0dd1ac40a0
[New commit on 18 Aug 2014 : rebased, and DOM indexing only] Issues to fix : Most of 6XX may contain a $2 that identifies the system used for indexing. It should not be indexed. In French libraries, $2 contains "rameau". So searching books about the music composer "Rameau" retreive thousands of records! For some 6XX fiels, other subfields should not be indexed, for example dates of persons and family, or adresses. In Unimarc guide, 600$t,601$t,602$t are said to exist but to be "not used". I keep them indexed. Additionnally, subject indexing could be improved by using specific indexes for each 6XX if possible : In ccl.properties : - su-to, su-geo and su-ut are defined as aliases of Subject. - a specific index is defined, but not used in record.abs : Subject-name-personal, alias su-na We can use these indexes and create new specific indexes by using existing bib1 attributes. We could also index $j,$x,$y,$z subdivision in specific indexes. This patch does the following changes : 1) For all 6XX : Not indexing $2 (LSCH, Rameau...), $3 and $5 2) Suppressing the indexing of some specific subfields, depending on the field: 600 : Personal name used as a subject // see Marc21 600 not indexing c (additional elements),f (dates),p (address/affiliation) 602 : Family name used as a subject // see Marc21 600 3X not indexing f (dates) 616 : Trademark not indexing c,f 3) For all 6XX : index $j,$x,$y,$z in several indexes in addition to the specfific index for their 6XX field: 4) Define in ccl.properties some specific indexes : Subject-name-conference 1=1073 => alias su-conf Subject-name-corporate 1=1074 => alias su-corp Subject-genre-form 1=1075 => alias su-genre and su-form Subject-geographical 1=1076 => alias su-geo Subject-chronological 1=1077 => alias su-chrono Subject-title 1=1078 => alias su-ut and su-ti Subject-topical 1=1079 => alias su-to 5) Adding new aliases in Search.pm : su-chrono, su-form, su-genre, su-corp, su-conf, su-ti 6) Using these new indexes in for 600 : Subject and Subject-Personal-Name ; all subfields except subdivisions in Personal-name 601 : Subject, Subject-name-conference and Subject-name-corporate and Subject-name-conf ; all subfields except subdivisions in Corporate-name and Conference-name 602 : same as 600 but could be improved later 604 : Subject and Subject-title ; $a in Subject-Personal-Name ; all subfields except subdivisions in Name-and-Title 605 : Subject and Subject-title 606 : Subject and Subject-topical 607 : Subject and Subject-geographical ; all subfields except subdivisions in Name-geographic 608 : Subject and Subject-genre-form To test : A. In a UNIMARC-DOM indexing environment 1) Apply the patch 2) Rebuild zebra 3) Create a record A with some values in critical fields, for example: - the string "test9828" in 600$c 600$f 600$p, 602$f, 616$c, 616$f, 606$2,600$2 - the string "subform" in 600$j 4) Create a record B with the string "subgeo" in 606$y 5) Create a record C with the string "subdate" in 606$z 6) try to search "su:test9828". You should have no results 7) try to search "su-genre:subform". You should have 1 result : record A 8) try to search "su-geo:subgeo". You should have 1 result : record B 9) try to search "su-chrono:subdate". You should have 1 result : record C 10) on existing records, try su-ut, su-to, su-na, su-form, su-corp, su-geo indexes, and see it results are relevant Indexing of subjects could maybe be improved later Signed-off-by: Nick Clemens <nick@quecheelibrary.org> All seems to work as expected, I am not super-familiar with UNIMARC but I wonder if in su-corp and su-conf the subdivisions might be useful (e.g. France-Gendarmie / Staatsbibliothek-Berlin) Signed-off-by: Paul Poulain <paul.poulain@biblibre.com> Signed-off-by: Tomas Cohen Arazi <tomascohen@gmail.com> |
||
---|---|---|
.. | ||
marc21 | ||
normarc/biblios | ||
unimarc |