Koha/C4
Mathieu Saby 0dd1ac40a0 Bug 9828: More specific indexing of UNIMARC 6XX fields
[New commit on 18 Aug 2014 : rebased, and DOM indexing only]

Issues to fix :
Most of 6XX may contain a $2 that identifies the system used for indexing. It should not be indexed.
In French libraries, $2 contains "rameau". So searching books about the music composer "Rameau" retreive thousands of records!
For some 6XX fiels, other subfields should not be indexed, for example dates of persons and family, or adresses.
In Unimarc guide, 600$t,601$t,602$t are said to exist but to be "not used". I keep them indexed.

Additionnally, subject indexing could be improved by using specific indexes for each 6XX if possible :
In ccl.properties :
- su-to, su-geo and su-ut are defined as aliases of Subject.
- a specific index is defined, but not used in record.abs : Subject-name-personal, alias su-na
We can use these indexes and create new specific indexes by using existing bib1 attributes.

We could also index $j,$x,$y,$z subdivision in specific indexes.

This patch does the following changes :
1) For all 6XX : Not indexing $2 (LSCH, Rameau...), $3 and $5
2) Suppressing the indexing of some specific subfields, depending on the field:
600 : Personal name used as a subject // see Marc21 600
not indexing c (additional elements),f (dates),p (address/affiliation)
602 : Family name used as a subject // see Marc21 600 3X
not indexing f (dates)
616 : Trademark
not indexing c,f
3) For all 6XX : index $j,$x,$y,$z in several indexes in addition to the specfific index for their 6XX field:
4) Define in ccl.properties some specific indexes :
Subject-name-conference 1=1073 => alias su-conf
Subject-name-corporate 1=1074 => alias su-corp
Subject-genre-form 1=1075 => alias su-genre and su-form
Subject-geographical 1=1076 => alias su-geo
Subject-chronological 1=1077 => alias su-chrono
Subject-title 1=1078 => alias su-ut and su-ti
Subject-topical 1=1079 => alias su-to
5) Adding new aliases in Search.pm :
su-chrono, su-form, su-genre, su-corp, su-conf, su-ti
6) Using these new indexes in for
600 : Subject and Subject-Personal-Name ; all subfields except subdivisions in Personal-name
601 : Subject, Subject-name-conference and Subject-name-corporate and Subject-name-conf ; all subfields except subdivisions in Corporate-name and Conference-name
602 : same as 600 but could be improved later
604 : Subject and Subject-title ; $a in Subject-Personal-Name ; all subfields except subdivisions in Name-and-Title
605 : Subject and Subject-title
606 : Subject and Subject-topical
607 : Subject and Subject-geographical ; all subfields except subdivisions in Name-geographic
608 : Subject and Subject-genre-form

To test :

A. In a UNIMARC-DOM indexing environment
1) Apply the patch
2) Rebuild zebra
3) Create a record A with some values in critical fields, for example:
- the string "test9828" in 600$c 600$f 600$p, 602$f, 616$c, 616$f, 606$2,600$2
- the string "subform" in 600$j
4) Create a record B with the string "subgeo" in 606$y
5) Create a record C with the string "subdate" in 606$z
6) try to search "su:test9828". You should have no results
7) try to search "su-genre:subform". You should have 1 result : record A
8) try to search "su-geo:subgeo". You should have 1 result : record B
9) try to search "su-chrono:subdate". You should have 1 result : record C
10) on existing records, try su-ut, su-to, su-na, su-form, su-corp, su-geo indexes, and see it results are relevant

Indexing of subjects could maybe be improved later

Signed-off-by: Nick Clemens <nick@quecheelibrary.org>

All seems to work as expected, I am not super-familiar with UNIMARC but I wonder if in su-corp and su-conf the subdivisions might be useful (e.g. France-Gendarmie / Staatsbibliothek-Berlin)

Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
Signed-off-by: Tomas Cohen Arazi <tomascohen@gmail.com>
2014-10-27 12:46:42 -03:00
..
AuthoritiesMarc Bug 7941 : Fix version numbers in modules 2012-06-11 17:29:38 +02:00
Barcodes Bug 11539: removing 2 unused files 2014-01-14 20:55:28 +00:00
Bookseller Bug 10402 follow-up: choose contacts for claims 2014-08-26 11:45:59 -03:00
ClassSortRoutine Bug 9770: fix sorting of Dewey call numbers that contain prefixes 2013-07-15 16:12:47 +00:00
Creators Bug 8375: (follow-up) adjust StrWidth to account for TTF fonts 2014-05-06 18:52:12 +00:00
External Bug 12041 - improve Koha::Cache 2014-06-19 13:05:04 -03:00
Form Bug 12100: ensure that messaging preferences displays saved Days in Advance 2014-04-28 21:35:18 +00:00
Heading Bug 10308 - local subjects can use authorities too 2014-09-01 10:45:07 -03:00
ILSDI Bug 12871 - wthdrawn instead of withdrawn in ILSDI 2014-10-27 10:59:47 -03:00
Installer Bug 12068: (rm followup) remove useless newline introduced on merging 2014-10-22 09:31:01 -03:00
Labels Bug 12068 - label-create-pdf.pl Add support for RTL language 2014-10-21 16:14:57 -03:00
Linker Bug 11650: multiplicated authorities after link_bibs_to_authorities.pl 2014-07-07 12:40:25 -03:00
Members Bug 12100: (follow-up) fix regression 2014-04-28 21:36:25 +00:00
OAI Bug 9295: Introduce operator equal/ notequal to OAI set mapping instead of hardcoded 'equal' value. 2013-10-10 23:03:30 +00:00
Output
Patroncards Bug 8315 - remove use C4::* version 2012-07-13 14:17:20 +02:00
Reports Bug 10126: (qa followup) fix tests 2014-10-16 10:24:10 -03:00
Search Bug 10807: (follow-up) use 24-hour time when storing search times to session 2014-05-05 02:55:41 +00:00
Serials Bug 7688: (follow-up) update license statements 2013-10-30 02:56:32 +00:00
SIP Bug 11633 Return Correct status if patron blocked by fines 2014-09-23 15:30:12 -03:00
Utils Bug 12833: Patron search should search on extended attributes 2014-09-09 10:08:59 -03:00
VirtualShelves Bug 8521 - Error in warning message when deleting list 2014-08-05 20:44:28 -03:00
Accounts.pm Bug 11230 - Refactor C4::Stats::UpdateStats and add UT 2014-07-27 11:29:28 -03:00
Acquisition.pm Bug 12827: NewOrder needs more unit tests 2014-09-17 21:22:56 -03:00
Auth.pm Bug 13114: Prevent Shibboleth Patches from spamming logs 2014-10-21 21:10:48 -03:00
Auth_cas_servers.yaml.orig Bug 5630 CAS improvements 2011-10-13 10:49:49 +13:00
Auth_with_cas.pm Bug 12398: Fix CAS authentication validation 2014-08-01 10:13:49 -03:00
Auth_with_ldap.pm Bug 8148 - LDAP auth_by_bind doesn't fallback to local auth 2014-08-07 16:22:49 -03:00
Auth_with_shibboleth.pm BUG8446, QA Followup: Use DBIx::Class 2014-10-16 12:28:01 -03:00
AuthoritiesMarc.pm Bug 12654 Correct incorrectly quoted regexp 2014-07-30 11:06:27 -03:00
BackgroundJob.pm Bug 10601: (follow-up) improvements to ->set() and ->get() 2013-09-18 17:23:44 +00:00
Barcodes.pm Bug 6679 - [SIGNED-OFF] fix 8 perlcritic violations in C4/Barcodes.pm 2012-09-20 12:01:36 +02:00
Biblio.pm Bug 12538: Remove Solr without breaking anything else 2014-10-11 16:59:04 -03:00
Bookseller.pm Bug 10402 follow-up: choose contacts for claims 2014-08-26 11:45:59 -03:00
Boolean.pm Bug 10080 - Change system pref IndependantBranches to IndependentBranches 2013-05-22 07:58:23 -07:00
Branch.pm Bug 9350: Making changes so that you can add the new fields to branches 2014-10-27 10:38:16 -03:00
Breeding.pm Bug 12898 - Z39.50 title search doesn't work with multiple words 2014-09-14 02:02:51 -03:00
Budgets.pm Bug 7498 - Cloning a budget, enable change of description 2014-09-05 10:21:30 -03:00
Calendar.pm Bug 7351 : feature that allows to delete a range of dates 2012-09-28 12:19:45 +02:00
Category.pm Bug 7919: FIX the "all" categories method 2013-01-02 16:50:52 -05:00
Charset.pm Bug 9859: fix nsb_clean side effect 2014-10-22 14:06:14 -03:00
Circulation.pm Bug 12914 - Wrong message 'Patron(..) is blocked for 2014-09-30 day(s). 2014-10-21 16:13:28 -03:00
ClassSortRoutine.pm Bug 12424 - ddc sorting of call numbers truncates long Cutter parts 2014-10-18 10:50:07 -03:00
ClassSource.pm Bug 10643: fix inappropriate uses of $sth->finish() in C4::ClassSource.pm 2013-08-09 15:32:22 +00:00
Context.pm Bug 12651: DOM indexing is the default 2014-10-27 12:35:44 -03:00
Contract.pm Bug 12487 [QA Followup] - GetContract must return undef with no params 2014-07-30 10:44:45 -03:00
CourseReserves.pm bug 8215: (followup) rename GetItemReservesInfo 2013-05-21 15:51:03 -07:00
Creators.pm Bug 8315 - remove use C4::* version 2012-07-13 14:17:20 +02:00
Csv.pm Bug 10853: All existing routing to get a CSV should return a MARC csv 2013-10-11 02:16:33 +00:00
Dates.pm Bug 7941 : Fix version numbers in modules 2012-06-11 17:29:38 +02:00
Debug.pm Bug 7941 : Fix version numbers in modules 2012-06-11 17:29:38 +02:00
Heading.pm Bug 7941 : Fix version numbers in modules 2012-06-11 17:29:38 +02:00
HoldsQueue.pm Bug 11258: fix another case where holds queue made transfer requests that contradict the library holds policy 2014-04-18 15:23:23 +00:00
HTML5Media.pm Bug 8377: Followup move style in a css file and do not pass template to a pm 2012-12-27 09:28:00 -05:00
Images.pm Bug 8710 - Don't show the images tab in the OPAC if the record has no local images 2012-11-28 18:54:40 -05:00
ImportBatch.pm Bug 11254: make reservoir search normalize ISBNs 2014-04-19 21:44:30 +00:00
ImportExportFramework.pm Bug 11666: remove SQL as an option for MARC framework exports and imports 2014-02-05 19:48:27 +00:00
Input.pm Bug 766: remove disused routine buildCGISort 2014-05-04 23:03:24 +00:00
InstallAuth.pm Bug 11349: Remove unnecesary name translation 2014-07-17 11:05:58 -03:00
Installer.pm Bug 10523: Remove two obsolete routines from Installer.pm 2013-10-31 16:51:47 +00:00
ItemCirculationAlertPreference.pm Bug 6679 - [SIGNED-OFF] fix 2 perlcritic violations in C4/ItemCirculationAlertPreference.pm 2012-09-20 12:01:39 +02:00
Items.pm Bug 12874: A DB field without a default mapping is set to a default value on editing 2014-10-11 12:57:11 -03:00
ItemType.pm Bug 10513: display a warning/message when returning a chosen item type 2013-09-16 17:45:31 +00:00
Koha.pm Bug 12874: A DB field without a default mapping is set to a default value on editing 2014-10-11 12:57:11 -03:00
Labels.pm Bug 8315 - remove use C4::* version 2012-07-13 14:17:20 +02:00
Languages.pm Bug 12534 - PROG/CCSR deprecation: Make getlanguages() theme independent for opac 2014-07-14 09:01:08 -03:00
Letters.pm Bug 9530: Replace tabs with spaces 2014-10-27 10:38:51 -03:00
Linker.pm Bug 11650: multiplicated authorities after link_bibs_to_authorities.pl 2014-07-07 12:40:25 -03:00
Log.pm Bug 11331 - CSV export for viewlog.pl is missing newlines 2014-08-05 20:23:26 -03:00
MarcModificationTemplates.pm Bug 11479: Remove experimental given/when keywords 2014-02-20 15:55:21 +00:00
Matcher.pm Bug 10500: (follow-up) disable AggressiveMatchOnISBN if UseQueryParser is on 2014-05-05 19:31:00 +00:00
Members.pm Bug 13084 [Master] - QA Followup - Use DBIx::Class to simplify logic 2014-10-23 10:40:57 -03:00
Message.pm Bug 6679 - [SIGNED-OFF] fix 3 perlcritic violations in C4/Message.pm 2012-09-20 12:01:39 +02:00
NewsChannels.pm Bug 12507 - News does not always display in staff or OPAC 2014-08-24 12:37:06 -03:00
Output.pm Bug 10016: force zero browser-side caching of SCO pages 2013-10-21 18:05:12 +00:00
Overdues.pm Bug 9180: All branches should be returned if a default rule exists 2014-08-19 09:29:51 -03:00
Patroncards.pm Bug 8315 - remove use C4::* version 2012-07-13 14:17:20 +02:00
Print.pm Bug 6679 - [SIGNED-OFF] fix 2 perlcritic violations in C4/Print.pm 2012-09-20 12:17:43 +02:00
Ratings.pm Bug 12609: Add some unit tests for C4::Ratings 2014-09-17 22:08:57 -03:00
Record.pm Bug 12409: Follow up - Reflect from hash to array in comments 2014-07-07 10:13:17 -03:00
Reports.pm Bug 12696: Remove CGI::scrolling_list from C4/Reports.pm 2014-08-15 14:44:50 -03:00
Reserves.pm Bug 12876: Improve unit tests for CanReserveBeCanceledFromOpac 2014-09-23 20:46:57 -03:00
Review.pm Bug 7941 : Fix version numbers in modules 2012-06-11 17:29:38 +02:00
Ris.pm Bug 11066: make RIS and Bibtex exports RDA compatible 2014-01-03 15:54:38 +00:00
RotatingCollections.pm Bug 11384: rename the collections_tracking.ctId column 2013-12-23 16:14:57 +00:00
Scheduler.pm Bug 7941 : Fix version numbers in modules 2012-06-11 17:29:38 +02:00
Scrubber.pm Bug 7941 : Fix version numbers in modules 2012-06-11 17:29:38 +02:00
Search.pm Bug 9828: More specific indexing of UNIMARC 6XX fields 2014-10-27 12:46:42 -03:00
Serials.pm Bug 12338: Remove smartmatch operator from C4/Serials.pm 2014-06-16 15:07:01 -03:00
Service.pm
ShelfBrowser.pm Bug 10856: (follow-up) if callnumbers are equal, order should be on itemnumber 2013-10-04 15:57:03 +00:00
SMS.pm Bug 7941 : Fix version numbers in modules 2012-06-11 17:29:38 +02:00
SocialData.pm bug 7470 follow-up, fix POD doc 2012-03-26 17:53:28 +02:00
SQLHelper.pm Bug 11221: ensure that SQLHelper uses NULL rather than 0000-00-00 as default date value 2013-11-19 15:29:08 +00:00
Stats.pm Bug 11230: dereference hashes in keys (QA followup) 2014-07-27 11:29:36 -03:00
Suggestions.pm Bug 10277 - Add C4::Context->IsSuperLibrarian() 2013-12-30 15:47:23 +00:00
Tags.pm Bug 9136: C4::Tags not Plack-compatible 2012-12-22 15:47:48 -05:00
Templates.pm Bug 12716: Allow the import patrons form have drop-downs and datepickers 2014-08-15 15:26:44 -03:00
TmplToken.pm Bug 12131: Remove unused dependency on Exporter 2014-04-25 15:24:39 +00:00
TmplTokenType.pm Revert "Bug 6679 - [SIGNED-OFF] fix 9 perlcritic violations in C4/TmplTokenType.pm" 2012-09-20 13:29:59 +02:00
TTParser.pm Bug 12207: fix TTparser's handling of TT directives that contain "]" 2014-05-23 15:23:20 +00:00
UploadedFile.pm Bug 7941 : Fix version numbers in modules 2012-06-11 17:29:38 +02:00
UsageStats.pm Bug 11926: Follow-up - remove SearchEngine pref / fix POD 2014-10-22 15:17:14 -03:00
VirtualShelves.pm Bug 8262: explicitly warn that database admin account cannot create lists 2014-04-20 22:55:22 +00:00
XISBN.pm Bug 11096: support the retrieval of large MARCXML records 2014-02-28 19:50:09 +00:00
XSLT.pm Bug 12237: Remove the "horrible hack" in C4::Templates 2014-07-03 10:34:11 -03:00