Main Koha release repository https://koha-community.org
Find a file
Mathieu Saby 0dd1ac40a0 Bug 9828: More specific indexing of UNIMARC 6XX fields
[New commit on 18 Aug 2014 : rebased, and DOM indexing only]

Issues to fix :
Most of 6XX may contain a $2 that identifies the system used for indexing. It should not be indexed.
In French libraries, $2 contains "rameau". So searching books about the music composer "Rameau" retreive thousands of records!
For some 6XX fiels, other subfields should not be indexed, for example dates of persons and family, or adresses.
In Unimarc guide, 600$t,601$t,602$t are said to exist but to be "not used". I keep them indexed.

Additionnally, subject indexing could be improved by using specific indexes for each 6XX if possible :
In ccl.properties :
- su-to, su-geo and su-ut are defined as aliases of Subject.
- a specific index is defined, but not used in record.abs : Subject-name-personal, alias su-na
We can use these indexes and create new specific indexes by using existing bib1 attributes.

We could also index $j,$x,$y,$z subdivision in specific indexes.

This patch does the following changes :
1) For all 6XX : Not indexing $2 (LSCH, Rameau...), $3 and $5
2) Suppressing the indexing of some specific subfields, depending on the field:
600 : Personal name used as a subject // see Marc21 600
not indexing c (additional elements),f (dates),p (address/affiliation)
602 : Family name used as a subject // see Marc21 600 3X
not indexing f (dates)
616 : Trademark
not indexing c,f
3) For all 6XX : index $j,$x,$y,$z in several indexes in addition to the specfific index for their 6XX field:
4) Define in ccl.properties some specific indexes :
Subject-name-conference 1=1073 => alias su-conf
Subject-name-corporate 1=1074 => alias su-corp
Subject-genre-form 1=1075 => alias su-genre and su-form
Subject-geographical 1=1076 => alias su-geo
Subject-chronological 1=1077 => alias su-chrono
Subject-title 1=1078 => alias su-ut and su-ti
Subject-topical 1=1079 => alias su-to
5) Adding new aliases in Search.pm :
su-chrono, su-form, su-genre, su-corp, su-conf, su-ti
6) Using these new indexes in for
600 : Subject and Subject-Personal-Name ; all subfields except subdivisions in Personal-name
601 : Subject, Subject-name-conference and Subject-name-corporate and Subject-name-conf ; all subfields except subdivisions in Corporate-name and Conference-name
602 : same as 600 but could be improved later
604 : Subject and Subject-title ; $a in Subject-Personal-Name ; all subfields except subdivisions in Name-and-Title
605 : Subject and Subject-title
606 : Subject and Subject-topical
607 : Subject and Subject-geographical ; all subfields except subdivisions in Name-geographic
608 : Subject and Subject-genre-form

To test :

A. In a UNIMARC-DOM indexing environment
1) Apply the patch
2) Rebuild zebra
3) Create a record A with some values in critical fields, for example:
- the string "test9828" in 600$c 600$f 600$p, 602$f, 616$c, 616$f, 606$2,600$2
- the string "subform" in 600$j
4) Create a record B with the string "subgeo" in 606$y
5) Create a record C with the string "subdate" in 606$z
6) try to search "su:test9828". You should have no results
7) try to search "su-genre:subform". You should have 1 result : record A
8) try to search "su-geo:subgeo". You should have 1 result : record B
9) try to search "su-chrono:subdate". You should have 1 result : record C
10) on existing records, try su-ut, su-to, su-na, su-form, su-corp, su-geo indexes, and see it results are relevant

Indexing of subjects could maybe be improved later

Signed-off-by: Nick Clemens <nick@quecheelibrary.org>

All seems to work as expected, I am not super-familiar with UNIMARC but I wonder if in su-corp and su-conf the subdivisions might be useful (e.g. France-Gendarmie / Staatsbibliothek-Berlin)

Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
Signed-off-by: Tomas Cohen Arazi <tomascohen@gmail.com>
2014-10-27 12:46:42 -03:00
acqui Bug 12827: NewOrder should not return basketno 2014-09-17 21:22:26 -03:00
admin Bug 9350: Making changes so that you can add the new fields to branches 2014-10-27 10:38:16 -03:00
authorities Bug 12177 - Remove HTML from authorities.pl 2014-08-19 09:27:08 -03:00
basket Bug 9530 making changes to basket/sendbasket.pl 2014-10-27 10:38:20 -03:00
C4 Bug 9828: More specific indexing of UNIMARC 6XX fields 2014-10-27 12:46:42 -03:00
catalogue Bug 12330: [QA Follow-up] Consistency between opac-search and staff client 2014-09-05 12:06:04 -03:00
cataloguing Bug 12884: Get rid of redefined subroutine warnings in dateaccessioned.pl 2014-10-22 14:18:40 -03:00
circ Bug 11577: Add 'auto_renew' and 'auto_too_soon' to renewal page 2014-09-17 19:23:16 -03:00
course_reserves Bug 11349: Change .tmpl -> .tt in scripts using templates 2014-07-17 11:05:49 -03:00
debian Bug 11362 - increase zebra AUTH register sizes, from 4G to 20G 2014-10-24 09:41:04 -03:00
docs Bug 7143 Adding a new developer to the history 2014-10-11 16:23:08 -03:00
errors Bug 11349: Change .tmpl -> .tt in scripts using templates 2014-07-17 11:05:49 -03:00
etc Bug 9828: More specific indexing of UNIMARC 6XX fields 2014-10-27 12:46:42 -03:00
install_misc Bug 12068 - label-create-pdf.pl Add support for RTL language 2014-10-21 16:14:57 -03:00
installer Bug 3977: MARC21 321$b has wrong description 2014-10-27 12:38:52 -03:00
Koha Bug 9530: Update DBIx 2014-10-27 10:46:05 -03:00
koha-tmpl Bug 13139 - Move treeview jQuery plugin outside of language-specific directory 2014-10-27 12:36:53 -03:00
labels Bug 11614: Untranslatable label_element_title in label management 2014-08-19 09:42:42 -03:00
members Bug 12693 - colspan calculation done by members/statistics.pl should be moved to template 2014-08-11 15:47:14 -03:00
misc Bug 12651: DOM indexing is the default 2014-10-27 12:35:44 -03:00
offline_circ Bug 11349: Change .tmpl -> .tt in scripts using templates 2014-07-17 11:05:49 -03:00
opac Bug 9530 making changes to opac/opac-sendshelf.pl 2014-10-27 10:38:23 -03:00
OpenILS
patron_lists Bug 10565: (follow-up) add new user permission for patron list management 2013-10-14 22:43:03 +00:00
patroncards Bug 5502 - Patron card category search field should be menu 2014-08-10 09:30:47 -03:00
plugins Bug 11349: Change .tmpl -> .tt in scripts using templates 2014-07-17 11:05:49 -03:00
reports Bug 11672: Untranslatable dropdown on Guided Reports and dictionary 2014-09-23 15:32:21 -03:00
reserve Bug 12287: At the moment, found is 'W', 'T' or NULL 2014-08-16 09:06:51 -03:00
reviews Bug 11349: Change .tmpl -> .tt in scripts using templates 2014-07-17 11:05:49 -03:00
rotating_collections Bug 11349: Change .tmpl -> .tt in scripts using templates 2014-07-17 11:05:49 -03:00
selenium
serials Bug 11349: Make the QA script happy 2014-07-17 11:06:06 -03:00
services Bug 11349: Change .tmpl -> .tt in scripts using templates 2014-07-17 11:05:49 -03:00
skel Bug 11078: Add locking to rebuild_zebra 2014-02-28 22:21:41 +00:00
sms Bug 11349: Change .tmpl -> .tt in scripts using templates 2014-07-17 11:05:49 -03:00
suggestion Bug 11349: Change .tmpl -> .tt in scripts using templates 2014-07-17 11:05:49 -03:00
svc Bug 12590 - Support deletion of biblio in svc API 2014-10-27 11:13:49 -03:00
t Bug 12651: DOM indexing is the default 2014-10-27 12:35:44 -03:00
tags Bug 11349: Change .tmpl -> .tt in scripts using templates 2014-07-17 11:05:49 -03:00
test Bug 11349: Change .tmpl -> .tt in scripts using templates 2014-07-17 11:05:49 -03:00
tmp/modified_authorities
tools Bug 12031: [QA Follow-up] Undefined routine and change to koha-conf.xml 2014-10-27 10:38:11 -03:00
virtualshelves Bug 9530 making changes to virtualshelves/sendshelf.pl 2014-10-27 10:38:25 -03:00
xt Bug 11349: Change .tmpl -> .tt in scripts using templates 2014-07-17 11:05:49 -03:00
.editorconfig Bug 12545: Add EditorConfig.org file to the source tree 2014-08-22 11:07:45 -03:00
.htaccess Fix file permissions: if it is not a script, it should not be executable. 2010-04-16 00:40:34 -04:00
.mailmap Bug 12479: (QA followup) minor fixes, and tcohen added 2014-06-30 10:04:10 -03:00
about.pl Bug 13140: Add a notice on the About page about GRS-1 deprecated 2014-10-27 11:24:09 -03:00
changelanguage.pl
edithelp.pl Bug 11661: sanitize file names supplied to edithelp.pl 2014-02-05 01:36:10 +00:00
fix-perl-path.PL
help.pl Bug 11238: contruct links to the appropriate manual version dynamically 2013-11-23 19:30:16 +00:00
INSTALL
install-CPAN.pl Bug 5370: Fix all the references to koha.org 2010-11-08 09:41:49 +13:00
INSTALL.debian
INSTALL.fedora7 Bug 11757: remove dependency on POE 2014-02-15 01:38:15 +00:00
INSTALL.opensuse Bug 11757: remove dependency on POE 2014-02-15 01:38:15 +00:00
INSTALL.ubuntu
koha_perl_deps.pl bug 10548: fix count of missing required dependencies by koha_perl_deps.pl 2013-07-11 14:03:32 +00:00
kohaversion.pl Bug 13088: DBRev 3.17.00.033 2014-10-27 12:28:23 -03:00
LICENSE
mainpage.pl Bug 11349: Change .tmpl -> .tt in scripts using templates 2014-07-17 11:05:49 -03:00
Makefile.PL Bug 12651: DOM indexing is the default 2014-10-27 12:35:44 -03:00
MANIFEST.SKIP Bug 9546 : Updating make manifest tardist 2013-02-06 23:54:46 -05:00
README
README.robots Bug 6411 add another example to README.robots 2011-07-05 14:48:05 +12:00
rewrite-config.PL Bug 12031: [QA Follow-up] Undefined routine and change to koha-conf.xml 2014-10-27 10:38:11 -03:00

Koha is a free software integrated library system.

Koha is distributed under the GNU GPL version 3 or later.
Please read the file LICENSE for more details.

To install or upgrade Koha, please see the INSTALL file appropriate
to your platform.

Report bugs at http://bugs.koha-community.org/

Visit the Koha Project website at http://www.koha-community.org/