Bug 11202: Improve UNIMARC biblio indexing

This patch makes the following changes to UNIMARC biblio indexing :
A. Changes to UNIMARC conf files
1. add comments to biblio-koha-indexdefs.xml
2. make biblio-koha-indexdefs.xml more compact by grouping some
   declarations
   Ex : 200$f and 200$g => one declaration for 200$fg
3. suppress unneeded declarations (indexing of some 4XX fields and 6XX
   fields not in unimarc format)
4. unindex some (sub)fields unneeded by most users (318, 207,230,210a,
   215, 4XXd)
5. change the way 308 field is indexed (no visible changes)
6. replace Title-host with Host-item -- see bug 11119
7. index 208 in Material-Type -- see bug 11119
8. index 100 pos 8-9 and 9-12 in pubdate:y and pubdate:n
9. index 100 pos 8-9 in pubdate:s instead of 210$d
10. Index all subfields of note 334 and 327 in note index
11. Index 304 and 327 in title index as well as note index
    327 can contain a list of titles included in a work
    304 can contain the title of the original work in case of a
    translation
12. Index 314 in author index as well as note index
    314 can contain authors not mentionned in 200$f/g (the 4th, 5th etc.
    author)
13. Index 328 note in Dissertation-information as well as note
14. Index 328$t in Title

B. Changes to ccl.properties :
1. add a new index Dissertation-information (1056)
2. fix EAN, pubdate and acqdate (they were not linked with bib1 attributes)

C. Changes to Search.pm
1. add Dissertation-information and suppress Title-host and UPC

D. Changes to QP config file queryparser.yaml
1. add Dissertation-information
2 fix EAN, pubdate and acqdate

Test plan :
If you cannot test in GRS1, test only in DOM, as GRS will be deprecated.

1. Apply the patch in a UNIMARC Koha running with DOM and ICU
2. copy src/etc/searchengine/queryparser.yaml into the main config
   directory of QP
3. copy src/etc/zebradb/ccl.properties into the main config directory
   of Zebra
4. copy src/etc/zebradb/marc_defs/unimarc/biblio/* into the main config
   directory of Zebra
5. reindex biblios (rebuild_zebra.pl -r -b -x -v)
6. test note index : make some searches on 334$b or 327$b
7. test author index : make some searches on 314 field
8. test title index : make some searches on 304 and 327 field, make a
   search on 328$t subfield
9. test dissertation-information index : make some searches on 328 field
10. In a record, put in the dates of 100 fields the values "1000" (1st
    date) and "1001" (2d date) ; try to search a book written in year
    1000, you should find the record ; idem for year 1001
11. make some searches and sort by date. It should work better as before,
    especially if you have values like "c2009" or "impr. 2010" in 210
    field
12. Regression test : make some searches on several indexes, like EAN,
    etc. It should work as before

Test 10-12 with and without Queryparser activated.
Be careful: with Queryparser activated, the index names (title,
dissertation-information...) must be entered in lowercase only.
Of course, to test search and sort by dates, you need to have full
records, with dates in 100 field as well as 210 field.

Signed-off-by: Paola Rossi <paola.rossi@cineca.it>
Signed-off-by: Jonathan Druart <jonathan.druart@biblibre.com>
Signed-off-by: Galen Charlton <gmc@esilibrary.com>
This commit is contained in:
Mathieu Saby 2013-11-05 12:33:04 +01:00 committed by Galen Charlton
parent aaff735269
commit b6118db2f5
6 changed files with 476 additions and 1327 deletions

View file

@ -963,6 +963,7 @@ sub getIndexes{
'Date-of-acquisition',
'Date-of-publication',
'Dewey-classification',
'Dissertation-information',
'EAN',
'extent',
'fic',
@ -1044,7 +1045,6 @@ sub getIndexes{
'su-to',
'su-ut',
'ut',
'UPC',
'Term-genre-form',
'Term-genre-form-heading',
'Term-genre-form-see',
@ -1053,7 +1053,6 @@ sub getIndexes{
'Title',
'Title-cover',
'Title-series',
'Title-host',
'Title-uniform',
'Title-uniform-heading',
'Title-uniform-see',

View file

@ -375,6 +375,15 @@ field_mappings:
aliases:
- datelastseen
label: Datelastseen
dissertation-information:
bib1_mapping:
biblioserver:
1: 1056
enabled: 1
index: dissertation-information
aliases:
- dissertation-information
label: Dissertation-information
dt-bks:
bib1_mapping:
biblioserver:
@ -423,7 +432,7 @@ field_mappings:
ean:
bib1_mapping:
biblioserver:
1: EAN
1: 1214
enabled: 1
index: ean
aliases:
@ -1240,7 +1249,7 @@ filter_mappings:
acqdate:
bib1_mapping:
biblioserver:
1: Date-of-acquisition
1: 32
4: 4
target_syntax_callback: date_filter_target_callback
enabled: 1
@ -1257,7 +1266,7 @@ filter_mappings:
pubdate:
bib1_mapping:
biblioserver:
1: pubdate
1: 31
4: 4
target_syntax_callback: date_filter_target_callback
enabled: 1

View file

@ -332,21 +332,19 @@ sn Local-number
#Date 30 The point of time at which 005, 008/00-05,
# a transaction or event 008/07-10, 260$c,
# takes place. 008/11-14, 033,etc.
# interpreting this as the copyright date in 260$c
# interpreting this as the copyright date in 260$c (MARC21) and 210$d (UNIMARC)
copydate 1=30 r=r
#Date-publication 31 The date (usually year) in 008/07-10, 260$c
# which a document is published. 046, 533$d
Date-of-publication 1=pubdate r=r
#dp Date-of-publication
Date-of-publication 1=31 r=r
yr Date-of-publication
pubdate Date-of-publication
#Date-acquisition 32 The date when a document was 541$d
# acquired.
Date-of-acquisition 1=Date-of-acquisition
Date-of-acquisition 1=32
acqdate Date-of-acquisition
#da Date-of-acquisition
#Date/time added to 1011 The date and time that a 008/00-05
#database record was added to the
@ -357,6 +355,16 @@ acqdate Date-of-acquisition
#modified was last updated.
Date/time-last-modified 1=1012
#Dissertation- 1056 Information about a MARC21 502/UNIMARC 328
#information dissertation thesis, or another
# publication connected with an
# academic degree.
Dissertation-information 1=1056
#EAN 1214 European article number UNIMARC 073
EAN 1=1214
ean EAN
#Identifier-- 1013 Used in full-text searching
#authority/format to indicate to the target
# system the format of the
@ -447,8 +455,6 @@ music Identifier-publisher-for-music
#Identifier-standard 1=1007 4=6
Identifier-standard 1=1007 4=6
ident Identifier-standard
upc 1=UPC
ean 1=EAN
#
#Identifier-stock 1028 A stock number that could be 037
@ -832,7 +838,7 @@ se Title-series
# a work is to be identified subfield $t in the
# for cataloging purposes. following: 700,710,
# 711
Title-uniform 1=Title-uniform
Title-uniform 1=6
ut Title-uniform
Title-uniform-heading 1=Title-uniform-heading

View file

@ -76,14 +76,15 @@ melm 071$a Identifier-publisher-for-music:w,Identifier-standard:w
melm 071$z Identifier-publisher-for-music:w,Identifier-standard:w
melm 071$b Publisher,Publisher:p
# UPC
melm 072$a UPC:w,Identifier-standard:w
melm 072$z UPC:w,Identifier-standard:w
melm 072$a Identifier-standard:w
melm 072$z Identifier-standard:w
# EAN
melm 073$a EAN:w,Identifier-standard:w
melm 073$z EAN:w,Identifier-standard:w
############ ITEM TYPE ##################
# FIXME index 200$b only in Material-type ?
# FIXME in standard installations, 200$b should probably NOT be indexed
melm 200$b itemtype:w,itemtype:p,itype:w,itype:p,Material-type:w,Material-type:p
melm 995$r itemtype:w,itemtype:p,itype:w,itype:p
@ -102,7 +103,7 @@ melm 995$r itemtype:w,itemtype:p,itype:w,itype:p
# Character Set (Mandatory) 4 26-29
# additional Character Set 4 28-33
# Script of title 2 34-35
melm 100$a tpubdate:s:range(data,8,1),ta:w:range(data,17,1),ta:w:range(data,18,1),ta:w:range(data,19,1),Modified-code:n:range(data,21,1),char-encoding:n:range(data,26,2),char-encoding:n:range(data,28,2),char-encoding:n:range(data,30,2),script-Title:n:range(data,34,2)
melm 100$a tpubdate:s:range(data,8,1),pubdate:n:range(data,9,4),pubdate:y:range(data,9,4),pubdate:s:range(data,9,4),pubdate:n:range(data,13,4),pubdate:y:range(data,13,4),ta:w:range(data,17,1),ta:w:range(data,18,1),ta:w:range(data,19,1),Modified-code:n:range(data,21,1),char-encoding:n:range(data,26,2),char-encoding:n:range(data,28,2),char-encoding:n:range(data,30,2),script-Title:n:range(data,34,2)
melm 101$a ln
melm 101$c language-original
melm 102$a Country-publication
@ -198,17 +199,21 @@ melm 205 Title,Title:p
########## MATERIAL SPECIFIC AREA #################
# TODO 206
melm 207 Serials,Serials:p
melm 208 Printed-music,Printed-music:p
melm 230$a Electronic-ressource
# 207 do not index
# 208
melm 208$a Material-type:w,Material-type:p
melm 208$d Material-type:w,Material-type:p
# Uncomment to index this field
# melm 230$a Electronic-ressource
########## PUBLISHER #################
melm 210$a pl,pl:p
#melm 210$a pl,pl:p
melm 210$c Publisher,Publisher:p
melm 210$d pubdate:n,pubdate:y,pubdate:s
melm 210$d pubdate:n,pubdate:y
########## DESCRIPTION #################
melm 215 Extent
# Uncomment to index this field
# melm 215 Extent
########## SERIES #################
melm 225$a Title-series,Title-series:p
@ -235,7 +240,7 @@ melm 302$a Note,Note:p
# Notes Pertaining to Descriptive Information
melm 303$a Note,Note:p
# Notes Pertaining to Title and Statement of Responsibility
melm 304$a Note,Note:p
melm 304$a Note,Note:p,Title:w,Title:p
# Notes Pertaining to Edition and Bibliographic History
melm 305$a Note,Note:p
# Notes Pertaining to Publication, Distribution, etc.
@ -253,15 +258,18 @@ melm 312$a Note,Note:p
# Notes Pertaining to Subject Access
melm 313$a Note,Note:p
# Notes Pertaining to Intellectual Responsability (in Sudoc catalogue, may contains the 4th, 5th etc. authors)
melm 314$a Note,Note:p
melm 314$a Note,Note:p,Author:w,Author:p
# Notes Pertaining to Material (or Type of Publication) Specific Information
melm 315$a Note,Note:p
# Note Relating to the Copy in Hand (ancient books)
# Do not index $u,$5
melm 316$a Note,Note:p
# Provenance Note (ancient books)
# Do not index $u,$5
melm 317$a Note,Note:p
# Action Note
melm 318$a Note,Note:p
# Do not index this note (useless for the public)
# melm 318 Note,Note:p
# Internal Bibliographies/Indexes Note
melm 320$a Note,Note:p
# External Indexes/Abstracts/References Note
@ -277,9 +285,23 @@ melm 325$a Note,Note:p
# Frequency Statement Note (Serials)
melm 326$a Note,Note:p
# Contents Note
melm 327$a Note,Note:p
melm 327$a Note,Note:p,Title:w,Title:p
melm 327$b Note,Note:p,Title:w,Title:p
melm 327$c Note,Note:p,Title:w,Title:p
melm 327$d Note,Note:p,Title:w,Title:p
melm 327$e Note,Note:p,Title:w,Title:p
melm 327$f Note,Note:p,Title:w,Title:p
melm 327$g Note,Note:p,Title:w,Title:p
melm 327$h Note,Note:p,Title:w,Title:p
melm 327$i Note,Note:p,Title:w,Title:p
# Dissertation note
melm 328 Note,Note:p
# Do not index $z ("Commercial edition : ")
melm 328$a Note,Note:p,Dissertation-information:w,Dissertation-information:p
melm 328$b Note,Note:p,Dissertation-information:w,Dissertation-information:p
melm 328$c Note,Note:p,Dissertation-information:w,Dissertation-information:p
melm 328$d Note,Note:p,Dissertation-information:w,Dissertation-information:p
melm 328$e Note,Note:p,Dissertation-information:w,Dissertation-information:p
melm 328$t Note,Note:p,Dissertation-information:w,Dissertation-information:p,Title:w,Title:p
# Summary or Abstract
melm 330$a Abstract:w,Note:w,Abstract:p,Note:p
# Preferred Citation of Described Materials
@ -287,18 +309,34 @@ melm 332$a Note,Note:p
# Users/Intended Audience Note
melm 333$a Note,Note:p
# Awards note
# Do not index $u,$z
melm 334$a Note,Note:p
melm 334$b Note,Note:p
melm 334$c Note,Note:p
melm 334$d Note,Note:p
# Type of electronic ressource note
melm 336$a Note,Note:p
# System requirements note
melm 337$a Note,Note:p
# Acquisition Information Note
melm 345$a Note,Note:p
# Table of contents note (Used in french libraries)
# Do not index $u,v,p
# Uncomment to index as note and title
# melm 359$a Note,Note:p,Title:w,Title:p
# melm 359$b Note,Note:p,Title:w,Title:p
# melm 359$c Note,Note:p,Title:w,Title:p
# melm 359$d Note,Note:p,Title:w,Title:p
# melm 359$e Note,Note:p,Title:w,Title:p
# melm 359$f Note,Note:p,Title:w,Title:p
# melm 359$g Note,Note:p,Title:w,Title:p
# melm 359$h Note,Note:p,Title:w,Title:p
# melm 359$i Note,Note:p,Title:w,Title:p
############## 4XX - LINKING ##################
# All 4XX indexed as Title, except for 410
melm 410$t Title-series,Title-series:p
melm 411$t Title,Title:p
melm 411$t Title-series,Title-series:p
melm 412$t Title,Title:p
melm 413$t Title,Title:p
melm 421$t Title,Title:p
@ -330,11 +368,11 @@ melm 454$t Title,Title:p
melm 455$t Title,Title:p
melm 456$t Title,Title:p
# FIXME Warning : field used by Koha for analytics, but also in Sudoc network
melm 461$t Title,Title-host:w,title-host:p
melm 461$t Title,Title:p,Host-item:w,Host-item:p
melm 462$t Title,Title:p
melm 463$t Title,Title:p
# FIXME Warning : field used by Koha for analytics, but also in Sudoc network
melm 464$t Title,Title-host:w,title-host:p,Title:p
melm 464$t Title,Title:p,Host-item:w,Host-item:p
melm 470$t Title,Title:p
melm 481$t Title,Title:p
melm 482$t Title,Title:p
@ -344,39 +382,6 @@ melm 488$t Title,Title:p
# FIXME Warning : field used by Koha for analytics, but also in Sudoc network
melm 461$9 Host-Item-Number
#FIXME Fields 400, 401, 403, 414, 415, 416, 417, 418, 419, 420 are not defined in Unimarc, but may be used by some libraries.
melm 400$t Title,Title:p
melm 401$t Title,Title:p
melm 403$t Title,Title:p,Title-Uniform,Title-Uniform:p
melm 414$t Title,Title:p
melm 415$t Title,Title:p
melm 416$t Title,Title:p
melm 417$t Title,Title:p
melm 418$t Title,Title:p
melm 419$t Title,Title:p
melm 420$t Title,Title:p
melm 400$d pubdate:n
melm 401$d pubdate:n
melm 403$d pubdate:n
melm 410$d pubdate:n
melm 412$d pubdate:n
melm 413$d pubdate:n
melm 414$d pubdate:n
melm 415$d pubdate:n
melm 416$d pubdate:n
melm 417$d pubdate:n
melm 418$d pubdate:n
melm 419$d pubdate:n
melm 420$d pubdate:n
melm 430$d pubdate:n
melm 431$d pubdate:n
melm 432$d pubdate:n
melm 440$d pubdate:n
melm 441$d pubdate:n
melm 445$d pubdate:n
melm 461$d pubdate:n
############## 5XX - TITLES ##################
melm 500$9 Koha-Auth-Number,Koha-Auth-Number:n
melm 501$9 Koha-Auth-Number,Koha-Auth-Number:n
@ -435,19 +440,6 @@ melm 616$9 Koha-Auth-Number,Koha-Auth-Number:n
melm 617$9 Koha-Auth-Number,Koha-Auth-Number:n
melm 620$9 Koha-Auth-Number,Koha-Auth-Number:n
melm 621$9 Koha-Auth-Number,Koha-Auth-Number:n
# melm 626$9 Koha-Auth-Number,Koha-Auth-Number:n
# melm 660$9 Koha-Auth-Number,Koha-Auth-Number:n
# melm 661$9 Koha-Auth-Number,Koha-Auth-Number:n
#FIXME Fields 603, 630, 631, 632, 633, 634, 635, 636, 646 are not defined in Unimarc, but may be used by some libraries.
melm 603$9 Koha-Auth-Number,Koha-Auth-Number:n
melm 630$9 Koha-Auth-Number,Koha-Auth-Number:n
melm 631$9 Koha-Auth-Number,Koha-Auth-Number:n
melm 632$9 Koha-Auth-Number,Koha-Auth-Number:n
melm 633$9 Koha-Auth-Number,Koha-Auth-Number:n
melm 634$9 Koha-Auth-Number,Koha-Auth-Number:n
melm 635$9 Koha-Auth-Number,Koha-Auth-Number:n
melm 636$9 Koha-Auth-Number,Koha-Auth-Number:n
melm 600$a Personal-name,Personal-name:p,Subject,Subject:p
melm 600 Subject,Subject:p
@ -473,17 +465,6 @@ melm 621 Subject,Subject:p
# Chronological coverage code. Probably not useful
# melm 661 Subject,Subject:p
#FIXME Fields 603, 630, 631, 632, 633, 634, 635, 636, 646 are not defined in Unimarc, but may be used by some libraries.
melm 603 Subject,Subject:p
melm 630 Subject,Subject:p
melm 631 Subject,Subject:p
melm 632 Subject,Subject:p
melm 633 Subject,Subject:p
melm 634 Subject,Subject:p
melm 635 Subject,Subject:p
melm 636 Subject,Subject:p
melm 646 Subject,Subject:p
########### CLASSIFICATIONS (67x/68x) ##################
# PRECIS
melm 670 Subject-precis:w,Subject-precis:p
@ -497,7 +478,6 @@ melm 680 LC-call-number:w,LC-call-number:p
# Other class numbers // see Marc21 084
melm 686 Local-classification:w,Local-classification:p
############## KOHA ITEM INFORMATION (based on 995) ###############
# Koha specific : $1, $2, $3
melm 995$1 damaged,damaged:n