Same as before, but test with UNIMARC setup
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>
Signed-off-by: Katrin Fischer <katrin.fischer.83@web.de>
Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>
For a good management of autorities linking to biblio records,
MARC21 uses index_heading and index_match_heading in authorities zebra configuration.
UNIMARC configuration must use the same.
This patch adds in UNIMARC authorities zebra configuration index_heading and index_match_heading to earch heading
in order to be maximum close to MARC21 authorities zebra configuration.
See changes made in MARC21 :
32cf2af700
It fixes some indexes names : Personal-name-see => Personal-name-see-from
Removes useless Term-geographic index, a duplicate of Name-geographic.
Sometimes parallel 7xx form whas only on $a, it must contains same subfields
has the main heading.
Test plan :
===========
1.0) Use a UNIMARC install without patch
1.1) Set sysprefs
BiblioAddsAuthorities = ON
AutoCreateAuthorities = ON
LinkerModule = First Match
1.2) Replace authorities zebra configuration files
cp $KOHA_CLONE/etc/zebradb/marc_defs/unimarc/authorities/authority-koha-indexdefs.xml $KOHA_CONF_DIR/zebradb/marc_defs/unimarc/authorities/authority-koha-indexdefs.xml
cp $KOHA_CLONE/etc/zebradb/marc_defs/unimarc/authorities/authority-zebra-indexdefs.xsl $KOHA_CONF_DIR/zebradb/marc_defs/unimarc/authorities/authority-zebra-indexdefs.xsl
1.3) Restart zebra server and indexer services
1.4) Reindex authorities
./misc/migration_tools/rebuild_zebra.pl -r -a -v
1.5) Search in Z3950 a record with complex heading (with subdivisions),
for example ISBN 2877620115 "Facteurs culturels et sociaux de la santé en Afrique de l'Oues"
1.6) Import this record and save it : authorities are created
go to staff:/cgi-bin/koha/cataloguing/addbooks.pl
1.7) Reimport the same record (when asked, say that it's not a duplicate)
1.8) The authority should have been duplicated :
different url and different $9 value
2.0) Apply this patch
2.1) Replace again the authorities zebra configuration files
2.2) Restart zebra server and indexer services
2.3) Reindex authorities
2.4) Reimport the same record
2.5) The authority should have not been duplicated. Compare with both
existing records to see which the 3rd has been matched against.
3.0) Play with authorities search to check every mode :
Search main heading ($a only)
Search main heading
Search all headings
Search entire record
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>
Signed-off-by: Victor Grousset/tuxayo <victor@tuxayo.net>
Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
Like for MARC21, UNIMARC authorities has subdivisions form, general,
chronological and geographic.
In C4::Heading::UNIMARC, use subdivisions in _get_search_heading like in C4::Heading::MARC21.
Adds subdivisions variables into UNIMARC authorities zebra configuration.
Note that unlike MARC21 geographic is subfield $y and chronological is subfield $z.
See https://www.ifla.org/publications/unimarc-formats-and-related-documentation
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>
Signed-off-by: Victor Grousset/tuxayo <victor@tuxayo.net>
Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
Removes last remaining bit of configuration for the Titles facet
configuration.
Signed-off-by: Katrin Fischer <katrin.fischer.83@web.de>
Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>
Signed-off-by: Katrin Fischer <katrin.fischer.83@web.de>
Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>
Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>
Signed-off-by: Katrin Fischer <katrin.fischer.83@web.de>
Signed-off-by: Nick Clemens <nick@bywatersolutions.com>
This patch adds the index definitions for zebra faceting of ccode in
koha for marc21, normarc and unimarc.
We also add lines to the templates to expose the new facet and enable
non-zebra faceting for ccode too.
Signed-off-by: David Cook <dcook@prosentient.com.au>
Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>
Signed-off-by: Nick Clemens <nick@bywatersolutions.com>
This patch adds a numeric index 'not-onloan-count' containing the value
of 999$x. This subfield is filled by 'rebuild_zebra.pl' by making use of
(bug's 18208) 'EmbedItemsAvailability' filter.
bib1.att and indexes definitions are updated accordingly.
To test:
- Apply the patch
- Pick the right biblio-zebra-indexdefs.xsl file for your setup and
replace the one your Zebra uses [1]
- Replace your bib1.att
- Replace your ccl.properties
- Have at least one record with more than one item, checkout some
item(s) from that record(s).
- Rebuild zebra's indexes:
$ sudo koha-shell kohadev
k$ cd kohaclone
k$ misc/migration_tools/rebuild_zebra.pl -r -b -v -k
(notice the dump directory is kept, you can try the XSLT yourself
running:
$ xsltproc \
etc/zebradb/marc_defs/marc21/biblios/biblio-zebra-indexdefs.xsl \
/tmp/the_dump_dir/biblios/exported_records | less
=> SUCCESS: There are records with the not-onloan-count index, and the
value is correct!
- Check Zebra yourself:
$ yaz-client unix:/var/run/koha/kohadev/bibliosocket
Z> base biblios
Z> find @attr 1=9013 @attr 2=5 @attr 4=109 0
=> SUCCESS: The search matches the amount of records with not-onloan
items.
Z> s 1+1
=> SUCCESS: Records with 999$x having a value higher than 0 are rendered
- Sign off :-D
Note: While this work is complete on its purpose, it is part of an
attempt to create a better way of filtering by availability.
Sponsored-by: ByWater Solutions
[1] In kohadevbox this would be
/etc/koha/zebradb/marc_defs/marc21/biblios/biblio-zebra-indexdefs.xsl
Edit: Added the missing XSLT changes for UNIMARC and NORMARC
Signed-off-by: Josef Moravec <josef.moravec@gmail.com>
Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>
Signed-off-by: Marcel de Rooy <m.de.rooy@rijksmuseum.nl>
Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>
In DOM config file :
etc/zebradb/marc_defs/unimarc/biblios/biblio-koha-indexdefs.xml, the 608$9 is
defined a second time instead of 610$9. Just a type I think.
Test plan :
- Apply patch
- Install a UNIMARC + DOM instance
- Define in a framework 610 using a thesaurus
- Create a new biblio
- Create a new authority (same type as the thesaurus defined above)
- Index : rebuild_zebra.pl -a -b -x -z
- Link the field 610 to the new authority
- Index : rebuild_zebra.pl -a -b -x -z
- In authorities search, search for the new authority
=> You see Use in 1 Records(s)
Signed-off-by: Frederic Demians <f.demians@tamil.fr>
I confirm the typo.
Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>
Make the shipped XSLTs for authorities (MARC21 and UNIMARC) the same as the generated version
Signed-off-by: Tomas Cohen Arazi <tomascohen@unc.edu.ar>
Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>
Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>
In authority-koha-indexdefs.xml, all tags use the namespace "kohaidx" except the tag "id".
When re-generating authority-zebra-indexdefs.xsl, the line :
<xslo:variable name="idfield" select="normalize-space(marc:controlfield[@tag='001'])"/>
is modified :
<xslo:variable name="idfield" select="normalize-space()"/>
This is an error.
This patch adds kohaidx namespace to correct.
Test plan :
- Without patch
- go to etc/zebradb/marc_defs/marc21/authorities/
- run : xslproc xsltproc ../../../xsl/koha-indexdefs-to-zebra.xsl authority-koha-indexdefs.xml > authority-zebra-indexdefs.xsl
- read authority-zebra-indexdefs.xsl
=> the line has changed : <xslo:variable name="idfield" select="normalize-space()"/>
- Apply patch
- go to etc/zebradb/marc_defs/marc21/authorities/
- run : xslproc xsltproc ../../../xsl/koha-indexdefs-to-zebra.xsl authority-koha-indexdefs.xml > authority-zebra-indexdefs.xsl
- read authority-zebra-indexdefs.xsl
=> the line has not changed
(same for unimarc flavor)
Signed-off-by: Mirko Tietgen <mirko@abunchofthings.net>
Signed-off-by: Tomas Cohen Arazi <tomascohen@unc.edu.ar>
As Mirko mentioned, the xslt's now generate the facet-processing templates in
the authority xslt's too. They are harmless because we don't define facets
for authority records. If we did, it would be harmless too.
Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>
Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>
All of them were found and fixed using codespell.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Bernardo Gonzalez Kriegel <bgkriegel@gmail.com>
Signed-off-by: Jonathan Druart <jonathan.druart@koha-community.org>
Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>
Fix a typo. Not test plan required, just a look at default UNIMARC framework.
Signed-off-by: Tomas Cohen Arazi <tomascohen@gmail.com>
Signed-off-by: Katrin Fischer <katrin.fischer.83@web.de>
This followup
- changes some indexes in Queryparser configuration file
- supresses some clearly useless 6XX$9 in biblio-koha-indexdefs.xml and adds 2 new ones, probably useless (not sure of that)
- change the name of index Subject-geographical to Subject-name-geographical in ccl.properties (to match bib1.att)
the xsl file zebradb/marc_defs/unimarc/biblios/biblio-zebra-indexdefs.xsl was generated with the following command:
xsltproc zebradb/xsl/koha-indexdefs-to-zebra.xsl zebradb/marc_defs/unimarc/biblios/biblio-koha-indexdefs.xml > zebradb/marc_defs/unimarc/biblios/biblio-zebra-indexdefs.xsl
To test :
1) Apply the 3 patches
2) copy the modified files from the source directory to the directory where you store the config files for Zebra and Queryparser
The files modified by the 3 patches and that need to be copied are:
etc/zebradb/biblios/etc/bib1.att
etc/zebradb/ccl.properties
etc/searchengine/queryparser.yaml
etc/zebradb/ccl.properties
.../unimarc/biblios/biblio-koha-indexdefs.xml
.../unimarc/biblios/biblio-zebra-indexdefs.xsl
3) Rebuild Zebra
4) Create a record A with some values in critical fields, for example:
- the string "test9828" in 600$c 600$f 600$p, 602$f, 616$c, 616$f, 606$2,600$2
- the string "subform" in 600$j
4) Create a record B with the string "subgeo" in 606$y
5) Create a record C with the string "subdate" in 606$z
WITHOUT QP activated in sysprefs ("Don't try to use QP"):
6) try to search "su:test9828". You should have no results
7) try to search "su-genre:subform". You should have 1 result : record A
8) try to search "su-geo:subgeo". You should have 1 result : record B
9) try to search "su-chrono:subdate". You should have 1 result : record C
10) on existing records, try su-ut, su-to, su-na, su-form, su-corp, su-geo indexes, and see it results are relevant
WITH QP activated in sysprefs:
Same tests
Signed-off-by: Nick Clemens <nick@quecheelibrary.org>
Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
Signed-off-by: Tomas Cohen Arazi <tomascohen@gmail.com>
Only cosmetic :
- the references to lines record.abs are now useless and outdated
- some comments added in record.abs could be usefull in biblio-koha-indexdefs.xml
No change expected, only comments
Signed-off-by: Nick Clemens <nick@quecheelibrary.org>
Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
Signed-off-by: Tomas Cohen Arazi <tomascohen@gmail.com>
[New commit on 18 Aug 2014 : rebased, and DOM indexing only]
Issues to fix :
Most of 6XX may contain a $2 that identifies the system used for indexing. It should not be indexed.
In French libraries, $2 contains "rameau". So searching books about the music composer "Rameau" retreive thousands of records!
For some 6XX fiels, other subfields should not be indexed, for example dates of persons and family, or adresses.
In Unimarc guide, 600$t,601$t,602$t are said to exist but to be "not used". I keep them indexed.
Additionnally, subject indexing could be improved by using specific indexes for each 6XX if possible :
In ccl.properties :
- su-to, su-geo and su-ut are defined as aliases of Subject.
- a specific index is defined, but not used in record.abs : Subject-name-personal, alias su-na
We can use these indexes and create new specific indexes by using existing bib1 attributes.
We could also index $j,$x,$y,$z subdivision in specific indexes.
This patch does the following changes :
1) For all 6XX : Not indexing $2 (LSCH, Rameau...), $3 and $5
2) Suppressing the indexing of some specific subfields, depending on the field:
600 : Personal name used as a subject // see Marc21 600
not indexing c (additional elements),f (dates),p (address/affiliation)
602 : Family name used as a subject // see Marc21 600 3X
not indexing f (dates)
616 : Trademark
not indexing c,f
3) For all 6XX : index $j,$x,$y,$z in several indexes in addition to the specfific index for their 6XX field:
4) Define in ccl.properties some specific indexes :
Subject-name-conference 1=1073 => alias su-conf
Subject-name-corporate 1=1074 => alias su-corp
Subject-genre-form 1=1075 => alias su-genre and su-form
Subject-geographical 1=1076 => alias su-geo
Subject-chronological 1=1077 => alias su-chrono
Subject-title 1=1078 => alias su-ut and su-ti
Subject-topical 1=1079 => alias su-to
5) Adding new aliases in Search.pm :
su-chrono, su-form, su-genre, su-corp, su-conf, su-ti
6) Using these new indexes in for
600 : Subject and Subject-Personal-Name ; all subfields except subdivisions in Personal-name
601 : Subject, Subject-name-conference and Subject-name-corporate and Subject-name-conf ; all subfields except subdivisions in Corporate-name and Conference-name
602 : same as 600 but could be improved later
604 : Subject and Subject-title ; $a in Subject-Personal-Name ; all subfields except subdivisions in Name-and-Title
605 : Subject and Subject-title
606 : Subject and Subject-topical
607 : Subject and Subject-geographical ; all subfields except subdivisions in Name-geographic
608 : Subject and Subject-genre-form
To test :
A. In a UNIMARC-DOM indexing environment
1) Apply the patch
2) Rebuild zebra
3) Create a record A with some values in critical fields, for example:
- the string "test9828" in 600$c 600$f 600$p, 602$f, 616$c, 616$f, 606$2,600$2
- the string "subform" in 600$j
4) Create a record B with the string "subgeo" in 606$y
5) Create a record C with the string "subdate" in 606$z
6) try to search "su:test9828". You should have no results
7) try to search "su-genre:subform". You should have 1 result : record A
8) try to search "su-geo:subgeo". You should have 1 result : record B
9) try to search "su-chrono:subdate". You should have 1 result : record C
10) on existing records, try su-ut, su-to, su-na, su-form, su-corp, su-geo indexes, and see it results are relevant
Indexing of subjects could maybe be improved later
Signed-off-by: Nick Clemens <nick@quecheelibrary.org>
All seems to work as expected, I am not super-familiar with UNIMARC but I wonder if in su-corp and su-conf the subdivisions might be useful (e.g. France-Gendarmie / Staatsbibliothek-Berlin)
Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
Signed-off-by: Tomas Cohen Arazi <tomascohen@gmail.com>
This patch updates the Zebra configuration for unimarc.
995$d and 995$j should not be indexed.
Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
Signed-off-by: Katrin Fischer <katrin.fischer.83@web.de>
Signed-off-by: Tomas Cohen Arazi <tomascohen@gmail.com>
Note: NORMARC is missing the id field.
Signed-off-by: Tomas Cohen Arazi <tomascohen@gmail.com>
Signed-off-by: Katrin Fischer <katrin.fischer.83@web.de>
This patch makes t/db_dependent/Search.t pass again.
NORMARC is currently not tested.
I checked the results before and after applying the patch
and the facets are now looking the same as before.
Passes all tests and QA script.
Signed-off-by: Tomas Cohen Arazi <tomascohen@gmail.com>
The previous patches for facet extraction from Zebra indexes set a default
namespace on the following files:
etc/zebradb/marc_defs/marc21/biblios/biblio-koha-indexdefs.xml
etc/zebradb/marc_defs/normarc/biblios/biblio-koha-indexdefs.xml
etc/zebradb/marc_defs/unimarc/biblios/biblio-koha-indexdefs.xml
and hence the XML file index_subfields can be cleaned by removing the namespace.
To test:
- Apply this patch
- Run
$ for i in marc21 normarc unimarc
do xsltproc etc/zebradb/xsl/koha-indexdefs-to-zebra.xsl \
etc/zebradb/marc_defs/$i/biblios/biblio-koha-indexdefs.xml \
> etc/zebradb/marc_defs/$i/biblios/biblio-zebra-indexdefs.xsl
done
=> SUCCESS: no errors reported
- Run
$ git diff
=> SUCCESS: no differences on the xsl files
- Sign off :-D
Sponsored-by: Universidad Nacional de Cordoba
Signed-off-by: David Cook <dcook@prosentient.com.au>
Seems to work with DOM and MARC21.
Signed-off-by: Jonathan Druart <jonathan.druart@biblibre.com>
Signed-off-by: Katrin Fischer <katrin.fischer.83@web.de>
Signed-off-by: Tomas Cohen Arazi <tomascohen@gmail.com>
This patch adds the facets definitions to the biblio-koha-indexdefs.xml, based
on what is hardcoded on C4::Koha::getFacets().
The biblio-zebra-indexdefs.xsl file for UNIMARC is generated using the usual:
xsltproc ...koha-indexdefs-to-zebra.xsl ...unimarc/biblios/biblio-koha-indexdefs.xml > \
...unimarc/biblios/biblio-zebra-indexdefs.xsl
Sponsored-by: Universidad Nacional de Cordoba
Signed-off-by: David Cook <dcook@prosentient.com.au>
Seems to work with DOM and MARC21.
Signed-off-by: Jonathan Druart <jonathan.druart@biblibre.com>
Signed-off-by: Katrin Fischer <katrin.fischer.83@web.de>
Signed-off-by: Tomas Cohen Arazi <tomascohen@gmail.com>
Actually, in default UNIMARC install, 461$9 is indexed as Host-Item-Number, meaning it is used for analytical itemnumber.
But most UNIMARC catalog use the analytical relation using unimarc_field_4XX.pl plugin on 461$a. In fact, this plugin is defined in default UNIMARC frameworks.
If Host-Item-Number is defined but 461$9 is used for something else, it will lead to odd bugs. For example, records containing analytical items can not be deleted.
This patch comments the 461$9 indexing in UNIMARC zebra config.
Test plan :
- Create a fresh UNIMARC install
- Create a record with 461$9 containing a value
- Index the record
- Perform a search on Host-Item-Number : ccl=Host-Item-Number,alwaysmatches=''
=> Without the patch you get a result
=> With the patch you get no result
Signed-off-by: Chris Cormack <chris@bigballofwax.co.nz>
Signed-off-by: Katrin Fischer <Katrin.Fischer.83@web.de>
Code is clean, commenting out all the indexing of 461$9.
Trusting the author that this is the correct thing to do :)
Signed-off-by: Tomas Cohen Arazi <tomascohen@gmail.com>
Test plan :
- Create a fresh install UNIMARC flavor and GRS1 indexing for biblios
- Re-indexe database
- Perform a search with index "itemtype" (and then "itype") on an
existing value of 995$r. For example : itemtype:BOOK
=> Check you get results
Signed-off-by: Mark Tompsett <mtompset@hotmail.com>
Signed-off-by: Katrin Fischer <Katrin.Fischer.83@web.de>
Signed-off-by: Galen Charlton <gmc@esilibrary.com>
This patch makes the following changes to UNIMARC biblio indexing :
A. Changes to UNIMARC conf files
1. add comments to biblio-koha-indexdefs.xml
2. make biblio-koha-indexdefs.xml more compact by grouping some
declarations
Ex : 200$f and 200$g => one declaration for 200$fg
3. suppress unneeded declarations (indexing of some 4XX fields and 6XX
fields not in unimarc format)
4. unindex some (sub)fields unneeded by most users (318, 207,230,210a,
215, 4XXd)
5. change the way 308 field is indexed (no visible changes)
6. replace Title-host with Host-item -- see bug 11119
7. index 208 in Material-Type -- see bug 11119
8. index 100 pos 8-9 and 9-12 in pubdate:y and pubdate:n
9. index 100 pos 8-9 in pubdate:s instead of 210$d
10. Index all subfields of note 334 and 327 in note index
11. Index 304 and 327 in title index as well as note index
327 can contain a list of titles included in a work
304 can contain the title of the original work in case of a
translation
12. Index 314 in author index as well as note index
314 can contain authors not mentionned in 200$f/g (the 4th, 5th etc.
author)
13. Index 328 note in Dissertation-information as well as note
14. Index 328$t in Title
B. Changes to ccl.properties :
1. add a new index Dissertation-information (1056)
2. fix EAN, pubdate and acqdate (they were not linked with bib1 attributes)
C. Changes to Search.pm
1. add Dissertation-information and suppress Title-host and UPC
D. Changes to QP config file queryparser.yaml
1. add Dissertation-information
2 fix EAN, pubdate and acqdate
Test plan :
If you cannot test in GRS1, test only in DOM, as GRS will be deprecated.
1. Apply the patch in a UNIMARC Koha running with DOM and ICU
2. copy src/etc/searchengine/queryparser.yaml into the main config
directory of QP
3. copy src/etc/zebradb/ccl.properties into the main config directory
of Zebra
4. copy src/etc/zebradb/marc_defs/unimarc/biblio/* into the main config
directory of Zebra
5. reindex biblios (rebuild_zebra.pl -r -b -x -v)
6. test note index : make some searches on 334$b or 327$b
7. test author index : make some searches on 314 field
8. test title index : make some searches on 304 and 327 field, make a
search on 328$t subfield
9. test dissertation-information index : make some searches on 328 field
10. In a record, put in the dates of 100 fields the values "1000" (1st
date) and "1001" (2d date) ; try to search a book written in year
1000, you should find the record ; idem for year 1001
11. make some searches and sort by date. It should work better as before,
especially if you have values like "c2009" or "impr. 2010" in 210
field
12. Regression test : make some searches on several indexes, like EAN,
etc. It should work as before
Test 10-12 with and without Queryparser activated.
Be careful: with Queryparser activated, the index names (title,
dissertation-information...) must be entered in lowercase only.
Of course, to test search and sort by dates, you need to have full
records, with dates in 100 field as well as 210 field.
Signed-off-by: Paola Rossi <paola.rossi@cineca.it>
Signed-off-by: Jonathan Druart <jonathan.druart@biblibre.com>
Signed-off-by: Galen Charlton <gmc@esilibrary.com>
It could be useful to index the original language of a document (i.e.
"fre" for the English translation of a French novel).
This patch renames the Bib-1 use attribute 1095 from
Code-language-original to language-original and uses it to index:
- MARC21 041$h subfield
- UNIMARC 101$c subfield
It adds "language-original" in the list of index in Search.pm.
Test plan :
A. in a MARC21 GRS1 environment
1. Copy Zebra config files (zebradb/biblios/etc/bib1.att,
zebradb/ccl.properties, marc_defs/marc21/biblios/record.abs) from
your source etc/ directory to your main koha etc/ directory
2. Reindex zebra
3. Make some searches, like "language-original:fre"
B. in a MARC21 DOM environment
4. Copy Zebra config files (zebradb/biblios/etc/bib1.att, zebradb/ccl.properties,
marc_defs/marc21/biblios/biblio-zebra-indexdefs.xsl) from your source etc/
directory to your main koha etc/ directory
5. Reindex zebra
6. Make some searches, like "language-original:fre"
C. in a UNIMARC GRS1 environment
7. Copy Zebra config files (zebradb/biblios/etc/bib1.att,
zebradb/ccl.properties, marc_defs/unimarc/biblios/record.abs) from
your source etc/ directory to your main koha etc/ directory
8. Reindex zebra
9. Make some searches, like "language-original:fre"
A. in a UNIMARC DOM environment
10. Copy Zebra config files (zebradb/biblios/etc/bib1.att,
zebradb/ccl.properties, marc_defs/unimarc/biblios/biblio-zebra-indexdefs.xsl)
from your source etc/ directory to your main koha etc/ directory
11. Reindex zebra
12. Make some searches, like "language-original:fre"
Signed-off-by: Chris Cormack <chrisc@catalyst.net.nz>
Signed-off-by: Jonathan Druart <jonathan.druart@biblibre.com>
Signed-off-by: Galen Charlton <gmc@esilibrary.com>
With this combination of sysprefs, and a UNIMARC configuration, it was
impossible to search on location, barcode and ccode indexes :
QueryWeightFields is activated
QueryAutoTruncate only if * is added
But in UNIMARC, location, barcode and ccode (995 $e,$f,h) were indexed
only as "words". They need to be indexed also as "phrase".
Additionnaly, in UNIMARC, information about damaged and withdrawn status
of items is not indexed, while it is done in MARC21.
This patch
- add 2 new indexes for 995$1 (damaged) and 995$3 (withdrawn)
- index location, barcode and ccode as "phrase" as well as "words"
Indexing of items in UNIMARC could be improved later. So this patch also
add comments explaining the origin of Koha 995, I think it could be
useful for further changes.
To test, on a UNIMARC configuration :
A. indexed with GRS-1
1) Set sysprefs QueryWeightFields as "activated" and QueryAutoTruncate
as "only if * is added"
2) Select location index in advanced search and search for a value
existing in your records in 995$e => 0 results
3) Apply patch
4) Rebuild zebra
5) Select location index in advanced search and search for a value
existing in your records in 995$e => x results
6) Mark an item as withdrawn; search "withdrawn:1" => x results, and
among them the biblio to which the item is attached
7) Mark an item as damaged ; search "damaged:1" => x results, and among
them the biblio to which the item is attached
B. indexed with DOM
Do the same operations
Signed-off-by: Bernardo Gonzalez Kriegel <bgkriegel@gmail.com>
Work as described. No koha-qa errors
Test
Apply the patch
Begin with GRS-1
Full reindex
Search by location, no results
cp files biblio-*-indexdefs.xml and record.abs to destination on etc/zebra
Full reindex
Search by location, got results
Switch to DOM
reset files
Full reindex
Search by location, no results
cp files
Full reindex
Search by location, results !
Signed-off-by: Jonathan Druart <jonathan.druart@biblibre.com>
Signed-off-by: Galen Charlton <gmc@esilibrary.com>
I took as a base the patch of F. Demians, but made a lot of changes,
so I think it is more logical to create a new patch as the behavior is
not the same as previous patch.
I tried to define DOM config files as a "miror" of record.abs, so the
behavior be the same.
If it is OK, we will be able to improve indexing later, for example
suppressing warns, managing indicators or subdivisions, etc.
I made some little changes to record.abs :
- comments
- 216 was indexed in Conference-name as well as Trademark. I suppose
that "Conference-name" is an error, so I indexed only in Trademark
- index 2 new notes : 340 / 356
The only difference between record.abs and DOM is that DOM config files
does not index complete fields, but subfields.
Ex :
melm 200 ===> <kohaidx:index_subfields tag="200" subfields="abcdfgjxyz">
I took all the subfields from the UNIMARC Authorities manual. The only
subfields not indexed are numeric subfields : $7, $8 for language of
record, and $0,2,3,5,6 for 4XX/5XX/7XX
To test :
- index a set of bib and auth records with GRS-1
- make some searches on different kind of authorities
- index the same records with DOM
- make the same searches
- You are not supposed to see differences
Signed-off-by: Katrin Fischer <Katrin.Fischer.83@web.de>
As I am not a UNIMARC user it's hard for me to test this, but
while testing other authority related patches I noticed that I couldn't
index the UNIMARC authorities of the sample base. The files are obviously
missing and reindex_zebra.pl notes this. With this patch applied,
indexing works and authorities are searchable in my installation.
Signed-off-by: Vitor Fernandes <fvernandes@keep.pt>
Signed-off-by: Galen Charlton <gmc@esilibrary.com>
In UNIMARC DOM indexing, "item" index was working only for subfields
of 995 field mapped with specific indexes, and also in index (ex :
$a, $b...). It was not working for the other subfields (ex : $g),
because a comment from record.abs was integrated in DOM config files.
This patch removes the comment.
To test, in a DOM UNIMARC environment :
1) In a item, write some value "Test10037" in 995$g
2) Search for this value in simple search, this way : item=Test10037
=> you should have no results
3) Apply the patch. if necessary, copy the modified
etc/zebradb/marc_defs/unimarc/biblios/biblio-koha-indexdefs.xml and
etc/zebradb/marc_defs/unimarc/biblios/biblio-zebra-indexdefs.xsl into
the /etc/... directory in your main Koha directory
4) Reindex Zebra biblios
5) Do the same search as 2) => you should have one result
Signed-off-by: Bernardo Gonzalez Kriegel <bgkriegel@gmail.com>
Work as described. No koha-qa errors.
Test
NOTE: default UNIMARC framework don't have 995g,
so I must add it first.
1) Added test string to 995b on some record
2) Reindex and search as indicated, no results
3) cp files to destination
4) reindex
5) search and result ok !
Signed-off-by: Jonathan Druart <jonathan.druart@biblibre.com>
Signed-off-by: Galen Charlton <gmc@esilibrary.com>
This patch fixes biblio-zebra-indexdefs.xsl files.
It was generated from biblio-koha-indexdefs.xsm with the new
koha-indexdefs-to-zebra.xsl amended by F. Démians's patch.
To test :
- Take a DOM UNIMARC Koha
- Apply all the patchs of 8252 bug, including this one
- Copy src/etc/zebradb/marc_defs/unimarc/biblios/biblio-zebra-indexdefs.xsl
to your etc/zebradb/marc_defs/unimarc/biblios/ located in your
installation directory
- Run rebuid_zebra -b -x -r -v
- make advanced searches on staff interface and opac, on coded fields
indexes (Audience, Literary genre, Biography, Illustration, Content,
Video Types, Serial Type, Periodicity, Regularity, Picture)
Signed-off-by: Frédéric Demians <f.demians@tamil.fr>
Ok for me. This patch put in sync indexes XSL definition with
authoritative XML definition. Subsequently, it won't be difficult to
amend DOM UNIMARC indexes defintion if necessary. And, as it is, I don't
see any regression, whereas I can see huge improvements. Thanks Mathieu!
Signed-off-by: Jonathan Druart <jonathan.druart@biblibre.com>
Signed-off-by: Galen Charlton <gmc@esilibrary.com>
This followup restores the original wording of "Date/time-last-modified"
index, and change the name of "Music-number" index to
"Number-music-publisher"
To test :
1. In a UNIMARC Koha instance
2. Apply patchs #1, #2 and this followup
3. Copy from src/etc/zebradb directory to the etc/zebradb/ in your main
Koha directory the following files:
-- zebradb/biblios/etc/bib1.att
-- zebradb/ccl.properties
-- zebradb/marc_defs/unimarc/biblios/record.abs
-- zebradb/marc_defs/unimarc/biblios/biblio-koha-indexdefs.xml
-- zebradb/marc_defs/unimarc/biblios/biblio-zebra-indexdefs.xsl
4. Rebuild zebra with -b -x -v -r options
5. Write a value like "test071a" in 071$a field in a record
6. Check if you can find this record with this search:
"ccl=Number-music-publisher:test071a"
Signed-off-by: Bernardo Gonzalez Kriegel <bgkriegel@gmail.com>
No koha-qa errors.
Test
Copy files
reindex full
Modify a couple of record to add 071a with test message
Reindex -v -z -b -x
Search test message as described and found modified records.
Signed-off-by: Jonathan Druart <jonathan.druart@biblibre.com>
Signed-off-by: Galen Charlton <gmc@esilibrary.com>
This patch makes the same changes in UNIMARC DOM configuration as patch
1 made for GRS-1.
positions of subfields are indexed that way :
In biblio-koha-indexdefs.xml :
tag="100" subfields="a" offset="17" length="1"
In biblio-zebra-indexdefs.xsl :
xslo:value-of select="substring(., 17, 1)"
I had to edit biblio-zebra-indexdefs.xsl by hand, because
etc/zebdradb/xml/koha-indexdefs-to-zebra.xsl does only support
"subtring" in handle-one-index-control-field template.
It is good for MARC21, but not for UNIMARC : in MARC21, indexing
subtrings is needed for controled field (001-009, with no subfields)
But in UNIMARC it is needed for subfields of 1XX fields.
So if DOM indexing is working with these new files, we may need to
change koha-indexdefs-to-zebra.xsl.
Test plan (not possible in a sandbox) :
1) In a Koha instance using UNIMARC and DOM indexing
2) Apply Patch 1 and Patch 2 (this one)
3) Copy the following files from the etc/zebradb directory of your
source into the etc/zebradb directory of your main Koha directory :
-- etc/zebradb/marc_defs/unimarc/biblios/biblio-koha-indexdefs.xml
-- etc/zebradb/marc_defs/unimarc/biblios/biblio-zebra-indexdefs.xsl
-- etc/zebradb/ccl.properties
-- etc/zebradb/biblios/etc/bib1.att
4) rebuild zebra with -x -b -r -v options
5) check if coded filters in advanced search are usable in OPAC and
Staff interface
Signed-off-by: Bernardo Gonzalez Kriegel <bgkriegel@gmail.com>
Works. No koha-qa errors.
Test for DOM
Apply patches
Don't forget to copy files
reindex
Search by coded fields works, also Country-publication
Signed-off-by: Jonathan Druart <jonathan.druart@biblibre.com>
Signed-off-by: Galen Charlton <gmc@esilibrary.com>
Before fixing UNIMARC DOM indexing, we must fix GRS-1 indexing
1) In advanced search, some Coded fields index are not working: Print,
Illustration, Content
2) Country-heading index is not working
3) Some subfields are indexed in wrong indexes :
102$a should be in Country-publication instead of Country-heading
(non defined in bib1.att)
106$a, filled only for printed works, should be in ff88-23 (form of
item) instead of itype. (ff88-23 is made for Marc21 008 pos
23, which contains the same data as 106a)
200$b should be in Material-type instead of (or in addition to) itype
and itemtype: (Material-type :"free-form string, ... that
describes the material type of the item, e.g., cassette, kit,
computer database, computer file.")
100$a pos 22-24 should not be indexed as "ln" : it is the language of
the record, not the language of the ressource
4) Index names are too long : if we index new positions of coded fields,
with existing names it breaks Zebra indexing (there must be a limit
in line lenghth in record.abs?)
5) There are a lot of warns when rebuiding zebra.
This patch make some changes in bib1.att (could be used later to improve
search) :
- fixing wording for att 51 and 1012
- adding comments for attributes based on MARC21 008 field (8800-8841)
- creating 8806 (tpubdate), 8838 (Modified-code), 8818 (ff8-18), 8840
(ff8-18-21), 8819 (ff8-19), 8821 (ff8-21), 8828 (ff8-28), 8830
(ff8-30), 8831 (ff8-31)
- creating attributes specific to UNIMARC : 9701-9707 (Video-mt,
Graphics-type, Graphics-support, Title-page-availability,
Cumulative-index-availability, script-Title, char-encoding)
- setting apart 3 blocks of attributes, so it could be easy to make
further changes :
-- common to Marc21 and UNIMARC : 8806, 8822, 8838
-- slightly different in Marc21 and UNIMARC (different meanings
according to the type of the record => don't match a single
UNIMARC field)
-- specific to UNIMARC : 9701-9707
In ccl.properties :
- creating a new index: Country-publication 1=1053
- suppressing some warns by mapping with bib1 att:
Date-time-last-modified, Name, rtype, Music-number
- defining indexes using the 3 blocks attributes defined in bib1
(common to Marc21 and UNIMARC, slightly different, specific to UNIMARC)
In record.abs :
- renaming some index for 100-105-110 fields
- correcting indexing of 102$a (country of publication)
106$a (ff88-23)
100$a pos 22-24 (language of record, no more
indexed)
105$a pos. 0-3 (illustration code)
200$b (for the moment, I keep it indexed in
itype and itemtype, but also Material-Type)
In C4/Search.pm :
- adding "Country-publication" index
In OPAC and staff interface template subtypes_unimarc.in :
- renaming indexes to take into account the changes made to Zebra
config files
To test (this cannot be done with a sandbox) :
1) Apply the patch in a UNIMARC GRS-1 Koha instance
2) Copy the following files from the etc/zebradb of your source
directory into the etc/zebradb of your main Koha directory:
-- etc/zebradb/biblios/etc/bib1.att
-- etc/zebradb/ccl.properties
-- etc/zebradb/marc_defs/unimarc/biblios/record.abs
3) Reindex your data (rebuild_zebra -x -b -r -v)
4) Try to use those Coded fields indexes in Advanced search, in OPAC
and Staff interface (available after clicking on "More options",
then on "Coded information filters"):
Audience, Print, Literary genre, Biography, Illustration, Content,
Video Types, Serials, Serial Type, Periodicity, Regularity
5) Try to search "Country-publication=FR" in simple search
Signed-off-by: Bernardo Gonzalez Kriegel <bgkriegel@gmail.com>
No koha-qa errors.
Tests for GRS-1
Followed test plan
Search by coded fields works, but only on OPAC,
on staff there are few options
Search by Country-publication works after patch
Signed-off-by: Jonathan Druart <jonathan.druart@biblibre.com>
Signed-off-by: Galen Charlton <gmc@esilibrary.com>
See the bug for a description of the problem.
This patch tries to restore searching for marcflavour != MARC21 as well as
allowing instances with different marcflavors to co-exist on the same server.
To test:
- Do a package install with e.g. the official squeeze-dev packages and create at
least two instances, with different marcflavours, e.g.:
sudo koha-create --create-db --marcflavor marc21 test1
sudo koha-create --create-db --marcflavor normarc test2
- Run through the web installers for both instances and add a couple of
records to each. Wait for the records to be indexed or run indexing manually
with
sudo koha-rebuild-zebra -f test1
sudo koha-rebuild-zebra -f test2
- Try searching for the records you added. It should work in test1 but not in
test2.
- Apply the patch and build packages with the build-git-snapshot script
- Install the new koha-common package
- Create two instances (because of Bug 9754 it is probably best to give the
instances different names than the ones you created above, or to do this on
a fresh VM or similar) and add records, as described above. Searching should
now work equally well for both instances.
Please note: Because of Bug 9752 you will have to set marcflavour = NORMARC
by hand before you do the searching, if you choose NORMARC as the marc flavour
on one of the instances you create.
Please note too: I am not confident that this is the perfect solution, so
merciless and thorough testing is necessary! ;-)
Signed-off-by: Mirko Tietgen <mirko@abunchofthings.net>
Works for me for GRS-1 (package installation out of the box). Could not figure out how to set up DOM indexing and eventually stopped caring about it.
Signed-off-by: Katrin Fischer <Katrin.Fischer.83@web.de>
Build packages with the patch and checked that creating
instances and search within them works for both MARC21 and NORMARC.
All tests and QA script pass.
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
This patch makes the following changes to record.abs, biblio-koha-indexdefs.xml and biblio-zebra-indexdefs.xsl :
- adding new (sub)fields to Identifier-standard index : 011f/g ; 012a ; 013a/z ; 014a/z ; 015a/z ; 016a/z ; 017a/z, 040a/z, 071z, 072z, 073z
- adding 1 new subfield to Publisher index : 071b (may contain the name of a music publisher)
- adding new (sub)fields to Author and Identifier-standard index (for the $9) : 716, 72X, 730 - adding new (sub)fields to Note : 334$a (award note)
- correcting 207 and 208
- suppressing 308a and 328a in Note (useless as complete fields are indexed in same index)
- adding (sub)fields to Title index : 411t, 421-425t, 433-437t, 442-444t, 446-456t, 462-463t, 470-488t, 560
- adding (sub)fields to Subject and Identifier-standard index (for the $9) : 608, 615, 616, 617, 620, 621
- adding some classifications index : 670, 675, 686 - adding some comments (to make easier further modifications and to identify non unimarc fields : 414-420, 603, 630-636, 646)
To test :
- take a record and fill some of the missing fields (e.g 488t, 608, 720, 012a) with some data as "field488", "field608" etc
- try to find the record => not possible
- apply the patch, copy the new record.abs in etc/zebradb/biblios/etc and rebuild zebra
- try to find the record => should be ok
- check nothing else is broken...
- same test with DOM indexing activated
http://bugs.koha-community.org/show_bug.cgi?id=8984
Signed-off-by: Zeno Tajoli <tajoli@cilea.it>
Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
Signed-off-by: Marcel de Rooy <m.de.rooy@rijksmuseum.nl>
Tested with Zebra, marc21, grs1.
Discovered that paging through auth search results does no longer work, but that is not related to these changes.
Signed-off-by: Katrin Fischer <Katrin.Fischer.83@web.de>
Tested with Zebra, marc21, dom.
All tests pass.
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
Use a user-specified field for z:id.
This patch also fixes an excess space before the index in the MARC21
biblio index definitions, which someone fixed in the generated file
but not in the source file it should have been fixed in.
Signed-off-by: Chris Cormack <chris@bigballofwax.co.nz>
Signed-off-by: Elliott Davis <elliott@bywatersolutions.com>
The superfluous whitespace after the definition of subject
tag $9s is causing an error when carried over into dom config
files so that the authority links fail to index
Also removed the (harmless) trailing space in the equivalent
Unimarc files
A good editor and git can help in not creating excess whitespace
Signed-off-by: Chris Cormack <chris@bigballofwax.co.nz>
Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
I tested two UNIMARC Koha installations using the sample UNIMARC
data from the BibLibre sandbox, comparing the results with DOM
and with GRS-1 indexing. The results are very similar, though there
are some differences. Most noticeable:
* relevance and facets seem to be more accurate with DOM enabled
* the GRS-1 configuration returns approximately 10% more results with
random single keywords like "petit," but the DOM results contain
the most relevant items, and any lacks in the configuration can
easily be corrected as UNIMARC users identify fields that should be
indexed but aren't
* authority-controlled searches match exactly
* author and topic facets do not work with the out-of-the-box GRS-1
indexing configuration (?!?)
(adding second sign-off line below because all that probably looks like
a commit message and not a sign off)
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
Add the option of sorting authority search results by authid, and instruct the
FirstMatch and LastMatch linkers to use that sort order rather than the default
search order.
To test:
1. Install new Zebra authorities config
etc/zebradb/marc_defs/marc21/authorities/authority-koha-indexdefs.xml,
etc/zebradb/marc_defs/marc21/authorities/authority-zebra-indexdefs.xsl,
etc/zebradb/marc_defs/marc21/authorities/record.abs, and
etc/zebradb/marc_defs/unimarc/authorities/record.abs
2. Reindex authorities in Zebra
3. Set LinkerModule to FirstMatch or LastMatch
4. Add two identical authority records, and a bib record with a heading that
matches them
5. Run misc/link_bibs_to_authorities.pl on that record
6. Confirm that the authid that's been inserted into subfield $9 of that
heading is the first, if you selected FirstMatch, or last if you selected
LastMatch
Signed-off-by: Katrin Fischer <Katrin.Fischer.83@web.de>
I followed the test plan and checked that for "Last match" and "First match"
the correct authority was selected and linked to the record.
Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
Add the Match-heading and Match-heading-see-from indexes to the UNIMARC Zebra
configuration.
Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
Tested with an UNIMARC setup that things work fine. They do
Squashed patch incorporating all previous patches (there is no functional
change compared to the previous version of this patch, this patch merely
squashes the original patch and follow-up, and rebases on latest master).
=== TL;DR VERSION ===
*** Installation ***
1. Run installer/data/mysql/atomicupdate/bug_7284_authority_linking_pt1
and installer/data/mysql/atomicupdate/bug_7284_authority_linking_pt2
2. Make sure you copy the following files from kohaclone to koha-dev:
etc/zeradb/authorities/etc/bib1.att,
etc/zebradb/marc_defs/marc21/authorities/authority-koha-indexdefs.xml,
etc/zebradb/marc_defs/marc21/authorities/authority-zebra-indexdefs.xsl,
etc/zebradb/marc_defs/marc21/authorities/koha-indexdefs-to-zebra.xsl, and
etc/zebradb/marc_defs/unimarc/authorities/record.abs
3. Run misc/migration_tools/rebuild_zebra.pl -a -r
*** New sysprefs ***
* AutoCreateAuthorities
* CatalogModuleRelink
* LinkerModule
* LinkerOptions
* LinkerRelink
* LinkerKeepStale
*** Important notes ***
You must have rebuild_zebra processing the zebraqueue for bibs when testing this
patch.
=== DESCRIPTION ===
*** Cataloging module ***
* Added an additional box to the authority finder plugin for "Heading match,"
which consults not just the main entry but also See-from and See-also-from
headings.
* With this patch, the automatic authority linking will actually work properly
in the cataloging module. As Owen pointed out while testing the patch,
though, longtime users of Koha will not be expecting that. In keeping with
the principles of least surprise and maximum configurability, a new syspref,
CatalogModuleRelink makes it possible to disable authority relinking in the
cataloging module only (i.e. leaving it enabled for future runs of
link_bibs_to_authorities.pl). Note that though the default behavior matches
the current behavior of Koha, it does not match the intended behavior.
Libraries that want the intended behavior rather than the current behavior
will need to adjust the CatalogModuleRelink syspref.
*** misc/link_bibs_to_authorities.pl ***
Added the following options to the misc/link_bibs_to_authorities.pl script:
--auth-limit Only process those headings that match the authorities
matching the user-specified WHERE clause.
--bib-limit Only process those bib records that match the
user-specified WHERE clause.
--commit Commit the results to the database after every N records
are processed.
--link-report Display a report of all the headings that were processed.
Converted misc/link_bibs_to_authorities.pl to use POD.
Added a detailed report of headings that linked, did not link, and linked
in a "fuzzy" fashion (the exact semantics of fuzzy are up to the individual
linker modules) during the run.
*** C4::Linker ***
Implemented new C4::Linker functionality to make it possible to easily add
custom authority linker algorithms. Currently available linker options are:
* Default: retains the current behavior of only creating links when there is
an exact match to one and only one authority record; if the 'broader_headings'
option is enabled, it will try to link to headings to authority records for
broader headings by removing subfields from the end of the heading (NOTE:
test the results before enabling broader_headings in a production system
because its usefulness is very much dependent on individual sites' authority
files)
* First Match: based on Default, creates a link to the *first* authority
record that matches a given heading, even if there is more than one
authority record that matches
* Last Match: based on Default, creates a link to the *last* authority
record that matches a given heading, even if there is more than one record
that matches
The API for linker modules is very simple. All modules should implement the
following two functions:
<get_link ($field)> - return the authid for the authority that should be
linked to the provided MARC::Field object, and a boolean to indicate whether
the match is "fuzzy" (the semantics of "fuzzy" are up to the individual plugin).
In order to handle authority limits, get_link should always end with:
return $self->SUPER::_handle_auth_limit($authid), $fuzzy;
<flip_heading ($field)> - return a MARC::Field object with the heading flipped
to the preferred form. At present this routine is not used, and can be a stub.
Made the linking functionality use the SearchAuthorities in C4::AuthoritiesMarc
rather than SimpleSearch in C4::Search. Once C4::Search has been refactored,
SearchAuthorities should be rewritten to simply call into C4::Search. However,
at this time C4::Search cannot handle authority searching. Also fixed numerous
performance issues in SearchAuthorities and the Linker script:
* Correctly destroy ZOOM recordsets in SearchAuthorities when finished. If left
undestroyed, efficiency appears to approach O(log n^n)
* Add an optional $skipmetadata flag to SearchAuthorities that can be used to
avoid additional calls into Zebra when all that is wanted are authority
records and not statistics about their use
*** New sysprefs ***
* AutoCreateAuthorities - When this and BiblioAddsAuthorities are both turned
on, automatically create authority records for headings that don't have
any authority link when cataloging. When BiblioAddsAuthorities is on and
AutoCreateAuthorities is turned off, do not automatically generate authority
records, but allow the user to enter headings that don't match an existing
authority. When BiblioAddsAuthorities is off, this has no effect.
* CatalogModuleRelink - when turned on, the automatic linker will relink
headings when a record is saved in the cataloging module when LinkerRelink
is turned on, even if the headings were manually linked to a different
authority by the cataloger. When turned off (the default), the automatic
linker will not relink any headings that have already been linked when a
record is saved.
* LinkerModule - Chooses which linker module to use for matching headings
(current options are as described above in the section on linker options:
"Default," "FirstMatch," and "LastMatch")
* LinkerOptions - A pipe-separated list of options to set for the authority
linker (at the moment, the only option available is "broader_headings," which
is described below)
* LinkerRelink - When turned on, the linker will confirm the links for headings
that have previously been linked to an authority record when it runs. When
turned off, any heading with an existing link will be ignored.
* LinkerKeepStale - When turned on, the linker will never *delete* a link to an
authority record, though, depending on the value of LinkerRelink, it may
change the link.
*** Other changes ***
* Cleaned up authorities code by removing unused functions and adding
unimplemented functions and added some unit tests.
* This patch also modifies the authority indexing to remove trailing punctuation
from Match indexes.
* Replace the old BiblioAddAuthorities subroutines with calls into the new
C4::Linker routines.
* Add a simple implementation for C4::Heading::UNIMARC. (With thanks to F.
Demians, 2011.01.09) Correct C4::Heading::UNIMARC class loading. Create
biblio tag to authority types data structure at initialization rather than
querying DB.
* Ran perltidy on all changed code.
*** Linker Options ***
Enter "broader_headings" in LinkerOptions. With this option, the linker will
try to match the following heading as follows:
=600 10$aCamins-Esakov, Jared$xCoin collections$vCatalogs$vEarly works to
1800.
First: Camins-Esakov, Jared--Coin collections--Catalogs--Early works to 1800
Next: Camins-Esakov, Jared--Coin collections--Catalogs
Next: Camins-Esakov, Jared--Coin collections
Next: Camins-Esakov, Jared (matches! if a previous attempt had matched, it
would not have tried this)
This is probably relevant only to MARC21 and LCSH, but could potentially be of
great use to libraries that make heavy use of floating subdivisions.
=== TESTING PLAN ===
Note: all of these tests require that you have some authority records,
preferably for headings that actually appear in your bibliographic data. At
least one authority record must contain a "see from" reference (remember which
one contains this, as you'll need it for some of the tests). The number shown
in the "Used in" column in the authority module is populated using Zebra
searches of the bibliographic database, so you *must* have
rebuild_zebra.pl -b -z [-x] running in cron, or manually run it after running
the linker.
*** Testing the Heading match in the cataloging plugin ***
1. Create a new record, and open the cataloging plugin for an
authority-controlled field.
2. Search for an authority by entering the "see from" term in the Heading Match
box
3. Confirm that the appropriate heading shows up
4. Search for an authority by entering the preferred heading into the Main
entry or Main entry ($a only) box (i.e., repeat the procedure you usually
use for cataloging, whatever that may be)
5. Confirm that the appropriate heading shows up
*** Testing the cataloging interface ***
6. Turn off BiblioAddsAuthorities
7. Confirm that you cannot enter text directly in an authority-controlled field
8. Confirm that if you search for a heading using the authority control plugin
the heading is inserted (note, however, that this patch does not AND IS NOT
INTENDED TO fix the bugs in the authority plugin with duplicate subfields;
those are wholly out of scope- this check is for regressions)
9. Turn on BiblioAddsAuthorities and AutoCreateAuthorities
10. Confirm that you can enter text directly into an authority-controlled field,
and if you enter a heading that doesn't currently have an authority record,
an authority record stub is automatically created, and the heading you
entered linked
11. Confirm that if you enter a heading with only a subfield $a that fully
*matches* an existing heading (i.e. the existing heading has only
subfield $a populated), the authid for that heading is inserted into
subfield $9
12. Confirm that if you enter a heading with multiple subfields that *matches*
an existing heading, the authid for that heading is inserted into
subfield $9
13. Turn on BiblioAddsAuthorities and turn off AutoCreateAuthorities
14. Confirm that you can enter text directly into an authority-controlled field,
and if you enter a heading that doesn't currently have an authority record,
an authority record stub is *not* created
15. Confirm that if you enter a heading with only a subfield $a that *matches*
an existing heading, the authid for that heading is inserted into
subfield $9
16. Confirm that if you enter a heading with multiple subfields that *matches*
an existing heading, the authid for that heading is inserted into
subfield $9
17. Create a record and link an authority record to an authorized field using
the authority plugin.
18. Save the record. Ensure that the heading is linked to the appropriate
authority.
19. Open the record. Change the heading manually to something else, leaving
the link. Save the record.
20. Ensure that the heading remains linked to that same authority.
21. Change CatalogModuleRelink to "on."
22. Open the record. Use the authority plugin to link that heading to the
same authority record you did earlier.
23. Save the record. Ensure that the heading is linked to the appropriate
authority.
24. Open the record. Change the heading manually to something else, leaving
the link. Save the record.
25. Ensure that the heading is no longer linked to the old authority record.
*** Testing link_bibs_to_authorities.pl ***
26. Set LinkerModule to "Default," turn on LinkerRelink and
BiblioAddsAuthorities, and turn AutoCreateAuthorities and
LinkerKeepStale off
27. Edit one bib record so that an authority controlled field that has already
been linked (i.e. has data in $9) has a heading that does not match any
authority record in your database
28. Run misc/link_bibs_to_authorities.pl --link-report --verbose --test (you may
want to pipe the output into less or a file, as the result is quite a lot of
information)
29. Look over the report to see if the headings that you have authority records
for report being matched, that the heading you modified in step 2 is
reported as "unlinked," and confirm that no changes were actually made to
the database (to check this, look at the bib record you edited earlier, and
check that the authid in the field you edited hasn't changed)
30. Run misc/link_bibs_to_authorities.pl --link-report --verbose (you may want
to pipe the output into less or a file, as the result is quite a lot of
information)
31. Check that the heading you modified has been unlinked
32. Change the modified heading back to whatever it was, but don't use the
authority control plugin to populate $9
33. Run misc/link_bibs_to_authorities.pl --link-report --verbose
--bib-limit="biblionumber=${BIB}" (replacing ${BIB} with the biblionumber
of the record you've been editing)
34. Confirm that the heading has been linked to the correct authority record
35. Turn LinkerKeepStale on
36. Change that heading to something else
37. Run misc/link_bibs_to_authorities.pl --link-report --verbose
--bib-limit="biblionumber=${BIB}" (replacing ${BIB} with the biblionumber
of the record you've been editing)
38. Confirm that the $9 has not changed
39. Turn LinkerKeepStale off
40. Create two authorities with the same heading
41. Run misc/migration_tools/rebuild_zebra.pl -a -z
42. Enter that heading into the bibliographic record you are working with
43. Run misc/link_bibs_to_authorities.pl --link-report --verbose
--bib-limit="biblionumber=${BIB}" (replacing ${BIB} with the biblionumber
of the record you've been editing)
44. Confirm that the heading has not been linked
45. Change LinkerModule to "FirstMatch"
46. Run misc/link_bibs_to_authorities.pl --link-report --verbose
--bib-limit="biblionumber=${BIB}" (replacing ${BIB} with the biblionumber
of the record you've been editing)
47. Confirm that the heading has been linked to the first authority record it
matches
48. Change LinkerModule to "LastMatch"
49. Run misc/link_bibs_to_authorities.pl --link-report --verbose
--bib-limit="biblionumber=${BIB}" (replacing ${BIB} with the biblionumber
of the record you've been editing)
50. Confirm that the heading has been linked to the second authority record it
matches
51. Run misc/link_bibs_to_authorities.pl --link-report --verbose
--auth-limit="authid=${AUTH}" (replacing ${AUTH} with an authid)
52. Confirm that only that heading is displayed in the report, and only those
bibs with that heading have been changed
If all those things worked, good news! You're ready to sign off on the patch
for bug 7284.
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
Rebased on latest master and squashed follow-up, 16 February 2012
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
Rebased on latest master, 21 February 2012
Signed-off-by: schuster <dschust1@gmail.com>
Display links to parent biblios, show linked items in holdings, allow holds on
linked items. This uses MARC to maintain relationships.
Sponsored by the Mississippi Department of Archives and History and RapidRadio
Solution. Originally developed by Savitra Sirohi and Amit Gupta at OSSLabs, with
UNIMARC support added by Zeno Tajoli. Commits squashed and merge conflicts
resolved by Chris Cormack from Catalyst. Respect for NORMARC and some small
framework portability fixes made by Jared Camins-Esakov of C & P Bibliography
Services.
IMPORTANT NOTE: A bug in the 773 coding for MARC21 was corrected from the
original OSS Labs code. The 773s generated by the pre-release code did not have
the first indicator set to '0', which means that they were not supposed to
display. Going forward, the first indicator will be set correctly, but existing
records created with this code will no longer appear (they appeared before only
due to another bug). To correct this, you could globally (or, to make sure you
only modify records created with the Analytics tool, for records with 773$0)
change the first indicator of the 773 from blank to '0'.
== Background ==
An analytic record for an item is a more detailed, monographic biblio for an
item attached to a serial record . This is often used for special issues of a
journal that are released as books on their own (assigned an ISBN, as well as an
ISSN/volume/issue). It is important for researchers to be able to search for
these items both as issues of the serial, and as monographs. It is equally
important for the library to not have duplicate item records for the item in
question to have to keep synchronized.
== Establishing relationships ==
Analytical records are connected to items belonging to parent or host
bibliographic records. This can be accomplished by:
* From an analytical bibliographic record linking to an host item by providing
the item barcode as input
* From a host item by using option "analyze", this creates a new empty
bibliographic record with field 773 (MARC21) populated
* Running a new CLI script that establishes a relationship between the
analytical record and the host item identified by the barcode in the
analytical record's 773$o (MARC21)
== Connecting Records ==
The relationships are maintained in the MARC records, we have not used database
tables at all.
== MARC Representation ==
In MARC21/NORMARC we have used:
* 773$9 to store the Koha item number of the host item
* 773$0 to store the Koha biblio number of the host bibliographic record
The above fields are used to display the relationships in various screens in the
OPAC and the staff interface. Additionally, when populating field 773 with host
item's details, we have used following MARC 21 mapping:
* 'a' <= 100/110/111 $a (author main)
* 'b' <= 250$a (edition)
* 'd' <= 260$a, 260$b, 260$c (place, publisher, year)
* 'o' <= barcode
* 't' <= 245$a (title)
* 'w' <= (003)001 --> if no 001 is available, we can populate biblionumber
* 'x' <= 022$a (issn)
* 'z' <= 020$a (isbn)
In UNIMARC, this code uses:
* 461$9 to store the Koha item number of the host item
* 461$0 to store the Koha biblio number of the host bibliographic record
When populating field 461 in UNIMARC, the following mapping is used:
* 't' <= 200$a (title)
== Treatment of Holds ==
A key requirement was to allow holds to be placed on host items from the
analytical record. We have accomplished this by allowing holds on specific
copies only. Biblio level holds are not allowed. This ensures that holds are
placed on specific items that are relevant to the analytical record.
== Deleting host items with linked analytical records ==
As we have not used database tables to maintain relationships, we had to use
search to find out if any linked analytical records are present. If 1 or more
analytical are present, we do not allow deletion of items. This is similar to
what we see when we try to delete authority records.
== Importing analytical records ==
Analytical records can be imported using bulkmarcimport or the GUI tools. The
new CLI script can be executed after the import to establish relationships with
host items. The script will establish relationships using the host item's
barcode, the barcode must be present in 773$o of the analytical record.
== What if there are two or more copies of the host item? ==
The current design will require that there be two host (773) fields, one for
each copy.
== What if there is no barcode available for the host item? ==
It is still possible to establish a relationship, by populating 773$9 with the
host's item number. However the CLI script uses barcode in 773$o to establish
relationships so it won't work where barcodes are unavailable. Also from an
analytical record, it is possible to establish a relationship to a host item by
providing the barcode as input, this option will not be available as well.
Commits that added the following features were squashed by Chris Cormack (this
is not a list of every commit):
* Display links to host records from biblio detail screens
* Support for UNIMARC, respecting the system preference 'marcflavor'
* Support holds from the OPAC
* Ability to link to items belong to host records from a analytical record
* Display items belonging to host records in the moredetail page
* Ability to edit items belonging to host records, also ability to delink from
them
* Move get host items code into a C4 routine, also calling the new routine in
related perl scripts
* Move host field population to a C4 routine, all changes in pl files to call
new routine
* Allow only specific copy holds for analytical records plus changes to use new
C4 routines
* Support for holds on items linked via host records
* Storing bibnumber and itemnumber in subfields 0 and 9, plus other mapping
changes
* New command line script that establishes relationships between analytical
records and host items and bibs. The script looks for host field (MARC21 773)
in records, and based on barcode in subfield 'o' populates host bibnumber in
subfield '0' and host itemnumber in subfield '9'. The script can be run after
an import of analytical records, it can also be run in the crontab to maintain
the relationships
* Ability to create analytical records from items, to view linked analytics, and
prevent deletion of items that have linked analytics
* New template for catalogue/detail.pl (NOTE: not a new template file, just a
new way of displaying analytics), template displays linked analytics and
allows creation of analytical records
* New zebra index for item number in host fields. This index will be used to
display links to analytical records from host records
* Display title of host record instead of the phrase host record
* Using detail.tmpl for analytics tab instead of a new template file
* Improved qualification info prepration in Prephostmarcfield
* Check for linked analytics before deleting item
* Display link to host record and more meaningful anchor text for edit item link
* Analytical record: Unimarc index in record.abs and help in
create_analytical_rel.pl
* Adding a sys pref that controls display of options to create analytical
relationships
* Add host entry in XSLT stylesheet in staff item detail
* Added host record support to OPAC detail XSLT
* Adding 773$0 and 773$9 to all frameworks
* Adding 773 subfields 0 and 9 to default marc framework via updatedatabase.pl
* Display create analytics and used in links in catalog detail
* Fixed problem where analytical records not showing in OPAC search results
because GetMarcBiblio now needs a flag to add item records
* Fixed problem where analytics count was set to 1 for all records, not just
those with analytics
* Fixed catalogue detail page not to show analytics counts if count is 0
Conflicts:
installer/data/mysql/updatedatabase.pl
koha-tmpl/intranet-tmpl/prog/en/modules/cataloguing/addbiblio.tt
kohaversion.pl
Co-author: Savitra Sirohi <savitra.sirohi@osslabs.biz>
Co-author: Zeno Tajoli <tajoli@cilea.it>
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
Signed-off-by: Ian Walls <ian.walls@bywatersolutions.com>
Signed-off-by: Chris Cormack <chrisc@catalyst.net.nz>