Commit graph

254 commits

Author SHA1 Message Date
Mathieu Saby
4d8b1ec786 Bug 7421: support indexing UNIMARC authority records using the DOM Filter
I took as a base the patch of F. Demians, but made a lot of changes,
so I think it is more logical to create a new patch as the behavior is
not the same as previous patch.

I tried to define DOM config files as a "miror" of record.abs, so the
behavior be the same.

If it is OK, we will be able to improve indexing later, for example
suppressing warns, managing indicators or subdivisions, etc.

I made some little changes to record.abs :
- comments
- 216 was indexed in Conference-name as well as Trademark. I suppose
  that "Conference-name" is an error, so I indexed only in Trademark
- index 2 new notes : 340 / 356

The only difference between record.abs and DOM is that DOM config files
does not index complete fields, but subfields.

Ex :
melm 200 ===> <kohaidx:index_subfields tag="200" subfields="abcdfgjxyz">
I took all the subfields from the UNIMARC Authorities manual. The only
subfields not indexed are numeric subfields : $7, $8 for language of
record, and $0,2,3,5,6 for 4XX/5XX/7XX

To test :
- index a set of bib and auth records with GRS-1
- make some searches on different kind of authorities
- index the same records with DOM
- make the same searches
- You are not supposed to see differences

Signed-off-by: Katrin Fischer <Katrin.Fischer.83@web.de>
As I am not a UNIMARC user it's hard for me to test this, but
while testing other authority related patches I noticed that I couldn't
index the UNIMARC authorities of the sample base. The files are obviously
missing and reindex_zebra.pl notes this. With this patch applied,
indexing works and authorities are searchable in my installation.

Signed-off-by: Vitor Fernandes <fvernandes@keep.pt>
Signed-off-by: Galen Charlton <gmc@esilibrary.com>
2013-10-10 21:03:15 +00:00
Mathieu Saby
5298140c67 Bug 10037: fix item index in UNIMARC DOM indexing
In UNIMARC DOM indexing, "item" index was working only for subfields
of 995 field mapped with specific indexes, and also in index (ex :
$a, $b...). It was not working for the other subfields (ex : $g),
because a comment from record.abs was integrated in DOM config files.

This patch removes the comment.

To test, in a DOM UNIMARC environment :
1) In a item, write some value "Test10037" in 995$g
2) Search for this value in simple search, this way : item=Test10037
   => you should have no results
3) Apply the patch. if necessary, copy the modified
   etc/zebradb/marc_defs/unimarc/biblios/biblio-koha-indexdefs.xml and
   etc/zebradb/marc_defs/unimarc/biblios/biblio-zebra-indexdefs.xsl into
   the /etc/... directory in your main Koha directory
4) Reindex Zebra biblios
5) Do the same search as 2) => you should have one result

Signed-off-by: Bernardo Gonzalez Kriegel <bgkriegel@gmail.com>
Work as described. No koha-qa errors.

Test

NOTE: default UNIMARC framework don't have 995g,
so I must add it first.

1) Added test string to 995b on some record
2) Reindex and search as indicated, no results
3) cp files to destination
4) reindex
5) search and result ok !

Signed-off-by: Jonathan Druart <jonathan.druart@biblibre.com>
Signed-off-by: Galen Charlton <gmc@esilibrary.com>
2013-10-10 18:54:12 +00:00
Galen Charlton
8ea3462517 Bug 8252: (follow-up) standardize name of Identifier-publisher-for-music index
To test:

[1] When running t/db_dependent/Search.t, veify that no warnings like
    this are shown:

15:52:07-10/10 zebraidx(2006) [warn] Index 'Number-music-publisher' not found in attset(s)

Signed-off-by: Galen Charlton <gmc@esilibrary.com>
2013-10-10 16:05:33 +00:00
Galen Charlton
45d0365d12 Bug 8620: (follow-up) apply to NORMARC and MARC21 authorities
This applies the fix for the Any index to NORMARC bib
and MARC21 authority DOM Zebra indexes.

Signed-off-by: Galen Charlton <gmc@esilibrary.com>
2013-10-10 15:56:13 +00:00
Galen Charlton
5024e519ad Bug 8252: (follow-up) tidy up long lines in bib1.att
Signed-off-by: Galen Charlton <gmc@esilibrary.com>
2013-10-10 15:56:13 +00:00
Mathieu Saby
475a9d19d1 Bug 8252: (follow-up) fix biblio-zebra-indexdefs.xsl
This patch fixes biblio-zebra-indexdefs.xsl files.
It was generated from biblio-koha-indexdefs.xsm with the new
koha-indexdefs-to-zebra.xsl amended by F. Démians's patch.

To test :
- Take a DOM UNIMARC Koha
- Apply all the patchs of 8252 bug, including this one
- Copy src/etc/zebradb/marc_defs/unimarc/biblios/biblio-zebra-indexdefs.xsl
  to your etc/zebradb/marc_defs/unimarc/biblios/ located in your
  installation directory
- Run rebuid_zebra -b -x -r -v
- make advanced searches on staff interface and opac, on coded fields
  indexes (Audience, Literary genre, Biography, Illustration, Content,
  Video Types, Serial Type, Periodicity, Regularity, Picture)

Signed-off-by: Frédéric Demians <f.demians@tamil.fr>

Ok for me. This patch put in sync indexes XSL definition with
authoritative XML definition. Subsequently, it won't be difficult to
amend DOM UNIMARC indexes defintion if necessary. And, as it is, I don't
see any regression, whereas I can see huge improvements. Thanks Mathieu!

Signed-off-by: Jonathan Druart <jonathan.druart@biblibre.com>
Signed-off-by: Galen Charlton <gmc@esilibrary.com>
2013-10-10 15:21:56 +00:00
f9addcc98b Bug 8852: DOM XSL now handles subfield substring extraction
This patch modify koha-indexdefs-to-zebra.xsl in order to add the
ability to populate indexes with subfield substring.

It's now possible to understand such construction as:

<index_subfields xmlns="http://..." tag="100" subfields="a" offset="7" length="1">
  <target_index>tpubdate:s</target_index>
</index_subfields>

Signed-off-by:Mathieu Saby <mathieu.saby@univ-rennes2.fr>

I applied the patch and ran
  xsltproc koha-indexdefs-to-zebra.xsl ../marc_defs/unimarc/biblios/biblio-koha-indexdefs.xml \
     > ../marc_defs/unimarc/biblios/biblio-zebra-indexdefs.xsl
I looked at the generated file. It looks nice.
Then I copied it file in my INSTALLDIR/etc/zebra.... and reindexed my
records with rebuild_zebra.pl
I made some searches on coded position index and non coded position
indexes, everything works.

http://bugs.koha-community.org/show_bug.cgi?id=8252

Signed-off-by: Jonathan Druart <jonathan.druart@biblibre.com>
Signed-off-by: Galen Charlton <gmc@esilibrary.com>
2013-10-10 15:19:20 +00:00
Mathieu Saby
43809d2835 Bug 8252: Followup for Date/time-last-modified and Music number
This followup restores the original wording of "Date/time-last-modified"
index, and change the name of "Music-number" index to
"Number-music-publisher"

To test :
1. In a UNIMARC Koha instance
2. Apply patchs #1, #2 and this followup
3. Copy from src/etc/zebradb directory to the etc/zebradb/ in your main
   Koha directory the following files:
-- zebradb/biblios/etc/bib1.att
-- zebradb/ccl.properties
-- zebradb/marc_defs/unimarc/biblios/record.abs
-- zebradb/marc_defs/unimarc/biblios/biblio-koha-indexdefs.xml
-- zebradb/marc_defs/unimarc/biblios/biblio-zebra-indexdefs.xsl
4. Rebuild zebra with -b -x -v -r options
5. Write a value like "test071a" in 071$a field in a record
6. Check if you can find this record with this search:
   "ccl=Number-music-publisher:test071a"

Signed-off-by: Bernardo Gonzalez Kriegel <bgkriegel@gmail.com>
No koha-qa errors.

Test
Copy files
reindex full
Modify a couple of record to add 071a with test message
Reindex -v -z -b -x
Search test message as described and found modified records.

Signed-off-by: Jonathan Druart <jonathan.druart@biblibre.com>
Signed-off-by: Galen Charlton <gmc@esilibrary.com>
2013-10-10 15:16:27 +00:00
Mathieu Saby
8034566027 Bug 8252: Fix indexing of UNIMARC 1xx for DOM
This patch makes the same changes in UNIMARC DOM configuration as patch
1 made for GRS-1.

positions of subfields are indexed that way :
In biblio-koha-indexdefs.xml :
tag="100" subfields="a" offset="17" length="1"
In biblio-zebra-indexdefs.xsl :
xslo:value-of select="substring(., 17, 1)"

I had to edit biblio-zebra-indexdefs.xsl by hand, because
etc/zebdradb/xml/koha-indexdefs-to-zebra.xsl does only support
"subtring" in handle-one-index-control-field template.

It is good for MARC21, but not for UNIMARC : in MARC21, indexing
subtrings is needed for controled field (001-009, with no subfields)
But in UNIMARC it is needed for subfields of 1XX fields.
So if DOM indexing is working with these new files, we may need to
change koha-indexdefs-to-zebra.xsl.

Test plan (not possible in a sandbox) :
1) In a Koha instance using UNIMARC and DOM indexing
2) Apply Patch 1 and Patch 2 (this one)
3) Copy the following files from the etc/zebradb directory of your
   source into the etc/zebradb directory of your main Koha directory :
-- etc/zebradb/marc_defs/unimarc/biblios/biblio-koha-indexdefs.xml
-- etc/zebradb/marc_defs/unimarc/biblios/biblio-zebra-indexdefs.xsl
-- etc/zebradb/ccl.properties
-- etc/zebradb/biblios/etc/bib1.att
4) rebuild zebra with -x -b -r -v options
5) check if coded filters in advanced search are usable in OPAC and
   Staff interface

Signed-off-by: Bernardo Gonzalez Kriegel <bgkriegel@gmail.com>
Works. No koha-qa errors.

Test for DOM
Apply patches
Don't forget to copy files
reindex
Search by coded fields works, also Country-publication

Signed-off-by: Jonathan Druart <jonathan.druart@biblibre.com>
Signed-off-by: Galen Charlton <gmc@esilibrary.com>
2013-10-10 15:15:04 +00:00
Mathieu Saby
041e3603a1 Bug 8252: Fix indexing of UNIMARC 1xx for GRS-1
Before fixing UNIMARC DOM indexing, we must fix GRS-1 indexing

1) In advanced search, some Coded fields index are not working: Print,
   Illustration, Content
2) Country-heading index is not working
3) Some subfields are indexed in wrong indexes :

  102$a should be in Country-publication instead of Country-heading
        (non defined in bib1.att)
  106$a, filled only for printed works, should be in ff88-23 (form of
         item) instead of itype.  (ff88-23 is made for Marc21 008 pos
         23, which contains the same data as 106a)
  200$b should be in Material-type instead of (or in addition to) itype
        and itemtype: (Material-type :"free-form string, ... that
        describes the material type of the item, e.g., cassette, kit,
        computer database, computer file.")
  100$a pos 22-24 should not be indexed as "ln" : it is the language of
        the record, not the language of the ressource

4) Index names are too long : if we index new positions of coded fields,
   with existing names it breaks Zebra indexing (there must be a limit
   in line lenghth in record.abs?)
5) There are a lot of warns when rebuiding zebra.

This patch make some changes in bib1.att (could be used later to improve
search) :

- fixing wording for att 51 and 1012
- adding comments for attributes based on MARC21 008 field (8800-8841)
- creating 8806 (tpubdate), 8838 (Modified-code), 8818 (ff8-18), 8840
  (ff8-18-21), 8819 (ff8-19), 8821 (ff8-21), 8828 (ff8-28), 8830
  (ff8-30), 8831 (ff8-31)
- creating attributes specific to UNIMARC : 9701-9707 (Video-mt,
  Graphics-type, Graphics-support, Title-page-availability,
  Cumulative-index-availability, script-Title, char-encoding)
- setting apart 3 blocks of attributes, so it could be easy to make
  further changes :
-- common to Marc21 and UNIMARC : 8806, 8822, 8838
-- slightly different in Marc21 and UNIMARC (different meanings
   according to the type of the record => don't match a single
   UNIMARC field)
-- specific to UNIMARC : 9701-9707

In ccl.properties :
- creating a new index: Country-publication 1=1053
- suppressing some warns by mapping with bib1 att:
  Date-time-last-modified, Name, rtype, Music-number
- defining indexes using the 3 blocks attributes defined in bib1
  (common to Marc21 and UNIMARC, slightly different, specific to UNIMARC)

In record.abs :
- renaming some index for 100-105-110 fields
- correcting indexing of 102$a (country of publication)
                         106$a (ff88-23)
                         100$a pos 22-24 (language of record, no more
                               indexed)
                         105$a pos. 0-3 (illustration code)
                         200$b (for the moment, I keep it indexed in
                               itype and itemtype, but also Material-Type)

In C4/Search.pm :
- adding "Country-publication" index

In OPAC and staff interface template subtypes_unimarc.in :
- renaming indexes to take into account the changes made to Zebra
  config files

To test (this cannot be done with a sandbox) :
1) Apply the patch in a UNIMARC GRS-1 Koha instance
2) Copy the following files from the etc/zebradb of your source
   directory into the etc/zebradb of your main Koha directory:
-- etc/zebradb/biblios/etc/bib1.att
-- etc/zebradb/ccl.properties
-- etc/zebradb/marc_defs/unimarc/biblios/record.abs
3) Reindex your data (rebuild_zebra -x -b -r -v)
4) Try to use those Coded fields indexes in Advanced search, in OPAC
   and Staff interface (available after clicking on "More options",
   then on "Coded information filters"):
   Audience, Print, Literary genre, Biography, Illustration, Content,
   Video Types, Serials, Serial Type, Periodicity, Regularity
5) Try to search "Country-publication=FR" in simple search

Signed-off-by: Bernardo Gonzalez Kriegel <bgkriegel@gmail.com>
No koha-qa errors.

Tests for GRS-1
Followed test plan
Search by coded fields works, but only on OPAC,
on staff there are few options
Search by Country-publication works after patch

Signed-off-by: Jonathan Druart <jonathan.druart@biblibre.com>
Signed-off-by: Galen Charlton <gmc@esilibrary.com>
2013-10-10 15:06:10 +00:00
e1de36fbb9 Bug 9252 - Add option to send patron's home branch in AF field
This patch gives you the option of sending a patrons home branch code
in an AF field for patron status requests. It is controlled at the account
login level, so it can be enable on a per-sip-login basis.

Test Plan:
1) Apply patch
2) Edit SIPconfig.xml, add the parameter 'send_patron_home_library_in_af="1"'
   to the login you will be using to test.
3) Start your SIP2 server.
4) Connect to it via telnet ( something like: '9300CNterm1|COterm1|CPCPL|' )
5) Send a patron status request ( like: '2300120121110    82925AOCPL|AA23529000035676|ACterm1|ADletmein' )
6) Examine reponse you should see something like this:
      "24              00120121210    085332AEHenry Acevedo|AA23529000035676|BLY|CQN|AFGreetings from Koha. |AFMPL|AO|"
   Note the second AF field with the value MPL.

Signed-off-by: George Williams <georgew@latahlibrary.org>
Signed-off-by: Christopher Brannon <cbrannon@cdalibrary.org>
Signed-off-by: Chris Cormack <chris@bigballofwax.co.nz>
Signed-off-by: Galen Charlton <gmc@esilibrary.com>
2013-09-08 06:07:38 +00:00
90e2aafeb5 Bug 9531: Make SIP2 message terminator configurable via SIPconfig.xml
Add a terminator option to SIPConfig.xml, choices for 'terminator' are
'CR' or 'CRLF'.  The default continues to be 'CRLF' if 'terminator' is
undefined.

Test Plan:
1) Apply patch
2) Start SIP server
3) Run C4/SIP/t/04patron_status.t
4) Stop SIP server
5) Add terminator="CR" for account login 'term1'
6) Run 04patron_status.t again, you should see no change

Signed-off-by: Frédéric Demians <f.demians@tamil.fr>
Signed-off-by: Adrien Saurat <adrien.saurat@biblibre.com>
Signed-off-by: Chris Cormack <chrisc@catalyst.net.nz>
Signed-off-by: Galen Charlton <gmc@esilibrary.com>
2013-09-08 05:44:09 +00:00
Galen Charlton
a9719215b4 Bug 10325: (follow-up) fix typos and whitespace in httpd.conf example
Signed-off-by: Galen Charlton <gmc@esilibrary.com>
2013-09-08 02:16:59 +00:00
47bd795b82 Bug 10325 - Allow system preferences to be overridable from koha-httpd.conf
For Koha installations with multiple OPAC URLs, It would be nice to be
able to override systeprefs from the http conf file. Case in point,
a library wants to have two separate opacs, one the is only viewable
from within the library that allows patrons to place holds, and a second
public one that does not. In this case, overriding the system preference
RequestOnOpac would accomplish this simply, and with no ill affects.

This feature would of course be should only be used to override
cosmetic effects on the system, and should not be used for system
preferences such as CircControl, but would be great for preferences
such as OpacStarRatings, opacuserjs, OpacHighlightedWords and many
others!

Test Plan:
1) Apply this patch
2) Disable the system pref OpacHighlightedWords
3) Do a seach in the OPAC, not the term is not highlighted
4) Edit your koha-http.conf file, add the line
     SetEnv OVERRIDE_SYSPREF_OpacHighlightedWords "1"
   to your koha-http.conf file's OPAC section.
   Also add the line
     SetEnv OVERRIDE_SYSPREF_NAMES "OpacHighlightedWords"
   to the Intranet section
5) Restart your web server, or just reload it's config
6) Do a seach, now your search term should be highlighted!
7) From the intranet preference editor, view the pref,
   You should see a warning the this preference has been overridden.

Signed-off-by: Marcel de Rooy <m.de.rooy@rijksmuseum.nl>
Signed-off-by: Galen Charlton <gmc@esilibrary.com>
2013-09-08 02:09:53 +00:00
0fe3d053ff Bug 8911: make sure history.txt gets installed where history.txt can see it
This patch makes Makefile.PL put the history.txt file in the right places
depending on the chosen setup layout, adds a reference to that place in
koha-conf.xml (and debian template version), and finally tweaks about.pl to
use it.

To test, apply the patch and verify that perl Makefile.PL runs fine, and
installing in
 - dev
 - single
 - standard
layouts works as expected. Then go to the about.pl page and see if Koha's
history shows there.

Then, build your packages and test on your newly created instances.

Signed-off-by: Chris Cormack <chris@bigballofwax.co.nz>
Signed-off-by: Katrin Fischer <Katrin.Fischer.83@web.de>
Signed-off-by: Galen Charlton <gmc@esilibrary.com>
2013-07-22 14:11:48 +00:00
82e1b22794 Bug 10431 - Redundant mappings removed
There were 5 redundant mappings left in the file. They are removed.
Its almost a QA followup as everything went fine anyway.

Sponsored-by: Universidad Nacional de Córdoba
Signed-off-by: Chris Cormack <chris@bigballofwax.co.nz>
Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>
Signed-off-by: Galen Charlton <gmc@esilibrary.com>
2013-07-05 06:56:44 -07:00
343c93ea43 Bug 10431 - Spanish Zebra character sorting file
This patch provides a definition file for Spanish (es) character
sorting in Zebra. It is based on the ideas from Hugo Agud <hagud@orex.es>
and Pablo Bianchi <pablo.bianchi@gmail.com>.

Makefile.PL is fixed to notice the existence of the 'es' language. The
docs for koha-create are touched too.

To test:
Tarball
=======
- Go through the install process, choosing 'es' for the Zebra's language step
- Koha should work as usual.
- Running this should show the lang definition is properly set.
$ grep -R "lang_defs/es" /etc/koha/*
(stuff like zebradb/zebra-biblios-dom.cfg:profilePath:...etc/koha/zebra/lang_defs/es... should show)
- This file should be present:
 /etc/koha/zebradb/lang_defs/es/sort-string-utf.chr

Packages
========
- Build your own package, it shouldn't break the packaging
- Try the new package, using koha-create to set an instance using --lang 'es'

Sponsored-by: Universidad Nacional de Córdoba
Signed-off-by: Galen Charlton <gmc@esilibrary.com>
Signed-off-by: Chris Cormack <chris@bigballofwax.co.nz>
Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>
Signed-off-by: Galen Charlton <gmc@esilibrary.com>
2013-07-05 06:55:49 -07:00
Magnus Enger
6e4851fa40 Bug 9804 - Fix name for NORMARC biblio-koha-indexdefs.xml
When i did bug 8805, I gave the biblio-koha-indexdefs.xml file the
wrong name, and called it biblio-zebra-indexdefs.xml. This patch
fixes that.

To reproduce:
- Check that etc/zebradb/marc_defs/normarc/biblios/biblio-zebra-
  indexdefs.xml exists

To test:
- Apply the patch and check that etc/zebradb/marc_defs/normarc/
  biblios/biblio-zebra-indexdefs.xml no longer exists, but that
  etc/zebradb/marc_defs/normarc/biblios/biblio-koha-indexdefs.xml
  does exist.

Signed-off-by: Chris Cormack <chrisc@catalyst.net.nz>
Signed-off-by: Jonathan Druart <jonathan.druart@biblibre.com>
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
2013-04-24 09:16:17 -04:00
83cf63f7a8 Bug 9812 - Forbid access to several files through the browser
This patch hides (-Indexes) and forbids (Deny from all) access to some stuff through a browser.
Specifically "xlst", "modules" and "includes" dirs and its contents.

This is just a quick fix we talked about at IRC. The proper solution would be to remove this from htdocs which will still be needed.

Signed-off-by: Chris Cormack <chrisc@catalyst.net.nz>
Signed-off-by: Jonathan Druart <jonathan.druart@biblibre.com>
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
I do not have an installation that uses Apache at this point, but the
changes look correct and this was signed off and QAed by Chris and
Jonathan, both of whom have Apache installations.
2013-04-07 13:43:05 -04:00
Fridolyn SOMERS
b9c2c58fe6 Bug 9683: Allow disable rewrite apache mod
In Apache config koha-httpd.conf, URL-rewriting is enabled and does not
allow to disable mod_rewrite. Also, first rewriting rule
"RewriteRule (.+) $1?%1%2 [N,R,NE]" is enabled by default. This rule
rewrites nearly every URL. I propose to comment it in sources so that is
must be intentionally enabled.

This patch sets rewriting options into a conditional tag.

Test plan :
Test OPAC and intranet with and without mod_rewrite activated.

Signed-off-by: Chris Cormack <chris@bigballofwax.co.nz>
Signed-off-by: Mason James <mtj@kohaaloha.com>
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
The new Apache configuration seems to work fine (or at least as "fine"
as Apache gets) both with and without mod_rewrite enabled.
2013-04-03 07:02:54 -04:00
673a034e5c Bug 9883 - Add missing parameter to koha-conf.xml causing Plugins.t to fail
Signed-off-by: Katrin Fischer <Katrin.Fischer.83@web.de>
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
2013-03-22 09:52:26 -04:00
Kyle M Hall
5eabc672fd Bug 7804 - Add Koha Plugin System
Adds support for custom plugins. At the moment the Plugins
feature supports two types of plugins, reports and tools.

Plugins are installed by uploading KPZ ( Koha Plugin Zip )
packages. A KPZ file is just a zip file containing the
perl files, template files, and any other files neccessary
to make the plugin work.

Test plan:
1) Apply patch
2) Run updatedatabase.pl
3) Create the directory /var/lib/koha/plugins
4) Add the lines
      <pluginsdir>/var/lib/koha/plugins</pluginsdir>
      <enable_plugins>1</enable_plugins>"
   to your koha-conf.xml file
5) Add the line
       Alias /plugin/ "/var/lib/koha/plugins/"
   to your koha-httpd.conf file
6) Restart your webserver
7) Access the plugins system from the "More" pulldown
8) Upload the example plugin file provided here
9) Try it out!

Signed-off-by: Bernardo Gonzalez Kriegel <bgkriegel@gmail.com>

Signed-off-by: Jonathan Druart <jonathan.druart@biblibre.com>
Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
2013-03-20 14:49:47 -04:00
Magnus Enger
3f7dd2730a Bug 9213 - Implement analytics for NORMARC XSLT
Problem:
Links between anaytics records were not being displayed for NORMARC setups.

What this patch does:
1. Add indexing for 773 subfield a, w and 9; both for GRS-1 and DOM indexing
   (The DOM indexing config was generated from the GRS-1 record.abs)
2. Add "analytics links" to NORMARC XSLT files, both for OPAC and intranet

To test:
- Make sure you have a NORMARC installation
- Set UseControlNumber = Use
- Create a parent record with LDR/07=c. Leave 001 empty.
- In the "Normal" view, do New > New child record and create another record. Do
  this twice (so you get a list of hits when you click on the "Show anaytics"
  links later on).

- Do the following steps both in the OPAC and the Intranet:
  - Search for the parent record in such a way that you can see the record in a
    *result list*
  - Check that the "Show analytics" link is displayed, and uses the title of the
    parent record for linking: ?q=Host-item:<Title of parent record>
  - Clik on the "Show analytics" link and check that you get a result list with
    the two child records you created earlier
  - Go back to the result list and click on the parent record, so you get the
    *detail view*
  - Check that the "Show analytics" link is displayed, and uses the title of the
    parent record for linking: ?q=Host-item:<Title of parent record>
  - Clik on the "Show analytics" link and check that you get a result list with
    the two child records you created earlier
  - Search for one or both of the child records in such a way that you can see
    the record(s) in a *result list*
  - Check that the "In: <Title of parent record>" link is displayed, and that it
    uses the biblionumber of the parent record for linking:
    ?q=Control-number:<biblionumber of parent record>
  - Click on the "In: <Title of parent record>" link, and check that the parent
    record is displayed
  - Go back to the result list and click on the child record, so you get the
    *detail view*
  - Check that the "In: <Title of parent record>" link is displayed, and that it
    uses the biblionumber of the parent record for linking:
    ?q=Control-number:<biblionumber of parent record>
  - Click on the "In: <Title of parent record>" link, and check that the parent
    record is displayed

- Now edit the parent record and put it's biblionumber in 001. Repeat the steps
  above, and check that everything still works, but that the links are different:
  - The "Show analytics" link on the parent record should look like this:
    ?q=rcn:<biblionumber of parent record>+and+(bib-level:a+or+bib-level:b)
  - The "In: <Title of parent record>" link on the child records should be the
    same as it was earlier

- Now set UseControlNumber = "Don't use" and repeat all of the steps above
  - All of the links should still be displayed and work, of course
  - The "In: <Title of parent record>" link on the child records should look
    like this: ?q=ti,phr:<Title of parent record>
  - The "Show analytics" link on the parent record should look like this:
    ?q=Host-item:<Title of parent record>

- Change LDR/07 to "s" and repeat all of the steps above
- Do all of this both for GRS-1 indexing and for DOM indexing...

Signed-off-by: Chris Cormack <chris@bigballofwax.co.nz>

Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
2013-03-20 14:40:47 -04:00
Magnus Enger
be69176982 Bug 9256 - Fix search for the packages
See the bug for a description of the problem.

This patch tries to restore searching for marcflavour != MARC21 as well as
allowing instances with different marcflavors to co-exist on the same server.

To test:
- Do a package install with e.g. the official squeeze-dev packages and create at
  least two instances, with different marcflavours, e.g.:
  sudo koha-create --create-db --marcflavor marc21 test1
  sudo koha-create --create-db --marcflavor normarc test2
- Run through the web installers for both instances and add a couple of
  records to each. Wait for the records to be indexed or run indexing manually
  with
  sudo koha-rebuild-zebra -f test1
  sudo koha-rebuild-zebra -f test2
- Try searching for the records you added. It should work in test1 but not in
  test2.
- Apply the patch and build packages with the build-git-snapshot script
- Install the new koha-common package
- Create two instances (because of Bug 9754 it is probably best to give the
  instances different names than the ones you created above, or to do this on
  a fresh VM or similar) and add records, as described above. Searching should
  now work equally well for both instances.

Please note: Because of Bug 9752 you will have to set marcflavour = NORMARC
by hand before you do the searching, if you choose NORMARC as the marc flavour
on one of the instances you create.

Please note too: I am not confident that this is the perfect solution, so
merciless and thorough testing is necessary! ;-)

Signed-off-by: Mirko Tietgen <mirko@abunchofthings.net>
Works for me for GRS-1 (package installation out of the box). Could not figure out how to set up DOM indexing and eventually stopped caring about it.

Signed-off-by: Katrin Fischer <Katrin.Fischer.83@web.de>
Build packages with the patch and checked that creating
instances and search within them works for both MARC21 and NORMARC.
All tests and QA script pass.
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
2013-03-19 19:34:12 -04:00
Jared Camins-Esakov
376c55dc4e Bug 9239 QA follow-up: remove stray debug code
Remove a line of debug code from EG, provide better error handling
when presented with weird data in the authority linker, and correct
queryparser configuration to reflect the correct configuration for
Zebra.

Signed-off-by: Katrin Fischer <Katrin.Fischer.83@web.de>
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
2013-03-16 21:32:34 -04:00
Jared Camins-Esakov
d88bd37f30 Bug 9239 QA follow-up: the last QA follow-up was missing a require
This patch also corrects the definition of the an= index, which was
missing exactness.

Signed-off-by: Katrin Fischer <Katrin.Fischer.83@web.de>
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
2013-03-16 21:32:34 -04:00
Jared Camins-Esakov
09de3b93be Bug 9239 follow-up: update koha-conf.xml
The previous patches were missing the koha-conf.xml updates. This patch
updates koha-conf.xml and makes the changes neccessary to include the
QueryParser configuration file in the packages.

To test:
1) Run Makefile and check generated koha-conf.xml to confirm that the
   line <queryparser_config>...</queryparser_config> is there with an
   absolute path.
2) That was it.

Signed-off-by: Chris Cormack <chris@bigballofwax.co.nz>
Signed-off-by: Katrin Fischer <Katrin.Fischer.83@web.de>
Tested update successfully - new line now shows up and
query parser is used.
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
2013-03-16 21:32:33 -04:00
Jared Camins-Esakov
2900bd14dc Bug 9239: Introduce QueryParser driver for PQF
Since the most expressive query language supported by Zebra is PQF, this
patch adds a PQF driver for QueryParser which will translate QueryParser
queries into standard PQF (guided by mappings which have been written to
match Koha's existing Zebra configuration) which can then be sent to
Zebra. This driver, Koha::QueryParser::Driver::PQF(::*) extends the
OpenILS::QueryParser(::*) class(es), so as to preserve maximum
interoperability between the various users of the QueryParser driver.

Initially, search syntax is as follows:
* AND operator: &&
* OR operator: ||
* GROUPING operators: ( )

Fields can mostly be searched using the ccl prefixes they have now. The
exception is the various date limits which are searched with a syntax
like this: pubdate(2008)

For sorting, you can simply add #title-sort-az (etc.) to your query.

Signed-off-by: Chris Cormack <chris@bigballofwax.co.nz>
Signed-off-by: Elliott Davis <elliott@bywatersolions.com>
Test Passed successfully after installing missing dep for Test::Deep

Signed-off-by: Katrin Fischer <Katrin.Fischer.83@web.de>
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
2013-03-16 21:32:32 -04:00
Jared Camins-Esakov
e56a0a0e62 Bug 8620: Any index in DOM mode sensitive to -x flag of rebuild_zebra.pl
The definition of the Any index was sensitive to whether
spaces were present between (say) subfield elements in the
MARCXML representation of the bib being indexed.  When using
the -x option to rebuild_zebra.pl, spaces would be present
because of how MARC::File::XML emits MARCXML.

When not using the -x option, spaces would not be present
and the contents of a field would be run together, potentially
as one big token.

The visible behavior was that doing a keyword search by
item barcode would sometimes not work.

To test:
0) Make sure Zebra is using DOM mode
1) Create an item record.
2) Reindex using rebuild_zebra.pl -b -z, *without* -x
3) Do a keyword search by the barcode of the item just
   added; the search shouldn't work
4) Apply patch.
5) Update the following two files:
    etc/zebradb/marc_defs/marc21/biblios/biblio-zebra-indexdefs.xsl
    etc/zebradb/xsl/koha-indexdefs-to-zebra.xsl
6) Reindex
7) Do a search that was previously failing.

Signed-off-by: Katrin Fischer <Katrin.Fischer.83@web.de>
Fixes the problem for me - formerly not working callnumbers
and barcodes are now found in keyword (any) searches.

Signed-off-by: Galen Charlton <gmc@esilibrary.com>
(revised commit description to better explain why it fixes the problem)

Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>
Passes all my tests, happy to sign off
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
2013-03-07 09:19:43 -05:00
eb4ebab07c Bug 9552 - BIB1 Relation "Greater Than" Attribute Not Mapped Properly in CCL.Properties
Currently, you can use "lt,le,eq,ge" in your CCL query to handle
"lesser than, lesser or equal to, equal to, greater than or equal
to" relationships.

The only one missing is "gt" (Bib1 2=5).

The mappings are also off "ne, phonetic, stem", but those are Bib1
attributes that Zebra doesn't support, so that's not really relevant.

To test:

[1] Before applying the patch, try the following query in the OPAC:

pubdate,gt:2006

You should get "no results found".

[2] After applying the patch (and note that ccl.properties will usually
need to be installed in the run-time Zebra configuration directory), try
the same search.  This time, you could get back the titles whose
publication date is after 2006.

Signed-off-by: Galen Charlton <gmc@esilibrary.com>
Signed-off-by: Jonathan Druart <jonathan.druart@biblibre.com>
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
2013-02-07 00:20:42 -05:00
Mathieu Saby
e86e3c24b8 Bug 8984: make Zebra more UNIMARC compliant
This patch makes the following changes to record.abs, biblio-koha-indexdefs.xml and biblio-zebra-indexdefs.xsl :
- adding new (sub)fields to Identifier-standard index : 011f/g ; 012a ; 013a/z ; 014a/z ; 015a/z ; 016a/z ; 017a/z, 040a/z, 071z, 072z, 073z
- adding 1 new subfield to Publisher index : 071b (may contain the name of a music publisher)
- adding new (sub)fields to Author and  Identifier-standard index (for the $9) : 716, 72X, 730 - adding new (sub)fields to Note : 334$a (award note)
- correcting 207 and 208
- suppressing 308a and 328a in Note (useless as complete fields are indexed in same index)
- adding (sub)fields to Title index : 411t, 421-425t, 433-437t, 442-444t, 446-456t, 462-463t, 470-488t, 560
- adding (sub)fields to Subject and  Identifier-standard index (for the $9) : 608, 615, 616, 617, 620, 621
- adding some classifications index : 670, 675, 686 - adding some comments (to make easier further modifications and to identify non unimarc fields : 414-420, 603, 630-636, 646)

To test :
- take a record and fill some of the missing fields (e.g 488t, 608, 720, 012a) with some data as "field488", "field608" etc
- try to find the record => not possible
- apply the patch, copy the new record.abs in etc/zebradb/biblios/etc and rebuild zebra
- try to find the record => should be ok
- check nothing else is broken...
- same test with DOM indexing activated

http://bugs.koha-community.org/show_bug.cgi?id=8984
Signed-off-by: Zeno Tajoli <tajoli@cilea.it>
Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
2013-01-04 08:39:56 -05:00
Fridolyn SOMERS
6e62f58015 Bug 9123: Authorities search ordered by authid does not work
Signed-off-by: Marcel de Rooy <m.de.rooy@rijksmuseum.nl>
Tested with Zebra, marc21, grs1.
Discovered that paging through auth search results does no longer work, but that is not related to these changes.

Signed-off-by: Katrin Fischer <Katrin.Fischer.83@web.de>
Tested with Zebra, marc21, dom.
All tests pass.
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
2012-12-02 16:28:12 -05:00
2624a45386 Bug 8750 - Chronological terms authorities not correctly indexed (trivial fix)
Patch re-done so it applies, had that double-utf8 problem

There was no entry in authority's record.abs for indexing chronological
terms. They couldn't be searched and (obviously) linked.

I've added those entries using the index names defined in
authorities/etc/bib1.att

Regards
To+

Sponsored-by: Universidad Nacional de Córdoba
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
Passed-QA-by: Paul Poulain <paul.poulain@biblibre.com>
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
2012-11-20 07:08:50 -05:00
Magnus Enger
9270d84c93 Bug 8805 - Add a biblio-zebra-indexdefs.xsl for NORMARC
This is required in order for Koha to support DOM indexing of the
NORMARC dialect, cf Bug "Bug 7818 - support DOM mode for Zebra
indexing of bibliographic records".

The two files in this patch were generated from the NORMARC
record.abs by doing the steps suggested at the bottom here:
http://wiki.koha-community.org/wiki/Switching_to_dom_indexing
No manual editing was involved.

To test:
- Do a fresh install, choosing NORMARC as the MARC dialect
- Run rebuild_zebra.pl and check it does not complain about missing
  files or other things
- Check that search works as expected. Using MARC21 records for
  the testing should be OK.

2012-10-31: New patch after an update to Bug 8665
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
Passed-QA-by: Paul Poulain <paul.poulain@biblibre.com>
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
2012-11-08 12:35:49 -05:00
Jared Camins-Esakov
510a2397fb Bug 8665 follow-up: add missing line to XSLT
The DOM transformer was missing a line from a previous development,
resulting in the MARC21 authorities DOM indexing stylesheet being
regenerated with a missing line. This patch readds the missing line
to the transformer, and provides the corrected authority-zebra-indexdefs.

Signed-off-by: Elliott Davis <elliott@bywatersolutions.com>
Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
2012-10-29 19:12:41 +01:00
Jared Camins-Esakov
7d9b4d58e3 Bug 8665: DOM indexing fails to index some bib records
Use a user-specified field for z:id.

This patch also fixes an excess space before the index in the MARC21
biblio index definitions, which someone fixed in the generated file
but not in the source file it should have been fixed in.

Signed-off-by: Chris Cormack <chris@bigballofwax.co.nz>
Signed-off-by: Elliott Davis <elliott@bywatersolutions.com>
2012-10-29 19:12:38 +01:00
084367f6cb Bug 3087 Fix Z39.50 server to return the correct record syntax
Modify Makefile.PL and Zebra configuration files in order to parametrized
biblio record type returned by Zebra Z39.50 server.

How to test:

- Test with a MARC21 and a UNIMARC DB
- Do a new installation
- Search from OPAC
- Search from a Z39.50 client like yaz-client: syntax = MARC21/UNIMARC must be
  choosed
- It was working for MARC21: it continues to work
- It wasn't working for UNIMARC: it works now, both in OPAC and from a Z39.50
  client

Signed-off-by: Marcel de Rooy <m.de.rooy@rijksmuseum.nl>
Works fine for MARC21. Frederic looked at UNIMARC. Magnus looked at NORMARC.
GRS1 works okay for me. I still have issues with DOM, but they are not directly related to changes in this patch.
A followup is still needed for packaging (debian/templates).

Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
2012-10-22 14:12:22 +02:00
Jared Camins-Esakov
91be607586 Bug 7475: Update configuration
In order to make matching rules more useful for MARC21 authorities,
this patch adds special indexes on previous see-from headings and
LCCN. This patch does not change UNIMARC authority configuration in
any way. Also modifies the Koha schema in preparation for adding
authority import and matching to the Staging tools.

To install:
1. Run installer/data/mysql/atomicupdate/importauthorities.pl
2. Update the following four files in your koha-dev:
    etc/zebradb/authorities/etc/bib1.att
    etc/zebradb/marc_defs/marc21/authorities/authority-koha-indexdefs.xml
    etc/zebradb/marc_defs/marc21/authorities/authority-zebra-indexdefs.xsl
    etc/zebradb/xsl/koha-indexdefs-to-zebra.xsl
3. Reindex your authorities:
    misc/migration_tools/rebuild_zebra.pl -a -r -v

NOTE TO RM: this patch adds an atomicupdate file that needs to be
incorporated into updatedatabase.pl if bug 7167 is not pushed.

http://bugs.koha-community.org/show_bug.cgi?id=2060

Signed-off-by: Elliott Davis <elliott@bywatersolutions.com>

Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
Rebased on master 1 August 2012
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
Rebased on master 11 September 2012
2012-09-19 17:15:25 +02:00
Colin Campbell
1e8423167e Bug 8653 remove erroneous whitespace blocking indexing
The superfluous whitespace after the definition of subject
tag $9s is causing an error when carried over into dom config
files so that the authority links fail to index

Also removed the (harmless) trailing space in the equivalent
Unimarc files

A good editor and git can help in not creating excess whitespace

Signed-off-by: Chris Cormack <chris@bigballofwax.co.nz>
Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
2012-09-14 17:20:34 +02:00
Jared Camins-Esakov
66cee2f590 Bug 8206 follow-up: Add Match index to MARC21 record.abs
Although the Match index was correctly configured for UNIMARC
authorities and MARC21 authorities indexed with DOM, the Match
index was inadvertantly removed from the record.abs file for
MARC21 authorities at some point. Since the Match index is required
to make best use of the new search options, this patch adds it
back in.

Signed-off-by: Marcel de Rooy <m.de.rooy@rijksmuseum.nl>
2012-09-07 15:16:45 +02:00
43fb058fcd Bug 8704 - Typo in etc/koha-conf.xml
Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>
Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
2012-09-05 15:27:12 +02:00
Jared Camins-Esakov
bbcb1d784b Bug 8268: Add database dump to export tool
This patch builds on work by Lars Wirzenius for the Koha packages.

To date, the only way for a Koha librarian to obtain a complete backup
of their system has been to log into the system via SSH (or FTP) to
download the mysqldump file. This patch makes it possible for
superlibrarians in properly configured systems to download night backups
via the staff client's Export tool.

Recognizing that this is functionality with potentially very grave
security implications, system administrators must manually enable these
features in the koha-conf.xml configuration file.

The following configuration settings have been added to the koha-conf.xml
file:
* backupdir => directory where backups should be stored.
* backup_db_via_tools => whether to allow superlibrarians to download
  database backups via the Export tool. The default is disabled, and
  there is no way -- by design -- to enable this option without manually
  editing koha-conf.xml.
* backup_conf_via_tools => whether to allow superlibrarians to download
  configuration backups via the Export tool (this may be applicable to
  packages only). The default is disabled, and there is no way -- by
  design -- to enable this option without manually editing koha-conf.xml.

This commit modifies the following scripts to make use of the new
backupdir configuration option:
* koha-dump and koha-run-backups in the Debian packages
* The sample backup script misc/cronjobs/backup.sh

Note that for security reasons, superlibrarians will not be allowed
to download files that are not owned by the web server's effective user.
This imposes a de facto dependency on ITK (for Apache) or running the
web server as the Koha user (as is done with Plack).

To test:
1. Apply patch.
2. Go to export page as a superlibrarian. Notice that no additional
   export options appear because they have not been enabled.
3. Add <backupdir>$KOHADEV/var/spool</backup> to the <config> section
   of your koha-conf.xml (note that you will need to adjust that so that
   it is pointing at a logical directory).
4. Create the aforementioned directory.
5. Go to export page as a superlibrarian. Notice that no additional
   export options appear because they have not been enabled.
6. Add <backup_db_via_tools>1</backup_db_via_tools> to the <config>
   section of your koha-conf.xml
7. Go to the export page as a superlibrarian. Notice the new tab.
8. Go to the export page as a non-superlibrarian. Notice there is no
   new tab.
9. Run: mysqldump -u koha -p koha | gzip > $BACKUPDIR/backup.sql.gz
   (substituting appropriate user, password, and database name)
10. Go to the export page as a superlibrarian, and look at the "Export
    database" tab. If you are running the web server as your Koha user,
    and ran the above command as your Koha user, you should now see the
    file listed as an option for download.
11. If you *did* see the file listed, change the ownership to something
    else: sudo chown root:root $BACKUPDIR/backup.sql.gz
11a. Confirm that you no longer see the file listed when you look at the
     "Export database" tab.
12. Change the ownership on the file to your web server (or Koha) user:
    sudo chown www-data:www-data backup.sql.gz
13. Go to the export page as a superlibrarian, and look at the "Export
    database" tab. You should now see backup.sql.gz listed.
14. Choose to download backup.sql.gz
15. Confirm that the downloaded file is what you were expecting.

If you are interested, you can repeat the above steps but replace
<backup_db_via_tools> with <backup_conf_via_tools>, and instead of
creating an sql file, create a tar file.

To test packaging: run koha-dump, confirm that it still creates a
usable backup.

------

This signoff contains two changes:

10-1. If no backup/conf files were present, then the message telling you
so doesn't appear and the download button does. Made them behave
correctly.
10-2. The test for a file existing required it to be owned by the
webserver UID. This change makes it so it only has to be readable.

Signed-off-by: Robin Sheat <robin@catalyst.net.nz>
2012-07-12 17:40:21 +02:00
Jonathan Druart
623f3a2c84 Bug 8233 : SearchEngine: Add a Koha::SearchEngine module
First draft introducing solr into Koha :-)

List of files :
  $ tree t/searchengine/
  t/searchengine
  |-- 000_conn
  |   `-- conn.t
  |-- 001_search
  |   `-- search_base.t
  |-- 002_index
  |   `-- index_base.t
  |-- 003_query
  |   `-- buildquery.t
  |-- 004_config
  |   `-- load_config.t
  `-- indexes.yaml
  just do `prove -r t/searchengine/**/*.t`

  t/lib
  |-- Mocks
  |   `-- Context.pm
  `-- Mocks.pm
  provide a mock to SearchEngine syspref (set_zebra and set_solr).

  $ tree Koha/SearchEngine
  Koha/SearchEngine
  |-- Config.pm
  |-- ConfigRole.pm
  |-- FacetsBuilder.pm
  |-- FacetsBuilderRole.pm
  |-- Index.pm
  |-- IndexRole.pm
  |-- QueryBuilder.pm
  |-- QueryBuilderRole.pm
  |-- Search.pm
  |-- SearchRole.pm
  |-- Solr
  |   |-- Config.pm
  |   |-- FacetsBuilder.pm
  |   |-- Index.pm
  |   |-- QueryBuilder.pm
  |   `-- Search.pm
  |-- Solr.pm
  |-- Zebra
  |   |-- QueryBuilder.pm
  |   `-- Search.pm
  `-- Zebra.pm

How to install and configure Solr ?
  See the wiki page: http://wiki.koha-community.org/wiki/SearchEngine_Layer_RFC

http://bugs.koha-community.org/show_bug.cgi?id=8233
Signed-off-by: Chris Cormack <chris@bigballofwax.co.nz>
2012-07-06 16:51:58 +02:00
Marc Veron
9c492a7fae Bug 7586 - Search: Language restriction does NOT show expected results (no items shown)
modified:   etc/zebradb/marc_defs/marc21/biblios/record.abs

Signed-off-by: Chris Cormack <chris@bigballofwax.co.nz>
Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
2012-06-10 11:00:14 +02:00
38b375b32c Bug 7818 Add UNIMARC biblio records zebra DOM def files
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
I tested two UNIMARC Koha installations using the sample UNIMARC
data from the BibLibre sandbox, comparing the results with DOM
and with GRS-1 indexing. The results are very similar, though there
are some differences. Most noticeable:
* relevance and facets seem to be more accurate with DOM enabled
* the GRS-1 configuration returns approximately 10% more results with
  random single keywords like "petit," but the DOM results contain
  the most relevant items, and any lacks in the configuration can
  easily be corrected as UNIMARC users identify fields that should be
  indexed but aren't
* authority-controlled searches match exactly
* author and topic facets do not work with the out-of-the-box GRS-1
  indexing configuration (?!?)
(adding second sign-off line below because all that probably looks like
a commit message and not a sign off)

Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
2012-06-09 11:44:16 +02:00
Galen Charlton
1f88669152 Bug 7818: add warning about not editing record.abs when using DOM filter
This commit also updates the authority and biblio DOM indexing definition
XSL to include updated header comments.

Signed-off-by: Galen Charlton <gmc@esilibrary.com>
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
2012-06-09 11:44:14 +02:00
Galen Charlton
79c0158aab Bug 7818: update comment to clarify availability of DOM index mode
DOM indexing is now available for both bibs and authorities.

Signed-off-by: Galen Charlton <gmc@esilibrary.com>
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
2012-06-09 11:44:12 +02:00
Galen Charlton
daca5edc52 Bug 7818: -x option of rebuild_zebra.pl now works with DOM filter
One consequence is that the -x and -a options are no longer
mutually exclusive.

Also, because of the way that the GRS-1 SGML filter works, if you're
indexing multiple documents, you can't just wrap them in a document
element, but the DOM filter *requires* it.  Consequently, two
new config settings in koha-conf.xml are added to indicate the
Zebra filter in use so that the -x option of rebuild_zebra.pl
knows whether to wrap the exported records or not:

- bib_index_mode (defaults to 'grs1' if not specified)
- auth_index_mode (defaults to 'dom')

Signed-off-by: Galen Charlton <gmc@esilibrary.com>
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
2012-06-09 11:44:09 +02:00
Galen Charlton
64680c18b3 Bug 7818: Zebra DOM filter index definitions for MARC21 bibs
The file biblio-zebra-indexdefs.xsl, which is the stylesheet that
is used by the Zebra DOM filter to convert an incoming MARC21 bib
to its indexed form, was generated by the following two steps:

misc/maintenance/make_zebra_dom_cfg_from_record_abs \
  --input  etc/zebradb/marc_defs/marc21/biblios/record.abs \
  --output etc/zebradb/marc_defs/marc21/biblios/biblio-koha-indexdefs.xml

xsltproc etc/zebradb/xsl/koha-indexdefs-to-zebra.xsl \
  etc/zebradb/marc_defs/marc21/biblios/biblio-koha-indexdefs.xml \
  > etc/zebradb/marc_defs/marc21/biblios/biblio-zebra-indexdefs.xsl

Records indexed using this XSLTshould behave similarly to records
indexed using the GRS-1 filter and the old record.abs definition, with
the following big exception (and improvemwent): indexed phrases now
span subfield boundaries if a specific subfield wasn't specified in the
index definition.  For example, the GRS-1 filter index definition

melm 245 Title

would allow 245 $a Cats on boxes : $b cardboard fantasies

to be searched as the phrases "cats on boxes" or "cardboard fantasies",
but a title phrase seach of "cats on boxes cardboard fantasises"
wouldn't work.  The DOM filter equivalent,

<index_data_field xmlns="http://www.koha-community.org/schemas/index-defs" tag="245">
  <target_index>Title:w</target_index>
  <target_index>Title:p</target_index>
</index_data_field>

*does* allow phrase searches to span subfield boundaries.

Signed-off-by: Galen Charlton <gmc@esilibrary.com>
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
2012-06-09 11:44:06 +02:00
Galen Charlton
76378ed202 Bug 7818: add index_data_field option to DOM indexing repertoire
Adds a new kohaidx:index_data_field index definition type which
indexes all of the subfields of a MARC data field as a single
phrase, separating the contents of each with a space.

Signed-off-by: Galen Charlton <gmc@esilibrary.com>
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
2012-06-09 11:44:04 +02:00