At the last development meeting we have voted to remove the
QueryParser-related code
https://wiki.koha-community.org/wiki/Development_IRC_meeting_19_February_2020
Hea tells us that it has not been adopted, and the code/bug tracker that
it is not really usable as it. As nobody is willing to work on it, we
decided to remove it instead.
Test plan:
% prove t/db_dependent/Search.t
must return green
See commits from bug 9239 and confirm that the code is removed in this
patch.
Also play with the search on the UI and confirm that you do not see
obvious regressions
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>
Signed-off-by: Katrin Fischer <katrin.fischer.83@web.de>
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>
Test plan:
* Add record matching rule:
Matching rule code: test
Description: 001 and 003
Match threshold: 1000
Record type: Bibliografic record
Match points:
Search index: Control-number
Score: 1000
Tag: 001
Subfields:
Offset:
Length:
Normalization rule: none
Required match checks:
Source (incoming) record check field
Tag: 003
Subfields:
Offset:
Length:
Normalization rule: none
Target (database) record check field
Tag: 003
Subfields:
Offset:
Length:
Normalization rule: none
* Note the match rule identity number.
* Stage a marc-file for import, for instance this one ftp://ftp.libris.kb.se/pub/export2/X/marc/X.20200104.marc
sudo /usr/sbin/koha-shell -c "perl /home/vagrant/kohaclone/misc/stage_file.pl --match 4 --file 'X.20200104.marc' --format ISO2709 --comment 'test'" kohadev
* Note the batch number and commit the file using the batch number:
sudo /usr/sbin/koha-shell -c "perl /home/vagrant/kohaclone/misc/commit_file.pl --batch-number 1" kohadev
* Again, stage the same marc-file for import:
sudo /usr/sbin/koha-shell -c "perl /home/vagrant/kohaclone/misc/stage_file.pl --match 4 --file 'X.20200104.marc' --format ISO2709 --comment 'test'" kohadev
* Note the number of records matched.
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>
Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>
To test:
1 - Set SearchEngine to ElasticSearch
2 - Stage the sample file (import it if it doesn't already exist in your catalog and then stage again)
3 - Set matching rule to ISBN
4 - No matches found
5 - Apply patch
6 - Apply no matchign rule
7 - Change the ISBN matching rule to use ISBN normalizer
8 - Apply matching rule for ISBN
9 - It matches!
Signed-off-by: Ron Houk <rhouk@ottumwapubliclibrary.org>
Signed-off-by: Marcel de Rooy <m.de.rooy@rijksmuseum.nl>
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>
I did not include tests only because this is a very small reasonable change and routine has no tests.
It does not affect behaviour, it only touches syntax lightly.
To test:
0 - Set SearchEngine to ElasticSearch
1 - Import the example record (a version is included in sample data in devbox/testing docker so you can skip)
2 - Stage the attached example record
3 - Match using 'ISBN' rule
4 - No matches found
5 - Apply patch
6 - Restart all the things
7 - Reapply matching with no rule
8 - Reapply with ISBN matcing
9 - It matches!
10 - Set SearchEngine to Zebra
11 - Reapply matching with no rule
12 - Reapply with ISBN matching
13 - Matching works as before in Zebra
Signed-off-by: Ron Houk <rhouk@ottumwapubliclibrary.org>
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>
To test:
1 - Stage some records for import
2 - Manage the records
3 - Use several different matching rules and note the results
4 - Apply patch
5 - Try several matching rules again and note the results have not
changed
6 - Try under both search engines (Zebra and ES)
Signed-off-by: Liz Rea <wizzyrea@gmail.com>
Signed-off-by: Marcel de Rooy <m.de.rooy@rijksmuseum.nl>
Signed-off-by: Nick Clemens <nick@bywatersolutions.com>
Also optimize it so it's actually usable.
Test plan:
1. To test it properly you need biblio and authority data. You might get away with enabling AutoCreateAuthorities and BiblioAddsAuthorities so that authorities are created in the process. Another option would be to import authorities first e.g. from LoC.
2. Make sure the authority index has been properly created with "misc/search_tools/rebuild_elastic_search.pl -a -d"
3. Run "misc/link_bibs_to_authorities.pl -v -l" twice and observe the results.
Sponsored-by: National Library of Finland
Signed-off-by: Brendan Gallagher <brendan@bywatersolutions.com>
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>
Signed-off-by: Nick Clemens <nick@bywatersolutions.com>
Fix:
Incorrect integer value: '' for column 'offset'
Incorrect integer value: '' for column 'score'
Test plan:
Add/edit matching rules
Signed-off-by: Marcel de Rooy <m.de.rooy@rijksmuseum.nl>
Signed-off-by: Nick Clemens <nick@bywatersolutions.com>
Test Plan:
1-go to C4/Matcher.pm
2-verify there is no whitespace at line 25
Signed-off-by: Mark Tompsett <mtompset@hotmail.com>
Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
Signed-off-by: Nick Clemens <nick@bywatersolutions.com>
Occurrences of C4::AuthoritiesMarc::SearchAuthorities must be replaced
by search_auth_compat.
You need to define the search index of matching rule with one of the
values defined in %koha_to_index_name (from
Koha::SearchEngine::Elasticsearch::QueryBuilder::build_authorities_query_compat)
Signed-off-by: Nick Clemens <nick@bywatersolutions.com>
Signed-off-by: Julian Maurice <julian.maurice@biblibre.com>
Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
The code in C4::Matches::get_matches is terrible and a bug has been introduced
by bug 12478 because of its way to handle uniqueness.
If search engine is elastic, simple_search_compat returns array ref of MARC::Record,
used as a string for the key of the matches hashref we get things like
"MARC::Record=HASH(0x8f76ab0)".
Yes, terrible...
The file is never staged and we get an internal server error:
stage-marc-import.pl: Can't locate object method "fields" via package "MARC::Record=HASH(0x8f76ab0)" (perhaps you forgot to load "MARC::Record=HASH(0x8f76ab0)"?) at /home/vagrant/kohaclone/C4/Biblio.pm line 2691
To recreate the issue:
- Set SearchEngine == Elastic
- Create a matching rule on 999$c (you need to edit the existing one and specify
'Local-number' as search index, not 'local-number')
- Import a file with bibliographic records and use the matching rule you defined.
Test plan:
Import authority and bibliographic records with Zebra and Elastic using a matching rule.
Everything should work correctly.
Note: I found a bug when importing authorities using Elastic, see bug 17255 comment 38.
Signed-off-by: Nick Clemens <nick@bywatersolutions.com>
Signed-off-by: Julian Maurice <julian.maurice@biblibre.com>
Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
C4::ImportBatch::GetBestRecordMatch uses SQL to sort by score descending
then candidate_match_id descending. With C4::Matcher::get_matches, I
implement the same sort but use Perl code to do it, since we're sorting
search results.
It's a simple change, but it's in a big block of code, so I don't have
unit tests.
Signed-off-by: Alex Buckley <alexbuckley@catalyst.net.nz>
Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>
Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
This patch allows you to use the "qualifier,qualifier" syntax
in the Record Matching Rules "Search Index" when using the QueryParser.
While QueryParser doesn't support this syntax, it will now fallback
correctly to non-QueryParser functionality. Without the patch, your search
will just fail silently.
This patch also adds a "skip_normalize" option to C4::Search::SimpleSearch(),
and uses the option during C4::Matcher::get_matches. This prevents
the s/:/=/g and s/=/:/g normalization. This normalization is heavy-handed,
and while it is necessary sometimes to generate a valid CCL query or
QueryParser query, C4::Matcher::get_matches() already produces a valid
CCL query, so we don't need to do this normalization.
Additionally, this normalization causes problems when you use a
Zebra register which isn't normalized: namely the "u" register.
Strings are stored "as is", so http://localhost/koha/resource/1 is
stored as is during indexing. When you search, you need to pass
the exact same thing through the query to get a match. Using
http=//localhost/koha/resource/1 in your query will yield zero results.
_TEST PLAN_
0) Apply patch
1) Create a Record Matching Rule in Koha with the following details:
Matching rule code: TEST
Description: Test
Match threshold: 100
Record type: Bibliographic
Match point 1:
Search index: id-other,st-urx
Score: 100
Tag: 024
Subfields: a
Normalization rule: None
2) Create a record in Koha with an indexable URI
2a) Default framework
2b) 024 $a http://koha-community.org/test $2 uri
2c) 040 $c test
2d) 245 $a This is a test record
2e) 942 $c Books
2f) Save (save again if cautioned about missing fields as these should autofill)
3) Do a full re-index of Zebra
4) Download your record from Koha as a .mrc file (ie isomarc, binary marc, etc)
5) Go to "Stage MARC records for import"
5a) Upload your .marc file.
5b) Change your "Record matching rule" to "Test"
5c) Click Stage for import
9) It should say "1 records with at least one match in catalog per matching rule "Test".
NOTE: For completeness, you can go through this process on a clean master branch,
and note that it will say '0 records with at least one match in catalog per matching rule "TEST"'
Signed-off-by: Alex Buckley <alexbuckley@catalyst.net.nz>
Signed-off-by: Marcel de Rooy <m.de.rooy@rijksmuseum.nl>
Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>
This patch adds a syspref "AggressiveMatchOnISSN" allowing for a match
of ISSNs with or without hyphens. It uses Business::ISSN in order to
follow the use of Business::ISBN and allow for validation of ISSNs
To test:
1 - Find a record in your system with an ISSN (or add one)
2 - Stage a record containing the same ISSN but lacking a hyphen
3 - Matching on ISSN should find 0 matches
4 - Repeat with no hyphen ISSN in system and hyphen ISSN in import
5 - Matching should find 0
6 - Apply patch
7 - Update datbase and install Business::ISSN
8 - Leave AggressiveMatchOnISSN as don't and repeat original tests- no
change
9 - Set AggressiveMatchOnISSN as do and repeat original test
10 - You should find a match
11 - prove t/Koha.t - all tests pass
Sponsored by North Central Regional Library System (NCRL) www.ncrl.org
Signed-off-by: Chad Roseburg <croseburg@ncrl.org>
Signed-off-by: Marcel de Rooy <m.de.rooy@rijksmuseum.nl>
Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>
The current implementation doesn't care about that parameter, and applies
a default normalization rule that seems counter-productive (in general) for
its aleged purpose.
This patch makes it handle the following values for 'norms':
- upper_case
- lower_case
- remove_spaces
- legacy_default
- none
They make it call the relevant Koha::Utils::Normalize routines. 'legacy_default'
is used only for backwards compatibility, but could be removed if there's consensus.
To test:
- Run:
$ prove t/Matcher.t
=> FAIL: most _get_match_keys tests fail
- Apply the patch
- Run:
$ prove t/Matcher.t
=> SUCCESS: Tests pass!
- Sign off :-D
Sponsored-by: FIT
Signed-off-by: Mark Tompsett <mtompset@hotmail.com>
Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>
Signed-off-by: Nick Clemens <nick@bywatersolutions.com>
Signed-off-by: Jesse Weaver <jweaver@bywatersolutions.com>
Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>
Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>
Signed-off-by: Brendan Gallagher <brendan@bywatersolutions.com>
subroutines should not take $dbh in parameter.
C4::Biblio::TransformMarcToKoha has it and does not use it.
Test plan:
Look at the patch and confirm that all occurrences of
TransformMarcToKoha have been modified.
Signed-off-by: Jacek Ablewicz <abl@biblos.pk.edu.pl>
Signed-off-by: Katrin Fischer <katrin.fischer.83@web.de>
Signed-off-by: Brendan Gallagher brendan@bywatersolutions.com
Signed-off-by: Olli-Antti Kivilahti <olli-antti.kivilahti@jns.fi>
Also fixes ! and +
Rebased to master
Signed-off-by: Tomas Cohen Arazi <tomascohen@unc.edu.ar>
It makes perfect sense and works as expected. This part of the code is too
under-tested so no point requiring a regression test for such a simple change.
Signed-off-by: Brendan A Gallagher <brendan@bywatersolutions.com>
perl -p -i -e 's/use vars qw\(\s*\);\n//' **/*.pm
Signed-off-by: Josef Moravec <josef.moravec@gmail.com>
Signed-off-by: Tomas Cohen Arazi <tomascohen@unc.edu.ar>
Signed-off-by: Brendan A Gallagher <brendan@bywatersolutions.com>
perl -p -i -e 's/^.*set the version for version checking.*\n//' **/*.pm
+ manual adjustements
Signed-off-by: Josef Moravec <josef.moravec@gmail.com>
Signed-off-by: Tomas Cohen Arazi <tomascohen@unc.edu.ar>
Signed-off-by: Brendan A Gallagher <brendan@bywatersolutions.com>
Mainly a
perl -p -i -e 's/^.*3.07.00.049.*\n//' **/*.pm
Then some adjustements
Signed-off-by: Josef Moravec <josef.moravec@gmail.com>
Signed-off-by: Tomas Cohen Arazi <tomascohen@unc.edu.ar>
Signed-off-by: Brendan A Gallagher <brendan@bywatersolutions.com>
perl -p -i -e 's/^(use vars .*)\$VERSION\s?(.*)/$1$2/' **/*.pm
Signed-off-by: Josef Moravec <josef.moravec@gmail.com>
Signed-off-by: Tomas Cohen Arazi <tomascohen@unc.edu.ar>
Signed-off-by: Brendan A Gallagher <brendan@bywatersolutions.com>
Signed-off-by: Chris Nighswonger <cnighswonger@foundations.edu>
Signed-off-by: Tomas Cohen Arazi <tomascohen@gmail.com>
Signed-off-by: Katrin Fischer <katrin.fischer@bsz-bw.de>
http://bugs.koha-community.org/show_bug.cgi?id=9987
Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>
Signed-off-by: Tomas Cohen Arazi <tomascohen@gmail.com>
The original patch did not correctly construction ISBN phrase
searches when QP is on. Unfortunately, when attempting to fix that,
I discovered that there's a deep bug in QP that makes it generate
incorrect search queries when combining more than two atoms
in with the || operator. Consequently, until that can be fixed,
this patch ensures that if UseQueryParser is on, AggressiveMatchOnISBN
has no effect.
To state it anther way, AggressiveMatchOnISBN works only when QP
is not in use.
Signed-off-by: Galen Charlton <gmc@esilibrary.com>
In UNIMARC, the isbn index is ISBN.
Signed-off-by: Jonathan Druart <jonathan.druart@biblibre.com>
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>
Signed-off-by: Galen Charlton <gmc@esilibrary.com>
Test Plan:
1) Catalog a record with the ISBN "0394502884 (Random House)"
2) Export the record, edit it so the ISBN is now
"0394502884 (UnRandomHouse)"
3) Using the record import tool, import this record with matching
on ISBN.
4) You should not find a match
5) Apply this patch
6) Run updatedatabase.pl
7) Enable the new system preference AggressiveMatchOnISBN
8) Repeat step 3
9) The tool should now find a match
Signed-off-by: Tom McMurdo <thomas.mcmurdo@state.vp.us>
Signed-off-by: Jonathan Druart <jonathan.druart@biblibre.com>
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>
Signed-off-by: Galen Charlton <gmc@esilibrary.com>
This patch makes Koha <-> Zebra use MARCXML for the serialization when
using DOM, and USMARC for GRS-1.
* The following functions are modified to set the Zebra record syntax
according to the current sysprefs and configuration:
- C4::Context->Zconn
- C4::Context-_new_Zconn
* A new function 'new_record_from_zebra' is introduced, which checks the
context we are in, and creates the MARC::Record object using the right
constructor.
The following packages get touched to make use of the new function:
- C4::Search
- C4::AuthoritiesMarc
and the same happens to the UI scripts that make use of them (both in
the OPAC and STAFF interfaces).
* Calls to the unsafe ZOOM::Record->render()[1] method are removed.
Due to this last change the code for building facets was rewritten. And
for performance on the facets creation I pushed higher version
dependencies for MARC::File::XML and MARC::Record (we rely on
MARC::Field->as_string).
* Calls to MARC::Record->new_from_xml and MARC::Record->new_from_usmarc
are wrapped with eval for catching problems [2].
* As of bug 3087, UNIMARC uses the 'unimarc' record syntax. this case is
correctly handled.
* As of bug 7818 misc/migration_tools/rebuild_zebra.pl behaves like:
- bib_index_mode (defaults to 'grs1' if not specified)
- auth_index_mode (defaults to 'dom')
here we do exactly the same.
To test:
- prove t/db_dependent/Search.t should pass.
- Searching should remain functional.
- Indexing and searching for a big record should work (that's what the
unit tests do).
- Test an index scan search (on the staff interface):
Search > More options > Check "Scan indexes".
- Enable 'itemBarcodeFallbackSearch' and try to circulate any word, it
shouldn't break.
- Searching for a biblio in a new subscription shouldn't break.
- Running bulkmarcimport.pl shouldn't break.
- And so on... for the rest of the .pl files.
[1] http://search.cpan.org/~mirk/Net-Z3950-ZOOM/lib/ZOOM.pod#render()
[2] a record that cannot be parsed by MARC::Record is simply skipped (bug 10684)
Sponsored-by: Universidad Nacional de Cordoba
Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>
Signed-off-by: Galen Charlton <gmc@esilibrary.com>
When introducing QueryParser, I introduced a check for QueryParser at
too high a level, causing authority matching to try and use SimpleSearch
for authorities prematurely, when SearchAuthorities should be handling
it. This patch corrects the level of the check. This patch only moves
three lines, but thanks to the change in if level, it adjusts the
indentation quite a bit.
Signed-off-by: Katrin Fischer <Katrin.Fischer.83@web.de>
Comments on third patch of this series.
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
With the inclusion of this patch, all searches will (try) to use
QueryParser for handling queries for both the bibliographic and authority
databases if UseQueryParser is enabled. If QueryParser is unavailable,
UseQueryParser is disabled, or the search uses CCL indexes, the old
search code will be used.
To test:
1) Apply patch.
2) Run the unit test with `prove t/QueryParser.t`
3) Enable the UseQueryParser syspref.
4) Try searches that should return results in the following places:
* OPAC (simple search)
* OPAC (advanced search)
* OPAC (authorities)
* Staff client (header search)
* Staff client (advanced search)
* Staff client (cataloging search)
* Staff client (authorities)
* Staff client (importing a batch using a match point)
* Staff client (searching for an item for adding to a label)
* Staff client (acquisitions)
* Staff client (searching for a record to create a serial)
* ANYWHERE ELSE I HAVE FORGOTTEN
5) Disable the UseQueryParser syspref. Repeat at least some of the
searches you did above.
6) If all searches worked, sign off.
Signed-off-by: Chris Cormack <chris@bigballofwax.co.nz>
Signed-off-by: Elliott Davis <elliott@bywatersolions.com>
Searching still works as expected for variuos places.
QueryParser syspref seemed to be enabled by default
Signed-off-by: Katrin Fischer <Katrin.Fischer.83@web.de>
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
* Add the code necessary to handle authorities with matching rules and
import batches.
* Update all the scripts that use the matcher and import batch code
to use the new API.
* Add authority records to the matching rules interface in the staff
client.
http://bugs.koha-community.org/show_bug.cgi?id=2060
Signed-off-by: Elliott Davis <elliott@bywatersolutions.com>
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
Rebased on latest master 11 September 2012
svc/import_bib:
* takes POST request with parameters in url and MARC XML as DATA
* pushes MARC XML to an impoort bach queue of type 'webservice'
* returns status and imported record XML
* is a drop-in replacement for svc/new_bib
misc/cronjobs/import_webservice_batch.pl:
* a cron job for processing impoort bach queues of type 'webservice'
* batches can also be processed through the UI
misc/bin/connexion_import_daemon.pl:
* a daemon that listens for OCLC Connexion requests and is compliant
with OCLC Gateway spec
* takes request with MARC XML
* takes import batch params from a config file and forwards the lot to
svc/import_bib
* returns status
ImportBatches:
* Added new import batch type of 'webservice'
* Changed interface to AddImportBatch() - now it takes a hashref
* Replaced batch_type = 'batch' with
batch_type IN ( 'batch', 'webservice' ) in some SELECTs
Signed-off-by: MJ Ray <mjr@phonecoop.coop>
Remove some unnecessary checks when check of error is
sufficient. Make the order in some cases more logical
Should remove some possibilities of runtime warning noise.
Although some calls belong to the 'Nothing could
ever go wrong' school have added some warnings
Signed-off-by: Christophe Croullebois <christophe.croullebois@biblibre.com>
Signed-off-by: Chris Cormack <chrisc@catalyst.net.nz>
Should fix any remaining warnings with 'podchecker'
Signed-off-by: Andrew Elwell <Andrew.Elwell@gmail.com>
Signed-off-by: Galen Charlton <gmcharlt@gmail.com>
Also fix issues with normalizing ISBNs and the default
normalizer in C4::Matcher.
Signed-off-by: Chris Cormack <chrisc@catalyst.net.nz>
Signed-off-by: Galen Charlton <galen.charlton@liblime.com>
Recommended by Michele Maenpaa, this adds handling colon and end-punctuation stripping
in subroutine _normalize, and fixes length testing in subroutine _get_match_keys
Signed-off-by: Galen Charlton <galen.charlton@liblime.com>
C4::Search::SimpleSearch was alredy patched to let you pass in the number of results you want back.
These instances were not using the new API. This patch makes all calls to SimpleSearch specify a limit.
I improved the documentation of SimpleSearch a bit to include the third returned value.
I believe there's a bug in C4::Output::pagination_bar, in that it doesn't deal well with URLs
with only one pair of parameter=value passed to it. I'm getting around this by passing in a second
pair that does nothing.
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
Enhancement to store the matching rule associated with an
import batch and to allow the current matching rule in
effect to be changed and the duplicate detection redone.
Signed-off-by: Chris Cormack <crc@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>