For debugging purposes we may wish to see the requests and responses made to
Elasticsearch
To test:
1 - prove -v t/Koha/SearchEngine/Elasticsearch.t
2 - Set <trace_to>Stderr</trace_to> in koha-conf
3 - Restart all
4 - perl misc/search_tools/rebuild_elasticsearch.pl
5 - Note requests are shown
6 - Set
<trace_to>File</trace_to>
<trace_to>/var/log/koha/kohadev/plack-error.log</trace_to>
in koha-conf
7 - Restart all
8 - perl misc/search_tools/rebuild_elasticsearch.pl
9 - Check the plack log and see the ES requests
Signed-off-by: Bob Bennhoff <bbennhoff@clicweb.org>
Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>
Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
Signed-off-by: Heather Hernandez <heather_hernandez@nps.gov>
Bug 23828: (follow-up) fix unit test merge
Signed-off-by: Katrin Fischer <katrin.fischer.83@web.de>
Bug 23828: (QA follow-up) Fix number of tests
Signed-off-by: Katrin Fischer <katrin.fischer.83@web.de>
Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
Signed-off-by: Bob Bennhoff <bbennhoff@clicweb.org>
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>
Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
Signed-off-by: Katrin Fischer <katrin.fischer.83@web.de>
Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
Signed-off-by: Nick Clemens <nick@bywatersolutions.com>
Signed-off-by: Katrin Fischer <katrin.fischer.83@web.de>
Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
Signed-off-by: Nick Clemens <nick@bywatersolutions.com>
Signed-off-by: Katrin Fischer <katrin.fischer.83@web.de>
Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
Add a "year" search field type. Fields with this type will only
retain values that looks like years, so invalid values such as
whitespace or word characters will not be indexed.
This for instance improves the behaviour when sorting by
"date-of-publication". If all values are indexed, records with
junk data instead of valid years will appear first among the search
results, drowning out more relevant hits. If assigning this field
the "year" type these records will instead always appear last,
regarless of sort order.
To test:
1) Have at least two biblios, one with a valid year in 008 (pos 7-10)
and another with an invalid one ("uuuu" for example)
2) Perform a wildcard search (*) and sort results by publication date.
3) The record with invalid year of pulication in 008 should appear first
4) Apply patch and run database updates
5) Reindex ElasticSearch
6) Perform the same search as in 2)
7) The record with the invalid year should now appear last
Signed-off-by: Nick Clemens <nick@bywatersolutions.com>
Signed-off-by: Katrin Fischer <katrin.fischer.83@web.de>
Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
.pm must not have -x
.t must have -x
.pl must have -x
Test plan:
Apply only the first patch, run the tests and confirm that the failures
make sense
Apply this patch and confirm that the test now returns green
Signed-off-by: Marcel de Rooy <m.de.rooy@rijksmuseum.nl>
Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
The match-heading field is a special field used only by the linker, not accessible
to staff or patrons via the interface. This field is used to store the constructed
'search form' used for matching bib headings to authority fields.
In bug 24269 I attempted to use the mappings defined in the inferface and also inject the search term.
This did not work as too many subfields were indexed on their own and leading to false matches.
In this bug we remove the mappings for this field, and create it ourselves during
the indexing process. The C4::Headings module is still used to generate the correct form,
however, the mappings are set based on the authority types in the system. This gives the user
the ability to add new typoes, but prevents mapping changes from breaking linker functionality
To test:
1 - Start form a sample database with ElasticSearch working
2 - Download via Z39.50 2 authorities, one of which is a narrower heading of the other, e.g.:
Waterworks
Waterworks - Costs
3 - Place a heading for the broader term in a record. e.g. Waterworks
In 650$a, without the cataloguing authority plugin. We don't want
the link created now.
You need syspref BiblioAddsAuthorities => allow
4 - Make sure linker is set to default
5 - Attempt to link the records
misc/link_bibs_to_authorities.pl
6 - Linking fails
7 - Apply patch
8 - refresh index settings (if using a custom file, remove 'match-heading')
You can reset mappings in the UI or run this:
misc/search_tools/rebuild_elasticsearch.pl -v -d -r
9 - Reindex ES
10 - Try to link again
11 - It succeeds!
12 - Run the tests
prove t/db_dependent/Koha/SearchEngine/Elasticsearch.t
Signed-off-by: Victor Grousset/tuxayo <victor@tuxayo.net>
Signed-off-by: Katrin Fischer <katrin.fischer.83@web.de>
Bug 25273: (follow-up)
Signed-off-by: Katrin Fischer <katrin.fischer.83@web.de>
Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
Remove unused Elasticsearch parameters 'key_prefix' and 'request_timeout'.
Refactor so that get_elasticsearch_params only returns parameters used
by Search::Elasticsearch, add class accessor for index_name.
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>
Signed-off-by: Nick Clemens <nick@bywatersolutions.com>
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>
This patch simplay alters the data we use for the tests, doing so causes them to fail
To test:
1 - Apply only this patch
2 - prove -v t/Koha/SearchEngine/Elasticsearch.t
3 - It fails!
4 - Apply next patch
5 - It passes!
Signed-off-by: Andrew Fuerste-Henry <andrew@bywatersolutions.com>
Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>
Strip initial characters from search fields in accordance with
nonfiling character indicators.
To test:
1) Apply patch
2) Run tests in t/Koha/SearchEngine/Elasticsearch.t
3) All tests should pass
Signed-off-by: David Nind <david@davidnind.com>
Signed-off-by: Nick Clemens <nick@bywatersolutions.com>
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>
To test:
1 - Be using Elasticsearch
2 - Check that field 150 is mapped to 'Match-heading' or add it (the subfields don't matter)
3 - Add a topic term authority record like "150 $aCats$vFiction"
4 - Add a 650 with $aCats and $vFiction and $e depicted to a bibliographic record
5 - Run the linker for the bib
perl misc/link_bibs_to_authorities.pl --bib-limit "biblionumber=89" -v
6 - Confirm the record is not correctly linked to the record
7 - Apply patch
8 - Reindex authorities for ES
perl misc/search_tools/rebuild_elasticsearch.pl -v -d -a
9 - Run the linker and confirm record is correctly linked
perl misc/link_bibs_to_authorities.pl --bib-limit "biblionumber=89" -v
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>
Signed-off-by: Bouzid Fergani <bouzid.fergani@inlibro.com>
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>
marc_records_to_documents is now an arrayref of hashes, not an arrayref of arrays
_sanitise_records has been removed, we don't need those tests anymore
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>
Adds preference ElasticsearchMARCFormat that controls whether MARC records are stored as ISO2709/MARCXML or array. Array is searchable by field and also indexes all subfields in the _all field for searching.
Test plan:
1. Test that searching and indexing works with the patch without any changes.
2. Switch to array format and index some records.
3. Check e.g. the 008 field of a record and verify that the record can be found with the contents enclosed in quotes.
4. Check that it's possible to search for a specific field/subfield. Search query: marc_data_array.fields.655.subfields.a:Diaries
5. Check that tests still pass, especially t/Koha/SearchEngine/Elasticsearch.t
Signed-off-by: Michal Denar <black23@gmail.com>
Signed-off-by: Nick Clemens <nick@bywatersolutions.com>
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>
Fixes handling of /0 selector in addition to several fixed field positions. Note that ff7-01-02 is in fact 00-01 in Zebra and that has been replicated here.
Test plan:
1. Before applying a patch, check from Elasticsearch (e.g. localhost:9200/koha_biblios/_search?q=_id:123) what is indexed in Elasticsearch for a record for date-entered-on-file, ff7-00, ff7-01, ff7-02 and lleader.
2. Apply the patch, update database and save the record again.
3. Verify that the contents of the forementioned fields is now correct in Elasticsearch.
4. Verify that tests pass: prove -v t/Koha/SearchEngine/Elasticsearch.t
Signed-off-by: Björn Nylén <bjorn.nylen@ub.lu.se>
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>
Signed-off-by: Nick Clemens <nick@bywatersolutions.com>
Deduplicate multivalued fields and make sure sort fields are not excessively long. Also updates default mappings so that sort fields are not created for item fields where it doesn't make sense.
Test plan:
1. Reset ES mappings in administration
2. Check that sort is '0' for local-classification in biblio mappings.
3. Change sort back to '1' for local-classification for the next steps.
4. Create a record with 20 items, each with a 100 character long call number
5. Check that when indexed, the record in ES does not have duplicates in any of the item fields and local-classification__sort is truncated to 255 characters.
Signed-off-by: Nick Clemens <nick@bywatersolutions.com>
Signed-off-by: Josef Moravec <josef.moravec@gmail.com>
Signed-off-by: Nick Clemens <nick@bywatersolutions.com>
https://bugs.koha-community.org/show_bug.cgi?id=20244
Test plan:
1. Add a record with alternate script fields in 880 (sample attached in the bug).
2. Make sure the the record can be found e.g. with the alternate script title.
3. Add a record with multiple authors.
4. Check that in the index there is only a single author__sort field.
Signed-off-by: Brendan Gallagher <brendan@bywatersolutions.com>
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>
Signed-off-by: Nick Clemens <nick@bywatersolutions.com>
https://bugs.koha-community.org/show_bug.cgi?id=20244
Test plan:
1. Add a record with an ISBN-10 or ISBN-13 that can be converted to ISBN-10 (e.g. 1-56619-909-3).
2. Verify that the record can be found by searching for "1-56619-909-3", "1566199093", "978-1-56619-909-4" and "9781566199094".
Signed-off-by: Brendan Gallagher <brendan@bywatersolutions.com>
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>
Signed-off-by: Nick Clemens <nick@bywatersolutions.com>
Sponsored-by: Gothenburg University Library
Signed-off-by: Ere Maijala <ere.maijala@helsinki.fi>
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>
Signed-off-by: Nick Clemens <nick@bywatersolutions.com>
Sponsored-by: Gothenburg University Library
Signed-off-by: Ere Maijala <ere.maijala@helsinki.fi>
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>
Signed-off-by: Nick Clemens <nick@bywatersolutions.com>
Add missing pods, remove obsolete syspref and add test for serialization format for records exceeding max record size
Sponsored-by: Gothenburg University Library
Signed-off-by: Ere Maijala <ere.maijala@helsinki.fi>
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>
Signed-off-by: Nick Clemens <nick@bywatersolutions.com>
Sponsored-by: Gothenburg University Library
Signed-off-by: Ere Maijala <ere.maijala@helsinki.fi>
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>
Signed-off-by: Nick Clemens <nick@bywatersolutions.com>
https://bugs.koha-community.org/show_bug.cgi?id=19893
Sponsored-by: Gothenburg University Library
Signed-off-by: Ere Maijala <ere.maijala@helsinki.fi>
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>
Signed-off-by: Nick Clemens <nick@bywatersolutions.com>
Implement optimized indexing for Elasticsearch
How to test:
1) Time a full elasticsearch re-index without this patch by running the
rebuild_elastic_search.pl with the -d flag:
`koha-shell <instance_name> -c "time rebuild_elastic_search.pl -d"`.
2) Apply this patch.
3) Time a full re-index again, it should be about twice at fast (for a
couple of thousand biblios, with fewer biblios results may be more
unpredictable).
Sponsored-by: Gothenburg University Library
Signed-off-by: Ere Maijala <ere.maijala@helsinki.fi>
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>
Signed-off-by: Nick Clemens <nick@bywatersolutions.com>
Improvements:
1) Index settings moved from code to etc/searchengine/elasticsearch/index_config.yaml. An alternative can be specified in koha-conf.xml.
2) Field settings moved from code to etc/searchengine/elasticsearch/field_config.yaml. An alternative can be specified in koha-conf.xml.
3) mappings.yaml has been moved from admin/searchengine/elasticsearch to etc/searchengine/elasticsearch. An alternative can be specified in koha-conf.xml.
4) Default settings have been improved to remove punctuation from phrases used for sorting etc.
5) State variables are used for storing configuration to avoid parsing it multiple times.
6) A possibility to reset the fields too has been added to the reset operation of mappings administration.
7) mappings.yaml has been moved from admin/searchengine/elasticsearch to etc/searchengine/elasticsearch.
8) An stdno field type has been added for standard identifiers.
To test:
1) Run tests in t/Koha/SearchEngine/Elasticsearch.t
2) Clear tables search_fields and search_marc_map
3) Go to admin/searchengine/elasticsearch/mappings.pl?op=reset&i_know_what_i_am_doing=1
4) Verify that admin/searchengine/elasticsearch/mappings.pl displays the mappings properly, including ISBN and other standard number fields.
5) Index some records using the -d parameter with misc/search_tools/rebuild_elastic_search.pl to recreate the index
6) Verify that you can find the records
7) Put <elasticsearch_index_mappings>non_existent</elasticsearch_index_mappings> to koha-conf.xml
8) Verify that admin/searchengine/elasticsearch/mappings.pl?op=reset&i_know_what_i_am_doing=1 fails because it can't find non_existent.
9) Copy etc/searchengine/elasticsearch/mappings.yaml to a new location and make elasticsearch_index_mappings setting in koha-conf.xml point to it.
10) Make a change in the new mappings.yaml.
11) Clear table search_fields (mappings reset doesn't do it yet, see bug 20248)
12) Go to admin/searchengine/elasticsearch/mappings.pl?op=reset&i_know_what_i_am_doing=1
13) Verify that the changes you made are now visible in the mappings UI
Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>
Bug 20073: Move Elasticsearch yaml files back to admin directory
Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>
Signed-off-by: Nick Clemens <nick@bywatersolutions.com>
This reverts commit f489d2034b.
This commit breaks the install process when using debian packages.
Reverting as we are very close to the 18.05.00 release
Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
Improvements:
1) Index settings moved from code to etc/searchengine/elasticsearch/index_config.yaml. An alternative can be specified in koha-conf.xml.
2) Field settings moved from code to etc/searchengine/elasticsearch/field_config.yaml. An alternative can be specified in koha-conf.xml.
3) mappings.yaml has been moved from admin/searchengine/elasticsearch to etc/searchengine/elasticsearch. An alternative can be specified in koha-conf.xml.
4) Default settings have been improved to remove punctuation from phrases used for sorting etc.
5) State variables are used for storing configuration to avoid parsing it multiple times.
6) A possibility to reset the fields too has been added to the reset operation of mappings administration.
7) mappings.yaml has been moved from admin/searchengine/elasticsearch to etc/searchengine/elasticsearch.
8) An stdno field type has been added for standard identifiers.
To test:
1) Run tests in t/Koha/SearchEngine/Elasticsearch.t
2) Clear tables search_fields and search_marc_map
3) Go to admin/searchengine/elasticsearch/mappings.pl?op=reset&i_know_what_i_am_doing=1
4) Verify that admin/searchengine/elasticsearch/mappings.pl displays the mappings properly, including ISBN and other standard number fields.
5) Index some records using the -d parameter with misc/search_tools/rebuild_elastic_search.pl to recreate the index
6) Verify that you can find the records
7) Put <elasticsearch_index_mappings>non_existent</elasticsearch_index_mappings> to koha-conf.xml
8) Verify that admin/searchengine/elasticsearch/mappings.pl?op=reset&i_know_what_i_am_doing=1 fails because it can't find non_existent.
9) Copy etc/searchengine/elasticsearch/mappings.yaml to a new location and make elasticsearch_index_mappings setting in koha-conf.xml point to it.
10) Make a change in the new mappings.yaml.
11) Clear table search_fields (mappings reset doesn't do it yet, see bug 20248)
12) Go to admin/searchengine/elasticsearch/mappings.pl?op=reset&i_know_what_i_am_doing=1
13) Verify that the changes you made are now visible in the mappings UI
Signed-off-by: Nicolas Legrand <nicolas.legrand@bulac.fr>
Signed-off-by: Nick Clemens <nick@bywatersolutions.com>
Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
This patch adds a new statuc function to Koha::SearchEngine::ElasticSearch
which is instended to replace most of get_elasticsearch_params. This function
reads the configuration from C4::Context->config('elasticsearch') and raises
relevant exceptions when mandatory entries are missing.
Its behaviour is covered by tests.
To test:
- Run:
$ kshell
k$ prove t/Koha/SearchEngine/Elasticsearch.t
=> SUCCESS: Tests pass!
- Sign off :-D
Signed-off-by: David Bourgault <david.bourgault@inlibro.com>
Signed-off-by: Nick Clemens <nick@bywatersolutions.com>
Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>