With Elasticsearch 6 (>6.4), we have a warning on index creation :
the default number of shards will change from [5] to [1] in 7.0.0
See https://github.com/elastic/elasticsearch/pull/30587
I propose to add number_of_shards in index config.
Also add number_of_replicas that is better explicit.
In case on only one node, it must be 0.
Test plan :
1) Use Elasticsearch
2) Apply patch and flush memcached
3) Rebuild indexes : misc/search_tools/rebuild_elasticsearch.pl -v -b -d
4) Check you dont have a warning about number of shards
5) Check the settings of index :
curl '<cluster>:9200/<myindex>_biblios/_settings?pretty&filter_path=**.number_of_*'
6) You should see :
"number_of_shards" : "5",
"number_of_replicas" : "1"
Signed-off-by: Victor Grousset/tuxayo <victor@tuxayo.net>
Signed-off-by: Nick Clemens <nick@bywatersolutions.com>
Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
Rather than limiting initials to [A-Z] we should test for a broad
range of uppercase letters.
The ES/Zebra changes are slightly different because of Perl vs Java regex
conventions. POerl may support either, but I found 'Uppercase' to be a bit more explicit
More info here:
https://perldoc.perl.org/perlunicode.html
TO test:
Same plan as before but use Ж. as the ending initial
Confirm the period is preserved and other punctuation removed
Signed-off-by: Katrin Fischer <katrin.fischer.83@web.de>
Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
The current code for facets doesn't pull strip ending punctuation from facets
This causes duplicate facets for terms that should be combined
Sometimes series can have different punctuation depending on the field they are in
Author initials punctuation should be preserved
To test:
1 - Do search and pull up some records
2 - Edit some of the records to have authors like:
Date, C.J.
Date, C.j.
Date, C.J .
3 - Edit the records to have some series statments like:
830 $aDate, C.J. ;$v5
830 $aDate, C.J. ; $v5
830 $aDate, C.J.; $v5
4 - Add some 490s to the record with first indicator 1 and series like:
You wouldn't want to--
You wouldn't want to
You wouldn't want to..
5 - Search again and note you have 3 facets each for author and series
6 - Apply patch
7 - Repeat
8 - Now you get 2 facets for author, period not removed when following Upper case immediately, is otherwise
9 - Now you should have a single series facet
10 - Switch search engine to ES (index before applying patch)
11 - Note facets are separate again
12 - Reset mappings and reindex
perl misc/search_tools/rebuild_elasticsearch -v -r
13 - Repeat search, facets combined as above
Signed-off-by: Sarah Cornell <sbcornell@cityofportsmouth.com>
Signed-off-by: Katrin Fischer <katrin.fischer.83@web.de>
Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
Generate a list of fields for the query_string query fields parameter,
with possible boosts, instead of using "_all"-field. Also add "search"
flag in search_marc_to_field table so that certain mappings can be
excluded from searches. Add option to include/exclude fields in
query_string "fields" parameter depending on searching in OPAC or staff
client. Refactor code to remove all other dependencies on "_all"-field.
How to test:
1) Reindex authorities and biblios.
2) Search biblios and try to verify that this works as expected.
3) Search authorities and try to verify that this works as expected.
4) Go to "Search engine configuration"
5) Change some "Boost", "Staff client", and "OPAC" settings and save.
6) Verify that those settings where saved accordingly.
7) Click the "Biblios" or "Authorities" tab and change one or more
"Searchable" settings
8) Verfiy that those settings where saved accordingly.
9) Try to verify that these settings has taken effect by peforming
some biblios and/or authorities searches.
Sponsorded-by: Gothenburg Univesity Library
Signed-off-by: Nick Clemens <nick@bywatersolutions.com>
Signed-off-by: Alex Arnaud <alex.arnaud@biblibre.com>
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>
Increases maximum field count from the default 1000 to 10000 to accommodate large records and MARC as an array.
Signed-off-by: Michal Denar <black23@gmail.com>
Signed-off-by: Nick Clemens <nick@bywatersolutions.com>
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>
To test:
1 - Do some authority searches in Zebra
2 - Switch to ES and repeat, results will vary and some may fail
3 - Apply patch and dependencies
4 - Reindex ES
5 - Repeat searches, they should suceed and results should be similar to
Zebra
6 - Slight differences are okay, but results should (mostly) meet
expectations
A few notes:
We add a 'normalizer' to ensure we get a single token from the heading
indexes, this makes 'starts with' work as expcted
We switch to 'AND' for fields searched from cataloging editor - this
matches Zebra results
We force the '__sort' fields for sorting - if sorting looks wrong try
reducing the heading field to a single subfield - this will need to be
addressed on a future bug (multiple subfields create an array, ES sorts
those randomly)
Signed-off-by: Nicolas Legrand <nicolas.legrand@bulac.fr>
Signed-off-by: Katrin Fischer <katrin.fischer.83@web.de>
Signed-off-by: Nick Clemens <nick@bywatersolutions.com>
Improvements:
1) Index settings moved from code to etc/searchengine/elasticsearch/index_config.yaml. An alternative can be specified in koha-conf.xml.
2) Field settings moved from code to etc/searchengine/elasticsearch/field_config.yaml. An alternative can be specified in koha-conf.xml.
3) mappings.yaml has been moved from admin/searchengine/elasticsearch to etc/searchengine/elasticsearch. An alternative can be specified in koha-conf.xml.
4) Default settings have been improved to remove punctuation from phrases used for sorting etc.
5) State variables are used for storing configuration to avoid parsing it multiple times.
6) A possibility to reset the fields too has been added to the reset operation of mappings administration.
7) mappings.yaml has been moved from admin/searchengine/elasticsearch to etc/searchengine/elasticsearch.
8) An stdno field type has been added for standard identifiers.
To test:
1) Run tests in t/Koha/SearchEngine/Elasticsearch.t
2) Clear tables search_fields and search_marc_map
3) Go to admin/searchengine/elasticsearch/mappings.pl?op=reset&i_know_what_i_am_doing=1
4) Verify that admin/searchengine/elasticsearch/mappings.pl displays the mappings properly, including ISBN and other standard number fields.
5) Index some records using the -d parameter with misc/search_tools/rebuild_elastic_search.pl to recreate the index
6) Verify that you can find the records
7) Put <elasticsearch_index_mappings>non_existent</elasticsearch_index_mappings> to koha-conf.xml
8) Verify that admin/searchengine/elasticsearch/mappings.pl?op=reset&i_know_what_i_am_doing=1 fails because it can't find non_existent.
9) Copy etc/searchengine/elasticsearch/mappings.yaml to a new location and make elasticsearch_index_mappings setting in koha-conf.xml point to it.
10) Make a change in the new mappings.yaml.
11) Clear table search_fields (mappings reset doesn't do it yet, see bug 20248)
12) Go to admin/searchengine/elasticsearch/mappings.pl?op=reset&i_know_what_i_am_doing=1
13) Verify that the changes you made are now visible in the mappings UI
Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>
Bug 20073: Move Elasticsearch yaml files back to admin directory
Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>
Signed-off-by: Nick Clemens <nick@bywatersolutions.com>