Commit graph

36 commits

Author SHA1 Message Date
a562101fdc Bug 27673: Fix encoding issues - Dump
Same as Load, but for Dump.

Test plan:
Edit ES mappings, replace withdrawn's label with "withdrawn ✔️ ❤️ ★"
Export the mappings
  perl misc/search_tools/export_elasticsearch_mappings.pl > admin/searchengine/elasticsearch/mappings.yaml
Reset mappings from the UI
=> Notice that the label is correct

Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>
Signed-off-by: Joonas Kylmälä <joonas.kylmala@helsinki.fi>

Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
2021-02-16 14:54:50 +01:00
46f7239b08 Bug 27673: Replace YAML with YAML::XS
From tht YAML pod:

"""
This module has been released to CPAN as YAML::Old, and soon YAML.pm will be changed to just be a frontend interface module for all the various Perl YAML implementation modules, including YAML::Old.

If you want robust and fast YAML processing using the normal Dump/Load API, please consider switching to YAML::XS. It is by far the best Perl module for YAML at this time. It requires that you have a C compiler, since it is written in C.
"""

See also
https://gitlab.com/koha-community/qa-test-tools/-/merge_requests/35

Test plan:
Try some place where YAML::XS is not used and confirm that it works
correctly

QA note: This patch removes some uses of YAML that were not useful

Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>
Signed-off-by: Joonas Kylmälä <joonas.kylmala@helsinki.fi>

Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
2021-02-16 14:54:50 +01:00
Björn Nylén
ba9c9cc794 Bug 26996: Convert Elasticsearch indexer commit buffer size to integer
When multithreaded indexing is used, the commit size for children are spread
out resulting in them being of type float. When records are processed and the
commit counter decreased it may then never reach *exactly* 0. This means records
are never commited. This patch makes sure the counter is an integer to avoid the
problem.

To test you must find a set of circumstances that causes the issue. For me:
1. Run: ./rebuild_elasticsearch -v -b -p 2 -c 400
2. Note that only one process is logging "Committing xxx records..."
3. Kill processes.
4. Apply patch.
5. Repeat 1
6. Note that both processes are logging "Committing xxx records..."

Sponsored-by: Lund University Library
Signed-off-by: Joonas Kylmälä <joonas.kylmala@helsinki.fi>

Signed-off-by: Nick Clemens <nick@bywatersolutions.com>

Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
2021-01-04 13:29:55 +01:00
e42b4088ed Bug 26180: Add descending option to rebuild_elasticsearch.pl
While the ES index is incremental and provides results as it commits, we currently index from the oldest records to the newest.

This patch provides the option to go the other direction

To test:
 1 - Have ES setup and running for Koha
 2 - perl misc/search_tools/rebuild_elasticsearch.pl -v -v -b
 3 - Note the biblios index from number 1 the end
 4 - perl misc/search_tools/rebuild_elasticsearch.pl -v -v -a
 5 - Notice the same
 6 - Apply patch
 7 - perl misc/search_tools/rebuild_elasticsearch.pl -v -v -b
 8 - Still in ascending order
 9 - perl misc/search_tools/rebuild_elasticsearch.pl -v -v -b --desc
10 - Now records index in descending order
11 - perl misc/search_tools/rebuild_elasticsearch.pl -v -v -a
12 - Still ascending
13 - perl misc/search_tools/rebuild_elasticsearch.pl -v -v -a --desc
14 - Now descending

Signed-off-by: Séverine QUEUNE <severine.queune@bulac.fr>
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>

JD amended patch: fix typo "inde" vs "index" and add commit body

Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
2020-11-04 12:59:33 +01:00
9441537e54 Bug 26832: Make Elasticsearch mappings export use UTF-8
Script misc/search_tools/export_elasticsearch_mappings.pl allows to export current search engine configuration into a YAML file.
This export should use UTF-8 encoding, like other exports.

Test plan :
1) Go to Administration > Search engine configuration (Elasticsearch)
2) Edit a field label to use a diacrtic, for example local-number => Numéro
3) Save
4) Edit file etc/koha-conf.xml to enable 'elasticsearch_index_mappings'
5) Export mappings to file via misc/search_tools/export_elasticsearch_mappings.pl -t $MARCFLAVOUR
6) Reset memcached and plack
7) Back to Administration > Search engine configuration (Elasticsearch)
8) Click on 'Reset Mappings' and accept
9) Look at field 'local-number'
=> Without patch diacritic 'é' is broken
10) You may try with an emoji B-)

Signed-off-by: David Nind <david@davidnind.com>
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>

Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
2020-11-04 12:59:32 +01:00
11bf5d7afa
Bug 23137: Move cache flushing to the method
Signed-off-by: Nick Clemens <nick@bywatersolutions.com>
Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>
2020-04-29 17:02:15 +01:00
9fa6bb6401
Bug 23137: Add reset option to rebuild_elasticsearch.pl
Setup:
1 - Be using Elasticsearch
2 - Reload mappings from the db
    Admin->Search configuration
    Reset Mappings
3 - Reindex ES and confirm searching is working

To test:
1 - Apply patch
2 - Alter your mappings file for elastic (just change a description for a field)
3 - perl misc/search_tools/rebuild_elasticsearch.pl -r -v
    Verbose not necessary, but good for letting you know things are progressing
4 - Confirm the mapping change shows in the interface
5 - Confirm reindex worked and searching is working

Signed-off-by: Andrew Fuerste-Henry <andrew@bywatersolutions.com>
Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>
2020-04-29 17:01:55 +01:00
4249b20a39
Bug 22828: Elasticsearch - display errors encountered during indexing on the command line
To test:
 1 - Use the Koha sample data, or insert a blank 245$b into a record (easiest way is using advanced cataloging editor
 2 - Reindex elasticsearch
 3 - Check the ES count on the about page
 4 - Check the count in the DB (SELECT count(*) FROM biblio)
 5 - They don't match!
 6 - perl misc/search_tools/rebuild_elastic_search.pl -v -v
 7 - No errors indicated
 8 - Apply patch
 9 - perl misc/search_tools/rebuild_elastic_search.pl -v
10 - You should be notified of an error
11 - perl misc/search_tools/rebuild_elastic_search.pl -v -v
12 - You should be notified of the specific biblio with an error and a (somewhat) readable reason
13 - perl misc/search_tools/rebuild_elastic_search.pl
14 - No output

Signed-off-by: Ere Maijala <ere.maijala@helsinki.fi>
Signed-off-by: Séverine QUEUNE <severine.queune@bulac.fr>
Signed-off-by: Bouzid Fergani <bouzid.fergani@inlibro.com>
Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>
2020-04-29 16:58:00 +01:00
93393036e5
Bug 23204: (RM follow-up) Use Koha::Script
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>
2020-03-27 12:08:54 +00:00
c7dbc27420
Bug 23204: Adjust copyright and license
Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>
2020-03-27 11:55:52 +00:00
Alex Arnaud
e160af97a6
Bug 23204: Add exec permission on the script
Signed-off-by: Bernardo Gonzalez Kriegel <bgkriegel@gmail.com>
Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>
2020-03-27 11:55:14 +00:00
Alex Arnaud
85f3a3a302
Bug 23204: Move code in a unit tested sub
Signed-off-by: Bernardo Gonzalez Kriegel <bgkriegel@gmail.com>
Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>
2020-03-27 11:54:51 +00:00
Alex Arnaud
e1f8522391
Bug 23204: Add script for exporting Elasticsearch mappings
Test plan:

  - launch perl misc/search_tools/export_elasticsearch_mappings.pl >
    /path/to/my_mappings.yaml
  - set koha-conf.elasticsearch_index_mappings to
    /path/to/my_mappings.yaml,
  - go to admin -> Search engine configuration,
  - click on "Reset mappins",
  - check that your search fields and mappings are as expected

Signed-off-by: Bouzid Fergani <bouzid.fergani@inlibro.com>
Signed-off-by: Bernardo Gonzalez Kriegel <bgkriegel@gmail.com>
Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>
2020-03-27 11:54:35 +00:00
7d8b96803f
Bug 24545: Fix license statements
Bug 9978 should have fixed them all, but some were missing.
We want all the license statements part of Koha to be identical, and
using the GPLv3 statement.

Signed-off-by: David Nind <david@davidnind.com>
Signed-off-by: Marcel de Rooy <m.de.rooy@rijksmuseum.nl>
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>
2020-02-24 13:31:26 +00:00
fe5c65c5cd
Bug 22826: Allow indexing of individual authorities in Elasticsearch
To test:
1 - perl misc/search_tools/rebuild_elastic_search.pl -h
2 - Note it indicates the bn option can be passed to index individual authids
3 - perl misc/search_tools/rebuild_elastic_search.pl -a -bn 92 -v
4 - Note the error
5 - Apply patch
6 - perl misc/search_tools/rebuild_elastic_search.pl -h
7 - Note new option ai|authid for indexing individual authids
8 - Note updated text for bn|biblionumber option
9 - perl misc/search_tools/rebuild_elastic_search.pl -a -bn 92 -v
10 - No errors, but no records indexed
11 - perl misc/search_tools/rebuild_elastic_search.pl -a -ai 92 -v
12 - 1 record indexed
13 - perl misc/search_tools/rebuild_elastic_search.pl -ai 92 -bn 92 -v
14 - 1 authority record and 1 biblio record indexed

Signed-off-by: Michal Denar <black23@gmail.com>
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>
2019-10-28 12:45:28 +00:00
9bc386a98f Bug 22892: Remove warning from rebuild_es if --processes not passed
Use of uninitialized value $processes in numeric lt (<) at
misc/search_tools/rebuild_elasticsearch.pl line 199.

We want the number of processes to be set to 1 by default, and then
assign it to $slice_count

Test plan:
Run the script with and without --processes and confirm that the warning
went away.

Signed-off-by: Katrin Fischer <katrin.fischer.83@web.de>

Signed-off-by: Nick Clemens <nick@bywatersolutions.com>
2019-05-15 12:10:53 +00:00
Ere Maijala
45c1cd1b66 Bug 21872: Fix name of rebuild_elasticsearch.pl
Signed-off-by: Josef Moravec <josef.moravec@gmail.com>
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>

Signed-off-by: Nick Clemens <nick@bywatersolutions.com>
2019-05-10 12:31:50 +00:00
David Gustafsson
f7e47847e4 Bug 21872: Simplify conditions and exit on invalid combination of arguments
Change to zero based indexing for slice index to simplify some
conditions. Exit with error message if trying to combine processes
and biblio numbers arguments.

Signed-off-by: Josef Moravec <josef.moravec@gmail.com>
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>

Signed-off-by: Nick Clemens <nick@bywatersolutions.com>
2019-05-10 12:31:49 +00:00
Ere Maijala
abe7081109 Bug 21872: Add multiprocess support to Elasticsearch indexing utility
Test plan:
1. Time execution without -p parameter
2. Time execution with -p 2 or -p3 or -p 4 depending on CPU core count

Signed-off-by: Josef Moravec <josef.moravec@gmail.com>
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>

Signed-off-by: Nick Clemens <nick@bywatersolutions.com>
2019-05-10 12:31:49 +00:00
d2e189ca1c Bug 22600: Set 'commandline' interface appropriately
This patch change Koha::Cron to be a more generic Koha::Script class and
update all commanline driven scripts to use it.

Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>

Signed-off-by: Josef Moravec <josef.moravec@gmail.com>
Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>

Signed-off-by: Nick Clemens <nick@bywatersolutions.com>
2019-04-10 19:43:11 +00:00
David Gustafsson
1d29d0e449 Bug 19893: Add pods, remove syspref, add tests for serialization format
Add missing pods, remove obsolete syspref and add test for serialization format for records exceeding max record size

Sponsored-by: Gothenburg University Library
Signed-off-by: Ere Maijala <ere.maijala@helsinki.fi>
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>

Signed-off-by: Nick Clemens <nick@bywatersolutions.com>
2018-11-16 11:04:59 +00:00
David Gustafsson
53cf97bb2d Bug 19893: Add index status
Add persistent per index "index status" state to provide useful
user feedback when update of Elasticsearch server mappings fails

Sponsored-by: Gothenburg University Library
Signed-off-by: Ere Maijala <ere.maijala@helsinki.fi>
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>

Signed-off-by: Nick Clemens <nick@bywatersolutions.com>
2018-11-16 11:04:57 +00:00
David Gustafsson
5389ffa38a Bug 19893: Alternative optimized indexing for Elasticsearch
Implement optimized indexing for Elasticsearch

How to test:
1) Time a full elasticsearch re-index without this patch by running the
   rebuild_elastic_search.pl with the -d flag:
   `koha-shell <instance_name> -c "time rebuild_elastic_search.pl -d"`.
2) Apply this patch.
3) Time a full re-index again, it should be about twice at fast (for a
   couple of thousand biblios, with fewer biblios results may be more
   unpredictable).

Sponsored-by: Gothenburg University Library
Signed-off-by: Ere Maijala <ere.maijala@helsinki.fi>
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>

Signed-off-by: Nick Clemens <nick@bywatersolutions.com>
2018-11-16 11:04:56 +00:00
Ere Maijala
23f78a6c2a Bug 20248: Improve Elasticsearch mappings UI and rebuild_elastic_search.pl.
Improvements:
1) Mappings UI now has button that allows one to reset the mappings.
2) Mappings UI displays the items in alphabetical order.
3) Indexing script drops and recreates the index right away, which
helps prevent ES from autocreating a bad index if someone does something
while the first batch of records is being processed.
4) Indexing script has nicer output.

To test:
1) Change mappings.yaml file
2) Reset mappings in UI in mappings.pl
3) Verify the mappings have been changed in UI
4) The field order is alphabetical
5) Rebuild script has clean output
6) Run test t/db_dependent/Koha_Elasticsearch_Indexer.t

Signed-off-by: Bouzid Fergani <bouzid.fergani@inlibro.com>
Signed-off-by: Julian Maurice <julian.maurice@biblibre.com>

Signed-off-by: Nick Clemens <nick@bywatersolutions.com>
2018-08-22 14:47:43 +00:00
df3df9cf71 Bug 17372: Standardize Elasticsearch paths
What we currently have:
Koha/ElasticSearch.pm
Koha/ElasticSearch/Indexer.pm
Koha/SearchEngine/Elasticsearch/QueryBuilder.pm
Koha/SearchEngine/Elasticsearch/Search.pm

What we want:
Koha/SearchEngine/Elasticsearch.pm
Koha/SearchEngine/Elasticsearch/Indexer.pm
Koha/SearchEngine/Elasticsearch/QueryBuilder.pm
Koha/SearchEngine/Elasticsearch/Search.pm

Test plan:
  % git grep -i Koha::ElasticSearch
  % git grep ElasticSearch|grep -v Catmandu::Store::ElasticSearch
should not return any result

Do a full reindex and search for records

Signed-off-by: Nick Clemens <nick@bywatersolutions.com>

Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>

Signed-off-by: Brendan Gallagher <brendan@bywatersolutions.com>
2016-10-11 01:07:03 +00:00
1e39ecc7f1 Bug 16708: Fix authority reindex for ElasticSearch
The changes made to Koha::Authority has not been correctly fixed.
The code of Koha::Authority has been moved bo
Koha::MetadataRecord::Authority by bug 15380.

Test plan:
  perl misc/search_tools/rebuild_elastic_search.pl -a -v
should success

Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>
2016-06-24 11:59:40 +00:00
Robin Sheat
282ea1cdb4 Bug 12478: abort early if there's no elasticserch definition
Signed-off-by: Nick Clemens <nick@bywatersolutions.com>
Signed-off-by: Jesse Weaver <jweaver@bywatersolutions.com>
Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>

Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>

Signed-off-by: Brendan Gallagher <brendan@bywatersolutions.com>
2016-04-26 20:20:10 +00:00
Robin Sheat
52c4b55a48 Bug 12478: fix errors from rebase and new upstream version
Signed-off-by: Nick Clemens <nick@bywatersolutions.com>
Signed-off-by: Jesse Weaver <jweaver@bywatersolutions.com>
Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>

Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>

Signed-off-by: Brendan Gallagher <brendan@bywatersolutions.com>
2016-04-26 20:20:09 +00:00
Robin Sheat
0b9483a691 Bug 12478: fix some issues on rebase
There were rebase conflicts that it was just easier to postpone until
afterwards.

Signed-off-by: Nick Clemens <nick@bywatersolutions.com>
Signed-off-by: Jesse Weaver <jweaver@bywatersolutions.com>
Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>

Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>

Signed-off-by: Brendan Gallagher <brendan@bywatersolutions.com>
2016-04-26 20:20:09 +00:00
2aa72529c0 Bug 12478: Fix error on indexing a specific record
% perl misc/search_tools/rebuild_elastic_search.pl -bn 42
Can't locate object method "idnumber" via package "MARC::Record" at
misc/search_tools/rebuild_elastic_search.pl line 171.

Signed-off-by: Nick Clemens <nick@bywatersolutions.com>
Signed-off-by: Jesse Weaver <jweaver@bywatersolutions.com>
Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>

Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>

Signed-off-by: Brendan Gallagher <brendan@bywatersolutions.com>
2016-04-26 20:20:09 +00:00
c6ef16a50f Bug 12478: Fix pod in the rebuild_ES.pl script
Signed-off-by: Nick Clemens <nick@bywatersolutions.com>
Signed-off-by: Jesse Weaver <jweaver@bywatersolutions.com>
Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>

Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>

Signed-off-by: Brendan Gallagher <brendan@bywatersolutions.com>
2016-04-26 20:20:08 +00:00
8dd3901080 Bug 12478: Fix the verbose flag on reindexing
Signed-off-by: Nick Clemens <nick@bywatersolutions.com>
Signed-off-by: Jesse Weaver <jweaver@bywatersolutions.com>
Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>

Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>

Signed-off-by: Brendan Gallagher <brendan@bywatersolutions.com>
2016-04-26 20:20:08 +00:00
2ee681c630 Bug 12478: Change the commit count to 5k
It will improve the indexing time.

Signed-off-by: Nick Clemens <nick@bywatersolutions.com>
Signed-off-by: Jesse Weaver <jweaver@bywatersolutions.com>
Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>

Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>

Signed-off-by: Brendan Gallagher <brendan@bywatersolutions.com>
2016-04-26 20:20:08 +00:00
Robin Sheat
374f1e6384 Bug 12478: ES is now updated when records are updated/deleted
Signed-off-by: Nick Clemens <nick@bywatersolutions.com>
Signed-off-by: Jesse Weaver <jweaver@bywatersolutions.com>
Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>

Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>

Signed-off-by: Brendan Gallagher <brendan@bywatersolutions.com>
2016-04-26 20:20:07 +00:00
Robin Sheat
0536ef37cc Bug 12478 - authorities can now be stored in ES
(Not fetched yet though.)

Signed-off-by: Nick Clemens <nick@bywatersolutions.com>
Signed-off-by: Jesse Weaver <jweaver@bywatersolutions.com>
Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>

Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>

Signed-off-by: Brendan Gallagher <brendan@bywatersolutions.com>
2016-04-26 20:20:03 +00:00
Robin Sheat
7dbd13e66f Bug 12478 - pile of elasticsearch code
Signed-off-by: Nick Clemens <nick@bywatersolutions.com>
Signed-off-by: Jesse Weaver <jweaver@bywatersolutions.com>
Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>

Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>

Signed-off-by: Brendan Gallagher <brendan@bywatersolutions.com>
2016-04-26 20:20:03 +00:00