Bug 24807: [20.05.x] Add "year" type to improve sorting behaviour
authorDavid Gustafsson <david.gustafsson@ub.gu.se>
Wed, 4 Mar 2020 16:07:11 +0000 (17:07 +0100)
committerLucas Gass <lucas@bywatersolutions.com>
Fri, 16 Oct 2020 14:27:07 +0000 (14:27 +0000)
commitcbd56aa5fc7664c2dfb482398312f74751f1881f
tree956dd184e18013a2e1bffe1ef0e7036a897539c5
parent7054818646e3751dbcee87e82d6c8b51506f70b4
Bug 24807: [20.05.x] Add "year" type to improve sorting behaviour

Add a "year" search field type. Fields with this type will only
retain values that looks like years, so invalid values such as
whitespace or word characters will not be indexed.
This for instance improves the behaviour when sorting by
"date-of-publication". If all values are indexed, records with
junk data instead of valid years will appear first among the search
results, drowning out more relevant hits. If assigning this field
the "year" type these records will instead always appear last,
regarless of sort order.

To test:

1) Have at least two biblios, one with a valid year in 008 (pos 7-10)
and another with an invalid one ("uuuu" for example)
2) Perform a wildcard search (*) and sort results by publication date.
3) The record with invalid year of pulication in 008 should appear first
4) Apply patch and run database updates
5) Reindex ElasticSearch
6) Perform the same search as in 2)
7) The record with the invalid year should now appear last

Signed-off-by: Nick Clemens <nick@bywatersolutions.com>
Signed-off-by: Katrin Fischer <katrin.fischer.83@web.de>
Bug 24807: Add database update script

Signed-off-by: Nick Clemens <nick@bywatersolutions.com>
Signed-off-by: Katrin Fischer <katrin.fischer.83@web.de>
Bug 24807: Update tests

Signed-off-by: Nick Clemens <nick@bywatersolutions.com>
Signed-off-by: Katrin Fischer <katrin.fischer.83@web.de>
Bug 24807: Add suppport for uncertain fields and ranges

To test:
1 - Have some records with uncertain dates in the 008
    19uu, 195u, etc.
2 - Index them in Elasticsearch
3 - Do a search that will return them
4 - Sort results by publication/copyright date
5 - Note odd results
6 - Apply patch
7 - Reindex
8 - Sorting should be improved

Signed-off-by: Nick Clemens <nick@bywatersolutions.com>
Signed-off-by: Katrin Fischer <katrin.fischer.83@web.de>
Bug 24807: Refactor using tokenize_callbacks

Signed-off-by: Nick Clemens <nick@bywatersolutions.com>
Signed-off-by: Katrin Fischer <katrin.fischer.83@web.de>
Bug 24807: Simplify with new and imporved value_callbacks

Signed-off-by: Nick Clemens <nick@bywatersolutions.com>
Signed-off-by: Katrin Fischer <katrin.fischer.83@web.de>
Bug 24807: (follow-up) Fix spelling

Signed-off-by: Nick Clemens <nick@bywatersolutions.com>
Signed-off-by: Katrin Fischer <katrin.fischer.83@web.de>
Bug 24807: (follow-up) Add support for spaces as unknown characters

Signed-off-by: Katrin Fischer <katrin.fischer.83@web.de>
Bug 24807: (QA follow-up) Remove uneccessary tests

These tests fail now, the code expects a real response from ES in Indexer.pm
but these tests mock 'bulk' and so don't have the necessary fields.

We are testing the same code above and can just add the _id == biblionumber test

Signed-off-by: Lucas Gass <lucas@bywatersolutions.com>
Koha/SearchEngine/Elasticsearch.pm
Koha/SearchEngine/Elasticsearch/Indexer.pm
admin/searchengine/elasticsearch/field_config.yaml
installer/data/mysql/atomicupdate/bug_24807-add-year-search-field-type.perl [new file with mode: 0644]
koha-tmpl/intranet-tmpl/prog/en/modules/admin/searchengine/elasticsearch/mappings.tt
t/db_dependent/Koha/SearchEngine/Elasticsearch.t
t/db_dependent/Koha/SearchEngine/Elasticsearch/Indexer.t