Bug 24807: [20.05.x] Add "year" type to improve sorting behaviour
Add a "year" search field type. Fields with this type will only
retain values that looks like years, so invalid values such as
whitespace or word characters will not be indexed.
This for instance improves the behaviour when sorting by
"date-of-publication". If all values are indexed, records with
junk data instead of valid years will appear first among the search
results, drowning out more relevant hits. If assigning this field
the "year" type these records will instead always appear last,
regarless of sort order.
To test:
1) Have at least two biblios, one with a valid year in 008 (pos 7-10)
and another with an invalid one ("uuuu" for example)
2) Perform a wildcard search (*) and sort results by publication date.
3) The record with invalid year of pulication in 008 should appear first
4) Apply patch and run database updates
5) Reindex ElasticSearch
6) Perform the same search as in 2)
7) The record with the invalid year should now appear last
Signed-off-by: Nick Clemens <nick@bywatersolutions.com> Signed-off-by: Katrin Fischer <katrin.fischer.83@web.de>
Bug 24807: Add database update script
Signed-off-by: Nick Clemens <nick@bywatersolutions.com> Signed-off-by: Katrin Fischer <katrin.fischer.83@web.de>
Bug 24807: Update tests
Signed-off-by: Nick Clemens <nick@bywatersolutions.com> Signed-off-by: Katrin Fischer <katrin.fischer.83@web.de>
Bug 24807: Add suppport for uncertain fields and ranges
To test:
1 - Have some records with uncertain dates in the 008
19uu, 195u, etc.
2 - Index them in Elasticsearch
3 - Do a search that will return them
4 - Sort results by publication/copyright date
5 - Note odd results
6 - Apply patch
7 - Reindex
8 - Sorting should be improved
Signed-off-by: Nick Clemens <nick@bywatersolutions.com> Signed-off-by: Katrin Fischer <katrin.fischer.83@web.de>
Bug 24807: Refactor using tokenize_callbacks
Signed-off-by: Nick Clemens <nick@bywatersolutions.com> Signed-off-by: Katrin Fischer <katrin.fischer.83@web.de>
Bug 24807: Simplify with new and imporved value_callbacks
Signed-off-by: Nick Clemens <nick@bywatersolutions.com> Signed-off-by: Katrin Fischer <katrin.fischer.83@web.de>
Bug 24807: (follow-up) Fix spelling
Signed-off-by: Nick Clemens <nick@bywatersolutions.com> Signed-off-by: Katrin Fischer <katrin.fischer.83@web.de>
Bug 24807: (follow-up) Add support for spaces as unknown characters