Commit graph

97 commits

Author SHA1 Message Date
5f6e1a1cf7
Bug 34740: Update sort options in ES config to by Yes/No
At some point the option for 'undef' was removed from te sort options
and was collapsed to yes/no

The dropdowns when adding a new field were missed, this patch corrects that.

While undef in a mappings file wil stil load, when saving we should not privde undef any longer

To test:
1 - Browse to bottom to add a new field on the 'Bibliographic records' tab in
    Administration > Search engine configuration (Elasticsearch)
2 - Set sortable column to undef, set other columns and provide a valid field
3 - Click '+Add'
4 - Click 'Save'
5 - At top of page you receive an error:
 An error occurred when updating mappings: DBIx::Class::Storage::DBI::_dbh_execute(): DBI Exception: DBD::mysql::st execute failed: Column 'sort' cannot be null at /kohadevbox/koha/Koha/SearchField.pm line 37 .
6 - Apply patch, restart all
7 - Add a new mapping, your only choices are Yes/No
8 - Save mapping
9 - Confirm it saves correctly

Signed-off-by: Salah Ghedda <salah.ghedda@inLibro.com>
Signed-off-by: Katrin Fischer <katrin.fischer.83@web.de>
Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>
2023-09-18 15:31:52 -03:00
af505c9117
Bug 33594: Only sort on title main heading
This patch simply remvoes sort from all elements that are not strictly
the main title

Note:
If multiple fields are set as sort, they are collapsed into a single entry in
the {field}__sort field in the ES index. The order will be determined by the order in
the marc record

To test:
1 - Apply patch
2 - perl misc/search_tools/rebuild_elasticsearch -r -v
3 - Search the catalog
4 - Sort by title
5 - Confirm records are correct
6 - Add a 240 (before the 245) with subfield a 'AAAAA'
7 - Confirm sorting is not affected
8 - View record details, click 'Elasticsearch record: Show'
9 - Find 'title__sort' and confirm it looks correct (does not include AAAAA)

Signed-off-by: David Nind <david@davidnind.com>
Signed-off-by: Katrin Fischer <katrin.fischer.83@web.de>
Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>
2023-05-18 11:03:28 -03:00
7f1f0bc5b7
Bug 33594: Update mappings and comment
Signed-off-by: David Nind <david@davidnind.com>
Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>
2023-05-18 11:03:28 -03:00
106adb320c
Bug 31695: Type standard number is missing field ci_raw in field_config.yaml
In Elasticsearch fields config field_config.yaml, default type as a field 'ci_raw'. This is used for exact search.
This field is missing for type  standard number 'stdno'.

Test plan :
1) In the staff interface, go to Administration, and search for SearchEngine
2) Make sure that the SearchEngine preference is set to Elasticsearch and save
3) Return to Administration and select "Search engine configuration"
4) Change the type of "Heading-Main" to "Std. Number" and save
5) Rebuild the index (e.g. "koha-elasticsearch --rebuild -d kohadev")
6) Go to the main staff page and select Authorities
7) Search for a heading (e.g. "A Dual-language book")
=> Result is found with or without patch
8) Click on the sliders and select "is exactly" for the operator and search
=> Result is found only with patch
9) Apply the patch
10) Rebuild the index (e.g. "koha-elasticsearch --rebuild -d kohadev")
11) Click on the sliders and select "is exactly" for the operator and search
=> Result is found only with patch

Signed-off-by: Kevin Carnes <kevin.carnes@ub.lu.se>

Signed-off-by: Marcel de Rooy <m.de.rooy@rijksmuseum.nl>
Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>
2023-04-14 11:35:35 -03:00
b0767f5eb6
Bug 33159: Simplify ES handling and fix zebra handling
Before this patch we used two indexes for the thesaurus values, we can
simply index both needed fields into a single index and just form the
search correctly.

This patch also ensures we pass the 'thesaurus' vlaue for the heading
directly to the query builder - for zebra it goes through, and for ES
we convert it to the expected code.

This patch also moves the necessary mappings out of the user definable
mappings and hardcodes them. There is precedent for this with
'match-heading', it ensures matching works as expected

To test:
1 - Follow previous test plan in Zebra and ES

Signed-off-by: Phil Ringnalda <phil@chetcolibrary.org>
Signed-off-by: Frank Hansen <frank.hansen@ub.lu.se>
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>
Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>
2023-03-31 11:56:53 +02:00
57ea65e725
Bug 15048: Index all possible searched subfields for index-term-genre
Currently we only index a - but we can setup the system such that avxyz are searched

To test:
 1 - define both a 655$a *and* 655$x value in a bib, save, reindex
 2 - Set system preferences:
      TraceSubjectSubdivisions: Include
      TraceCompleteSubfields: Force
 3 - View the record edited above in the opac
 4 - Click on the subject heading
 5 - No results found
 6 - Copy zebra files:
  sudo cp ./etc/zebradb/marc_defs/marc21/biblios/biblio-koha-indexdefs.xml \
  /etc/koha/zebradb/marc_defs/marc21/biblios/biblio-koha-indexdefs.xml
  sudo cp etc/zebradb/marc_defs/marc21/biblios/biblio-zebra-indexdefs.xsl \
  /etc/koha/zebradb/marc_defs/marc21/biblios/biblio-zebra-indexdefs.xsl
 7 - restart all and reindex
 8 - Click on the subject heading in OPAC
 9 - Sucess!
10 - Repeat with other fields (vyz)
11 - Repeat under ES, reindexing and resetting mappings

Signed-off-by: Katrin Fischer <katrin.fischer.83@web.de>

Signed-off-by: Marcel de Rooy <m.de.rooy@rijksmuseum.nl>
Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>
2022-10-24 14:39:38 -03:00
Caroline Cyr La Rose
ae0688f29c
Bug 31690: Add see from tracings in See-from index (Elasticsearch, MARC21)
This patch adds the following fields to the See-from index

- 450(abvxyz)
- 451(avxyz)
- 455(avxyz)

Signed-off-by: David Nind <david@davidnind.com>

Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>
Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>
2022-10-24 14:09:18 -03:00
Caroline Cyr La Rose
bb405d2773
Bug 31689: Add see from tracings in Match-heading-see-from index (Elasticsearch, MARC21)
This patch adds the following fields to the Match-heading-see-from index

- 430(adfghklmnoprstvxyz)
- 448(avxyz)
- 450(abvxyz)
- 451(avxyz)
- 455(avxyz)

Signed-off-by: David Nind <david@davidnind.com>

Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>
Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>
2022-10-24 11:20:58 -03:00
Caroline Cyr La Rose
20557fdfe1
Bug 31693: Remove non-existent fields from the See-also-from index (Elasticsearch, MARC21)
This patch removes fields from the See-from index that don't exist in
MARC21.

The existing fields can be found here: https://www.loc.gov/marc/authority/ad5xx.html

The following fields are removed:

- 511$b
- 511$m
- 511$o
- 511$r
- 530$b
- 530$c
- 530$e
- 547$b
- 547$e
- 547$f
- 547$h
- 547$j
- 547$k
- 547$l
- 547$m
- 547$n
- 547$o
- 547$p
- 547$q
- 547$r
- 547$s
- 547$t
- 548$b
- 548$c
- 548$d
- 548$e
- 548$f
- 548$g
- 548$h
- 548$k
- 548$l
- 548$m
- 548$n
- 548$o
- 548$p
- 548$r
- 548$s
- 548$t
- 550$c
- 550$d
- 550$e
- 550$f
- 550$h
- 550$j
- 550$k
- 550$l
- 550$m
- 550$n
- 550$o
- 550$p
- 550$q
- 550$r
- 550$s
- 550$t
- 551$b
- 551$c
- 551$d
- 551$e
- 551$f
- 551$h
- 551$k
- 551$l
- 551$m
- 551$n
- 551$o
- 551$p
- 551$r
- 551$s
- 551$t
- 555$b
- 555$c
- 555$d
- 555$e
- 555$f
- 555$g
- 555$h
- 555$j
- 555$k
- 555$l
- 555$m
- 555$n
- 555$o
- 555$p
- 555$q
- 555$r
- 555$s
- 555$t
- 562$b
- 562$c
- 562$d
- 562$e
- 562$f
- 562$g
- 562$h
- 562$k
- 562$l
- 562$m
- 562$n
- 562$o
- 562$p
- 562$r
- 562$s
- 562$t

Furthermore, the format of the mapping for 511 has been corrected.

Signed-off-by: David Nind <david@davidnind.com>

Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>
Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>
2022-10-21 11:33:35 -03:00
Caroline Cyr La Rose
2da1ee6349
Bug 31691: Remove non-existent fields from the See-from index (Elasticsearch, MARC21)
This patch removes fields from the See-from index that don't exist in
MARC21.

The existing fields can be found here: https://www.loc.gov/marc/authority/ad4xx.html

The following fields are removed:

- 411$b
- 411$m
- 411$o
- 411$r
- 430$b
- 430$c
- 430$e
- 440(abcdefghjklmnopqrstvxyz) (all of them, 440 doesn't exist at all)
- 441(abcdefghklmnoprstvxyz) (all of them, 441 doesn't exist at all)
- 444(abcdefghjklmnopqrstvxyz) (all of them, 444 doesn't exist at all)
- 447$b
- 447$e
- 447$f
- 447$h
- 447$j
- 447$k
- 447$l
- 447$m
- 447$n
- 447$o
- 447$p
- 447$q
- 447$r
- 447$s
- 447$t
- 448$b
- 448$c
- 448$d
- 448$e
- 448$f
- 448$g
- 448$h
- 448$k
- 448$l
- 448$m
- 448$n
- 448$o
- 448$p
- 448$r
- 448$s
- 448$t
- 462$b
- 462$c
- 462$d
- 462$e
- 462$f
- 462$g
- 462$h
- 462$k
- 462$l
- 462$m
- 462$n
- 462$o
- 462$p
- 462$r
- 462$s
- 462$t
- 462$v
- 462$x
- 462$y
- 462$z

Signed-off-by: David Nind <david@davidnind.com>

Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>
Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>
2022-10-21 11:33:28 -03:00
Caroline Cyr La Rose
384bc416b8
Bug 31687: Add see from and see also from tracings in Match index
This patch adds see from and see also from terms for uniform title,
chronological term, topical term, geographic name, and genre/form term
to the Match index in Elasticsearch for MARC21.

Previously, only see from/see also from for personal names,
corporate names, and meeting names were indexed.

To test:

1. Without patch, import attached authority records
1.1. Download attached records
1.2. Go to Tools > Stage MARC records for import
1.3. Click 'Browse' and choose the downloaded file
1.4. Click 'Upload file'
1.5. Choose Record type = Authority
1.6. Click 'Stage for import'
1.7. From the job details, click 'View batch'
1.8. Click 'Import this batch into the catalog'

2. Without patch, search for see from and see also from tracings
2.1. Go to Authorities
2.2. In the 'Default' drop-down menu, choose 'Uniform title'
2.3. Choose the 'Search all headings' tab
2.4. Enter the search term 'Five hundred'
2.5. Click 'Submit' or press 'Enter'
 --> No results
2.6. Redo the search for the following search terms

Authority type		Search term		Should be found in
Uniform title		five hundred		430
Uniform title		films préférés		530
Chronological term	fifteenth		448
Chronological term	middle ages		548
Topical term		lalopathie		450
Topical term		troubles communication	550
Geographic name	cécropia		451
Geographic name	canada francophone	551
Genre/Form term	filmiques		455
Genre/Form term	films			555

3. Apply patch
4. Delete index, reset mappings and reindex authorities (with command line, -a -d -r)
5. Redo the searches from step 2, there should now be results

Signed-off-by: David Nind <david@davidnind.com>
Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>
Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>
2022-10-21 11:33:06 -03:00
Frank Hansen
aac04da489
Bug 30280: Elasticsearch - Add 040f to Subject-heading-thesaurus-conventions (new) authority mapping index field (MARC21)
This patch adds 040 $f to a new field Subject-heading-thesaurus-conventions authority index mapping.

To test:
1) Apply patch
2) Reindex using rebuild_elasticsearch.pl -r

If you don't have access to a terminal (in a sandbox for example)
2a) Go to Administration > Search engine configuration, click "Reset mappings" and confirm
2b) Then reindex

Sponsored-by: Lund University Library, Sweden

Signed-off-by: David Nind <david@davidnind.com>

Signed-off-by: Katrin Fischer <katrin.fischer.83@web.de>
Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>
2022-10-03 10:00:00 -03:00
Julian Maurice
d79361c51e
Bug 25375: Fix 'available' facet in elasticsearch
Add a new boolean ES field named 'available', which is true if at least
one item is available, which means the item is not on loan, not
"notforloan", not withdrawn, not lost and not damaged

A full indexation is required

Test plan:
1. Apply patch and run updatedatabase.pl
2. Run `misc/search_tools/rebuild_elasticsearch.pl -d -b`
3. Make sure you have some biblios whose items are all unavailable, some
   biblios whose items are all available, and some biblios with at least
   one item available and at least one item unavailable
4. Use the 'available' filter on both opac and intranet and make sure it
   works as expected.

Signed-off-by: Andrew Fuerste-Henry <andrew@bywatersolutions.com>

Signed-off-by: Nick Clemens <nick@bywatersolutions.com>

Signed-off-by: Nick Clemens <nick@bywatersolutions.com>

Signed-off-by: Joonas Kylmälä <joonas.kylmala@iki.fi>
Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>
2022-09-23 09:01:23 -03:00
Caroline Cyr La Rose
3affdad4c2
Bug 31537: Elasticsearch - index mapping for 003 control-number-identifier is twice in mappings.yaml
This patch removes one of the two mappings for the 003 field to the
control-number-identifier index (for MARC21).

To test:
1) Apply patch
2) reindex with mappings reset
3) try to search for cni:code (for example cni:OSt)
--> it should return the desired results
4) try to search for control-number-identifier:code (for example
control-number-identifier:OSt)
--> it should return the desired results
5) Optionally, try the test plan in Bug 11175 to make sure it still
works

Signed-off-by: David Nind <david@davidnind.com>

Signed-off-by: Katrin Fischer <katrin.fischer.83@web.de>
Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>
2022-09-12 15:33:20 -03:00
c8c51867c4
Bug 30879: Allow biblionumber as sort option in Elasticsearch
Repeat previous tests with Elasticsearch engine
You will need to reindex and reset mappings to pickup the changes form the file

Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>

Signed-off-by: Katrin Fischer <katrin.fischer.83@web.de>
Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>
2022-08-02 09:33:13 -03:00
90ef5ec8c8
Bug 30882: Add max_result_window to index config
Elasticsearch number of results is by default limited by setting "index.max-result-window", default value is 10000.
https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules.html#index-max-result-window

We use this setting:
44d6528b56/Koha/SearchEngine/Elasticsearch/Search.pm (L411)

I propose we add this setting in index config.

Test plan:
1) Use Elasticsearch
2) Apply patch and flush memcached
3) Rebuild indexes: misc/search_tools/rebuild_elasticsearch.pl -v -b -d
4) Check the settings of index (when using koha-testing-docker*):
   curl 'es:9200/koha_kohadev_biblios/_settings?pretty&filter_path=**.max_result_window'
5) You should see:
   "max_result_window" : "1000000"

* You also need to add this setting to the es section in koha-testing-docker's
docker-compose.yml (after the networks configuration):
     ports:
         - "9200:9300"

Signed-off-by: David Nind <david@davidnind.com>
Signed-off-by: Victor Grousset/tuxayo <victor@tuxayo.net>
Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>
2022-07-18 11:23:03 -03:00
b8b8a62f11
Bug 29632: Don't sort cn-sort numerically
When defining our sort fields in we defined all as 'numeric'

For other string containing numbers this is likely correct, however,
for callnumbers it is not. e.g. E45 should sort before E7

This patch adds a new 'callnumber' type and deifnes this for cn-sort and
adds to the field maping a sort without numeric set

To test:
0 - Be using ES with Koha
1 - On records with single item, add callnumbers:
    VA65 E7 R63 1984
    VA65 E7 T35 1990
    VA65 E45 R67 1985
2 - Add public note 'shrimp' or something to make them easily searchable as a group
3 - Search for 'shrimp', sort by callnumber
4 - Note E45 comes last, it should come first
5 - Apply patch
6 - Reset ES mappings
7 - Reindex ES
8 - Repeat search
9 - Sorting should be correct when set to callnumber

Signed-off-by: David Nind <david@davidnind.com>
Signed-off-by: Michal Urban <michalurban177@gmail.com>
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>
Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>
2022-07-18 11:21:47 -03:00
Thomas Klausner
8a8c5b86e2 Bug 30142: Remove spaces from ElasticSearch mapping MARC fields
A first step to "validate" the MARC mappings: Remove all whitespace, so
if a user enters "245a " (with a trailing whitespace, which can easily
happen when copy/pasting) we only store "245a" in the DB. This is
neccessary, because the ES indexer will throw an exception in an invalid
MARC mapping.

Test Plan:
* Go to /cgi-bin/koha/admin/searchengine/elasticsearch/mappings.pl
* Go to the Bibliographic Records Tab
* Enter "100 a b c " (notice the whitespaces!) in the first "mapping"
  field
* Scroll down and save
* Go back to the Bibliographic Records Tab
* The spaces are still there

Now apply the patch

* Repeat the above steps
* After saving you should see "100abc" without any spaces in the
  "mapping" field

Sponsored-by: Steiermärkische Landesbibliothek

Signed-off-by: Katrin Fischer <katrin.fischer.83@web.de>

Signed-off-by: Nick Clemens <nick@bywatersolutions.com>
Signed-off-by: Fridolin Somers <fridolin.somers@biblibre.com>
2022-04-13 15:55:39 +02:00
401ce06ffe Bug 29436: ES mappings not saved if zebra is configured
The mappings must be editable even if ES is not turned on yet.

Using a separate array to store the errors as we are testing for $@ ||
@messages.

There is still something wrong that should be improve, but this patch
should be safe for backport.

Signed-off-by: Fridolin Somers <fridolin.somers@biblibre.com>

Signed-off-by: Nick Clemens <nick@bywatersolutions.com>
Signed-off-by: Fridolin Somers <fridolin.somers@biblibre.com>
2021-12-21 20:49:53 -10:00
cc7fb7a7fb Bug 11175: Add Elasticsearch support
Signed-off-by: Nick Clemens <nick@bywatersolutions.com>
Signed-off-by: Pasi Kallinen <pasi.kallinen@koha-suomi.fi>
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>
Signed-off-by: Andrew Nugged <nugged@gmail.com>

Signed-off-by: Marcel de Rooy <m.de.rooy@rijksmuseum.nl>

Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
2021-10-26 16:46:02 +02:00
0760045e0b Bug 18984: Remove ES mapping for NORMARC
perl misc/search_tools/export_elasticsearch_mappings.pl > admin/searchengine/elasticsearch/mappings.yaml
grep -v 'mandatory: ~' admin/searchengine/elasticsearch/mappings.yaml|grep -v 'opac: 1'|grep -v 'staff_client: 1'

Signed-off-by: Marcel de Rooy <m.de.rooy@rijksmuseum.nl>
Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>

Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
2021-10-07 15:36:40 +02:00
deece7b76b Bug 28830: Add cni index for 003
This patch adds the cni/Control-number-identifier index to enable
searches to use the 003 field.

Test plan
1/ Apply patch
2/ Re-index using updated configurations
3/ Confirm cni:number searches yield the expected results
4/ Signoff

Split-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>
Signed-off-by: Nick Clemens <nick@bywatersolutions.com>
Signed-off-by: Pasi Kallinen <pasi.kallinen@koha-suomi.fi>
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>

Signed-off-by: Katrin Fischer <katrin.fischer.83@web.de>

Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
2021-08-30 17:02:07 +02:00
Caroline Cyr La Rose
395678474b Bug 28393: Elasticsearch - Add 050a to lc-call-number index mapping (MARC21)
This patch adds 050 $a to the mapping for the lc-call-number index.

To test:
1) Apply patch
2) Reindex using rebuild_elasticsearch.pl -r

If you don't have access to a terminal (in a sandbox for example)
2a) Go to Administration > Search engine configuration, click "Reset mappings"
and confirm
2b) Then reindex

I'm not sure how to search specifically for an LC call number.
You can confirm that 050 $a is displayed in the Search engine configuration page.

Signed-off-by: David Nind <david@davidnind.com>

Signed-off-by: Katrin Fischer <katrin.fischer.83@web.de>

Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
2021-08-16 09:56:47 +02:00
Caroline Cyr La Rose
ba30e6a8d9 Bug 28381: Elasticsearch - Add 710 and 711 to default mappings for author index (MARC21)
This patch adds fields 710$a and 711$a to the default author mappings for elasticsearch indexing.

To test:
1) Apply patch
2) Reindex using rebuild_elasticsearch.pl -r

If you don't have access to a terminal (in a sandbox for example)
2a) Go to Administration > Search engine configuration, click "Reset mappings"
and confirm
2b) Then reindex

3) Search for an author name found only in 710 using the Author index
in advanced search
4) Repeat search for an author name in 711

Signed-off-by: David Nind <david@davidnind.com>

Signed-off-by: Katrin Fischer <katrin.fischer.83@web.de>

Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
2021-08-16 09:56:45 +02:00
Caroline Cyr La Rose
dc673a85af Bug 28380: Elasticsearch - correct 024aa in mappings (MARC21)
This patch corrects a typo in the mappings.yaml file

To test:
1) Apply patch
2) Reindex using rebuild_elasticsearch.pl -r

If you don't have access to a terminal (in a sandbox for example)
2a) Go to Administration > Search engine configuration, click "Reset mappings"
and confirm
2b) Then reindex

3) Search for a standard number found in 024$a using the Standard number index
in the advanced search

Signed-off-by: David Nind <david@davidnind.com>

Signed-off-by: Katrin Fischer <katrin.fischer.83@web.de>

Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
2021-08-16 09:56:43 +02:00
Caroline Cyr La Rose
5d53b8e85a Bug 28379: Elasticsearch - Add 710 to author-name-corporate index (MARC21)
This patch adds 710 to the default author-name-corporate index mappings for
elasticsearch.

To test:
1) Apply patch
2) Reindex using rebuild_elasticsearch.pl -r

If you don't have access to a terminal (in a sandbox for example)
2a) Go to Administration > Search engine configuration, click "Reset mappings"
and confirm
2b) Then reindex

3) Search for an author name found only in 710 using the Author (Corporate name) index
in advanced search

Signed-off-by: David Nind <david@davidnind.com>

Signed-off-by: Katrin Fischer <katrin.fischer.83@web.de>

Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
2021-08-16 09:56:42 +02:00
Caroline Cyr La Rose
21b9e455f1 Bug 28378: Elasticsearch - Add 264c to default copydate mappings (MARC21)
Signed-off-by: David Nind <david@davidnind.com>

Signed-off-by: Katrin Fischer <katrin.fischer.83@web.de>

Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
2021-08-16 09:56:40 +02:00
Caroline Cyr La Rose
0baa439e3a Bug 28339: Elasticsearch - Add 8XX to default title-series index mappings (MARC21)
This patch adds series added entries titles (800 $t, 810 $t, 811 $t, and 830 $a) in the
default title-series index mappings.

To test:

1) Apply patch
2) Reindex using rebuildelasticsearch.pl -r

If you don't have access to a terminal (in a sandbox for example)
2a) Go to Administration > Search engine configuration and click "Reset mappings" and confirm
2b) Then reindex

3) Search for a series title found only in and added entry field

Signed-off-by: David Nind <david@davidnind.com>

Signed-off-by: Katrin Fischer <katrin.fischer.83@web.de>

Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
2021-08-16 09:56:38 +02:00
89b634abc1 Bug 17600: Fix missing imports from mappings.pl
Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
2021-08-06 12:04:16 +02:00
Caroline Cyr La Rose
9127ab6773 Bug 27848: Elasticsearch - include 245p to default title index mappings
Signed-off-by: David Nind <david@davidnind.com>

Signed-off-by: Lucy Vaux-Harvey <lucy.vaux-harvey@ptfs-europe.com>

Signed-off-by: Katrin Fischer <katrin.fischer.83@web.de>

Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
2021-08-04 14:06:43 +02:00
Caroline Cyr La Rose
8bfb85d7e1 Bug 27848: Elasticsearch - include 245b subtitle subfield in the default title index mappings
Signed-off-by: David Nind <david@davidnind.com>

Signed-off-by: Lucy Vaux-Harvey <lucy.vaux-harvey@ptfs-europe.com>

Signed-off-by: Katrin Fischer <katrin.fischer.83@web.de>

Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
2021-08-04 14:06:43 +02:00
Caroline Cyr La Rose
f331238ef4 Bug 28391: Elasticsearch - Add 264b to publisher index mapping
Signed-off-by: David Nind <david@davidnind.com>

Signed-off-by: Katrin Fischer <katrin.fischer.83@web.de>

Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
2021-08-04 14:06:43 +02:00
9d6d641d1f Bug 17600: Standardize our EXPORT_OK
On bug 17591 we discovered that there was something weird going on with
the way we export and use subroutines/modules.
This patch tries to standardize our EXPORT to use EXPORT_OK only.

That way we will need to explicitely define the subroutine we want to
use from a module.

This patch is a squashed version of:
Bug 17600: After export.pl
Bug 17600: After perlimport
Bug 17600: Manual changes
Bug 17600: Other manual changes after second perlimports run
Bug 17600: Fix tests

And a lot of other manual changes.

export.pl is a dirty script that can be found on bug 17600.

"perlimport" is:
git clone https://github.com/oalders/App-perlimports.git
cd App-perlimports/
cpanm --installdeps .
export PERL5LIB="$PERL5LIB:/kohadevbox/koha/App-perlimports/lib"
find . \( -name "*.pl" -o -name "*.pm" \) -exec perl App-perlimports/script/perlimports --inplace-edit --no-preserve-unused --filename {} \;

The ideas of this patch are to:
* use EXPORT_OK instead of EXPORT
* perltidy the EXPORT_OK list
* remove '&' before the subroutine names
* remove some uneeded use statements
* explicitely import the subroutines we need within the controllers or
modules

Note that the private subroutines (starting with _) should not be
exported (and not used from outside of the module except from tests).

EXPORT vs EXPORT_OK (from
https://www.thegeekstuff.com/2010/06/perl-exporter-examples/)
"""
Export allows to export the functions and variables of modules to user’s namespace using the standard import method. This way, we don’t need to create the objects for the modules to access it’s members.

@EXPORT and @EXPORT_OK are the two main variables used during export operation.

@EXPORT contains list of symbols (subroutines and variables) of the module to be exported into the caller namespace.

@EXPORT_OK does export of symbols on demand basis.
"""

If this patch caused a conflict with a patch you wrote prior to its
push:
* Make sure you are not reintroducing a "use" statement that has been
removed
* "$subroutine" is not exported by the C4::$MODULE module
means that you need to add the subroutine to the @EXPORT_OK list
* Bareword "$subroutine" not allowed while "strict subs"
means that you didn't imported the subroutine from the module:
  - use $MODULE qw( $subroutine list );
You can also use the fully qualified namespace: C4::$MODULE::$subroutine

Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
2021-07-16 08:58:47 +02:00
8790fe908d Bug 26051: Add sort=1 on cn-sort in mappings.yaml
Signed-off-by: David Nind <david@davidnind.com>
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>

Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
2021-02-23 13:12:56 +01:00
fd8aee9270 Bug 25054: Display search field aliases in Search Engine Configuration
It'd be great if the Search Engine Configuration page would display
the various aliases (shortcuts) available : ti for title, sn for local-number, etc.

Patch changes Koha/SearchEngine/Elasticsearch/QueryBuilder.pm to move
hard-coded vars at the beging and adds a method to provide to %index_field_convert via a method.

Test plan :
1) Use Elasticsearch
2) Go to Administration > Search engine configuration (Elasticsearch)
3) Check you see new column 'Aliases' with for example ti for title.
4) Perform a search 'ti:<title>' and check you get results

Signed-off-by: Séverine QUEUNE <severine.queune@bulac.fr>

Signed-off-by: Katrin Fischer <katrin.fischer.83@web.de>

Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
2021-01-14 14:03:49 +01:00
5bc62d240c Bug 26991: Add action logs to search engine administration
Search engine administration is very important,
we should log who/when it is changed.
I don't add a preference system to disable it,
like there is no for preference system logs.

Test plan :
1) Use searchengine Elasticsearch
2) Go to Administation > Search engine configuration (Elasticsearch)
3) Click on 'Reset Mappings' and accept
4) Edit some lines and save
5) Go to 'Tools' > 'Log viewer'
6) Select only 'Search engine' in Modules and submit
7) Select only 'Edit mappings' in Actions
8) Check you see a log
9) Select only 'Reset mappings' in Actions
10) Check you see a log

Signed-off-by: David Nind <david@davidnind.com>
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>

Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
2021-01-12 16:40:41 +01:00
3dc90c66c7 Bug 27043: Add to number_of_replicas and number_of_shards to index config
With Elasticsearch 6 (>6.4), we have a warning on index creation :
  the default number of shards will change from [5] to [1] in 7.0.0

See https://github.com/elastic/elasticsearch/pull/30587

I propose to add number_of_shards in index config.

Also add number_of_replicas that is better explicit.
In case on only one node, it must be 0.

Test plan :
1) Use Elasticsearch
2) Apply patch and flush memcached
3) Rebuild indexes : misc/search_tools/rebuild_elasticsearch.pl -v -b -d
4) Check you dont have a warning about number of shards
5) Check the settings of index :
   curl '<cluster>:9200/<myindex>_biblios/_settings?pretty&filter_path=**.number_of_*'
6) You should see :
   "number_of_shards" : "5",
   "number_of_replicas" : "1"

Signed-off-by: Victor Grousset/tuxayo <victor@tuxayo.net>

Signed-off-by: Nick Clemens <nick@bywatersolutions.com>

Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
2021-01-04 13:29:51 +01:00
975e06bd7c Bug 19482: Add support for defining 'mandatory' mappings
To test:
1 - Apply patch
2 - ./installer/data/mysql/updatedatabase.pl
3 - Reset ES mapping: Administration->Search engine configuration , button at bottom of page
4 - 'issues' and 'title' mapping under 'search fields' should be mandatory and not editable
5 - On 'Bibliographic records' tab you should not be able to delete the single entry for issues
6 - You should be able to delete 'title' mappings, however, at the final one you should be stopped by javascript
7 - Bonus: force remove the last mapping from the page using developer tools - attempt to save and should be warned of missing mandatory mapping

Signed-off-by: Nicolas Legrand <nicolas.legrand@bulac.fr>
Signed-off-by: Bouzid Fergani <bouzid.fergani@inlibro.com>

Signed-off-by: Katrin Fischer <katrin.fischer.83@web.de>

Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
2020-11-04 12:59:33 +01:00
0ee41b5316 Bug 26487: Add all MARC flavours for not-onloan-count search field
In admin/searchengine/elasticsearch/mappings.yaml the search field not-onloan-count is defined for MARC21 on 999x.
This should be for all the MARC flavours, like in Zebra config.

Test plan:
1) On a UNIMARC database
2) Reset Elasticsearch mappings
3) Check search engine config to see field 'not-onloan-count' on 999$x
4) Same on a NORMARC database

Signed-off-by: Chris Cormack <chrisc@catalyst.net.nz>

Signed-off-by: Katrin Fischer <katrin.fischer.83@web.de>

Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
2020-11-02 11:03:08 +01:00
Julian Maurice
96cc447045 Bug 25898: Prohibit indirect object notation
Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>

Signed-off-by: Katrin Fischer <katrin.fischer.83@web.de>

Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
2020-10-15 12:56:30 +02:00
David Gustafsson
d1c8d7ecce Bug 24807: Add "year" type to improve sorting behaviour
Add a "year" search field type. Fields with this type will only
retain values that looks like years, so invalid values such as
whitespace or word characters will not be indexed.
This for instance improves the behaviour when sorting by
"date-of-publication". If all values are indexed, records with
junk data instead of valid years will appear first among the search
results, drowning out more relevant hits. If assigning this field
the "year" type these records will instead always appear last,
regarless of sort order.

To test:

1) Have at least two biblios, one with a valid year in 008 (pos 7-10)
and another with an invalid one ("uuuu" for example)
2) Perform a wildcard search (*) and sort results by publication date.
3) The record with invalid year of pulication in 008 should appear first
4) Apply patch and run database updates
5) Reindex ElasticSearch
6) Perform the same search as in 2)
7) The record with the invalid year should now appear last

Signed-off-by: Nick Clemens <nick@bywatersolutions.com>

Signed-off-by: Katrin Fischer <katrin.fischer.83@web.de>

Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>

Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
2020-09-18 11:21:31 +02:00
638786e719 Bug 24663: Remove authnotrequired if set to 0
It defaults to 0 in get_template_and_user

Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>
Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>

Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
2020-09-03 10:40:35 +02:00
ce161fda9b Bug 25273: Make match-heading rely on authority type configuration
The match-heading field is a special field used only by the linker, not accessible
to staff or patrons via the interface. This field is used to store the constructed
'search form' used for matching bib headings to authority fields.

In bug 24269 I attempted to use the mappings defined in the inferface and also inject the search term.
This did not work as too many subfields were indexed on their own and leading to false matches.
In this bug we remove the mappings for this field, and create it ourselves during
the indexing process. The C4::Headings module is still used to generate the correct form,
however, the mappings are set based on the authority types in the system. This gives the user
the ability to add new typoes, but prevents mapping changes from breaking linker functionality

To test:
 1 - Start form a sample database with ElasticSearch working
 2 - Download via Z39.50 2 authorities, one of which is a narrower heading of the other, e.g.:
    Waterworks
    Waterworks - Costs
 3 - Place a heading for the broader term in a record. e.g. Waterworks
       In 650$a, without the cataloguing authority plugin. We don't want
       the link created now.
       You need syspref BiblioAddsAuthorities => allow
 4 - Make sure linker is set to default
 5 - Attempt to link the records
       misc/link_bibs_to_authorities.pl
 6 - Linking fails
 7 - Apply patch
 8 - refresh index settings (if using a custom file, remove 'match-heading')
       You can reset mappings in the UI or run this:
       misc/search_tools/rebuild_elasticsearch.pl -v -d -r
 9 - Reindex ES
10 - Try to link again
11 - It succeeds!
12 - Run the tests
     prove t/db_dependent/Koha/SearchEngine/Elasticsearch.t

Signed-off-by: Victor Grousset/tuxayo <victor@tuxayo.net>

Signed-off-by: Katrin Fischer <katrin.fischer.83@web.de>

Bug 25273: (follow-up)

Signed-off-by: Katrin Fischer <katrin.fischer.83@web.de>

Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
2020-08-31 16:10:25 +02:00
224ac84aec Bug 17661: (follow-up) Update regex to support Unicode characters
Rather than limiting initials to [A-Z] we should test for a broad
range of uppercase letters.

The ES/Zebra changes are slightly different because of Perl vs Java regex
conventions. POerl may support either, but I found 'Uppercase' to be a bit more explicit

More info here:
https://perldoc.perl.org/perlunicode.html

TO test:
Same plan as before but use Ж. as the ending initial
Confirm the period is preserved and other punctuation removed

Signed-off-by: Katrin Fischer <katrin.fischer.83@web.de>

Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
2020-08-31 16:10:25 +02:00
e34f95a1f5 Bug 17661: Ending punctuation causes duplicate facets
The current code for facets doesn't pull strip ending punctuation from facets
This causes duplicate facets for terms that should be combined

Sometimes series can have different punctuation depending on the field they are in
Author initials punctuation should be preserved

To test:
1 - Do search and pull up some records
2 - Edit some of the records to have authors like:
    Date, C.J.
    Date, C.j.
    Date, C.J .
3 - Edit the records to have some series statments like:
    830 $aDate, C.J. ;$v5
    830 $aDate, C.J. ; $v5
    830 $aDate, C.J.; $v5
4 - Add some 490s to the record with first indicator 1 and series like:
    You wouldn't want to--
    You wouldn't want to
    You wouldn't want to..
5 - Search again and note you have 3 facets each for author and series
6 - Apply patch
7 - Repeat
8 - Now you get 2 facets for author, period not removed when following Upper case immediately, is otherwise
9 - Now you should have a single series facet
10 - Switch search engine to ES (index before applying patch)
11 - Note facets are separate again
12 - Reset mappings and reindex
   perl misc/search_tools/rebuild_elasticsearch -v -r
13 - Repeat search, facets combined as above

Signed-off-by: Sarah Cornell <sbcornell@cityofportsmouth.com>

Signed-off-by: Katrin Fischer <katrin.fischer.83@web.de>

Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
2020-08-31 16:10:25 +02:00
Julian Maurice
cb5acdc670 Bug 25873: Ignore malformed data for Elasticsearch integer fields
If we try to put malformed data into an integer field, Elasticsearch
rejects the whole document.
Setting 'ignore_malformed' to true allows to ignore malformed data and
process the other fields of the document normally

https://www.elastic.co/guide/en/elasticsearch/reference/7.8/ignore-malformed.html

Test plan:
* Without the patch
  1. In search engine configuration, change the type of a text field to
  'Number' (for instance 'title')
  2. misc/search_tools/rebuild_elasticsearch.pl -d -b
  3. See that the index is empty (unless you have titles consisting only
  of digits)
* With the patch
  1. misc/search_tools/rebuild_elasticsearch.pl -d -b
  2. Now records are correctly indexed

Signed-off-by: Nick Clemens <nick@bywatersolutions.com>

Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
2020-07-31 15:07:42 +02:00
067aa1c5f7
Bug 25278: add clear_search_fields_cache method
Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>
2020-05-18 15:28:42 +01:00
0899d99b7c
Bug 25278: (follow-up) Fix other occurrences
Signed-off-by: Victor Grousset/tuxayo <victor@tuxayo.net>
Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>
2020-05-18 15:28:37 +01:00
11bf5d7afa
Bug 23137: Move cache flushing to the method
Signed-off-by: Nick Clemens <nick@bywatersolutions.com>
Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>
2020-04-29 17:02:15 +01:00
e94ea221df
Bug 20484: (RM follow-up) Highlight ES disablement
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>
2020-04-21 12:14:13 +01:00