Commit graph

343 commits

Author SHA1 Message Date
Michael Hafen
eea7b9d20d Bug 5608 - command-line tool to switch information in 440 and 490 tags
With the MARC21 standard moving from the 440 tag to the 490, this tool is
to help libraries make the move.  It switches any information in 440 tags to
490 tags, and any information in 490 tags to 440 tags.  That seemed like the
best way to go to me.

To Test:
locate some biblios with 440 or 490 tags filled.
run bin/migration_tools/switch_marc21_series_info.pl -c
observe that the information in the biblios has switched 4xx tags.

http://bugs.koha-community.org/show_bug.cgi?id=5608
Signed-off-by: Bernardo Gonzalez Kriegel <bgkriegel@gmail.com>

Comment: Works as described. No errors.

Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
2013-03-21 22:15:56 -04:00
Stéphane Delaune
cefa7c21e2 Bug 5635: bulkmarcimport new parameters & features
See the script's documentation for more details

New parameters are:
 - authtypes
 - filter
 - insert
 - update
 - all

Signed-off-by: Pascale Nalon <pascale.nalon@gmail.com>
This patch is live in Mines ParisTech since 2012-07-24.
Signing off

Signed-off-by: Katrin Fischer <Katrin.Fischer.83@web.de>
 - Moved the sign-off from bugzilla to the commit message.
 - All tests and QA script pass.
 - Amended commit message to list new parameters.
 - Verified this patch works on a UNIMARC installation.
 - Verified normal import still works correct on a MARC21
   installation.
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
2013-03-21 20:21:54 -04:00
4dcee58a4d Bug 7440 - Remove NoZebra vestiges
Removed NoZebra vestiges. This comprises several code blocks that depend on the NoZebra syspref and NZ related functions/methods.

C4::Biblio->
 GetNoZebraIndexes
 _DelBiblioNoZebra
 _AddBiblioNoZebra

C4::Search->
 NZgetRecords
 NZanalyse
 NZoperatorAND
 NZoperatorOR
 NZoperatorNOT
 NZorder

C4::Installer->
 set_indexing_engine

Sponsored-by: Universidad Nacional de Córdoba
Signed-off-by: Julian Maurice <julian.maurice@biblibre.com>

Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
2013-03-19 21:17:04 -04:00
Jared Camins-Esakov
144c7f4e4e Bug 9239: Allow the use of QueryParser for all queries
With the inclusion of this patch, all searches will (try) to use
QueryParser for handling queries for both the bibliographic and authority
databases if UseQueryParser is enabled. If QueryParser is unavailable,
UseQueryParser is disabled, or the search uses CCL indexes, the old
search code will be used.

To test:
1) Apply patch.
2) Run the unit test with `prove t/QueryParser.t`
3) Enable the UseQueryParser syspref.
4) Try searches that should return results in the following places:
   * OPAC (simple search)
   * OPAC (advanced search)
   * OPAC (authorities)
   * Staff client (header search)
   * Staff client (advanced search)
   * Staff client (cataloging search)
   * Staff client (authorities)
   * Staff client (importing a batch using a match point)
   * Staff client (searching for an item for adding to a label)
   * Staff client (acquisitions)
   * Staff client (searching for a record to create a serial)
   * ANYWHERE ELSE I HAVE FORGOTTEN
5) Disable the UseQueryParser syspref. Repeat at least some of the
   searches you did above.
6) If all searches worked, sign off.

Signed-off-by: Chris Cormack <chris@bigballofwax.co.nz>
Signed-off-by: Elliott Davis <elliott@bywatersolions.com>
Searching still works as expected for variuos places.
QueryParser syspref seemed to be enabled by default

Signed-off-by: Katrin Fischer <Katrin.Fischer.83@web.de>
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
2013-03-16 21:32:32 -04:00
Robin Sheat
788e51adb1 Bug 9035 - delete bulkauthimport.pl
<dcook> Then bulkauthimport.pl?
<jcamins> bulkauthimport should not be used ever.
<eythian> it probably should be deleted
<jcamins> It should be.

Signed-off-by: David Cook <dcook@prosentient.com.au>

I've poked around in bulkmarcimport.pl and it certainly seems to have the functionality that Mason (and Jared and Robin) mention.

Signed-off-by: Jonathan Druart <jonathan.druart@biblibre.com>
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
2013-03-13 08:50:09 -04:00
Vitor FERNANDES
33e95ea3b9 Bug 9144 - bulkmarcimport.pl - Problem identifying errors
Replace \r with \n for newline in output for bulkmarcimport.pl

Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>
Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
2012-12-17 11:53:01 -05:00
Jared Camins-Esakov
49cadcf7c1 Bug 9049: Don't use shadow with rebuild_zebra -r
Due to a limitation of Zebra, the register must be cleared *before*
doing shadow indexing if you want to reset the indexes. In light of
that, it does not make sense to do shadow indexing at all when
rebuild_zebra.pl is run with the -r switch. This patch makes -r (reset)
imply -n (no shadow).

To test:
1) Run `rebuild_zebra.pl -b -r -v -v -v`
2) Note that the script never runs the merge phase

Without the patch I see log lines refering to the shadow cache (enabling shadow spec=/home/koha/koha-dev/var/lib/zebradb/biblios/shadow:20G)
With the patch I don't see anything in the logs about shadow.  I do however see lines about merging.  I think it could just be a misunderstanding of the logs

Signed-off-by: wajasu <matted-34813@mypacks.net>
Signed-off-by: Elliott Davis <elliott@bywatersolutions.com>
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
2012-12-08 09:46:30 -05:00
Robin Sheat
5d0bdbce59 Bug 9012 - --framework option for bulkmarcimport
This allows the --framework option to be specified when running
bulkmarkimport. This option allows a framework code to be specified for
the records being imported.

Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>
Signed-off-by: Katrin Fischer <Katrin.Fischer.83@web.de>
All tests pass, perlcritic fails before and after.

Tested
- imported records with -framework FA, FA framework is used
- imported records without -framework, default framework is used
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
2012-12-03 07:14:58 -05:00
Jared Camins-Esakov
deeeb068d9 Bug 9050: Use safer adelete when deleting records from Zebra index
Previously we used the "delete" command in zebraidx, which fails when
you try to delete a record that doesn't exist in the index. By changing
to the "adelete" command, we can reduce the likelihood of a failed
delete causing ghost records. A symptom of this problem is the warning
message occasionally encountered when indexing from the zebraqueue,
"[warn] cannot delete record above (seems new)."

To test:
1) Add a recordDelete action for a record that does not exist to
   zebraqueue in MySQL:
   INSERT INTO zebraqueue (biblio_auth_number, operation, server) \
       VALUES (999999999, 'recordDelete', 'biblioserver');
2) Run `rebuild_zebra.pl -b -z -v [-x]`.
3) Note that you do not get the message "[warn] cannot delete record
   above (seems new)".

Signed-off-by: Chris Cormack <chris@bigballofwax.co.nz>
Passed-QA-by: Paul Poulain <paul.poulain@biblibre.com>
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
2012-11-12 18:53:49 -05:00
Colin Campbell
722701d596 Bug 8727 Minor stylistic change to help text
indexing not indexation
some minor grammatical changes

Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>
Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
2012-09-17 18:47:40 +02:00
Jared Camins-Esakov
bc05b5d163 Bug 7417: Include see from references in bibliographic searches
This patch adds the Koha::Indexer::RecordNormalizer and
Koha::Indexer::MARC::RecordNormalizer::EmbedSeeFromHeadings packages
to enable the inclusion of alternate forms of headings in bibliographic
searches. When the new syspref IncludeSeeFromInSearches is turned on
(default is off) rebuild_zebra.pl will insert see from headings from
authority records into bibliographic records when indexing, so that a
search on an obsolete term will turn up relevant records.

To test:
1) Enable IncludeSeeFromInSearches
2) Add a heading that has an alternate form to a record (for example,
   "Cooking" has the alternate form "Cookery," if you have authority
   records from LC)
3) Index the zebraqueue (or reindex if you haven't indexed your system
   yet)
4) Confirm that if you search for "Cookery" you get the record you
   just modified

Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
Rebased on master 5 August 2012
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
Rebased on master 11 September 2012

Signed-off-by: Katrin Fischer <Katrin.Fischer.83@web.de>

Also checked:
- Verified database update works correctly
- Checked system preference and its description
- Checked staff/opac detail pages with feature on/off
- Checked staff/opac search facets
- Downloaded and tested records in various formats
- Tried different searches for 'see from' entries of authorities
- Ran all unit tests

No problems found.
2012-09-13 14:19:28 +02:00
Jared Camins-Esakov
3616eee996 Bug 8384: Some Perl scripts do not compile
Fix syntax errors preventing the scripts misc/translator/text-extract2.pl
and misc/cronjobs/thirdparty/TalkingTech_itiva_inbound.pl from compiling.

Remove misc/migration_tools/build6xx.pl entirely since it refers to
columns that no longer exist in the Koha database, and has seemingly
had broken encoding since Koha switched from CVS to git (or before!).

Signed-off-by: Chris Cormack <chris@bigballofwax.co.nz>
Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
2012-07-10 10:50:58 +02:00
Christophe Croullebois
665136f8a0 Bug 6566 Checking if DB's records are properly indexed
Small script that checks if each bibliorecord in the DB is properly indexed
use -h to learn more
(MT #6389)

Signed-off-by: Robin Sheat <robin@catalyst.net.nz>
Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
2012-07-06 17:11:39 +02:00
Jonathan Druart
623f3a2c84 Bug 8233 : SearchEngine: Add a Koha::SearchEngine module
First draft introducing solr into Koha :-)

List of files :
  $ tree t/searchengine/
  t/searchengine
  |-- 000_conn
  |   `-- conn.t
  |-- 001_search
  |   `-- search_base.t
  |-- 002_index
  |   `-- index_base.t
  |-- 003_query
  |   `-- buildquery.t
  |-- 004_config
  |   `-- load_config.t
  `-- indexes.yaml
  just do `prove -r t/searchengine/**/*.t`

  t/lib
  |-- Mocks
  |   `-- Context.pm
  `-- Mocks.pm
  provide a mock to SearchEngine syspref (set_zebra and set_solr).

  $ tree Koha/SearchEngine
  Koha/SearchEngine
  |-- Config.pm
  |-- ConfigRole.pm
  |-- FacetsBuilder.pm
  |-- FacetsBuilderRole.pm
  |-- Index.pm
  |-- IndexRole.pm
  |-- QueryBuilder.pm
  |-- QueryBuilderRole.pm
  |-- Search.pm
  |-- SearchRole.pm
  |-- Solr
  |   |-- Config.pm
  |   |-- FacetsBuilder.pm
  |   |-- Index.pm
  |   |-- QueryBuilder.pm
  |   `-- Search.pm
  |-- Solr.pm
  |-- Zebra
  |   |-- QueryBuilder.pm
  |   `-- Search.pm
  `-- Zebra.pm

How to install and configure Solr ?
  See the wiki page: http://wiki.koha-community.org/wiki/SearchEngine_Layer_RFC

http://bugs.koha-community.org/show_bug.cgi?id=8233
Signed-off-by: Chris Cormack <chris@bigballofwax.co.nz>
2012-07-06 16:51:58 +02:00
Julian Maurice
57424a9fdc Bug 7286: rebuild_zebra_sliced for biblios and authorities
Complete rewrite of rebuild_zebra_sliced.zsh (renamed to .sh). Main
improvements are:
  - both biblio and authority records are handled
  - records are exported only once

It also add an option --skip-index to rebuild_zebra.pl that permit to
use rebuild_zebra.pl as an 'export only' script.

Description:
Index Koha records by chunks. It is useful when some record causes
errors and stop the indexation process. With this script, if indexation
of one chunk fails, chunk is splitted in 2 (or 3) chunks, and
indexation continue on these chunks.
rebuild_zebra.pl is called only once to export records.
Splitting and indexing is handled by this script (using yaz-marcdump and
zebraidx).

Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>
Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
2012-07-06 15:06:40 +02:00
christophe croullebois
082bb5049d Bug 8136 Changes the expected lenght of 100$a in rebuild_zebra.pl
In rebuild_zebra.pl, if we are in "unimarc" ("marcflavour" syspref), the sub "fix_unimarc_100" is called and checks if 100$a lenght is equal to 35.
If it is not the case, the sub inserts the localtime and more, so we loose the datas in reindexing.
The standart lenght is 36.
I have just changed 35 to 36.

Signed-off-by: Sophie Meynieux <sophie.meynieux@biblibre.com>
Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
2012-06-20 09:39:27 +02:00
Galen Charlton
daca5edc52 Bug 7818: -x option of rebuild_zebra.pl now works with DOM filter
One consequence is that the -x and -a options are no longer
mutually exclusive.

Also, because of the way that the GRS-1 SGML filter works, if you're
indexing multiple documents, you can't just wrap them in a document
element, but the DOM filter *requires* it.  Consequently, two
new config settings in koha-conf.xml are added to indicate the
Zebra filter in use so that the -x option of rebuild_zebra.pl
knows whether to wrap the exported records or not:

- bib_index_mode (defaults to 'grs1' if not specified)
- auth_index_mode (defaults to 'dom')

Signed-off-by: Galen Charlton <gmc@esilibrary.com>
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
2012-06-09 11:44:09 +02:00
Chris Cormack
dd864696de Bug 7213 : Follow up fixing license information
Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
2012-05-15 15:44:33 +02:00
Dobrica Pavlinusic
63bc7ebc39 Bug 7213 - simple /svc/ HTTP example
Simple command-line client which can authorize itself to Koha,
get MARC XML record based on biblio number and update record

This script can also be used as module using require "koha-svc.pl"
from other scripts which can implement MARC XML creation or parsing.

This is follow up version which now uses Content-type: text/xml
header when using POST method to be in sync with documentation at
http://wiki.koha-community.org/wiki/Koha_/svc/_HTTP_API

Signed-off-by: Chris Cormack <chrisc@catalyst.net.nz>
Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
2012-05-14 18:22:17 +02:00
Robin Sheat
b96c8b7ffa Bug 6199 - allow bulkmarkimport.pl to remove duplicate barcodes
This adds the -dedupbarcode option that allows bulkmarkimport to erase
a barcode but keep the item of any items it finds with duplicate
barcodes.

Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
2012-03-28 17:30:54 +02:00
Julian Maurice
3b0d4e04e0 Bug 6440: Implement OAI-PMH Sets
New sql tables:
  - oai_sets: contains the list of sets, described by a spec and a name
  - oai_sets_descriptions: contains a list of descriptions for each set
  - oai_sets_mappings: conditions on marc fields to match for biblio to be
    in a set
  - oai_sets_biblios: list of biblionumbers for each set

New admin page: allow to configure sets:
  - Creation, deletion, modification of spec, name and descriptions
  - Define mappings which will be used for building oai sets

Implements OAI Sets in opac/oai.pl:
  - ListSets, ListIdentifiers, ListRecords, GetRecord

New script misc/migration_tools/build_oai_sets.pl:
  - Retrieve marcxml from all biblios and test if they belong to defined
    sets. The oai_sets_biblios table is then updated accordingly

New system preference OAI-PMH:AutoUpdateSets. If on, update sets
automatically when a biblio is created or updated.

Use OPACBaseURL in oai_dc xslt
2012-03-20 11:38:26 +01:00
Paul Poulain
1fd8c8a4de Bug 7246 add offset/length and where options to rebuild_zebra
This patch reimplement a feature that is on biblibre/master for Koha-community/master

It adds 4 parameters:
* offset = the offset of record. Say 1000 to start rebuilding at the 1000th record of your database
* length = how many records to export. Say 400 to export only 400 records
* where = add a where clause to rebuild only a given itemtype, or anything you want to filter on

Another improvement resulting from offset & length limit is the rebuild_zebra_sliced.zsh
that will be submitted in another patch.
rebuild_zebra_sliced will slice your all database in small chunks, and, if something went wrong for a given slice, will slice the slice, and repeat, until you reach a slice size of 1, showing which record is wrong in your database.

Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
Removed mention of -l option for limiting number of items exported, as requested
by QA manager. This can be re-added in a later patch.

Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
2012-02-17 10:59:23 +01:00
Colin Campbell
263dded818 Bug 6752: Be stricter with utf-8 encoding of output
use encoding(UTF-8) rather than utf-8 for stricter
encoding
Marking output as ':utf8' only flags the data as utf8
using :encoding(UTF-8) also checks it as valid utf-8
see binmode in perlfunc for more details
In accordance with the robustness principle input
filehandles have not been changed as code may make
the undocumented assumption that invalid utf-8 is present
in the imput
Fixes errors reported by t/00-testcritic.t
Where feasable some filehandles have been made lexical rather than
reusing global filehandle vars

Signed-off-by: Jonathan Druart <jonathan.druart@biblibre.com>
Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
2012-01-27 12:11:06 +01:00
Dobrica Pavlinusic
90d68d6f5c Bug 7247 - rebuild_zebra.pl -v should show all Zebra log output
Currently, -v option resets Zebra log output to default system values.

This produce amount of log specified in system defaults which is usually
too low for debugging.

This change explicitly forces all Zebra log output which create much more
chatter so it triggers with verbosity level 2

Test scenario:
1. pick koha site to reindex
2. use -v -v options to rebuild_zebra.pl to see additional output

Signed-off-by: Liz Rea <wizzyrea@gmail.com>
Verified help corrections and  loglevel 2 output vs. loglevel 1 output. No issues found.

Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
2012-01-17 17:31:25 +01:00
Marc Balmer
c9c6bbdea8 Bug 7356 - Fix various typos and mis-spellings
Fix typos: the the -> the, wether -> whether, developper -> developer.

http://bugs.koha-community.org/show_bug.cgi?id=7356
Signed-off-by: Owen Leonard <oleonard@myacpl.org>
Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
2012-01-13 11:51:26 +01:00
Robin Sheat
849547df68 Bug 7008 - create tmp dir for zebra
Sometimes zebra needs a tmp dir in order to work. This ensures that it
is created both by koha-create-dirs in the packages, and by
rebuild_zebra when it runs.
--

tested ok, signing off
Signed-off-by: Mason James <mtj@kohaaloha.com>
2011-12-03 07:56:44 +01:00
4ce57a102b Bug 6799 rebuild_zebra.pl -x produces invalid XML records
This patch allow to handle properly items containing extended characters and
send valid XML records to zebraidx

Signed-off-by: Julian Maurice <julian.maurice@biblibre.com>
Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
2011-11-18 23:29:08 +01:00
Nahuel ANGELINETTI
e0b029a4f5 (bug #4518) enhance 2.2 to 3.0 scripts
Signed-off-by: Chris Cormack <chrisc@catalyst.net.nz>
2011-11-16 17:48:24 +01:00
05d35b0ae0 6094 Fixing ModAuthority problems
Pref MergeAuthoritiesOnUpdate does not exist; should be dontmerge
(AuthoritiesMarc.pm).

Instead of folder modified_authorities, now introducing a table for this
purpose: need_merge_authorities. This eliminates several permissions and
security issues. This change applies to AuthoritiesMarc.pm and
merge_authority.pl.

POD lines added for ModAuthority. Deprecated parameter $merge removed.

Test this patch by applying the db revision first from the second patch.

August 4, 2011: Rebased.

Signed-off-by: Frédéric Demians <f.demians@tamil.fr>

Thanks Marcel. It works as advertised. Both modes are functionnal
(back):

- Immediate with dontmerge=0: After modifying an authority record, its
  linked biblios are immediately modified. This isn't the case in 3.4.5.
- Delayed with dontmerge=1: After modifying an authority record, its
  linked biblios are not modified. But an entry is added to
  need_merge_authority new table and 'merge_authorities.pl -b' script
  updates biblios.

Comment: need_merge_authority, like zebraqueue, should be cleared
from time to time.

Signed-off-by: Chris Cormack <chrisc@catalyst.net.nz>
2011-10-20 11:28:53 +13:00
Ian Walls
4e95e94727 Bug 6789: biblios with many items can result in broken search results link
This patch fixes an issue whereby biblios with many items (often > 500) would index,
but not the biblionumber itself, resulting in search results with a) inaccurate item counts
and b) no biblionumber to use in the link to the details page.  This is due to Net::Z3950::ZOOM  not providing
a mechanism for specifying different connection attributes; the maximumRecordSize ZOOM connection attribute,
if not specified, defaults to 1MB, which is less than the size of a MARC record with many, many 952 fields.  Since
it is unlikely we can fix Net::Z3950::ZOOM in a timely fashion, this patch aims to build a workaround on the Koha end.

This patch changes EmbedItemsInMarcBiblio to use append_fields instead of insert_ordered_fields,
so the 999$c will come before the item records.  It's VERY unlikely we will encounter more than 1MB of biblio-level MARC
content, as this would break the ISO-2709 standard by a large factor.

To this end, it also moves the fix_biblio_ids portion of get_corrected_marc_record out of rebuild_zebra.pl,
and makes it a part of GetMarcBiblio (right before EmbedItemsInMarcBiblio, so the 952s still come last).  fix_biblio_ids
is kept as a subroutine for the deletion portion of rebuild_zebra.pl, which still uses it.

It also uses the subroutine parameter in GetMarcBiblio to do the EmbedItemsInMarcBiblio action, rather than having
rebuild_zebra.pl perform it on the itemless record returned from GetMarcBiblio.  Simpler and cleaner that way.

To verify bug issue:
1. Find a biblio with over 700 items (or enough that the resulting MARCXML is greater than 1MB)
2. search for this biblio (in a search that would return multiple results, not just this title).  You should get the title in
the results list
3. attempt to click the link to this biblio's details page; the biblionumber should be blank, leading to a 404

To test solution:
1. Apply patch
2. modify the biblio slightly (click the 005 for example) and save
   OR manually add the biblio to zebraqueue for reindexing
3. after rebuild_zebra.pl -z -b -x runs, use the same search as above. The title should still appear.
4. click the link, and find yourself on the biblio detail page as desired

Signed-off-by: D Ruth Bavousett <ruth@bywatersolutions.com>
Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
Signed-off-by: Chris Cormack <chrisc@catalyst.net.nz>
2011-10-15 13:47:24 +13:00
Jared Camins-Esakov
f09e2ca27e Bug 5528: Analytic records support
Display links to parent biblios, show linked items in holdings, allow holds on
linked items. This uses MARC to maintain relationships.

Sponsored by the Mississippi Department of Archives and History and RapidRadio
Solution. Originally developed by Savitra Sirohi and Amit Gupta at OSSLabs, with
UNIMARC support added by Zeno Tajoli. Commits squashed and merge conflicts
resolved by Chris Cormack from Catalyst. Respect for NORMARC and some small
framework portability fixes made by Jared Camins-Esakov of C & P Bibliography
Services.

IMPORTANT NOTE: A bug in the 773 coding for MARC21 was corrected from the
original OSS Labs code. The 773s generated by the pre-release code did not have
the first indicator set to '0', which means that they were not supposed to
display. Going forward, the first indicator will be set correctly, but existing
records created with this code will no longer appear (they appeared before only
due to another bug). To correct this, you could globally (or, to make sure you
only modify records created with the Analytics tool, for records with 773$0)
change the first indicator of the 773 from blank to '0'.

== Background ==
An analytic record for an item is a more detailed, monographic biblio for an
item attached to a serial record .  This is often used for special issues of a
journal that are released as books on their own (assigned an ISBN, as well as an
ISSN/volume/issue).  It is important for researchers to be able to search for
these items both as issues of the serial, and as monographs.  It is equally
important for the library to not have duplicate item records for the item in
question to have to keep synchronized.

== Establishing relationships ==
Analytical records are connected to items belonging to parent or host
bibliographic records. This can be accomplished by:
* From an analytical bibliographic record linking to an host item by providing
  the item barcode as input
* From a host item by using option "analyze", this creates a new empty
  bibliographic record with field 773 (MARC21) populated
* Running a new CLI script that establishes a relationship between the
  analytical record and the host item identified by the barcode in the
  analytical record's 773$o (MARC21)

== Connecting Records ==
The relationships are maintained in the MARC records, we have not used database
tables at all.

== MARC Representation ==
In MARC21/NORMARC we have used:
* 773$9 to store the Koha item number of the host item
* 773$0 to store the Koha biblio number of the host bibliographic record

The above fields are used to display the relationships in various screens in the
OPAC and the staff interface. Additionally, when populating field 773 with host
item's details, we have used following MARC 21 mapping:
* 'a' <= 100/110/111 $a (author main)
* 'b' <= 250$a (edition)
* 'd' <= 260$a, 260$b, 260$c (place, publisher, year)
* 'o' <= barcode
* 't' <= 245$a (title)
* 'w' <= (003)001 --> if no 001 is available, we can populate biblionumber
* 'x' <= 022$a (issn)
* 'z' <= 020$a (isbn)

In UNIMARC, this code uses:
* 461$9 to store the Koha item number of the host item
* 461$0 to store the Koha biblio number of the host bibliographic record

When populating field 461 in UNIMARC, the following mapping is used:
* 't' <= 200$a (title)

== Treatment of Holds ==
A key requirement was to allow holds to be placed on host items from the
analytical record. We have accomplished this by allowing holds on specific
copies only. Biblio level holds are not allowed. This ensures that holds are
placed on specific items that are relevant to the analytical record.

== Deleting host items with linked analytical records ==
As we have not used database tables to maintain relationships, we had to use
search to find out if any linked analytical records are present. If 1 or more
analytical are present, we do not allow deletion of items. This is similar to
what we see when we try to delete authority records.

== Importing analytical records ==
Analytical records can be imported using bulkmarcimport or the GUI tools. The
new CLI script can be executed after the import to establish relationships with
host items. The script will establish relationships using the host item's
barcode, the barcode must be present in 773$o of the analytical record.

== What if there are two or more copies of the host item? ==
The current design will require that there be two host (773) fields, one for
each copy.

== What if there is no barcode available for the host item? ==
It is still possible to establish a relationship, by populating 773$9 with the
host's item number. However the CLI script uses barcode in 773$o to establish
relationships so it won't work where barcodes are unavailable. Also from an
analytical record, it is possible to establish a relationship to a host item by
providing the barcode as input, this option will not be available as well.

Commits that added the following features were squashed by Chris Cormack (this
is not a list of every commit):
* Display links to host records from biblio detail screens
* Support for UNIMARC, respecting the system preference 'marcflavor'
* Support holds from the OPAC
* Ability to link to items belong to host records from a analytical record
* Display items belonging to host records in the moredetail page
* Ability to edit items belonging to host records, also ability to delink from
  them
* Move get host items code into a C4 routine, also calling the new routine in
  related perl scripts
* Move host field population to a C4 routine, all changes in pl files to call
  new routine
* Allow only specific copy holds for analytical records plus changes to use new
  C4 routines
* Support for holds on items linked via host records
* Storing bibnumber and itemnumber in subfields 0 and 9, plus other mapping
  changes
* New command line script that establishes relationships between analytical
  records and host items and bibs. The script looks for host field (MARC21 773)
  in records, and based on barcode in subfield 'o' populates host bibnumber in
  subfield '0' and host itemnumber in subfield '9'. The script can be run after
  an import of analytical records, it can also be run in the crontab to maintain
  the relationships
* Ability to create analytical records from items, to view linked analytics, and
  prevent deletion of items that have linked analytics
* New template for catalogue/detail.pl (NOTE: not a new template file, just a
  new way of displaying analytics), template displays linked analytics and
  allows creation of analytical records
* New zebra index for item number in host fields. This index will be used to
  display links to analytical records from host records
* Display title of host record instead of the phrase host record
* Using detail.tmpl for analytics tab instead of a new template file
* Improved qualification info prepration in Prephostmarcfield
* Check for linked analytics before deleting item
* Display link to host record and more meaningful anchor text for edit item link
* Analytical record: Unimarc index in record.abs and help in
  create_analytical_rel.pl
* Adding a sys pref that controls display of options to create analytical
  relationships
* Add host entry in XSLT stylesheet in staff item detail
* Added host record support to OPAC detail XSLT
* Adding 773$0 and 773$9 to all frameworks
* Adding 773 subfields 0 and 9 to default marc framework via updatedatabase.pl
* Display create analytics and used in links in catalog detail
* Fixed problem where analytical records not showing in OPAC search results
  because GetMarcBiblio now needs a flag to add item records
* Fixed problem where analytics count was set to 1 for all records, not just
  those with analytics
* Fixed catalogue detail page not to show analytics counts if count is 0

Conflicts:
	installer/data/mysql/updatedatabase.pl
	koha-tmpl/intranet-tmpl/prog/en/modules/cataloguing/addbiblio.tt
	kohaversion.pl

Co-author: Savitra Sirohi <savitra.sirohi@osslabs.biz>
Co-author: Zeno Tajoli <tajoli@cilea.it>

Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
Signed-off-by: Ian Walls <ian.walls@bywatersolutions.com>
Signed-off-by: Chris Cormack <chrisc@catalyst.net.nz>
2011-10-13 10:03:39 +13:00
Jesse Weaver
048c0dc04e Bug 6492 - Deleted biblios cause rebuild_zebra to fail
This both adds a bit of a failsafe to get_raw_biblio, and prevents
records that have been deleted from being updated by the same instance
of rebuild_zebra.

Minor amendment to remove duplication of 6433

Signed-off-by: MJ Ray <mjr@phonecoop.coop>
Signed-off-by: Chris Cormack <chrisc@catalyst.net.nz>
2011-07-05 11:18:28 +12:00
3b8f1318e0 Bug 6050 Followup, edit a last function call
Signed-off-by: Frédéric Demians <f.demians@tamil.fr>
Signed-off-by: Chris Cormack <chrisc@catalyst.net.nz>
2011-06-14 14:12:05 +12:00
Srdjan Janković
5829cef6d8 bug_6433: exception handling
Signed-off-by: Magnus Enger <magnus@enger.priv.no>
2011-06-10 11:27:25 +12:00
ce849240ad bug 5579: tweaks to bulkmarcimport.pl
Fixes bug where a bib record imported by bulkmarcimport.pl
could become unindexable by ensuring that ModBiblioMarc()
is always called by bulkmarcimport.pl to finalize saving the
bib record (as it was initially created by AddBiblio with the
defer_marc_save option).

Also introduces a utility routine, C4::Biblio::_strip_item_fields.

Signed-off-by: Galen Charlton <gmcharlt@gmail.com>
Signed-off-by: Chris Cormack <chrisc@catalyst.net.nz>
2011-04-21 10:05:02 +12:00
e96315556b bug 5579: new routine to embed items in bib
Adds a new routine, C4::Biblio::EmbedItemsInMarcBiblio, to
embed the items in the bib record when necessary:

* cataloging/additem.pl
* rebuild_zebra.pl

Signed-off-by: Galen Charlton <gmc@esilibrary.com>
Signed-off-by: Claire Hernandez <claire.hernandez@biblibre.com>
Signed-off-by: Chris Cormack <chrisc@catalyst.net.nz>
2011-04-19 22:34:21 +12:00
Henri-Damien LAURENT
3584c4426b Bug 5579: remove items from MARC bib
This is a squash of four patches by Henri-Damien Laurent
starting work on removing the copy of item record information
in the 9XX field of bibliographic records.  The reason
for doing this is primarily to improve performance, in particular,
the expense of having to add/modify the bib record whenever an
item changes.  Now, whenever an item changes, the bib record is
put in the queue to be reindexed; when the bib is indexed, the 9XX
fields are inserted into the version of the bib that Zebra indexes.
Since rebuild_zebra.pl runs in a separate process, the processing of the
bib record will not delay (e.g.) circulation.

As part of upgrading to 3.4, the following batch script should be run:

misc/maintenance/remove_items_from_biblioitems.pl --run

This should be followed by a complete reindexing of the bib records, e.g.,

misc/migration_tools/rebuild_zebra.pl -b -r

Signed-off-by: Galen Charlton <gmcharlt@gmail.com>
Signed-off-by: Claire Hernandez <claire.hernandez@biblibre.com>
Signed-off-by: Chris Cormack <chrisc@catalyst.net.nz>
2011-04-19 22:33:56 +12:00
c9d082bcdc Bug 5067 Add a cleanisbn param to bulkmarcimport.pl
Import script shouldn't remove an information present in entering biblio
records. With this patch, by default, ISBN are not cleared anymore.

[2011.04.12] Rebased on HEAD

DOCUMENTATION: There is a new paramater --isbn|--noisbn

Signed-off-by: Colin Campbell <colin.campbell@ptfs-europe.com>
Signed-off-by: Chris Cormack <chrisc@catalyst.net.nz>
2011-04-13 11:41:04 +12:00
Colin Campbell
d8b362e0f9 Bug 5415 Let calls of SimpleSearch utilize considtent interface
Remove some unnecessary checks when check of error is
sufficient. Make the order in some cases more logical
Should remove some possibilities of runtime warning noise.
Although some calls belong to the 'Nothing could
ever go wrong' school have added some warnings

Signed-off-by: Christophe Croullebois <christophe.croullebois@biblibre.com>
Signed-off-by: Chris Cormack <chrisc@catalyst.net.nz>
2011-04-08 13:52:57 +12:00
Alex Arnaud
e43da19e34 Bug #6044 - Authority is deleted when mergeto and mergefrom are the same
Signed-off-by: Stéphane Delaune <stephane.delaune@biblibre.com>
Signed-off-by: Chris Cormack <chrisc@catalyst.net.nz>
2011-04-06 15:22:40 +12:00
Ian Walls
8dc56a0d2c Bug 5831: rebuild_zebra.pl doesn't respect -r
Reimplements support for -r, as well for -reset

Signed-off-by: D Ruth Bavousett <ruth@bywatersolutions.com>
Signed-off-by: Chris Cormack <chrisc@catalyst.net.nz>
2011-03-06 08:44:57 +13:00
Robin Sheat
8de1ef7e94 Bug 5228 - make rebuild_zebra handle fixing the zebra dirs
If the zebra server directories don't exist, zebra will spit the dummy.
This makes rebuild_zebra.pl smart enough to create them if they're not
there. If that fails, it'll scream loudly so you know zebra isn't
reindexing.

Signed-off-by: Chris Cormack <chrisc@catalyst.net.nz>
2010-12-13 21:59:49 +13:00
Chris Nighswonger
374cdb2678 Fixing up a regexp to stop a trivial warn 2010-11-04 14:24:46 -04:00
MJ Ray
65f8573b5d Display available error information during bulkmarcimport 2010-10-13 02:17:05 +01:00
Robin Sheat
57d11aee2c Bug 5077 - ensure rebuild_zebra will run somewhere it can read
This prevents it leaving files lying around in /tmp

Signed-off-by: Chris Cormack <chrisc@catalyst.net.nz>
Signed-off-by: Galen Charlton <gmcharlt@gmail.com>
2010-10-06 08:00:17 -04:00
Lars Wirzenius
3e7e386148 Convert to UTF-8.
Signed-off-by: Galen Charlton <gmcharlt@gmail.com>
2010-05-06 17:58:24 -04:00
Donovan Jones
5e0b850d49 Bug 2505 - Add commented use warnings where missing in the misc/ directory 2010-04-21 20:26:44 +12:00
Lars Wirzenius
87d845969e Fix FSF address in directory misc/
Signed-off-by: Galen Charlton <gmcharlt@gmail.com>
2010-03-16 20:17:54 -04:00
5916908e35 Bug 4125 - Reformat with perldoc bulkmarcimport.pl doc
Signed-off-by: Galen Charlton <gmcharlt@gmail.com>
2010-02-06 08:06:07 -05:00
Paul Poulain
a6e1f838ae bulkmarcimport : removing warnings 2010-01-28 15:11:56 +01:00
Henri-Damien LAURENT
ce3adab2ee bulkmarcimport.pl Bug Fix matching biblios enhanced
matching biblios is now also getting biblioitemnumber so that Items management can be performed
2009-11-23 21:40:13 +01:00
Paul Poulain
7cc1115cba adding error details 2009-11-17 16:27:12 +01:00
Henri-Damien LAURENT
663eb1edd6 Adding some error proof on GetMarcRecord
Signed-off-by: Galen Charlton <gmcharlt@gmail.com>
2009-09-30 11:29:24 +02:00
Henri-Damien LAURENT
7eca37db4f Authorities bulkmarcimport
Adding some new options to bulkmarcimport :
-k idtagsubfield in order to store the id of the file record into another field
-match tagsubfield,index
-a to import authorities
-l logfilename to store logs

Bug Fixing : C4/Charset.pm
Charset was incorrect for UNIMARC Authorities

Signed-off-by: Galen Charlton <gmcharlt@gmail.com>
2009-09-30 11:22:21 +02:00
Ricardo Dias Marques
f8ff5879a5 Bug 3582: Missing usage information for -h / --help switch for rebuild_nozebra.pl
Fix for Bug 3582:  Missing usage information for -h / --help switch for rebuild_nozebra.pl

http://bugs.koha.org/cgi-bin/bugzilla3/show_bug.cgi?id=3582

Signed-off-by: Galen Charlton <gmcharlt@gmail.com>
2009-09-06 12:48:35 -04:00
Sébastien Hinderer
2a8df0bc2f Get rid of a few warnings in the bulkmarcimport script: C4/Biblio.pm, hunks #1, #2: a warning occurring in NoZebra configurations. C4/Biblio.pm hunk #3: warning occurring in Unimarc MARC flavour. misc/migration_tools/bulkmarcimport.pl hunk #1: warning occurring when no default format is specified on command-line with -m switch.
Signed-off-by: Galen Charlton <gmcharlt@gmail.com>
2009-09-06 09:54:11 -04:00
Colin Campbell
3199d032e5 Avoid numeric comparisons with leading zeroes
Numbers in perl with leading zeros are interpreted in octal
Ensure that comparisons are done using string operators
or where appropriate use the MARC::Field method

Signed-off-by: Galen Charlton <gmcharlt@gmail.com>
2009-08-20 21:01:52 -04:00
Henri-Damien LAURENT
731b82f764 3519 : mergeauthority and authority edition were not synched
mergeauthority and ModAuthority were working on two separate directories.
So that no authority would ever be merged via cronjob or commandline script
when MergeAuthoritiesOnUpdate is disable

Signed-off-by: Galen Charlton <gmcharlt@gmail.com>
2009-08-11 19:30:14 -04:00
3caec55fd1 removed redundant license statement
The standard license statement in the header is fine; please
don't confuse things by doing anything different.

Signed-off-by: Galen Charlton <gmcharlt@gmail.com>
2009-08-01 08:17:52 -04:00
Paul Poulain
6b1df98ddf script to remove authorities without biblio attached
Signed-off-by: Galen Charlton <gmcharlt@gmail.com>
2009-08-01 08:10:01 -04:00
459d732180 Bug 3301 - Speed up rebuild_zebra script
With this patch, rebuild_zebra can re-index a whole Koha DB
quickly:

  rebuild_zebra -r -b -nosanitize

Biblio (authority) records are dump directly in a file
from marcxml field without beeing transformed into
MARC::Record object and corrected.

DOCUMENTATION:

rebuild_zebra.pl new paramater:

-nosanitize  export biblio/authority records directly from DB marcxml
             field without sanitizing records. It speed up
             dump process but could fail if DB contains badly
             encoded records. Works now only with -x and -b

Signed-off-by: Galen Charlton <galen.charlton@liblime.com>
2009-06-29 07:52:46 -05:00
Brian Harrington
6a2d9ffcf2 Bug 3313, bulkauthimport.pl skips MARC21 subdivision records.
This patch adds the MARC21 subdivsion record tags (18x) to the
block which recognizes and assigns authtypecodes to imported
authority records.

Signed-off-by: Galen Charlton <galen.charlton@liblime.com>
2009-06-08 17:03:03 -05:00
Galen Charlton
da51de184c bug 2926: fix staging import hang
Fixes a hang of the staging import tool when it
attempts to process a MARC21 record that claims
that it's UTF-8 when it is not.  The staging import
will now attempt to fix the character encoding of such
records.

Also added a FIXME to bulkmarcimport.pl, which because
of its use of MARC::Batch will skip over such records -
better than the original hang of the staging import, but
worse than the staging import's new ability to fix such
records.

Signed-off-by: Galen Charlton <galen.charlton@liblime.com>
2009-06-07 13:17:06 -05:00
Galen Charlton
3f4641bf30 bug 3201: missing090field.pl - skip bad bibs
Patch courtesy of G. Henry <henry@cmi.univ-mrs.fr>

Signed-off-by: Galen Charlton <galen.charlton@liblime.com>
2009-06-07 13:17:01 -05:00
J. David Bavousett
a7d1ab0041 Changes to bulkmarcimport.pl
Adds three new switches:

-idmap <filename> - optional output file of
                    map of source record ID numbers
                    to Koha biblionumber
-x                - if idmap is supplied, MARC tag
                    to get source record ID from
-y                - if idmap is supplied, MARC subfield
                    to get source record ID from

Signed-off-by: Galen Charlton <galen.charlton@liblime.com>
2009-04-03 19:18:29 -05:00
Henri-Damien LAURENT
911fddab4a merge_authority : Bug fixing
Signed-off-by: Galen Charlton <galen.charlton@liblime.com>
2009-03-06 14:14:34 -06:00
Mason James
e9599f973c Fixes command-line 'number' arg in bulkauthimport.pl.
for HEAD and 3.0.x

Signed-off-by: Galen Charlton <galen.charlton@liblime.com>
2009-03-04 10:43:49 -06:00
Brian Harrington
25cd35b3a1 bug 2924 fixed rebuild_zebra.pl to work when export is skipped
reindexing now occurs if there are $num_records_exported or if
$skip_export is set

Signed-off-by: Galen Charlton <galen.charlton@liblime.com>
2009-03-04 08:28:22 -06:00
Galen Charlton
8f07521a2d bug 2955: fix remaining calls to GetMarcFromKohaField
This includes part of a patch from Henri-Damien Laurent
that could not be applied because Chris and Joe patches
happened to win the race.

Signed-off-by: Galen Charlton <galen.charlton@liblime.com>
2009-02-12 16:29:19 -06:00
Joe Atzberger
11b90be284 Cleanup and perltidy.
Add "use warnings", remove unused variables and unnecessary finish/disconnect
at the end.  This script could be improved to run only on tables that need to
be altered instead of touching all of them.  It should also probably contain
warnings to the effect that it does not rescue your DATA that was forced into
whatever encoding the table used previously.

Signed-off-by: Galen Charlton <galen.charlton@liblime.com>
2009-01-28 17:29:43 -06:00
Michael Hafen
086b3ccf9a bug in rebuild_zebra verbose logging - found another print I didn't want to see all the time
Add the phrase 'if ( $verbose_logging )' to the two print statements
concerning the skipping of biblio or authority records.

I recently had to split biblio and authority index updating in my cron
script ( had some really big records so had to add the -x switch which
should only be used on biblios accourding to the help ).  So I noticed
that rebuild_zebra.pl printed messages that it was skipping biblios or
authorities.

This patch is to conditionalize those prints based on the verbose
logging switch.

Signed-off-by: Galen Charlton <galen.charlton@liblime.com>
2008-12-11 09:23:28 -06:00
Michael Hafen
62a590a954 Reduce logging from rebuild_zebra.pl with a command line option
This reduces the output of the script and zebraidx, and creates a -v
command line switch which will increase the logging to their former
states.

Signed-off-by: Galen Charlton <galen.charlton@liblime.com>
2008-10-01 13:05:20 -05:00
Henri-Damien LAURENT
ca8d24546e Bug Fixing merge_authority.pl
merge works on the fly now.
But for an obscure reason, merge_authority.pl fails to update database when lanched on command line.
Adding one table to LOCK for noZebra UPDATE in Biblio.pm
You should remove C4::Search from merg_authority.pl

Signed-off-by: Galen Charlton <galen.charlton@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2008-08-09 11:05:53 -05:00
Galen Charlton
df1f46f9da bug 2253: improve rebuild_zebra's handling of zebraqueue
Prior to this patch, rebuild_zebra.pl -z was effectively
hanging on to a lock on the zebraqueue table, preventing
other scripts from inserting new entries into the table.
This had the effect of causing circulation operations
to time out.

Refactored by having rebuld_zebra.pl pull the active
queue into memory, then mark entries done by zebraqueue.id.
Consequently, rebuild_zebra.pl should no longer
block adding new entries into zebraqueue.

Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2008-06-19 09:49:06 -05:00
Paul POULAIN
feae120738 BUGFIX : script to fix & fill onloan field in items table.
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2008-05-12 09:24:43 -05:00
Galen Charlton
a78b115d35 kohabug 2076 - make biblioitems.marc longblob during upgrade
Change to match 3.0 definition of that column.

Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2008-05-11 05:37:18 -05:00
Paul POULAIN
8e1844d495 missing )
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2008-05-05 05:39:13 -05:00
Paul POULAIN
e7209ed02a UNIMARC specific rebuild items correctly
note 995 for items is hardcoded, so it's really for UNIMARC only. The script exit if you're not UNIMARCflavour

Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2008-04-22 17:34:41 -05:00
Galen Charlton
3109d5820e rebuild_zebra.pl - add -y option
rebuild_zebra.pl will now mark all zebraqueue entries
of the affected record type(s) done when run in
normal mode to index all records (as opposed to running
it with -z to just process the zebraqueue).  This prevents
any running zebraqueue_daemon processes from attempting
to reindex the same records, redundantly.

The new -y swtich overrides this new behavior; in other words, if
running rebuild_zebra.pl without -z, you can specify
-y to *not* mark zebraqueue done.

Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2008-04-21 11:17:29 -05:00
004524584b Tweak bullmarcimport.pl
* Add a new parameter -o to begin importing input file after skiping
  n records.
* Enclose input file reading in an eval directive to avoid abording
  import if few records are corrupted: they are now skipped.
* Help formating.

Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2008-04-17 05:52:53 -05:00
Galen Charlton
e2c1f11715 fixed memory leak I introduced
Accidentally introducing a circular reference in a
MARC::Record object does not lead to goodness, particularly
if you export lots and lots of them.

Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2008-04-01 06:46:05 -05:00
Galen Charlton
4f001186b6 still more rebuild_zebra refactoring
Merged duplicate code for indexing bibs and
authorities into a single index_records() function.

Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2008-03-25 07:58:03 -05:00
Galen Charlton
a5576b8dfe IMPORTANT: added -z option to rebuild_zebra.pl
The -z option, when used in conjunction with -a and/or -b,
selects the records to reindex from the zebraqueue table.
Both record updates and record deletes are handled.

-z is cannot be used with -s or -r: the updated records
must always be freshly exported, and if zebraqueue
is to be processed, it's assumed that you don't want
to drop the Zebra index first.

This means that rebuild_zebra.pl -b -a -x can be
used as a cronjob to update the indexes periodically; it
is believed that this will offer much better indexing
performance on some setups as compared to zebraqueue_daemon.pl,
which uses Z39.50 extended services to send record updates
to Zebra.

Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2008-03-25 07:58:01 -05:00
Galen Charlton
57d128f727 rebuild_zebra: exit if both -a and -x specified
At moment using both -a (index authorities) and
-x (export records as MARC XML) is not allowed -
if the Zebra authority database is using the DOM
filter, zebraidx will not be able to process the
exported records correctly.

Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2008-03-25 07:57:44 -05:00
Galen Charlton
f0d5da7448 more rebuild_zebra.pl refactoring
1. Logic to fix up record IDs, UNIMARC 100 field,
   and record leader now in separate functions.
2. Removed (incorrect) logic to save corrected record
   in database.

Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2008-03-25 07:57:43 -05:00
Galen Charlton
f98c27a8bc refactor rebuild_zebra: new routine for invoking zebraidx
Created a routine for calling zebraidx, replacing
separate invocations for bibs and authorities.

Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2008-03-25 07:57:42 -05:00
Galen Charlton
ae8a76dacc rebuild_zebra.pl: removed disused $limit option
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2008-03-25 07:57:41 -05:00
Galen Charlton
b4f39e5c58 do not let MARC::Batch open MARC files
The version of MARC::Batch->new() distributed with version
2.0.0 of MARC::Record, if given a file name, will
open it using the ':utf8' layer.  This results in an
incorrect character conversion when processing records
in the MARC-8 character encoding.

To avoid this, batch jobs that use MARC::Batch now
open the file themselves, then pass the file handle
to MARC::Batch->new().

Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2008-03-21 21:46:39 -05:00
Galen Charlton
ad0639e548 remove some unneeded use statements
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2008-03-21 21:46:29 -05:00
Galen Charlton
4e95689287 bulkmarcimport.pl: XML input option documented
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2008-03-03 13:01:00 -06:00
Galen Charlton
d49873cc2f bulkauthimport.pl - various improvements
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2008-03-03 13:00:59 -06:00
Mason James
5057e74914 more \Q...\E wrapping on regexes, to handle occassionally problematic strings.
Signed-off-by: Chris Cormack <chris@bigballofwax.co.nz>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2008-02-28 07:58:45 -06:00
Mason James
d451f072f9 corrections to host-item, shelf_loc and collection-code indexes
Signed-off-by: Chris Cormack <chris@bigballofwax.co.nz>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2008-02-28 07:58:43 -06:00
Mason James
be14507658 oops, removing un-needed $dbh->commit() calls
Signed-off-by: Chris Cormack <chris@bigballofwax.co.nz>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2008-02-20 20:16:42 -06:00
Mason James
b57c146b26 setting $dbh->{AutoCommit} = 0, and adding a new --commit arg.
Signed-off-by: Chris Cormack <chris@bigballofwax.co.nz>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2008-02-20 20:16:41 -06:00
Paul POULAIN
0e2b065219 NoZebra : removing . and : before indexing
Signed-off-by: Chris Cormack <chris@bigballofwax.co.nz>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2008-02-19 20:27:36 -06:00
Paul POULAIN
bcf36122a6 speeding a lot rebuild_nozebra by using autocommit OFF feature
Signed-off-by: Chris Cormack <chris@bigballofwax.co.nz>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2008-02-19 20:27:24 -06:00
Mason James
bdd2afc747 a little speed tweak here, setting "SET FOREIGN_KEY_CHECKS = 0" *before* clearing bib/bi/items tables.
Signed-off-by: Chris Cormack <chris@bigballofwax.co.nz>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2008-02-18 22:07:09 -06:00
Ryan Higgins
71dd69d5ac add option to export and index xml to rebuild_zebra
Signed-off-by: Chris Cormack <crc@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2008-02-15 08:25:46 -06:00
Mason James
3cb4ea7ecf added 440* and 490* 'series' indexes
Signed-off-by: Chris Cormack <crc@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2008-02-11 16:14:54 -06:00
Galen Charlton
60a98d258a IMPORTANT - refactor MARC character set handling
* IsStringUTF8ish - determine if scalar contains a string in UTF8
* MarcToUTF8Record - convert MARC blob or MARC::Record to UTF8
* SetMarcUnicodeFlag - set appropriate MARC21 or UNIMARC field to
  indicate that record is in UTF-8.

Design points of this module include:

* No dependencies on other C4 modules, making it easier to add
  more test cases
* All character conversion code in one place
* Single entry point for doing a character conversion on a
  MARC record
* Capture of errors and warnings produced by Text::Iconv
  and MARC::Charset
* Start of support for guessing the source character set of
  a MARC record.

Several functions were moved from other scripts
or modules to C4::Charset:

* C4::Koha->FixEncoding (expanded and renamed
  MarcToUTF8Record)
* C4::Koha->char_decode5426
* fMARC8ToUTF8 from bulkmarcimport.pl (renamed
  _marc_marc8_to_utf8)

Several batch jobs were adjusted to use MarcToUTF8Record instead of
FixEncoding.

Signed-off-by: Chris Cormack <crc@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2008-02-03 07:23:56 -06:00
Daniel Bünzli
78f3e56e2c bulkauthimport fix
Signed-off-by: Chris Cormack <crc@liblime.com>
Signed-off-by: Galen Charlton <galen.charlton@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2008-01-22 07:20:28 -06:00
Joshua Ferraro
2a37c19dac Rudimentary import of MARC21 authorities
Also adding support for ingesting format MARCXML in bulkmarcimport and bulkauthimport

Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2008-01-04 21:30:17 -06:00
Joshua Ferraro
9c25d6368a improvements to INSTALL.debian, adding Symbols for currencies adding \n to make bulkmarcimport.pl prettier
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2008-01-03 21:28:37 -06:00
Galen Charlton
8c60e82605 fixed variable masking warnings found by perl -w
Signed-off-by: Chris Cormack <crc@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2008-01-03 20:23:59 -06:00
Galen Charlton
c2a0ed8077 item rework: replaced AddBiblioAndItems
Replace C4::Biblio::AddBiblioAndItems with two
things:

* An option to C4::Biblio::AddBiblio to defer writing
  biblioitems.marc and biblioitems.marcxml.  This
  option was created to give a significant
  speed boost to bulkmarcimport.pl, but is *not*
  recommended for general use.
* C4::Items::AddItemBatchFromMarc

This refactoring removes the need to have functions
in C4::Biblio and C4::Items that call each other's
private functions.

Signed-off-by: Chris Cormack <crc@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2008-01-03 16:26:16 -06:00
Galen Charlton
9d4d8897b2 item rework: various changes
* Move CheckItemPreSave to C4::Items (from C4::Biblio)
* Modified C4::Biblio::AddBiblioAndItems to use appropriate
   internal routines from C4::Items
* Moved GetItemnumberFromBarcode to C4::Items
* Removed duplicate C4::Biblio::_koha_new_items
* Removed disused C4::Biblio::MARCitemchange

Currently AddBiblioAndItems is a special routine that
uses private subs from both C4::Biblio and C4::Items.
This needs to be refactored.

Signed-off-by: Chris Cormack <crc@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2008-01-03 16:25:42 -06:00
Joshua Ferraro
5c23369af2 Fixing Database Definitions for Statuses *PARTIAL*
Prior to this fix, the status fields had three 'off' values, NULL, "",
and 0. I've reduced it to two in the db, removing the option for NULL, and
setting the default value to 0, however, we need to verify that we don't ever
write out as "" as this needlessly complicates the indexing process,
critical for searching or limiting by status (e.g., availability). Also,
queries that attempt to write a NULL value to one of these fields will fail
(based on my tests).

This patch includes the following changes:

* Updated the database definition for notforloan, damaged, itemlost, and
wthdrawn in kohastructure.sql to forbid NULL and default to 0; MySQL
can't forbid other values (such as empty ""), so this has to be handled
at the application layer and REQUIRES further patching.

* Fixed the 'limit by availability' query node in Search.pm to use a
much less confusing definition of 'available'

* Added code to set values to 0 where they are NULL or empty ( "" ) for
notforloan, damaged, itemlost or wthdrawn in both the MARC and the items
table:

  * Biblio.pm -> AddBiblioAndItems
  * catalogue/updateitem.pl
  * SEE NOTE BELOW, REQUIRES UPDATE TO THE REST OF KOHA'S ITEM MGT!

* Removed code in bulkmarcimport.pl that sets notforloan status depending
  on item-level or bib-level itemtype -- that flag is designed to be set
  only to override the notforloan setting for the item's (or bib's,
  depending on the syspref) assigned itemtype (it doesn't need to override
  to 'for loan', only to 'not for loan').

  added $dbh->do("truncate zebraqueue"); when operation is 'delete'

* I updated some notes in catalogue/updateitem.pl as to why ModItem can't be
used -- we don't have _a_ place where we can change the item and marc :/

  I've tested the following:

  bulkmarcimport.pl..........................MARC/items OK
  Staged Records Import......................NOT OK
  updateitem.pl (via moredetail.pl)..........MARC/items OK
  circulation.pl.............................NOT OK
  returns.pl.................................NOT OK
  addbiblio.pl...............................NOT OK
  additem.pl.................................NOT OK

Basically, there isn't a single place to apply this patch that will
update both item data and MARC data in one place ... a future patch
needs to address this issue.

Signed-off-by: Galen Charlton <galen.charlton@liblime.com>
Signed-off-by: Chris Cormack <crc@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2008-01-03 16:23:04 -06:00
Chris Cormack
c7215e7a93 Escaping the $title in the regexes with \Q and \E to handle nested quantifiers
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2008-01-03 01:20:40 -06:00
Paul POULAIN
319a32b16e rebuild_zebra : directories updated
the unimarc stuff has been moved to marc_defs directory and the
lang specific is in lang_defs

Signed-off-by: Chris Cormack <crc@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2008-01-03 00:55:12 -06:00
Joshua Ferraro
554bbe1bda s/Waited/Expected/ for serials statuses reformating rebuild_nozebra.pl indexes
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2008-01-01 12:59:28 -06:00
Joshua Ferraro
dd3f557f53 fixing nomenclature on files in misc/, adding a few new utilities
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2007-12-30 12:13:34 -06:00
Joshua Ferraro
c6ddddad98 adding a new option, -w, which disables shadow indexing for the current batch (faster indexing of large sets where ACID isn't critical)
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2007-12-30 12:13:27 -06:00
Galen Charlton
3508933c66 bulkmarcimport: enable MARC-8 to UTF-8 conversion
Enabled automatic conversion of MARC-8 records to
UTF-8.  Record is converted if its Leader/09 contains
a blank and the -s (skip) option hasn't been supplied
on the command-line.  Any record that cannot be converted
to UTF-8 is skipped.

Also now use Unicode Normalization Form C (NFC) for
records converted from MARC-8.

Signed-off-by: Chris Cormack <crc@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2007-12-25 09:08:38 -06:00
Galen Charlton
d426a91d0e removed extraneous comments
Signed-off-by: Chris Cormack <crc@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2007-12-25 09:08:35 -06:00
Galen Charlton
cb6cf680bc improved error detection in AddBiblioAndItems
Introduced new C4::Biblio function CheckItemPreSave,
which checks for duplicate barcodes and invalid
branch codes.  Not yet sure whether this function
needs to be exported or whether it will just be
used internally to C4::Bibli.

Signed-off-by: Chris Cormack <crc@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2007-12-25 09:08:34 -06:00
Galen Charlton
6b49df4c3f removed superfluous comments
Signed-off-by: Chris Cormack <crc@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2007-12-25 09:08:31 -06:00
Galen Charlton
7d47666f7e bulk MARC record import - speed improved
Changes to improve speed of MARC bib and item
imports:

[1] Turn off autocommit and commit database
    transactions in larger batches.
[2] Introduce a new C4::Biblio function (AddBiblioAndItems)
    to combine AddBiblio and AddItems -- this is faster
    because we are not parsing the MARC XML of the biblio
    every time we add an item.
[3] Introduce FasterTransformMarcToKoha, which is much
    faster than TransformMarcToKoha.  The new version,
    which will replace the old one once it has been
    fully tested, scans through each field in the
    MARC record just once, instead of potentially
    dozens of times.
[4] Remove code in bulkmarcexport that moved the
    item tags to separate MARC::Record objects.

Signed-off-by: Chris Cormack <crc@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2007-12-25 09:08:28 -06:00
Galen Charlton
4609608ccc allow use of older version of File::Temp
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2007-12-22 22:58:12 -06:00
Galen Charlton
93beb943c0 bug 1661: rebuild_zebra.pl changes
[1] Use File::Temp to create and manage
    export directory if -d is not specified.
[2] Added usage message.
[3] Code that attempts to fix up Zebra
    configuration files changed so that it
    is invoked only if --munge-config option
    is supplied; this code will ultimately
    either be removed or moved to a separate
    script -- the sorts of errors that it
    tries to fix should no longer be appearing
    in a standard install.
[4] Fixed Win32 portability problem when removing
    temporary directory.

Signed-off-by: Chris Cormack <crc@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2007-12-20 19:19:43 -06:00
Henri-Damien LAURENT
bdade9bc9d Adding rebuild field 100$a for Unimarc
Signed-off-by: Chris Cormack <crc@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2007-12-20 18:40:43 -06:00
Henri-Damien LAURENT
c9fb20928b Generating index for authorities on AUTHtypecode from table auth_header
Signed-off-by: Chris Cormack <crc@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2007-12-20 18:30:50 -06:00
Galen Charlton
b8a58c4934 installer: command-line scripts improve finding C4 modules
Command-line scripts now use a new SCRIPT_DIR/kohalib.pl
to put installed location of Koha's Perl modules
into @INC.
2007-12-17 09:13:54 -06:00
Mason James
d9a9e06556 updated MARC21 indexes, with authorites too. v2
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2007-12-14 07:43:44 -06:00
Joe Atzberger
377db43117 C4 and misc: permissions fixes
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2007-12-13 19:00:34 -06:00
Galen Charlton
ad4e02f91d warn on attempts to add duplicate item barcodes during batch import
Signed-off-by: Chris Cormack <crc@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2007-12-02 15:06:24 -06:00
Paul POULAIN
262a6e2a9a Updating rebuild_zebra.pl : now uses etc config files
There are only 2 UNIMARC specific files (.abs and .chr), they have been moved to etc/zebradb

The rebuild_zebra.pl takes all config file from this location now.
the misc/zebra/ can be removed (and will be soon)

Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2007-11-25 17:07:46 -06:00
Joshua Ferraro
cee40a741d adding $DEBUG warnings to nozebra
Signed-off-by: Chris Cormack <crc@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2007-11-24 09:07:44 -06:00
Paul POULAIN
f38b7598fc still handling better dirty MARC records
this time it's when a biblio don't have biblionumber, has a 100$a field, and it's invalid.

1 biblio in my 300 000 DB (and it was biblio 294 359, of course !)

Signed-off-by: Chris Cormack <crc@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2007-11-20 16:20:50 -06:00
Mason James
78abbe94d3 little SQL typo fix, now builds 'NoZebraIndexes' index mapping correctly.
Signed-off-by: Chris Cormack <crc@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2007-11-20 16:19:05 -06:00
Paul POULAIN
f1bca9ba50 missing biblionumber AND missing unimarc 100 was not properly handled
now, adding both on the fly when needed.
(had 2 biblios like that in a 290 000 DB, but was enought to have M::F::X complaining & diing !)

Signed-off-by: Chris Cormack <crc@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2007-11-17 11:25:07 -06:00
Paul POULAIN
ef1ac56857 handling wrong MARC record better
Signed-off-by: Chris Cormack <crc@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2007-11-12 17:13:00 -06:00
Mason James
c846ed00db utf8 handling fixes 'Wide character in print at' encoding errors.
Signed-off-by: Chris Cormack <crc@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2007-11-12 17:10:17 -06:00
Mason James
a51118833c wrapping AddBiblio(), and AddItem() in evals{} to protect import from failure due to bad records.
Signed-off-by: Chris Cormack <crc@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2007-11-11 18:44:13 -06:00
Mason James
f6b17c1de9 wrapping write to *.iso file in eval{}, to handle failure, caused by bad record.
Signed-off-by: Chris Cormack <crc@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2007-11-11 18:44:12 -06:00
Paul POULAIN
9149a711fb bugfixes to config files for zebra 2.0.18
those 2 lines are invalid

Signed-off-by: Chris Cormack <crc@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2007-11-08 17:50:00 -06:00
Paul POULAIN
b7eb9e1b5c rebuild_zebra now handle correctly improper authorities records
(missing 100 field are automatically added)

Signed-off-by: Chris Cormack <crc@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2007-11-07 08:18:24 -06:00
Paul POULAIN
bb5cea8e56 deal with wrong authorities when exporting for zebra
(authorities that don't have a 001 field containing authid)

also comment some code when exporting biblios (NOT tested, hdl,pls confirm this commit)

Signed-off-by: Chris Cormack <crc@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2007-11-07 08:18:19 -06:00
Paul POULAIN
89b9e8f8c1 skip empty records (new GetMarcRecord behaviour that returns empty string and not empty MARC::Record)
Signed-off-by: Chris Cormack <crc@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2007-10-31 19:41:49 -05:00
Paul POULAIN
1cd11f4d54 fixes in NoZebra search & indexing
- the quotemeta was wrong (and introduced some bugs in diacritics)
- fixing some bugs that appear only sometimes : the union was done including weight, which is wrong & resulted in missing some results (when various weighting)

Signed-off-by: Chris Cormack <crc@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2007-10-31 05:53:36 -05:00
Paul POULAIN
fa26bcc037 rebuild_unimarc_100 : better handling of unusual cases
If 100$a repeated, the scripts failed to handle that correctly

Signed-off-by: Chris Cormack <crc@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2007-10-24 17:08:56 -05:00
Paul POULAIN
cd8a565a6a temp
Signed-off-by: Chris Cormack <crc@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2007-10-24 17:08:40 -05:00
Paul POULAIN
837e5c5e94 less verbose
Signed-off-by: Chris Cormack <crc@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2007-10-24 17:06:36 -05:00
Joshua Ferraro
9d29ce5d58 improvements to zebra configuration files
Signed-off-by: Chris Cormack <crc@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2007-10-21 19:14:12 -05:00
Paul POULAIN
1ac38782a1 #1474 : Bulkmarcimport croaks when Log is ON
set to 0 and restore at the end of the import

Signed-off-by: Chris Cormack <crc@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2007-10-11 14:53:59 -05:00
Paul POULAIN
057d654a5b skipping wrong XMLs when rebuilding nozebra indexes
Signed-off-by: Chris Cormack <crc@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2007-10-09 19:11:47 -05:00
Paul POULAIN
49ef1df969 Adding a new option to rebuildzebra : noxml
This option uses the iso2709 version of the MARC record instead of the XML one
(biblioitems.marc vs biblioitems.marcxml)
No change if the parameter is not set.

Signed-off-by: Chris Cormack <crc@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2007-10-09 19:07:36 -05:00
Joshua Ferraro
827d27111f adding barcode index
Signed-off-by: Chris Cormack <crc@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2007-10-06 21:46:02 -05:00
Paul POULAIN
375d2f1158 (minor) updating doc & removing warn
Signed-off-by: Chris Cormack <crc@liblime.com>
2007-10-03 14:57:12 -05:00
Chris Catalfo
502487e2ba Added basic MARC21 index definitions.
Signed-off-by: Chris Cormack <crc@liblime.com>
2007-10-02 15:38:32 -05:00
Paul POULAIN
6f7efca7e1 BUGFIX for browser and nozebra tables
- adding browser and nozebra table definition to kohastructure & updatedatabase
- bumping to 3.00.00.005

Signed-off-by: Chris Cormack <crc@liblime.com>
2007-10-02 04:35:49 -05:00
Joshua Ferraro
ae34e8f45a changing the name of the zebra password file to passwd
Signed-off-by: Chris Cormack <crc@liblime.com>
2007-10-01 23:14:47 -05:00
Joshua Ferraro
b87d4924b9 commenting out set_service_options, but also removes commit op
Signed-off-by: Chris Cormack <crc@liblime.com>
2007-10-01 17:40:31 -05:00
Ryan Higgins
c44efe7b84 fix bad call to GetMarcFromKohaField in bulkmarcimport, and add -fk param, allowing disabling of fk constraints during import.
Signed-off-by: Chris Cormack <crc@liblime.com>
2007-09-30 21:16:50 -05:00
Paul POULAIN
0d7a4aafd0 BUGFIX : NoZebra indexing was wrong for accented words
Signed-off-by: Chris Cormack <crc@liblime.com>
2007-09-26 05:28:37 -05:00
Paul POULAIN
623ac80330 BUGFIXES : 3 (marc_biblio, check biblionumber, ModMarcBiblio API)
- use biblio instead of marc_biblio,
- better check that biblionumber is correctly stored
- fix an buggy API call when ModMarcBiblio

Signed-off-by: Chris Cormack <crc@liblime.com>
2007-09-13 17:18:50 -05:00
Paul POULAIN
ec7bd0b2ff (unimarc specific) BUGFIX : if 100$a exist but is not 35 char long, MARC::File::XML may fail
So, add blanks if needed...

Signed-off-by: Chris Cormack <crc@liblime.com>
2007-09-13 17:17:56 -05:00
tipaul
1399945a75 eval() on getAuthority & getBiblio to avoid a script failure 2007-08-01 09:20:03 +00:00
toins
5e7b171686 adding an eval to don't die if an error occurs 2007-07-19 09:48:22 +00:00
tipaul
23427c51b9 some fixes (and only fixes) 2007-06-15 13:44:44 +00:00
toins
6dfb0dca36 next if there is an error getting the biblio. 2007-06-11 15:22:59 +00:00
toins
4728830e34 it's faster to 'truncate' instead of using 'delete from'... 2007-06-08 09:41:14 +00:00
tipaul
5dd3f0229a bugfixes (various), handling utf-8 without guessencoding (as suggested by joshua, fixing some zebra config files -for french but should be interesting for other languages- 2007-06-06 13:08:35 +00:00
btoumi
68bcf35387 delete space in beggining of the script to accept script launch 2007-05-25 10:00:54 +00:00
tipaul
0569dccd5f some changes to default zebra config for better searches 2007-05-25 09:34:30 +00:00
tipaul
651b075197 small script to check XML parser. Remember that PurePerl Parser is buggued and can t handle utf8 correctly 2007-05-25 09:33:58 +00:00
tipaul
5ff7fcffa4 Bugfixes & improvements (various and minor) :
- updating templates to have tmpl_process3.pl running without any errors
- adding a drupal-like css for prog templates (with 3 small images)
- fixing some bugs in circulation & other scripts
- updating french translation
- fixing some typos in templates
2007-05-22 09:13:54 +00:00
tipaul
ca201e36af Koha NoZebra :
- support for authorities
- some bugfixes in ordering and "CCL" parsing
- support for authorities <=> biblios walking

Seems I can do what I want now, so I consider its done, except for bugfixes that will be needed i m sure !
2007-05-10 14:45:15 +00:00
tipaul
e1d907c688 various bugfixes on parameters modules + adding default NoZebraIndexes systempreference if it's empty 2007-05-04 16:24:08 +00:00
tipaul
3e85c9e97f NoZebra SQL index management :
* adding 3 subs in Biblio.pm
- GetNoZebraIndexes, that get the index structure in a new systempreference (added with this commit)
- _DelBiblioNoZebra, that retrieve all index entries for a biblio and remove in a variable the biblio reference
- _AddBiblioNoZebra, that add index entries for a biblio.
Note that the 2 _Add and _Del subs work only in a hash variable, to speed up things in case of a modif (ie : delete+add). The effective SQL update is done in the ModZebra sub (that existed before, and dealed with zebra index).
I think the code has to be more deeply tested, but it works at least partially.
2007-05-02 16:44:31 +00:00
tipaul
4213b6ec98 improving NOzebra search :
- changing nozebra table to have biblionumber,title-ranking; (; is the entry separator. Now, if a value is several times in an index, it is stored only once, with a higher ranking (the ranking is the number of times the word appeard for this index)
- improving search to have ranking value (default order). The ranking is the sum of ranking of all terms. The list is ordered by ranking+title, from most to lower
2007-05-02 11:57:11 +00:00
hdl
097fef712a Removing $dbh from GetMarcFromKohaField (dbh is not used in this function.) 2007-04-27 14:00:48 +00:00
tipaul
b53be9cdaf Koha 3.0 nozebra 1st commit : the script misc/migration_tools/rebuild_nozebra.pl build the nozebra table, and, if you set NoZebra to Yes, queries will be done through zebra. TODO :
- add nozebra table management on biblio editing
- the index table content is hardcoded. I still have to add some specific systempref to let the library update it
- manage pagination (next/previous)
- manage facets
WHAT works :
- NZgetRecords : has exactly the same API & returns as zebra getQuery, except that some parameters are unused
- search & sort works quite good
- CQL parser is better that what I thought I could do : title="harry and sally" and publicationyear>2000 not itemtype=LIVR should work fine
2007-04-25 16:26:42 +00:00
tipaul
6b201757c1 some bugfixes for this script that automatically build zebra DB from default config files 2007-04-17 08:50:33 +00:00
tipaul
eba2552086 Code cleaning of Biblio.pm (continued)
All subs have be cleaned :
- removed useless
- merged some
- reordering Biblio.pm completly
- using only naming conventions

Seems to have broken nothing, but it still has to be heavily tested.
Note that Biblio.pm is now much more efficient than previously & probably more reliable as well.
2007-03-29 16:45:53 +00:00
tipaul
a481fad4b7 Code cleaning :
== Biblio.pm cleaning (useless) ==
* some sub declaration dropped
* removed modbiblio sub
* removed moditem sub
* removed newitems. It was used only in finishrecieve. Replaced by a Koha2Marc+AddItem, that is better.
* removed MARCkoha2marcItem
* removed MARCdelsubfield declaration
* removed MARCkoha2marcBiblio

== Biblio.pm cleaning (naming conventions) ==
* MARCgettagslib renamed to GetMarcStructure
* MARCgetitems renamed to GetMarcItem
* MARCfind_frameworkcode renamed to GetFrameworkCode
* MARCmarc2koha renamed to TransformMarcToKoha
* MARChtml2marc renamed to TransformHtmlToMarc
* MARChtml2xml renamed to TranformeHtmlToXml
* zebraop renamed to ModZebra

== MARC=OFF ==
* removing MARC=OFF related scripts (in cataloguing directory)
* removed checkitems (function related to MARC=off feature, that is completly broken in head. If someone want to reintroduce it, hard work coming...)
* removed getitemsbybiblioitem (used only by MARC=OFF scripts, that is removed as well)
2007-03-29 13:30:31 +00:00
tipaul
f8e9fb6445 rel_3_0 moved to HEAD (introducing new files) 2007-03-09 15:34:17 +00:00
tipaul
a3999812e6 rel_3_0 moved to HEAD 2007-03-09 14:52:58 +00:00
thd
ad657e71eb For MARC 21, instead of deleting the whole subfield when a character does not
translate properly from MARC8 into UTF-8, only the problem characters are
deleted.
2006-09-01 17:11:53 +00:00
toins
eac83ccd45 Head & rel_2_2 merged 2006-07-04 15:02:42 +00:00
rangi
10b2315eb3 Fixing the problem that all items were getting biblioitem=1 set 2006-04-01 22:10:50 +00:00
kados
44b4d37b54 removed Zconns, no need for them anymore with new Context.pm setup 2006-02-27 01:06:30 +00:00
kados
fafe0896d6 minor bugfix with 'commit' option 2006-02-25 23:40:59 +00:00
kados
77abbe2caf A bulkmarcimport.pl that is based on the new Biblio.pm Zebra routines.
It now responds to:

-n : the number of records to import.
-commit : the number of records to wait before performing a 'commit' operation

ALSO: IMPORTANT: I took out the char_encoding as this should be handled by
MARC::File::XML now, unless I'm mistaken.
2006-02-25 21:53:48 +00:00
tipaul
f74823bf1b OK, this time it seems to work. The last blocking problem was... a space in
recordId: (bib1,Identifier-standard) just after the comma. Adam agreed it was a bug, and it should be solved soon. But now we are aware, we can avoid putting the space !

In this commit you have all what is needed to setup a working zebra DB in Unimarc :
* collection.abs is UNIMARC specific and must be rewritten for MARC21, in marc21 directory
* pdf.properties is to be copied unmodified in the marc21 directory (can also be put somewhere else)
* rebuild_zebra.pl is SLOW, but 1 step reindexing tool, using ZOOM
* rebuild_zebra_idx is FAST, but 2 step reindexing tool, and does not use zebra. run it, it will create all biblios XML files in /zebra/biblios directory, then zebraidx update biblios in your zebra directory
* zebra.cfg is the zebra config file ;-)
* test_cql2rpn.pl is a script that will query the database and show the results. Works for me, just change the query at the beginning to get answers you expect.

What has to be done :
* benchmarking : it seems the zebraidx update is faster than lightning (400biblios/sec : 10 000biblios in 25seconds), while ZOOM indexing is slow (something like 25biblios/second) More benchmarking could be done.
* completing collection.abs for UNIMARC. I'll take care of it.
* modifying Biblio.pm to use ZOOM instead of the "zebraidx through exec" running actually. I'll take care of it also.
* modify the search API & tools & screens. I'll let the ball to someone else (chris ?) for this. I agree SearchMarc.pm can be dropped and replaced by something else (maybe a new-and-clean Search.pm package)
2006-02-09 10:59:34 +00:00
tipaul
369ee65d94 new version of rebuild_zebra. Should work with Perl-ZOOM, but DOES NOT WORK for me.
I get  :
ZOOM error 10002 "Encoding failed" from diag-set 'ZOOM'

help expected from indexdata...
2006-01-10 17:03:32 +00:00
tipaul
d5938493d7 synch'ing head and rel_2_2 (from 2.2.5, including npl templates)
Seems not to break too many things, but i'm probably wrong here.
at least, new features/bugfixes from 2.2.5 are here (tested on some features on my head local copy)

- removing useless directories (koha-html and koha-plucene)
2006-01-06 16:39:37 +00:00
tipaul
dba37f38e7 This script can be use to rebuild the zebra DB. It stores all koha MARC records in iso2709, in the bilbios directory. After that, you just have to "zebraidx update biblios"
I tried on a 9900 DB, here are the results :

[paul@bureau migration_tools]$ ./rebuild_zebra.pl -c
9900
9903 MARC record done in 37.9104120731354 seconds

[paul@bureau zebra]$ zebraidx update biblios
<snip>
18:31:24-11/08 zebraidx(20348) [log] Iterations . . . 144575
18:31:24-11/08 zebraidx(20348) [log] Distinct words .  39891
18:31:24-11/08 zebraidx(20348) [log] Updates. . . . .     46
18:31:24-11/08 zebraidx(20348) [log] Deletions. . . .      2
18:31:24-11/08 zebraidx(20348) [log] Insertions . . .  39843
18:31:24-11/08 zebraidx(20348) [log] zebra_register_close p=0x8104cf8
18:31:25-11/08 zebraidx(20348) [log] Records:    9887 i/u/d 9881/6/0
18:31:25-11/08 zebraidx(20348) [log] user/system: 531/145
18:31:25-11/08 zebraidx(20348) [log] zebra_stop
18:31:25-11/08 zebraidx(20348) [log] zebraidx times: 11.33  5.31  1.45
2005-08-11 16:35:54 +00:00
tipaul
c52e5b61dd synch'ing 2.2 and head 2005-08-04 14:10:52 +00:00
tipaul
64cd740d2b synch'ing 2.2 and head 2005-05-04 08:58:30 +00:00
tipaul
93ff09d081 merging 2.2 branch with head. Sorry for not making it before, many many commits done here 2005-03-01 13:40:35 +00:00
tipaul
51e204fa23 moving bulkmarcimport script to migration_tools directory 2005-01-03 15:25:50 +00:00
tipaul
cd6f87a689 Auto-build LANG authorized values 2005-01-03 12:59:49 +00:00