It now responds to:
-n : the number of records to import.
-commit : the number of records to wait before performing a 'commit' operation
ALSO: IMPORTANT: I took out the char_encoding as this should be handled by
MARC::File::XML now, unless I'm mistaken.
recordId: (bib1,Identifier-standard) just after the comma. Adam agreed it was a bug, and it should be solved soon. But now we are aware, we can avoid putting the space !
In this commit you have all what is needed to setup a working zebra DB in Unimarc :
* collection.abs is UNIMARC specific and must be rewritten for MARC21, in marc21 directory
* pdf.properties is to be copied unmodified in the marc21 directory (can also be put somewhere else)
* rebuild_zebra.pl is SLOW, but 1 step reindexing tool, using ZOOM
* rebuild_zebra_idx is FAST, but 2 step reindexing tool, and does not use zebra. run it, it will create all biblios XML files in /zebra/biblios directory, then zebraidx update biblios in your zebra directory
* zebra.cfg is the zebra config file ;-)
* test_cql2rpn.pl is a script that will query the database and show the results. Works for me, just change the query at the beginning to get answers you expect.
What has to be done :
* benchmarking : it seems the zebraidx update is faster than lightning (400biblios/sec : 10 000biblios in 25seconds), while ZOOM indexing is slow (something like 25biblios/second) More benchmarking could be done.
* completing collection.abs for UNIMARC. I'll take care of it.
* modifying Biblio.pm to use ZOOM instead of the "zebraidx through exec" running actually. I'll take care of it also.
* modify the search API & tools & screens. I'll let the ball to someone else (chris ?) for this. I agree SearchMarc.pm can be dropped and replaced by something else (maybe a new-and-clean Search.pm package)
Seems not to break too many things, but i'm probably wrong here.
at least, new features/bugfixes from 2.2.5 are here (tested on some features on my head local copy)
- removing useless directories (koha-html and koha-plucene)
* go to koha cvs home directory
* in misc/zebra there is a unimarc directory. I suggest that marc21 libraries create a marc21 directory
* put your zebra.cfg files here & create your database.
* from koha cvs home directory, ln -s misc/zebra/marc21 zebra (I mean create a symbolic link to YOUR zebra directory)
* now, everytime you add/modify a biblio/item your zebra DB is updated correctly.
NOTE :
* this uses a system call in perl. CPU consumming, but we are waiting for indexdata Perl/zoom
* deletion still not work
* UNIMARC zebra config files are provided in misc/zebra/unimarc directory. The most important line being :
in zebra.cfg :
recordId: (bib1,Local-number)
storeKeys:1
in .abs file :
elm 090 Local-number -
elm 090/? Local-number -
elm 090/?/9 Local-number !:w
(090$9 being the field mapped to biblio.biblionumber in Koha)
how it works :
* create the table marc_Tword with the following structure :
CREATE TABLE `marc_Tword` (
`word` varchar(80) NOT NULL default '',
`usedin` text NOT NULL,
`tagsubfield` varchar(4) NOT NULL default '',
PRIMARY KEY (`word`,`tagsubfield`)
) TYPE=MyISAM;
* open a console & type export PERL5LIB & export KOHA_CONF as usual.
* fill this table with misc/build_marc_Tword.pl. Warning, this script uses a very very consumming but very fast method to fill the table : it does everything in memory, then write everything. Another method is provided (& commented), but it's 100x times slower (really !)
* open opac-search.pl and replace use C4::SearchMarc; by use C4::SearchMarcTest; as the API hasn't changed, it will work immediatly.
* go to opac-search (advanced search) & search whatever you want. Should work fine.
LIMITS :
* build_marc_Tword has problem with extended chars (accented ones mainly). So don't be afraid if you get sql errors. They are not a problem for a POC
* search works always order by title, whatever you choose.
* search works only search WORDA and WOARDB, not yet WORDA or WORDB or WORDA except WORDB.
Make the generated pot file (i.e., result of "create") look more "real",
but using msgmerge to reformat the output
Script failed to create intermediate directories if the directory of the
target does not exist and the parent of that directory does not exist
either. This should fix that.
The biblio & the items are separated.
The bug was that the item was still in the biblio (thus being added twice : as "biblio" and as "item")
thanks to dana for pointing the bug.
the user can select sql-datas to add to it's DB
can be populated by any SQL
french UNIMARC sample being added really soon
- framework for monography
- framework for URL
- framework for ...
- Personal UNIMARC autority
...
table with a reduced number of tags (only those
tags that should be searched) allowing for
faster and more accurate searching when used
with the SearchMarc routines. Make sure that
the MARCaddword routine in Biblio.pm will index
characters >= 1 char; otherwise, searches like
"O'brian, Patrick" will fail as the search
routines will seperate that query into "o",
"brian", and "patrick". (If "o" is not in the
database the search will fail)
spurious context to be recognized. In particular, the bugs fixed are:
1. Failure to recognize INPUT element at the end, e.g., if the input has
the form "Item number:%S", then the pattern was recognized as only
"Item number".
2. Failure to remove matching <foo></foo> tags if the pattern contains
INPUT or TMPL_VAR; e.g., if the input has the form "<h1>%s %s</h1>",
the form would not be simplified to "%s %s".
Unfortunately, fixing these 2 bugs will cause about 40 fuzzies to appear.
Amazon and store them in a ratings table in the Koha db
ratings.sql is the table definition
My next task is to extend it to work for cd's and dvds as well.
fines.pl will levy fines and place them on patrons accounts, it will also then attempt to notify
the borrowers.
Either by email, sms, fax etc depending on what the patron has set as preferred contact.
Email and letters are currently working fine, fax, and sms need to be worked on.
Test with caution in your library, you dont want to be bombarding patrons with emails :)
- If a string is enclosed by a tag, remove that tag from the extracted string
- Generate automatic comments to provide more information for the translator
- A couple bug fixes
It seems to be still correct, and it is no longer complaining about syntax
errors when seeing commented-out HTML (esp. with TMPL_* directives).
Don't try to translate stuff between <title>...</title> too, the stuff in
the middle is supposed to be PCDATA.
(Actually, xgettext.pl did not know how to handle them in the files listed
in the list of files.)
If the po file is empty (corrupted), $href->{'""'} will be undefined.
We just blindly dereferenced this null value without checking.
Minor comment update
Removed forced backups and the comment about interrupting xgettext.pl
corrupting the po file, now that we seem to be detecting the situation.
when minor whitespace changes occur in the original templates; it also
makes the strings much easier to read (e.g., instead of "foo\n\n\t\t bar",
xgettext.pl will now always generate "foo bar" and tmpl_process3.pl will
understand it to be the same as the original string).
Early termination of analysis if we encounter some strings, such as </h1>
or | or ||, in order to avoid extracting strings that are unnecessarily
long and which doesn't add any meaningful context.
to be combined together into one string. This seems to have the desired
effect (that "<b>foo</b> bar" type strings are now recognized in one piece).
However, "<h1>foo</h1>\nexplanation"-type things may now also be (arguably
wrongly) recognized as one piece.
either iso8859-1 or utf8, msgmerge(1) won't crap out. The code is ugly;
the conversion table is hard-coded, and in some place not very appropriate.
However, this does fix the case where a few strings containing French
characters can't be translated. As a side effect, tmpl_process3 can now
also be used for French or other languages using iso8859-1.
gives better strings. (Always allowing combinations gives havoc, we
currently avoid this by allowing combination only if the first and last
tokens are both TEXT.)
word order is too different than the word order of the target language to
yield meaningful translations.
The new scripts use a different translation file format (namely standard
gettext-style PO files).
This seems to reasonably work (e.g., producing an empty en_GB translation
then installing seems to not corrupt the "translated" files), but it likely
will still contain some bugs. There is also little documentation, but try
to run perldoc on the .p[lm] files to see what's there. There are also some
spurious warnings (both from bugs in the new scripts and from buggy third-
party Locale::PO module).