gbv / catmandu-pica Goto Github PK

View Code? Open in Web Editor NEW

4.0 4.0 4.0 394 KB

Catmandu modules for working with PICA+ data

Home Page: https://metacpan.org/release/Catmandu-PICA

License: Other

Perl 41.12% Pascal 58.88%

code4lib

catmandu-pica's People

Contributors

Stargazers

Watchers

Forkers

julianladisch jorol choroba cklee

catmandu-pica's Issues

Test fails on Windows

Test forXML importer fails on some Windows system, see cpantesters.

Error message

t/02-importer.t ......... 
Dubious, test returned 255 (wstat 65280, 0xff00)
No subtests run 
Read more bytes than requested. Do you use an encoding-related PerlIO layer? at C:\STRAWB~1\cpan\build\PICA-Data-0.26-vHTK4a/blib/lib/PICA/Parser/XML.pm line 46.

Catmandu::Importer::PICA with windows line endings

Importing a plain PICA record with Windows line endings (CR LF) gives me something like:

{
   "_id" : "67890\r",
   "record" : [
      [
         "003@",
         "",
         "0",
         "67890\r"
      ]
   ]
}]

with spare \r. Are PICA records expected to have Unix line endings?

Module does not provide support for the normalized CBS pica format with delimiters

Our CBS uses a pica+ format with delimiters \x1d, \x1e, \x1f which is still human readable, i.e. with line breaks after each field. Catmandu seems not to be able to produce this format. We use this format for many years because it can be directly loaded into CBS but is still human readable which makes it easier to find errors.

plain creates a file without delimiters
plus and binary use delimiters but they are not human readable (everything in one line)
generic seems not to be implemented in my catmandu version

Digi20-20220118.0.zip

add "pluck" argument to pica_map

... to copys subfields into the result hash in the order of the mapping.

See https://metacpan.org/source/HOCHSTEN/Catmandu-MARC-0.211/lib/Catmandu/Fix/marc_map.pm

Add Catmandu::Fix::bind::pica_patch

This should result in a PICA Patch record that modifies 003A$0 and adds 004F$af:

do pica_patch
  pica_set('dc.identifier', '003A$0')
   pica_add('dc.subjects', '004F$af')
end

pica_add() doesn't work if hash key 'record' not exists

$ echo '{"title":"code4lib"}' | catmandu convert JSON to PICA --type Plain --fix 'pica_add(title,021A$a)'
PICA record must be array reference
$  echo '{"title":"code4lib"}' | catmandu convert JSON to PICA --type Plain --fix 'pica_add(title,021A$a,record:"pica")'
PICA record must be array reference
$echo '{"title":"code4lib"}' | catmandu convert JSON to YAML --fix 'pica_add(title,021A$a)'
---
title: code4lib
...

Works if key is defined beforehand:

echo '{"title":"code4lib"}' | catmandu convert JSON to PICA --type Plain --fix 'set_array(record);pica_add(title,021A$a)'
021A $acode4lib

Bug in build system

$ dzil smoke --release --author
...
Can't call method "content" on an undefined value at /home/travis/perl5/perlbrew/perls/5.20/lib/site_perl/5.20.0/Dist/Zilla/Plugin/MakeMaker.pm line 291.

See https://travis-ci.org/gbv/Catmandu-PICA/jobs/78397673

Invalid characters in xml

I just came across a problem with character ranges below 0x20, which are present in initial plain pica files. Catmandu perfectly converts them to pica xml, but by doing so Catmandu actually generates invalid xml files, which cannot be parsed by xsl processors.

Example file (one record in plain pica, field 233P contains the unicode character 0x2)
record.txt

Command
catmandu convert -v PICA --type binary to PICA --type XML < record.txt > record.xml

The resulting xml cannot be transformed (e.g. to solr xml) via xmlstarlet, xsltproc or Saxon HE. For now i delete all characters which are not allowed in xml according to the specification by using

tr -d '\000-\010\013\014\016-\037'

I wonder if Catmandu should handle this problem and only generate valid xml. What do you think?

Split on import with level

To get level1 or level2 records.

pica_each() with several consecutive wildcards doesn't work

I like to filter holding records with this fix:

do pica_each('2...')
    ....
end

and get this error:

Nested quantifiers in regex; marked by <-- HERE in m/0** <-- HERE */ at C:/Strawberry/perl/site/lib/PICA/Path.pm line 70,

build error

Build error for Perl 5.010001, see https://travis-ci.org/gbv/Catmandu-PICA/jobs/43096363 .

Dependency Dist::Milla v1.0.9 requieres Perl 5.012:

! Installing the dependencies failed: Your Perl (5.010001) is not in the range '5.012'
! Bailing out the installation for Dist-Milla-v1.0.9.

Second argument to pica_match should be optional

To check whether a field or subfield exists without looking at its value.

Tests may fail (with older PICA::Data?)

On some of my smokers t/04-exporter.t fails:

#   Failed test at t/04-exporter.t line 86.
#          got: '<?xml version="1.0" encoding="UTF-8"?>
# <collection xlmns="info:srw/schema/5/picaXML-v1.0">
# <record>
#   <datafield tag="003@">
#     <subfield code="0">1041318383</subfield>
#   </datafield>
#   <datafield tag="021A">
#     <subfield code="a">Hello $Â¥!</subfield>
#   </datafield>
# </record>
# <record>
#   <datafield tag="028C" occurrence="01">
#     <subfield code="d">Emma</subfield>
#     <subfield code="a">Goldman</subfield>
#   </datafield>
# </record>
# </collection>
# '
#     expected: '<?xml version="1.0" encoding="UTF-8"?>
# 
# <collection xmlns="info:srw/schema/5/picaXML-v1.0">
#   <record>
#     <datafield tag="003@">
#       <subfield code="0">1041318383</subfield>
#     </datafield>
#     <datafield tag="021A">
#       <subfield code="a">Hello $Â¥!</subfield>
#     </datafield>
#   </record>
#   <record>
#     <datafield tag="028C" occurrence="01">
#       <subfield code="d">Emma</subfield>
#       <subfield code="a">Goldman</subfield>
#     </datafield>
#   </record>
# </collection>
# '
# Looks like you failed 1 test of 4.
t/04-exporter.t ........ 
Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/4 subtests

Statistical analysis suggests that it passes only with PICA::Data 0.34 --- so probably the minimum prerequisite version should be adjusted.

****************************************************************
Regression 'mod:PICA::Data'
****************************************************************
Name           	       Theta	      StdErr	 T-stat
[0='const']    	      0.0000	      0.0000	  10.09
[1='eq_0.34']  	      1.0000	      0.0000	91646440343380832.00

R^2= 1.000, N= 63, K= 2
****************************************************************

Tests fail with PICA::Data 0.36

The t/10-validator.t test started to fail on my smoker systems:

#   Failed test at t/10-validator.t line 15.
#     Structures begin differing at:
#          $got->[0]{repeated} = Does not exist
#     $expected->[0]{repeated} = '1'

#   Failed test at t/10-validator.t line 15.
#     Structures begin differing at:
#          $got->[0]{message} = 'field is not repeatable'
#     $expected->[0]{message} = 'field 021A is not repeatable'

#   Failed test at t/10-validator.t line 15.
#     Structures begin differing at:
#          $got->[0]{message} = 'field is not repeatable'
#     $expected->[0]{message} = 'field 021A is not repeatable'
# Looks like you failed 3 tests of 10.
t/10-validator.t ....... 
Dubious, test returned 3 (wstat 768, 0x300)
Failed 3/10 subtests

Statistical analysis suggests that a change in PICA::Data 0.36 caused this problem (theta=1 means "bad"):

****************************************************************
Regression 'mod:PICA::Data'
****************************************************************
Name           	       Theta	      StdErr	 T-stat
[0='const']    	      1.0000	      0.0000	136769032566320544.00
[1='eq_0.36']  	     -1.0000	      0.0000	-45909609008771576.00

R^2= 1.000, N= 71, K= 2
****************************************************************

Fields with a 3-digit occurrence are skipped

Example file consist of one record in plain pica format:
chunk.txt

Running
catmandu convert -v PICA --type binary to PICA --type XML < chunk.txt > chunk.txt.xml

results in

WARNING: no valid PICA field structure "203@/100 �09011385967 ". Skipped field at /usr/local/share/perl/5.22.1/Catmandu/Importer/PICA.pm line 46. WARNING: no valid PICA field structure "206W/100 �014784343X�z41409 A ". Skipped field at /usr/local/share/perl/5.22.1/Catmandu/Importer/PICA.pm line 46. WARNING: no valid PICA field structure "208@/100 �a27-01-09�bbn ". Skipped field at /usr/local/share/perl/5.22.1/Catmandu/Importer/PICA.pm line 46. WARNING: no valid PICA field structure "209A/100 �b8112�f1409�a1409/L040/k�rschner�di�x00 ". Skipped field at /usr/local/share/perl/5.22.1/Catmandu/Importer/PICA.pm line 46. WARNING: no valid PICA field structure "209B/100 �aCC0�x78 ". Skipped field at /usr/local/share/perl/5.22.1/Catmandu/Importer/PICA.pm line 46. WARNING: no valid PICA field structure "231@/100 �d21�j2007 ". Skipped field at /usr/local/share/perl/5.22.1/Catmandu/Importer/PICA.pm line 46. WARNING: no valid PICA field structure "231B/100 �a21.2007 ". Skipped field at /usr/local/share/perl/5.22.1/Catmandu/Importer/PICA.pm line 46. WARNING: no valid PICA field structure "203@/101 �09011386169 ". Skipped field at /usr/local/share/perl/5.22.1/Catmandu/Importer/PICA.pm line 46. WARNING: no valid PICA field structure "206W/101 �0147843677�z40324 A ". Skipped field at /usr/local/share/perl/5.22.1/Catmandu/Importer/PICA.pm line 46. WARNING: no valid PICA field structure "208@/101 �a27-01-09�bbn ". Skipped field at /usr/local/share/perl/5.22.1/Catmandu/Importer/PICA.pm line 46. WARNING: no valid PICA field structure "209A/101 �b8112�f0324�a0324/Zi 108 Sch K�r(13)�di�x00 ". Skipped field at /usr/local/share/perl/5.22.1/Catmandu/Importer/PICA.pm line 46. WARNING: no valid PICA field structure "209B/101 �aCC0�x78 ". Skipped field at /usr/local/share/perl/5.22.1/Catmandu/Importer/PICA.pm line 46. WARNING: no valid PICA field structure "231@/101 �d13�j1980 ". Skipped field at /usr/local/share/perl/5.22.1/Catmandu/Importer/PICA.pm line 46. WARNING: no valid PICA field structure "231B/101 �a13.1980,1-3 ". Skipped field at /usr/local/share/perl/5.22.1/Catmandu/Importer/PICA.pm line 46. WARNING: no valid PICA field structure "203@/102 �0119169531X ". Skipped field at /usr/local/share/perl/5.22.1/Catmandu/Importer/PICA.pm line 46. WARNING: no valid PICA field structure "206W/102 �0190064153�z40312 A ". Skipped field at /usr/local/share/perl/5.22.1/Catmandu/Importer/PICA.pm line 46. WARNING: no valid PICA field structure "208@/102 �a14-10-10�bbn ". Skipped field at /usr/local/share/perl/5.22.1/Catmandu/Importer/PICA.pm line 46. WARNING: no valid PICA field structure "209A/102 �b8112�f0312�a0312/Aa 28�di�x00 ". Skipped field at /usr/local/share/perl/5.22.1/Catmandu/Importer/PICA.pm line 46. WARNING: no valid PICA field structure "209B/102 �aCC0�x78 ". Skipped field at /usr/local/share/perl/5.22.1/Catmandu/Importer/PICA.pm line 46. WARNING: no valid PICA field structure "231@/102 �d10�j1966�0�d11�j1970 ". Skipped field at /usr/local/share/perl/5.22.1/Catmandu/Importer/PICA.pm line 46. WARNING: no valid PICA field structure "231B/102 �a10.1966; 11.1970 ". Skipped field at /usr/local/share/perl/5.22.1/Catmandu/Importer/PICA.pm line 46. converted 1 object done

closing tag is missing in result of conversion

Seems like the closing tag </collection> is missing after conversion.

Example file:
test.txt

catmandu convert PICA --type binary to PICA --type xml < test.txt > result.txt

Result:
result.txt

Customize PICA::Writer::Plus format

Some use \N{LINE FEED} and some use \N{INFORMATION SEPARATOR THREE} as record separator, so this should be configurable.

pica_each should not remove fields

pica_each(PATH) should not reduce record to fields matching PATH.

pica_match() and reject() doesn't work as documented

Documenation:

# Delete all the 041A subject fields
do pica_each()
    if pica_match("041A",".*")
        reject()
    end
end

This deletes the whole record not only the subject fields.

fix travis.yml

delete Perl 5.12 and add 5.22+
use cpanm option --skip-installed instead of --skip-satisfied

Substrings in path expressions

See https://github.com/gbv/Catmandu-PICA/blob/master/t/07-data.t#L29

sequence of local (1...) and item (2...) data get mixed by occurence on conversion

If I convert a PICA-XML record with data on all three levels (0..., 1... and 2....) elements of level 1 and 2 get mixed:

catmandu convert PICA --type XML to PICA --type Plain < xml_testfile.xml

  ...
  <datafield tag="101@">
    <subfield code="a">11</subfield>
  </datafield>
  <datafield tag="101B">
    <subfield code="0">11-09-10</subfield>
    <subfield code="t">16:50:01.000</subfield>
  </datafield>
  <datafield tag="101C">
    <subfield code="0">11-09-96</subfield>
    <subfield code="b">0001/0000</subfield>
  </datafield>
  <datafield tag="101D">
    <subfield code="0">11-09-10</subfield>
    <subfield code="b">touchark</subfield>
    <subfield code="a">0001</subfield>
  </datafield>
  <datafield tag="101U">
    <subfield code="0">utf8</subfield>
  </datafield>
  <datafield tag="145S" occurrence="11">
    <subfield code="a">Nx 6589</subfield>
    <subfield code="b">Nx 06589</subfield>
    <subfield code="9">086974076</subfield>
    <subfield code="a">Nx 6301 - Nx 6769</subfield>
  </datafield>
  <datafield tag="147B" occurrence="05">
    <subfield code="a">Auch als Mikroform vorhanden</subfield>
  </datafield>
  <datafield tag="147B" occurrence="09">
    <subfield code="a">ab</subfield>
  </datafield>
  <datafield tag="147I">
    <subfield code="a">kaz</subfield>
  </datafield>
  <datafield tag="201@" occurrence="01">
    <subfield code="a">Benutzung nur im Lesesaal|Standort unbekannt, wenden Sie sich an die Theke</subfield>
    <subfield code="b">0</subfield>
    <subfield code="e">365403148</subfield>
    <subfield code="m">mon</subfield>
    <subfield code="f">1</subfield>
    <subfield code="u">Benutzung nur im Lesesaal</subfield>
    <subfield code="v">Standort unbekannt, wenden Sie sich an die Theke</subfield>
  </datafield>
  <datafield tag="201B" occurrence="01">
    <subfield code="0">14-04-15</subfield>
    <subfield code="t">10:19:08.000</subfield>
  </datafield>
  <datafield tag="201D" occurrence="01">
    <subfield code="0">14-04-15</subfield>
    <subfield code="b">cbs_703</subfield>
    <subfield code="a">1999</subfield>
  </datafield>
  <datafield tag="201F" occurrence="01">
    <subfield code="0">0</subfield>
  </datafield>
  <datafield tag="201U" occurrence="01">
    <subfield code="0">utf8</subfield>
  </datafield>
  <datafield tag="203@" occurrence="01">
    <subfield code="0">365403148</subfield>
  </datafield>
  <datafield tag="208@" occurrence="01">
    <subfield code="a">01-01-71</subfield>
    <subfield code="b">k</subfield>
  </datafield>
  <datafield tag="209A" occurrence="01">
    <subfield code="f">1</subfield>
    <subfield code="a">4"@Nx 6589</subfield>
    <subfield code="d">s</subfield>
    <subfield code="x">00</subfield>
  </datafield>
  <datafield tag="209B" occurrence="01">
    <subfield code="a">D</subfield>
    <subfield code="x">69</subfield>
  </datafield>
  <datafield tag="209C" occurrence="01">
    <subfield code="a">D1280-630</subfield>
    <subfield code="x">90</subfield>
  </datafield>
  <datafield tag="220B" occurrence="01">
    <subfield code="a">Autopsie</subfield>
  </datafield>
  <datafield tag="220B" occurrence="01">
    <subfield code="a">Revision 2006 IIC2.1</subfield>
  </datafield>
  <datafield tag="237A" occurrence="01">
    <subfield code="a">Nur für den Lesesaal unter Aufsicht</subfield>
    <subfield code="a">In der Fernleihe nur als Mikrofiche benutzbar</subfield>
    <subfield code="a">Die Titelbl. für T. 2 sind hinter den Titelbl. von T. 1 eingebunden</subfield>
  </datafield>

vs,

...
101@ $a11
101B $011-09-10$t16:50:01.000
101C $011-09-96$b0001/0000
101D $011-09-10$btouchark$a0001
101U $0utf8
147I $akaz
201@/01 $aBenutzung nur im Lesesaal|Standort unbekannt, wenden Sie sich an die Theke$b0$e365403148$mmon$f1$uBenutzung nur im Lesesaal$vStandort unbekannt, wenden Sie sich an die Theke
201B/01 $014-04-15$t10:19:08.000
201D/01 $014-04-15$bcbs_703$a1999
201F/01 $00
201U/01 $0utf8
203@/01 $0365403148
208@/01 $a01-01-71$bk
209A/01 $f1$a4"@Nx 6589$ds$x00
209B/01 $aD$x69
209C/01 $aD1280-630$x90
220B/01 $aAutopsie
220B/01 $aRevision 2006 IIC2.1
237A/01 $aNur für den Lesesaal unter Aufsicht$aIn der Fernleihe nur als Mikrofiche benutzbar$aDie Titelbl. für T. 2 sind hinter den Titelbl. von T. 1 eingebunden
147B/05 $aAuch als Mikroform vorhanden
147B/09 $aab
145S/11 $aNx 6589$bNx 06589$9086974076$aNx 6301 - Nx 6769

Seems to be a problem with sorting by occurrence.

Tests have started to fail for 1.09 after release of PICA-Data 2.06

Sample fail report: http://www.cpantesters.org/cpan/report/8a6fde46-a772-11ed-ac40-0d1af2d2463d

Statistical analysis on my smokers hints that PICA::Data 2.06 is likely blame candidate:

****************************************************************
Regression 'mod:PICA::Data'
****************************************************************
Name                   Theta          StdErr     T-stat
[0='const']           1.0000          0.0000    48942084208964240.00
[1='eq_2.04']         0.0000          0.0000       5.16
[2='eq_2.05']         0.0000          0.0000       5.19
[3='eq_2.06']        -1.0000          0.0000    -22338902943805908.00

R^2= 1.000, N= 63, K= 4
****************************************************************

Fix failing GitHub actions.

See https://github.com/gbv/Catmandu-PICA/runs/4364199276 - likely requires to fix LibreCat/Catmandu#385.

Support PICA record access methods independent from fixes

Catmandu::Fix::pica_map is useful for Catmandu fixes but processing in raw Perl requires some helper methods, e.g.:

use Catmandu -all;
my %mapping = (
    '010@a' => 'language',
    '003A$0', => 'dc.identifier'
);
importer('PICA', file =>"pica.xml", type=> "XML")->each(sub {
    my $hashref = $_[0];

    my $id = pica_field($hashref->{record}, '003@');
    my $date = pica_value($hashref->{record}, '001A$0');

    my $data = pica_mapping($hashref->{record}, \$mapping);
});

I refactored an independent function parse_pica_path to start with:

We could also bless the PICA record structure, e.g. to do:

$hashref->{record}->fields('1***'); # returns a list of arrays
$hashref->{record}->values('003@$0'); # returns a list of scalar values

In particular one should be able to easily filter holding fields, aggregated by level 1.

00-load.t doesn't work under Windows

Test results:

D:\Workspace\Dist\Catmandu-PICA>perl -TIlib ./t/00-load.t
Datei *.pm nicht gefunden
# Testing Catmandu::PICA , Perl 5.016001, C:\Programme\StrawBerry\perl\bin\perl.exe
1..0
# No tests run!

Add Catmandu::Validator::PICASchema

Based on PICA::Schema, to be implemented similar to Catmandu::Validator::JSONSchema.

Install fails on some system

see http://matrix.cpantesters.org/?dist=Catmandu-PICA+0.16.

Error:


#   Failed test 'use Catmandu::Importer::PICA;'
#   at t/00-load.t line 6.
#     Tried to use 'Catmandu::Importer::PICA'.
#     Error:  Can't locate PICA/Parser/XML.pm in @INC (@INC contains: /home/stro/cpan/build/5.12.4/build/Catmandu-PICA-0.16-8qUGDg/blib/lib /home/stro/cpan/build/5.12.4/build/Catmandu-PICA-0.16-8qUGDg/blib/arch /home/stro/perl/5.12.4/lib/site_perl/5.12.4/x86_64-linux /home/stro/perl/5.12.4/lib/site_perl/5.12.4 /home/stro/perl/5.12.4/lib/5.12.4/x86_64-linux /home/stro/perl/5.12.4/lib/5.12.4 .) at /home/stro/cpan/build/5.12.4/build/Catmandu-PICA-0.16-8qUGDg/blib/lib/Catmandu/Importer/PICA.pm line 6, <DATA> line 4.
# BEGIN failed--compilation aborted at /home/stro/cpan/build/5.12.4/build/Catmandu-PICA-0.16-8qUGDg/blib/lib/Catmandu/Importer/PICA.pm line 6, <DATA> line 4.
# Compilation failed in require at t/00-load.t line 6, <DATA> line 4.
# BEGIN failed--compilation aborted at t/00-load.t line 6, <DATA> line 4.

#   Failed test 'use Catmandu::Exporter::PICA;'
#   at t/00-load.t line 6.
#     Tried to use 'Catmandu::Exporter::PICA'.
#     Error:  Can't locate PICA/Writer/Plus.pm in @INC (@INC contains: /home/stro/cpan/build/5.12.4/build/Catmandu-PICA-0.16-8qUGDg/blib/lib /home/stro/cpan/build/5.12.4/build/Catmandu-PICA-0.16-8qUGDg/blib/arch /home/stro/perl/5.12.4/lib/site_perl/5.12.4/x86_64-linux /home/stro/perl/5.12.4/lib/site_perl/5.12.4 /home/stro/perl/5.12.4/lib/5.12.4/x86_64-linux /home/stro/perl/5.12.4/lib/5.12.4 .) at /home/stro/cpan/build/5.12.4/build/Catmandu-PICA-0.16-8qUGDg/blib/lib/Catmandu/Exporter/PICA.pm line 4, <DATA> line 4.
# BEGIN failed--compilation aborted at /home/stro/cpan/build/5.12.4/build/Catmandu-PICA-0.16-8qUGDg/blib/lib/Catmandu/Exporter/PICA.pm line 4, <DATA> line 4.
# Compilation failed in require at t/00-load.t line 6, <DATA> line 4.
# BEGIN failed--compilation aborted at t/00-load.t line 6, <DATA> line 4.

#   Failed test 'use Catmandu::Fix::pica_map;'
#   at t/00-load.t line 6.
#     Tried to use 'Catmandu::Fix::pica_map'.
#     Error:  Can't locate PICA/Path.pm in @INC (@INC contains: /home/stro/cpan/build/5.12.4/build/Catmandu-PICA-0.16-8qUGDg/blib/lib /home/stro/cpan/build/5.12.4/build/Catmandu-PICA-0.16-8qUGDg/blib/arch /home/stro/perl/5.12.4/lib/site_perl/5.12.4/x86_64-linux /home/stro/perl/5.12.4/lib/site_perl/5.12.4 /home/stro/perl/5.12.4/lib/5.12.4/x86_64-linux /home/stro/perl/5.12.4/lib/5.12.4 .) at /home/stro/cpan/build/5.12.4/build/Catmandu-PICA-0.16-8qUGDg/blib/lib/Catmandu/Fix/pica_map.pm line 9, <DATA> line 4.
# BEGIN failed--compilation aborted at /home/stro/cpan/build/5.12.4/build/Catmandu-PICA-0.16-8qUGDg/blib/lib/Catmandu/Fix/pica_map.pm line 9, <DATA> line 4.
# Compilation failed in require at t/00-load.t line 6, <DATA> line 4.
# BEGIN failed--compilation aborted at t/00-load.t line 6, <DATA> line 4.
# Testing Catmandu::PICA 0.16, Perl 5.012004, /home/stro/perl/5.12.4/bin/perl
# Looks like you failed 3 tests of 4.

PICA::Data is not installed although it´s defined as prerequisite in cpanfile and META.json.

add check for "Occurrence" to pica_map()

    if (defined $occurrence) {
        $perl .= "next if (!defined ${var}->[1] || ${var}->[1] ne '${occurrence}');";
    }

Add test.

Add more fixes to modify PICA+ records

Here path denotes a PICA Path (such as 003@$0), field a field reference (such as dc.creator), value a string value and contents a field content (e.g. $afoo$bbar).

pica_update(path, values|contents, [options]) replace or add subfields or fields (deprecates pica_add and pica_set)
pica_replace(path, pattern, value) to search and replace in PICA subfield values with regular expression
pica_subfields(code,value,code,value...) or pica_subfields(contents) to set subfields of current field(s)
pica_annotate([path,] annotation) to set annotation
pica_append([path,] value|contents) to add full field(s) or subfield(s) (maybe not needed)
pica_sort([path,] schedule) to sort and reduce to known (sub)fields
pica_reduce([path,] schedule) to reduce to known (sub)fields (?)

Existing fixes:

pica_add(field, path, [options]) : should be depreacted because of order of arguments
pica_set(field, path, [options]) : might be depreacted because of order of arguments. Alternative name?
pica_keep(path [, path...])
pica_remove([path])
pica_map(path, field)
pica_tag(tag)
pica_occurrence(occurrence)

t/03-fix.t started to fail (unicode issue?)

On my smoker systems the test suite started to fail:

#   Failed test 'fix with pluck'
#   at t/03-fix.t line 25.
Wide character in print at /usr/perl5.31.11p/lib/5.31.11/Test2/Formatter/TAP.pm line 125.
#          got: '160.45 â�¬36420368139783642036811'
#     expected: '160.45 €36420368139783642036811'

#   Failed test 'fix record'
#   at t/03-fix.t line 27.
Wide character in print at /usr/perl5.31.11p/lib/5.31.11/Test2/Formatter/TAP.pm line 125.
#     Structures begin differing at:
#          $got->{price} = '160.45 â�¬364205076X9783642050763'
#     $expected->{price} = '160.45 €364205076X9783642050763'
# Looks like you failed 2 tests of 6.
t/03-fix.t ................. 
Dubious, test returned 2 (wstat 512, 0x200)
Failed 2/6 subtests

Statistical analysis suggests that this started to fail with PICA::Data 1.08 (@nichtich : FYI):

****************************************************************
Regression 'mod:PICA::Data'
****************************************************************
Name           	       Theta	      StdErr	 T-stat
[0='const']    	      1.0000	      0.0000	196853234403964704.00
[1='eq_1.08']  	     -1.0000	      0.0000	-64900589445389040.00

R^2= 1.000, N= 92, K= 2
****************************************************************

Fix / PICA method to get holding records

Holding records in PICA are aggregated by level 1 fields. There should be a pica_holdings method, similar to https://metacpan.org/pod/PICA::Record#holdings-iln.

Depends on gbv/PICA-Data#1

Fix travis build for 5.10

see issue #12

Add a pica_each binder?

See jorol/Catmandu-MAB2#3.

Add fixes for pica_each and pica_match.

Replacement of angle brackets

When converting from plain pica to pica xml, Catmandu replaces < with &lt;; i think it should be < instead.

Example file (one record in plain pica format)
angle-bracket.txt

Command
catmandu convert PICA --type binary to PICA --type XML < angle-bracket.txt > angle-bracket.xml

The resulting xml is not valid according to xmllint, see field 021A subfield a.
xmllint angle-bracket.xml

fixes pica_set and pica_add to create PICA

# Copy from field dc.identifier to pica field 003@ subfield 0, replace first existing field
pica_set('dc.identifier','003A0');

# Copy from field dc.identifier to pica field 010@ in an additional new subfield 0
pica_add('dc.language','010@a');

Double \x1D added when format is "normalized"

Hello,
I'm using your Perl module to convert Marc21 files into Pica+Files. We use the normalized format as this is directly supported by our library system.

#!/usr/bin/perl

use strict;
use utf8;
use warnings;

use PICA::Record;
use PICA::Writer;
use PICA::Field;

use Encode qw(encode decode);

my $writer =  PICA::Writer->new('tests/out.pica', format => 'normalized');
my $field =   new PICA::Field('021A');
my $record = new PICA::Record();

$field->add('a', 'Foo');
$field->add('d', 'Bar');

$record->appendif($field);

$writer->write('', $record);
$writer->write('', $record);
$writer->write('', $record);

$writer->end();

print "Pica file written";
1;

Produces the output:

�
�021A �aFoo�dBar
�
�
�021A �aFoo�dBar
�
�
�021A �aFoo�dBar

These rectangles between the records are \1xD (record separator) chars. CBS (library system) has problems if there are two \x1D and we need to remove one of them.

add tests for Catmandu::Exporter::PICA and different PICA formats

pica_remove: make path optional

To remove all current fields

PICA::Parser::Plus - leader length

Leader length seems not to be standardized between different data providers. So check for optional leader could not be done via length and missing subfields.

Implement PICA::Writer::XML and PICA::Writer::Plus

Simplify internal PICA format

From

[$tag, $occurrence, '_', '', @subfields]

[$tag, $occurrence, @subfields]

this will break backwards compatibility.

Tests fail (with Catmandu 0.9503?)

t/07-pica-each.t fails, probably only if Catmandu 0.9503 is installed:

#   Failed test 'created is_ger tag'
#   at t/07-pica-each.t line 36.
#          got: undef
#     expected: 'true'

#   Failed test 'fields 045. deleted'
#   at t/07-pica-each.t line 37.
# Looks like you failed 2 tests of 5.
t/07-pica-each.t ........ 
Dubious, test returned 2 (wstat 512, 0x200)
Failed 2/5 subtests

Refactoring the parsers out of the code

see LibreCat/Catmandu-MARC@344e9f2

pica_map error

i try to use pica_map, but I get the following error when I execute the following command:

catmandu convert JSON --fix 'pica_map(021A, dc_title)' < test.json

Oops! Tried to execute the fix 'pica_map' but can't find Catmandu::Fix::pica_map on your system.
Error: No such fix package: pica_map
Package name: Catmandu::Fix::pica_map
Fix name: pica_map
Source:
	pica_map(021A, dc_title)

When I look into the install moduls via catmandu info, it says:
Catmandu::Fix::pica_map | 1.03 | copy pica values of one field to a new field

Catmandu version:
catmandu (Catmandu::CLI) version 1.2015 (/usr/bin/catmandu)

I also use e.g. pica_add() which works.

Cheers
Michel

Add Fix command pica_remove

Like https://metacpan.org/pod/Catmandu::Fix::marc_remove

Add support for normalized PICA+ format

The GBV provides data in a "normalized" variant of the PICA+ plain format, have a look at the documentation https://verbundwiki.gbv.de/pages/viewpage.action?pageId=40009828 and the attached file test.txt

It would be nice to do a conversion without any replacing of characters upfront, eg. by

catmandu convert PICA --type normalized to PICA --type xml < test.txt

Breaking change in PICA Path expression language with PICA::Data 2.0

PICA::Data 2.0 will introduce a breaking change in PICA Path expression language (see gbv/PICA-Data#109 (comment)). The changes only affect some special cases (but the this depends on use case!). Current unit tests of Catmandu::PICA don't cover these cases, so Catmandu::PICA will install with PICA::Data 2.0 once the latter has been released but existing installation will work with the old syntax at same version of Catmandu::PICA. We may create a new release of Catmandu::PICA with PICA::Data 2.0 as requirement.

Extend pica_each to loop over selected fields only

do pica_each('1...')
   # process level 1 fields
end

pull request against jorol/Catmandu-PICA

There is a pull request to my repository "jorol/Catmandu-PICA". I tried to "redirect" it to our main repository "gbv/Catmandu-PICA", but this failed (see #21).

What is a proper way to merge these changes in "gbv/Catmandu-PICA"?

What could be done to avoid such situations in the future? I think the commit history of my master repository is broken (I merged the changes of "gbv/Catmandu-PICA" back in my master to keep it up to date) ...