gbv / catmandu-pica Goto Github PK
View Code? Open in Web Editor NEWCatmandu modules for working with PICA+ data
Home Page: https://metacpan.org/release/Catmandu-PICA
License: Other
Catmandu modules for working with PICA+ data
Home Page: https://metacpan.org/release/Catmandu-PICA
License: Other
Test forXML importer fails on some Windows system, see cpantesters.
t/02-importer.t .........
Dubious, test returned 255 (wstat 65280, 0xff00)
No subtests run
Read more bytes than requested. Do you use an encoding-related PerlIO layer? at C:\STRAWB~1\cpan\build\PICA-Data-0.26-vHTK4a/blib/lib/PICA/Parser/XML.pm line 46.
Importing a plain PICA record with Windows line endings (CR LF) gives me something like:
{
"_id" : "67890\r",
"record" : [
[
"003@",
"",
"0",
"67890\r"
]
]
}]
with spare \r
. Are PICA records expected to have Unix line endings?
Our CBS uses a pica+ format with delimiters \x1d, \x1e, \x1f which is still human readable, i.e. with line breaks after each field. Catmandu seems not to be able to produce this format. We use this format for many years because it can be directly loaded into CBS but is still human readable which makes it easier to find errors.
... to copys subfields into the result hash in the order of the mapping.
See https://metacpan.org/source/HOCHSTEN/Catmandu-MARC-0.211/lib/Catmandu/Fix/marc_map.pm
This should result in a PICA Patch record that modifies 003A$0 and adds 004F$af:
do pica_patch
pica_set('dc.identifier', '003A$0')
pica_add('dc.subjects', '004F$af')
end
$ echo '{"title":"code4lib"}' | catmandu convert JSON to PICA --type Plain --fix 'pica_add(title,021A$a)'
PICA record must be array reference
$ echo '{"title":"code4lib"}' | catmandu convert JSON to PICA --type Plain --fix 'pica_add(title,021A$a,record:"pica")'
PICA record must be array reference
$echo '{"title":"code4lib"}' | catmandu convert JSON to YAML --fix 'pica_add(title,021A$a)'
---
title: code4lib
...
Works if key is defined beforehand:
echo '{"title":"code4lib"}' | catmandu convert JSON to PICA --type Plain --fix 'set_array(record);pica_add(title,021A$a)'
021A $acode4lib
$ dzil smoke --release --author
...
Can't call method "content" on an undefined value at /home/travis/perl5/perlbrew/perls/5.20/lib/site_perl/5.20.0/Dist/Zilla/Plugin/MakeMaker.pm line 291.
I just came across a problem with character ranges below 0x20, which are present in initial plain pica files. Catmandu perfectly converts them to pica xml, but by doing so Catmandu actually generates invalid xml files, which cannot be parsed by xsl processors.
Example file (one record in plain pica, field 233P contains the unicode character 0x2)
record.txt
Command
catmandu convert -v PICA --type binary to PICA --type XML < record.txt > record.xml
The resulting xml cannot be transformed (e.g. to solr xml) via xmlstarlet, xsltproc or Saxon HE. For now i delete all characters which are not allowed in xml according to the specification by using
tr -d '\000-\010\013\014\016-\037'
I wonder if Catmandu should handle this problem and only generate valid xml. What do you think?
To get level1 or level2 records.
I like to filter holding records with this fix:
do pica_each('2...')
....
end
and get this error:
Nested quantifiers in regex; marked by <-- HERE in m/0** <-- HERE */ at C:/Strawberry/perl/site/lib/PICA/Path.pm line 70,
Build error for Perl 5.010001, see https://travis-ci.org/gbv/Catmandu-PICA/jobs/43096363 .
Dependency Dist::Milla v1.0.9 requieres Perl 5.012:
! Installing the dependencies failed: Your Perl (5.010001) is not in the range '5.012'
! Bailing out the installation for Dist-Milla-v1.0.9.
To check whether a field or subfield exists without looking at its value.
On some of my smokers t/04-exporter.t fails:
# Failed test at t/04-exporter.t line 86.
# got: '<?xml version="1.0" encoding="UTF-8"?>
# <collection xlmns="info:srw/schema/5/picaXML-v1.0">
# <record>
# <datafield tag="003@">
# <subfield code="0">1041318383</subfield>
# </datafield>
# <datafield tag="021A">
# <subfield code="a">Hello $Â¥!</subfield>
# </datafield>
# </record>
# <record>
# <datafield tag="028C" occurrence="01">
# <subfield code="d">Emma</subfield>
# <subfield code="a">Goldman</subfield>
# </datafield>
# </record>
# </collection>
# '
# expected: '<?xml version="1.0" encoding="UTF-8"?>
#
# <collection xmlns="info:srw/schema/5/picaXML-v1.0">
# <record>
# <datafield tag="003@">
# <subfield code="0">1041318383</subfield>
# </datafield>
# <datafield tag="021A">
# <subfield code="a">Hello $Â¥!</subfield>
# </datafield>
# </record>
# <record>
# <datafield tag="028C" occurrence="01">
# <subfield code="d">Emma</subfield>
# <subfield code="a">Goldman</subfield>
# </datafield>
# </record>
# </collection>
# '
# Looks like you failed 1 test of 4.
t/04-exporter.t ........
Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/4 subtests
Statistical analysis suggests that it passes only with PICA::Data 0.34 --- so probably the minimum prerequisite version should be adjusted.
****************************************************************
Regression 'mod:PICA::Data'
****************************************************************
Name Theta StdErr T-stat
[0='const'] 0.0000 0.0000 10.09
[1='eq_0.34'] 1.0000 0.0000 91646440343380832.00
R^2= 1.000, N= 63, K= 2
****************************************************************
The t/10-validator.t test started to fail on my smoker systems:
# Failed test at t/10-validator.t line 15.
# Structures begin differing at:
# $got->[0]{repeated} = Does not exist
# $expected->[0]{repeated} = '1'
# Failed test at t/10-validator.t line 15.
# Structures begin differing at:
# $got->[0]{message} = 'field is not repeatable'
# $expected->[0]{message} = 'field 021A is not repeatable'
# Failed test at t/10-validator.t line 15.
# Structures begin differing at:
# $got->[0]{message} = 'field is not repeatable'
# $expected->[0]{message} = 'field 021A is not repeatable'
# Looks like you failed 3 tests of 10.
t/10-validator.t .......
Dubious, test returned 3 (wstat 768, 0x300)
Failed 3/10 subtests
Statistical analysis suggests that a change in PICA::Data 0.36 caused this problem (theta=1 means "bad"):
****************************************************************
Regression 'mod:PICA::Data'
****************************************************************
Name Theta StdErr T-stat
[0='const'] 1.0000 0.0000 136769032566320544.00
[1='eq_0.36'] -1.0000 0.0000 -45909609008771576.00
R^2= 1.000, N= 71, K= 2
****************************************************************
Example file consist of one record in plain pica format:
chunk.txt
Running
catmandu convert -v PICA --type binary to PICA --type XML < chunk.txt > chunk.txt.xml
results in
WARNING: no valid PICA field structure "203@/100 �09011385967 ". Skipped field at /usr/local/share/perl/5.22.1/Catmandu/Importer/PICA.pm line 46. WARNING: no valid PICA field structure "206W/100 �014784343X�z41409 A ". Skipped field at /usr/local/share/perl/5.22.1/Catmandu/Importer/PICA.pm line 46. WARNING: no valid PICA field structure "208@/100 �a27-01-09�bbn ". Skipped field at /usr/local/share/perl/5.22.1/Catmandu/Importer/PICA.pm line 46. WARNING: no valid PICA field structure "209A/100 �b8112�f1409�a1409/L040/k�rschner�di�x00 ". Skipped field at /usr/local/share/perl/5.22.1/Catmandu/Importer/PICA.pm line 46. WARNING: no valid PICA field structure "209B/100 �aCC0�x78 ". Skipped field at /usr/local/share/perl/5.22.1/Catmandu/Importer/PICA.pm line 46. WARNING: no valid PICA field structure "231@/100 �d21�j2007 ". Skipped field at /usr/local/share/perl/5.22.1/Catmandu/Importer/PICA.pm line 46. WARNING: no valid PICA field structure "231B/100 �a21.2007 ". Skipped field at /usr/local/share/perl/5.22.1/Catmandu/Importer/PICA.pm line 46. WARNING: no valid PICA field structure "203@/101 �09011386169 ". Skipped field at /usr/local/share/perl/5.22.1/Catmandu/Importer/PICA.pm line 46. WARNING: no valid PICA field structure "206W/101 �0147843677�z40324 A ". Skipped field at /usr/local/share/perl/5.22.1/Catmandu/Importer/PICA.pm line 46. WARNING: no valid PICA field structure "208@/101 �a27-01-09�bbn ". Skipped field at /usr/local/share/perl/5.22.1/Catmandu/Importer/PICA.pm line 46. WARNING: no valid PICA field structure "209A/101 �b8112�f0324�a0324/Zi 108 Sch K�r(13)�di�x00 ". Skipped field at /usr/local/share/perl/5.22.1/Catmandu/Importer/PICA.pm line 46. WARNING: no valid PICA field structure "209B/101 �aCC0�x78 ". Skipped field at /usr/local/share/perl/5.22.1/Catmandu/Importer/PICA.pm line 46. WARNING: no valid PICA field structure "231@/101 �d13�j1980 ". Skipped field at /usr/local/share/perl/5.22.1/Catmandu/Importer/PICA.pm line 46. WARNING: no valid PICA field structure "231B/101 �a13.1980,1-3 ". Skipped field at /usr/local/share/perl/5.22.1/Catmandu/Importer/PICA.pm line 46. WARNING: no valid PICA field structure "203@/102 �0119169531X ". Skipped field at /usr/local/share/perl/5.22.1/Catmandu/Importer/PICA.pm line 46. WARNING: no valid PICA field structure "206W/102 �0190064153�z40312 A ". Skipped field at /usr/local/share/perl/5.22.1/Catmandu/Importer/PICA.pm line 46. WARNING: no valid PICA field structure "208@/102 �a14-10-10�bbn ". Skipped field at /usr/local/share/perl/5.22.1/Catmandu/Importer/PICA.pm line 46. WARNING: no valid PICA field structure "209A/102 �b8112�f0312�a0312/Aa 28�di�x00 ". Skipped field at /usr/local/share/perl/5.22.1/Catmandu/Importer/PICA.pm line 46. WARNING: no valid PICA field structure "209B/102 �aCC0�x78 ". Skipped field at /usr/local/share/perl/5.22.1/Catmandu/Importer/PICA.pm line 46. WARNING: no valid PICA field structure "231@/102 �d10�j1966�0�d11�j1970 ". Skipped field at /usr/local/share/perl/5.22.1/Catmandu/Importer/PICA.pm line 46. WARNING: no valid PICA field structure "231B/102 �a10.1966; 11.1970 ". Skipped field at /usr/local/share/perl/5.22.1/Catmandu/Importer/PICA.pm line 46. converted 1 object done
Seems like the closing tag </collection>
is missing after conversion.
Example file:
test.txt
catmandu convert PICA --type binary to PICA --type xml < test.txt > result.txt
Result:
result.txt
Some use \N{LINE FEED} and some use \N{INFORMATION SEPARATOR THREE} as record separator, so this should be configurable.
pica_each(PATH)
should not reduce record to fields matching PATH
.
Documenation:
# Delete all the 041A subject fields
do pica_each()
if pica_match("041A",".*")
reject()
end
end
This deletes the whole record not only the subject fields.
--skip-installed
instead of --skip-satisfied
If I convert a PICA-XML record with data on all three levels (0..., 1... and 2....) elements of level 1 and 2 get mixed:
catmandu convert PICA --type XML to PICA --type Plain < xml_testfile.xml
...
<datafield tag="101@">
<subfield code="a">11</subfield>
</datafield>
<datafield tag="101B">
<subfield code="0">11-09-10</subfield>
<subfield code="t">16:50:01.000</subfield>
</datafield>
<datafield tag="101C">
<subfield code="0">11-09-96</subfield>
<subfield code="b">0001/0000</subfield>
</datafield>
<datafield tag="101D">
<subfield code="0">11-09-10</subfield>
<subfield code="b">touchark</subfield>
<subfield code="a">0001</subfield>
</datafield>
<datafield tag="101U">
<subfield code="0">utf8</subfield>
</datafield>
<datafield tag="145S" occurrence="11">
<subfield code="a">Nx 6589</subfield>
<subfield code="b">Nx 06589</subfield>
<subfield code="9">086974076</subfield>
<subfield code="a">Nx 6301 - Nx 6769</subfield>
</datafield>
<datafield tag="147B" occurrence="05">
<subfield code="a">Auch als Mikroform vorhanden</subfield>
</datafield>
<datafield tag="147B" occurrence="09">
<subfield code="a">ab</subfield>
</datafield>
<datafield tag="147I">
<subfield code="a">kaz</subfield>
</datafield>
<datafield tag="201@" occurrence="01">
<subfield code="a">Benutzung nur im Lesesaal|Standort unbekannt, wenden Sie sich an die Theke</subfield>
<subfield code="b">0</subfield>
<subfield code="e">365403148</subfield>
<subfield code="m">mon</subfield>
<subfield code="f">1</subfield>
<subfield code="u">Benutzung nur im Lesesaal</subfield>
<subfield code="v">Standort unbekannt, wenden Sie sich an die Theke</subfield>
</datafield>
<datafield tag="201B" occurrence="01">
<subfield code="0">14-04-15</subfield>
<subfield code="t">10:19:08.000</subfield>
</datafield>
<datafield tag="201D" occurrence="01">
<subfield code="0">14-04-15</subfield>
<subfield code="b">cbs_703</subfield>
<subfield code="a">1999</subfield>
</datafield>
<datafield tag="201F" occurrence="01">
<subfield code="0">0</subfield>
</datafield>
<datafield tag="201U" occurrence="01">
<subfield code="0">utf8</subfield>
</datafield>
<datafield tag="203@" occurrence="01">
<subfield code="0">365403148</subfield>
</datafield>
<datafield tag="208@" occurrence="01">
<subfield code="a">01-01-71</subfield>
<subfield code="b">k</subfield>
</datafield>
<datafield tag="209A" occurrence="01">
<subfield code="f">1</subfield>
<subfield code="a">4"@Nx 6589</subfield>
<subfield code="d">s</subfield>
<subfield code="x">00</subfield>
</datafield>
<datafield tag="209B" occurrence="01">
<subfield code="a">D</subfield>
<subfield code="x">69</subfield>
</datafield>
<datafield tag="209C" occurrence="01">
<subfield code="a">D1280-630</subfield>
<subfield code="x">90</subfield>
</datafield>
<datafield tag="220B" occurrence="01">
<subfield code="a">Autopsie</subfield>
</datafield>
<datafield tag="220B" occurrence="01">
<subfield code="a">Revision 2006 IIC2.1</subfield>
</datafield>
<datafield tag="237A" occurrence="01">
<subfield code="a">Nur für den Lesesaal unter Aufsicht</subfield>
<subfield code="a">In der Fernleihe nur als Mikrofiche benutzbar</subfield>
<subfield code="a">Die Titelbl. für T. 2 sind hinter den Titelbl. von T. 1 eingebunden</subfield>
</datafield>
vs,
...
101@ $a11
101B $011-09-10$t16:50:01.000
101C $011-09-96$b0001/0000
101D $011-09-10$btouchark$a0001
101U $0utf8
147I $akaz
201@/01 $aBenutzung nur im Lesesaal|Standort unbekannt, wenden Sie sich an die Theke$b0$e365403148$mmon$f1$uBenutzung nur im Lesesaal$vStandort unbekannt, wenden Sie sich an die Theke
201B/01 $014-04-15$t10:19:08.000
201D/01 $014-04-15$bcbs_703$a1999
201F/01 $00
201U/01 $0utf8
203@/01 $0365403148
208@/01 $a01-01-71$bk
209A/01 $f1$a4"@Nx 6589$ds$x00
209B/01 $aD$x69
209C/01 $aD1280-630$x90
220B/01 $aAutopsie
220B/01 $aRevision 2006 IIC2.1
237A/01 $aNur für den Lesesaal unter Aufsicht$aIn der Fernleihe nur als Mikrofiche benutzbar$aDie Titelbl. für T. 2 sind hinter den Titelbl. von T. 1 eingebunden
147B/05 $aAuch als Mikroform vorhanden
147B/09 $aab
145S/11 $aNx 6589$bNx 06589$9086974076$aNx 6301 - Nx 6769
Seems to be a problem with sorting by occurrence.
Sample fail report: http://www.cpantesters.org/cpan/report/8a6fde46-a772-11ed-ac40-0d1af2d2463d
Statistical analysis on my smokers hints that PICA::Data 2.06 is likely blame candidate:
****************************************************************
Regression 'mod:PICA::Data'
****************************************************************
Name Theta StdErr T-stat
[0='const'] 1.0000 0.0000 48942084208964240.00
[1='eq_2.04'] 0.0000 0.0000 5.16
[2='eq_2.05'] 0.0000 0.0000 5.19
[3='eq_2.06'] -1.0000 0.0000 -22338902943805908.00
R^2= 1.000, N= 63, K= 4
****************************************************************
See https://github.com/gbv/Catmandu-PICA/runs/4364199276 - likely requires to fix LibreCat/Catmandu#385.
Catmandu::Fix::pica_map is useful for Catmandu fixes but processing in raw Perl requires some helper methods, e.g.:
use Catmandu -all;
my %mapping = (
'010@a' => 'language',
'003A$0', => 'dc.identifier'
);
importer('PICA', file =>"pica.xml", type=> "XML")->each(sub {
my $hashref = $_[0];
my $id = pica_field($hashref->{record}, '003@');
my $date = pica_value($hashref->{record}, '001A$0');
my $data = pica_mapping($hashref->{record}, \$mapping);
});
I refactored an independent function parse_pica_path to start with:
We could also bless the PICA record structure, e.g. to do:
$hashref->{record}->fields('1***'); # returns a list of arrays
$hashref->{record}->values('003@$0'); # returns a list of scalar values
In particular one should be able to easily filter holding fields, aggregated by level 1.
Test results:
D:\Workspace\Dist\Catmandu-PICA>perl -TIlib ./t/00-load.t
Datei *.pm nicht gefunden
# Testing Catmandu::PICA , Perl 5.016001, C:\Programme\StrawBerry\perl\bin\perl.exe
1..0
# No tests run!
Based on PICA::Schema, to be implemented similar to Catmandu::Validator::JSONSchema.
see http://matrix.cpantesters.org/?dist=Catmandu-PICA+0.16.
Error:
# Failed test 'use Catmandu::Importer::PICA;'
# at t/00-load.t line 6.
# Tried to use 'Catmandu::Importer::PICA'.
# Error: Can't locate PICA/Parser/XML.pm in @INC (@INC contains: /home/stro/cpan/build/5.12.4/build/Catmandu-PICA-0.16-8qUGDg/blib/lib /home/stro/cpan/build/5.12.4/build/Catmandu-PICA-0.16-8qUGDg/blib/arch /home/stro/perl/5.12.4/lib/site_perl/5.12.4/x86_64-linux /home/stro/perl/5.12.4/lib/site_perl/5.12.4 /home/stro/perl/5.12.4/lib/5.12.4/x86_64-linux /home/stro/perl/5.12.4/lib/5.12.4 .) at /home/stro/cpan/build/5.12.4/build/Catmandu-PICA-0.16-8qUGDg/blib/lib/Catmandu/Importer/PICA.pm line 6, <DATA> line 4.
# BEGIN failed--compilation aborted at /home/stro/cpan/build/5.12.4/build/Catmandu-PICA-0.16-8qUGDg/blib/lib/Catmandu/Importer/PICA.pm line 6, <DATA> line 4.
# Compilation failed in require at t/00-load.t line 6, <DATA> line 4.
# BEGIN failed--compilation aborted at t/00-load.t line 6, <DATA> line 4.
# Failed test 'use Catmandu::Exporter::PICA;'
# at t/00-load.t line 6.
# Tried to use 'Catmandu::Exporter::PICA'.
# Error: Can't locate PICA/Writer/Plus.pm in @INC (@INC contains: /home/stro/cpan/build/5.12.4/build/Catmandu-PICA-0.16-8qUGDg/blib/lib /home/stro/cpan/build/5.12.4/build/Catmandu-PICA-0.16-8qUGDg/blib/arch /home/stro/perl/5.12.4/lib/site_perl/5.12.4/x86_64-linux /home/stro/perl/5.12.4/lib/site_perl/5.12.4 /home/stro/perl/5.12.4/lib/5.12.4/x86_64-linux /home/stro/perl/5.12.4/lib/5.12.4 .) at /home/stro/cpan/build/5.12.4/build/Catmandu-PICA-0.16-8qUGDg/blib/lib/Catmandu/Exporter/PICA.pm line 4, <DATA> line 4.
# BEGIN failed--compilation aborted at /home/stro/cpan/build/5.12.4/build/Catmandu-PICA-0.16-8qUGDg/blib/lib/Catmandu/Exporter/PICA.pm line 4, <DATA> line 4.
# Compilation failed in require at t/00-load.t line 6, <DATA> line 4.
# BEGIN failed--compilation aborted at t/00-load.t line 6, <DATA> line 4.
# Failed test 'use Catmandu::Fix::pica_map;'
# at t/00-load.t line 6.
# Tried to use 'Catmandu::Fix::pica_map'.
# Error: Can't locate PICA/Path.pm in @INC (@INC contains: /home/stro/cpan/build/5.12.4/build/Catmandu-PICA-0.16-8qUGDg/blib/lib /home/stro/cpan/build/5.12.4/build/Catmandu-PICA-0.16-8qUGDg/blib/arch /home/stro/perl/5.12.4/lib/site_perl/5.12.4/x86_64-linux /home/stro/perl/5.12.4/lib/site_perl/5.12.4 /home/stro/perl/5.12.4/lib/5.12.4/x86_64-linux /home/stro/perl/5.12.4/lib/5.12.4 .) at /home/stro/cpan/build/5.12.4/build/Catmandu-PICA-0.16-8qUGDg/blib/lib/Catmandu/Fix/pica_map.pm line 9, <DATA> line 4.
# BEGIN failed--compilation aborted at /home/stro/cpan/build/5.12.4/build/Catmandu-PICA-0.16-8qUGDg/blib/lib/Catmandu/Fix/pica_map.pm line 9, <DATA> line 4.
# Compilation failed in require at t/00-load.t line 6, <DATA> line 4.
# BEGIN failed--compilation aborted at t/00-load.t line 6, <DATA> line 4.
# Testing Catmandu::PICA 0.16, Perl 5.012004, /home/stro/perl/5.12.4/bin/perl
# Looks like you failed 3 tests of 4.
PICA::Data is not installed although it´s defined as prerequisite in cpanfile and META.json.
if (defined $occurrence) {
$perl .= "next if (!defined ${var}->[1] || ${var}->[1] ne '${occurrence}');";
}
Add test.
Here path
denotes a PICA Path (such as 003@$0
), field
a field reference (such as dc.creator
), value
a string value and contents
a field content (e.g. $afoo$bbar
).
pica_update(path, values|contents, [options])
replace or add subfields or fields (deprecates pica_add
and pica_set
)pica_replace(path, pattern, value)
to search and replace in PICA subfield values with regular expressionpica_subfields(code,value,code,value...)
or pica_subfields(contents)
to set subfields of current field(s)pica_annotate([path,] annotation)
to set annotationpica_append([path,] value|contents)
to add full field(s) or subfield(s) (maybe not needed)pica_sort([path,] schedule)
to sort and reduce to known (sub)fieldspica_reduce([path,] schedule)
to reduce to known (sub)fields (?)Existing fixes:
pica_add(field, path, [options])
: should be depreacted because of order of argumentspica_set(field, path, [options])
: might be depreacted because of order of arguments. Alternative name?pica_keep(path [, path...])
pica_remove([path])
pica_map(path, field)
pica_tag(tag)
On my smoker systems the test suite started to fail:
# Failed test 'fix with pluck'
# at t/03-fix.t line 25.
Wide character in print at /usr/perl5.31.11p/lib/5.31.11/Test2/Formatter/TAP.pm line 125.
# got: '160.45 �36420368139783642036811'
# expected: '160.45 €36420368139783642036811'
# Failed test 'fix record'
# at t/03-fix.t line 27.
Wide character in print at /usr/perl5.31.11p/lib/5.31.11/Test2/Formatter/TAP.pm line 125.
# Structures begin differing at:
# $got->{price} = '160.45 �364205076X9783642050763'
# $expected->{price} = '160.45 €364205076X9783642050763'
# Looks like you failed 2 tests of 6.
t/03-fix.t .................
Dubious, test returned 2 (wstat 512, 0x200)
Failed 2/6 subtests
(See also http://fast-matrix.cpantesters.org/?dist=Catmandu-PICA%201.02;reports=1#sl=7,1 )
Statistical analysis suggests that this started to fail with PICA::Data 1.08 (@nichtich : FYI):
****************************************************************
Regression 'mod:PICA::Data'
****************************************************************
Name Theta StdErr T-stat
[0='const'] 1.0000 0.0000 196853234403964704.00
[1='eq_1.08'] -1.0000 0.0000 -64900589445389040.00
R^2= 1.000, N= 92, K= 2
****************************************************************
Holding records in PICA are aggregated by level 1 fields. There should be a pica_holdings
method, similar to https://metacpan.org/pod/PICA::Record#holdings-iln.
Depends on gbv/PICA-Data#1
see issue #12
Add fixes for pica_each and pica_match.
When converting from plain pica to pica xml, Catmandu replaces <
with &lt;
; i think it should be <
instead.
Example file (one record in plain pica format)
angle-bracket.txt
Command
catmandu convert PICA --type binary to PICA --type XML < angle-bracket.txt > angle-bracket.xml
The resulting xml is not valid according to xmllint, see field 021A subfield a.
xmllint angle-bracket.xml
# Copy from field dc.identifier to pica field 003@ subfield 0, replace first existing field
pica_set('dc.identifier','003A0');
# Copy from field dc.identifier to pica field 010@ in an additional new subfield 0
pica_add('dc.language','010@a');
Hello,
I'm using your Perl module to convert Marc21 files into Pica+Files. We use the normalized format as this is directly supported by our library system.
#!/usr/bin/perl
use strict;
use utf8;
use warnings;
use PICA::Record;
use PICA::Writer;
use PICA::Field;
use Encode qw(encode decode);
my $writer = PICA::Writer->new('tests/out.pica', format => 'normalized');
my $field = new PICA::Field('021A');
my $record = new PICA::Record();
$field->add('a', 'Foo');
$field->add('d', 'Bar');
$record->appendif($field);
$writer->write('', $record);
$writer->write('', $record);
$writer->write('', $record);
$writer->end();
print "Pica file written";
1;
Produces the output:
�
�021A �aFoo�dBar
�
�
�021A �aFoo�dBar
�
�
�021A �aFoo�dBar
These rectangles between the records are \1xD (record separator) chars. CBS (library system) has problems if there are two \x1D and we need to remove one of them.
To remove all current fields
Leader length seems not to be standardized between different data providers. So check for optional leader could not be done via length and missing subfields.
From
[$tag, $occurrence, '_', '', @subfields]
to
[$tag, $occurrence, @subfields]
this will break backwards compatibility.
t/07-pica-each.t fails, probably only if Catmandu 0.9503 is installed:
# Failed test 'created is_ger tag'
# at t/07-pica-each.t line 36.
# got: undef
# expected: 'true'
# Failed test 'fields 045. deleted'
# at t/07-pica-each.t line 37.
# Looks like you failed 2 tests of 5.
t/07-pica-each.t ........
Dubious, test returned 2 (wstat 512, 0x200)
Failed 2/5 subtests
Hi
i try to use pica_map
, but I get the following error when I execute the following command:
catmandu convert JSON --fix 'pica_map(021A, dc_title)' < test.json
Oops! Tried to execute the fix 'pica_map' but can't find Catmandu::Fix::pica_map on your system.
Error: No such fix package: pica_map
Package name: Catmandu::Fix::pica_map
Fix name: pica_map
Source:
pica_map(021A, dc_title)
When I look into the install moduls via catmandu info
, it says:
Catmandu::Fix::pica_map | 1.03 | copy pica values of one field to a new field
Catmandu version:
catmandu (Catmandu::CLI) version 1.2015 (/usr/bin/catmandu)
I also use e.g. pica_add()
which works.
Cheers
Michel
The GBV provides data in a "normalized" variant of the PICA+ plain format, have a look at the documentation https://verbundwiki.gbv.de/pages/viewpage.action?pageId=40009828 and the attached file test.txt
It would be nice to do a conversion without any replacing of characters upfront, eg. by
catmandu convert PICA --type normalized to PICA --type xml < test.txt
PICA::Data 2.0 will introduce a breaking change in PICA Path expression language (see gbv/PICA-Data#109 (comment)). The changes only affect some special cases (but the this depends on use case!). Current unit tests of Catmandu::PICA don't cover these cases, so Catmandu::PICA will install with PICA::Data 2.0 once the latter has been released but existing installation will work with the old syntax at same version of Catmandu::PICA. We may create a new release of Catmandu::PICA with PICA::Data 2.0 as requirement.
do pica_each('1...')
# process level 1 fields
end
There is a pull request to my repository "jorol/Catmandu-PICA". I tried to "redirect" it to our main repository "gbv/Catmandu-PICA", but this failed (see #21).
What is a proper way to merge these changes in "gbv/Catmandu-PICA"?
What could be done to avoid such situations in the future? I think the commit history of my master repository is broken (I merged the changes of "gbv/Catmandu-PICA" back in my master to keep it up to date) ...
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.