Git Product home page Git Product logo

catmandu-pica's People

Contributors

choroba avatar jorol avatar julianladisch avatar vzg-deploy avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

catmandu-pica's Issues

Test fails on Windows

Test forXML importer fails on some Windows system, see cpantesters.

Error message

t/02-importer.t ......... 
Dubious, test returned 255 (wstat 65280, 0xff00)
No subtests run 
Read more bytes than requested. Do you use an encoding-related PerlIO layer? at C:\STRAWB~1\cpan\build\PICA-Data-0.26-vHTK4a/blib/lib/PICA/Parser/XML.pm line 46.

Catmandu::Importer::PICA with windows line endings

Importing a plain PICA record with Windows line endings (CR LF) gives me something like:

{
   "_id" : "67890\r",
   "record" : [
      [
         "003@",
         "",
         "0",
         "67890\r"
      ]
   ]
}]

with spare \r. Are PICA records expected to have Unix line endings?

Module does not provide support for the normalized CBS pica format with delimiters

Our CBS uses a pica+ format with delimiters \x1d, \x1e, \x1f which is still human readable, i.e. with line breaks after each field. Catmandu seems not to be able to produce this format. We use this format for many years because it can be directly loaded into CBS but is still human readable which makes it easier to find errors.

  • plain creates a file without delimiters
  • plus and binary use delimiters but they are not human readable (everything in one line)
  • generic seems not to be implemented in my catmandu version

Digi20-20220118.0.zip

Add Catmandu::Fix::bind::pica_patch

This should result in a PICA Patch record that modifies 003A$0 and adds 004F$af:

do pica_patch
  pica_set('dc.identifier', '003A$0')
   pica_add('dc.subjects', '004F$af')
end

pica_add() doesn't work if hash key 'record' not exists

$ echo '{"title":"code4lib"}' | catmandu convert JSON to PICA --type Plain --fix 'pica_add(title,021A$a)'
PICA record must be array reference
$  echo '{"title":"code4lib"}' | catmandu convert JSON to PICA --type Plain --fix 'pica_add(title,021A$a,record:"pica")'
PICA record must be array reference
$echo '{"title":"code4lib"}' | catmandu convert JSON to YAML --fix 'pica_add(title,021A$a)'
---
title: code4lib
...

Works if key is defined beforehand:

echo '{"title":"code4lib"}' | catmandu convert JSON to PICA --type Plain --fix 'set_array(record);pica_add(title,021A$a)'
021A $acode4lib

Invalid characters in xml

I just came across a problem with character ranges below 0x20, which are present in initial plain pica files. Catmandu perfectly converts them to pica xml, but by doing so Catmandu actually generates invalid xml files, which cannot be parsed by xsl processors.

Example file (one record in plain pica, field 233P contains the unicode character 0x2)
record.txt

Command
catmandu convert -v PICA --type binary to PICA --type XML < record.txt > record.xml

The resulting xml cannot be transformed (e.g. to solr xml) via xmlstarlet, xsltproc or Saxon HE. For now i delete all characters which are not allowed in xml according to the specification by using

tr -d '\000-\010\013\014\016-\037'

I wonder if Catmandu should handle this problem and only generate valid xml. What do you think?

Tests may fail (with older PICA::Data?)

On some of my smokers t/04-exporter.t fails:

#   Failed test at t/04-exporter.t line 86.
#          got: '<?xml version="1.0" encoding="UTF-8"?>
# <collection xlmns="info:srw/schema/5/picaXML-v1.0">
# <record>
#   <datafield tag="003@">
#     <subfield code="0">1041318383</subfield>
#   </datafield>
#   <datafield tag="021A">
#     <subfield code="a">Hello $Â¥!</subfield>
#   </datafield>
# </record>
# <record>
#   <datafield tag="028C" occurrence="01">
#     <subfield code="d">Emma</subfield>
#     <subfield code="a">Goldman</subfield>
#   </datafield>
# </record>
# </collection>
# '
#     expected: '<?xml version="1.0" encoding="UTF-8"?>
# 
# <collection xmlns="info:srw/schema/5/picaXML-v1.0">
#   <record>
#     <datafield tag="003@">
#       <subfield code="0">1041318383</subfield>
#     </datafield>
#     <datafield tag="021A">
#       <subfield code="a">Hello $Â¥!</subfield>
#     </datafield>
#   </record>
#   <record>
#     <datafield tag="028C" occurrence="01">
#       <subfield code="d">Emma</subfield>
#       <subfield code="a">Goldman</subfield>
#     </datafield>
#   </record>
# </collection>
# '
# Looks like you failed 1 test of 4.
t/04-exporter.t ........ 
Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/4 subtests 

Statistical analysis suggests that it passes only with PICA::Data 0.34 --- so probably the minimum prerequisite version should be adjusted.

****************************************************************
Regression 'mod:PICA::Data'
****************************************************************
Name           	       Theta	      StdErr	 T-stat
[0='const']    	      0.0000	      0.0000	  10.09
[1='eq_0.34']  	      1.0000	      0.0000	91646440343380832.00

R^2= 1.000, N= 63, K= 2
****************************************************************

Tests fail with PICA::Data 0.36

The t/10-validator.t test started to fail on my smoker systems:

#   Failed test at t/10-validator.t line 15.
#     Structures begin differing at:
#          $got->[0]{repeated} = Does not exist
#     $expected->[0]{repeated} = '1'

#   Failed test at t/10-validator.t line 15.
#     Structures begin differing at:
#          $got->[0]{message} = 'field is not repeatable'
#     $expected->[0]{message} = 'field 021A is not repeatable'

#   Failed test at t/10-validator.t line 15.
#     Structures begin differing at:
#          $got->[0]{message} = 'field is not repeatable'
#     $expected->[0]{message} = 'field 021A is not repeatable'
# Looks like you failed 3 tests of 10.
t/10-validator.t ....... 
Dubious, test returned 3 (wstat 768, 0x300)
Failed 3/10 subtests 

Statistical analysis suggests that a change in PICA::Data 0.36 caused this problem (theta=1 means "bad"):

****************************************************************
Regression 'mod:PICA::Data'
****************************************************************
Name           	       Theta	      StdErr	 T-stat
[0='const']    	      1.0000	      0.0000	136769032566320544.00
[1='eq_0.36']  	     -1.0000	      0.0000	-45909609008771576.00

R^2= 1.000, N= 71, K= 2
****************************************************************

Fields with a 3-digit occurrence are skipped

Example file consist of one record in plain pica format:
chunk.txt

Running
catmandu convert -v PICA --type binary to PICA --type XML < chunk.txt > chunk.txt.xml

results in

WARNING: no valid PICA field structure "203@/100 �09011385967 ". Skipped field at /usr/local/share/perl/5.22.1/Catmandu/Importer/PICA.pm line 46. WARNING: no valid PICA field structure "206W/100 �014784343X�z41409 A ". Skipped field at /usr/local/share/perl/5.22.1/Catmandu/Importer/PICA.pm line 46. WARNING: no valid PICA field structure "208@/100 �a27-01-09�bbn ". Skipped field at /usr/local/share/perl/5.22.1/Catmandu/Importer/PICA.pm line 46. WARNING: no valid PICA field structure "209A/100 �b8112�f1409�a1409/L040/k�rschner�di�x00 ". Skipped field at /usr/local/share/perl/5.22.1/Catmandu/Importer/PICA.pm line 46. WARNING: no valid PICA field structure "209B/100 �aCC0�x78 ". Skipped field at /usr/local/share/perl/5.22.1/Catmandu/Importer/PICA.pm line 46. WARNING: no valid PICA field structure "231@/100 �d21�j2007 ". Skipped field at /usr/local/share/perl/5.22.1/Catmandu/Importer/PICA.pm line 46. WARNING: no valid PICA field structure "231B/100 �a21.2007 ". Skipped field at /usr/local/share/perl/5.22.1/Catmandu/Importer/PICA.pm line 46. WARNING: no valid PICA field structure "203@/101 �09011386169 ". Skipped field at /usr/local/share/perl/5.22.1/Catmandu/Importer/PICA.pm line 46. WARNING: no valid PICA field structure "206W/101 �0147843677�z40324 A ". Skipped field at /usr/local/share/perl/5.22.1/Catmandu/Importer/PICA.pm line 46. WARNING: no valid PICA field structure "208@/101 �a27-01-09�bbn ". Skipped field at /usr/local/share/perl/5.22.1/Catmandu/Importer/PICA.pm line 46. WARNING: no valid PICA field structure "209A/101 �b8112�f0324�a0324/Zi 108 Sch K�r(13)�di�x00 ". Skipped field at /usr/local/share/perl/5.22.1/Catmandu/Importer/PICA.pm line 46. WARNING: no valid PICA field structure "209B/101 �aCC0�x78 ". Skipped field at /usr/local/share/perl/5.22.1/Catmandu/Importer/PICA.pm line 46. WARNING: no valid PICA field structure "231@/101 �d13�j1980 ". Skipped field at /usr/local/share/perl/5.22.1/Catmandu/Importer/PICA.pm line 46. WARNING: no valid PICA field structure "231B/101 �a13.1980,1-3 ". Skipped field at /usr/local/share/perl/5.22.1/Catmandu/Importer/PICA.pm line 46. WARNING: no valid PICA field structure "203@/102 �0119169531X ". Skipped field at /usr/local/share/perl/5.22.1/Catmandu/Importer/PICA.pm line 46. WARNING: no valid PICA field structure "206W/102 �0190064153�z40312 A ". Skipped field at /usr/local/share/perl/5.22.1/Catmandu/Importer/PICA.pm line 46. WARNING: no valid PICA field structure "208@/102 �a14-10-10�bbn ". Skipped field at /usr/local/share/perl/5.22.1/Catmandu/Importer/PICA.pm line 46. WARNING: no valid PICA field structure "209A/102 �b8112�f0312�a0312/Aa 28�di�x00 ". Skipped field at /usr/local/share/perl/5.22.1/Catmandu/Importer/PICA.pm line 46. WARNING: no valid PICA field structure "209B/102 �aCC0�x78 ". Skipped field at /usr/local/share/perl/5.22.1/Catmandu/Importer/PICA.pm line 46. WARNING: no valid PICA field structure "231@/102 �d10�j1966�0�d11�j1970 ". Skipped field at /usr/local/share/perl/5.22.1/Catmandu/Importer/PICA.pm line 46. WARNING: no valid PICA field structure "231B/102 �a10.1966; 11.1970 ". Skipped field at /usr/local/share/perl/5.22.1/Catmandu/Importer/PICA.pm line 46. converted 1 object done

fix travis.yml

  • delete Perl 5.12 and add 5.22+
  • use cpanm option --skip-installed instead of --skip-satisfied

sequence of local (1...) and item (2...) data get mixed by occurence on conversion

If I convert a PICA-XML record with data on all three levels (0..., 1... and 2....) elements of level 1 and 2 get mixed:

catmandu convert PICA --type XML to PICA --type Plain < xml_testfile.xml
  ...
  <datafield tag="101@">
    <subfield code="a">11</subfield>
  </datafield>
  <datafield tag="101B">
    <subfield code="0">11-09-10</subfield>
    <subfield code="t">16:50:01.000</subfield>
  </datafield>
  <datafield tag="101C">
    <subfield code="0">11-09-96</subfield>
    <subfield code="b">0001/0000</subfield>
  </datafield>
  <datafield tag="101D">
    <subfield code="0">11-09-10</subfield>
    <subfield code="b">touchark</subfield>
    <subfield code="a">0001</subfield>
  </datafield>
  <datafield tag="101U">
    <subfield code="0">utf8</subfield>
  </datafield>
  <datafield tag="145S" occurrence="11">
    <subfield code="a">Nx 6589</subfield>
    <subfield code="b">Nx 06589</subfield>
    <subfield code="9">086974076</subfield>
    <subfield code="a">Nx 6301 - Nx 6769</subfield>
  </datafield>
  <datafield tag="147B" occurrence="05">
    <subfield code="a">Auch als Mikroform vorhanden</subfield>
  </datafield>
  <datafield tag="147B" occurrence="09">
    <subfield code="a">ab</subfield>
  </datafield>
  <datafield tag="147I">
    <subfield code="a">kaz</subfield>
  </datafield>
  <datafield tag="201@" occurrence="01">
    <subfield code="a">Benutzung nur im Lesesaal|Standort unbekannt, wenden Sie sich an die Theke</subfield>
    <subfield code="b">0</subfield>
    <subfield code="e">365403148</subfield>
    <subfield code="m">mon</subfield>
    <subfield code="f">1</subfield>
    <subfield code="u">Benutzung nur im Lesesaal</subfield>
    <subfield code="v">Standort unbekannt, wenden Sie sich an die Theke</subfield>
  </datafield>
  <datafield tag="201B" occurrence="01">
    <subfield code="0">14-04-15</subfield>
    <subfield code="t">10:19:08.000</subfield>
  </datafield>
  <datafield tag="201D" occurrence="01">
    <subfield code="0">14-04-15</subfield>
    <subfield code="b">cbs_703</subfield>
    <subfield code="a">1999</subfield>
  </datafield>
  <datafield tag="201F" occurrence="01">
    <subfield code="0">0</subfield>
  </datafield>
  <datafield tag="201U" occurrence="01">
    <subfield code="0">utf8</subfield>
  </datafield>
  <datafield tag="203@" occurrence="01">
    <subfield code="0">365403148</subfield>
  </datafield>
  <datafield tag="208@" occurrence="01">
    <subfield code="a">01-01-71</subfield>
    <subfield code="b">k</subfield>
  </datafield>
  <datafield tag="209A" occurrence="01">
    <subfield code="f">1</subfield>
    <subfield code="a">4"@Nx 6589</subfield>
    <subfield code="d">s</subfield>
    <subfield code="x">00</subfield>
  </datafield>
  <datafield tag="209B" occurrence="01">
    <subfield code="a">D</subfield>
    <subfield code="x">69</subfield>
  </datafield>
  <datafield tag="209C" occurrence="01">
    <subfield code="a">D1280-630</subfield>
    <subfield code="x">90</subfield>
  </datafield>
  <datafield tag="220B" occurrence="01">
    <subfield code="a">Autopsie</subfield>
  </datafield>
  <datafield tag="220B" occurrence="01">
    <subfield code="a">Revision 2006 IIC2.1</subfield>
  </datafield>
  <datafield tag="237A" occurrence="01">
    <subfield code="a">Nur für den Lesesaal unter Aufsicht</subfield>
    <subfield code="a">In der Fernleihe nur als Mikrofiche benutzbar</subfield>
    <subfield code="a">Die Titelbl. für T. 2 sind hinter den Titelbl. von T. 1 eingebunden</subfield>
  </datafield>

vs,

...
101@ $a11
101B $011-09-10$t16:50:01.000
101C $011-09-96$b0001/0000
101D $011-09-10$btouchark$a0001
101U $0utf8
147I $akaz
201@/01 $aBenutzung nur im Lesesaal|Standort unbekannt, wenden Sie sich an die Theke$b0$e365403148$mmon$f1$uBenutzung nur im Lesesaal$vStandort unbekannt, wenden Sie sich an die Theke
201B/01 $014-04-15$t10:19:08.000
201D/01 $014-04-15$bcbs_703$a1999
201F/01 $00
201U/01 $0utf8
203@/01 $0365403148
208@/01 $a01-01-71$bk
209A/01 $f1$a4"@Nx 6589$ds$x00
209B/01 $aD$x69
209C/01 $aD1280-630$x90
220B/01 $aAutopsie
220B/01 $aRevision 2006 IIC2.1
237A/01 $aNur für den Lesesaal unter Aufsicht$aIn der Fernleihe nur als Mikrofiche benutzbar$aDie Titelbl. für T. 2 sind hinter den Titelbl. von T. 1 eingebunden
147B/05 $aAuch als Mikroform vorhanden
147B/09 $aab
145S/11 $aNx 6589$bNx 06589$9086974076$aNx 6301 - Nx 6769

Seems to be a problem with sorting by occurrence.

Tests have started to fail for 1.09 after release of PICA-Data 2.06

Sample fail report: http://www.cpantesters.org/cpan/report/8a6fde46-a772-11ed-ac40-0d1af2d2463d

Statistical analysis on my smokers hints that PICA::Data 2.06 is likely blame candidate:

****************************************************************
Regression 'mod:PICA::Data'
****************************************************************
Name                   Theta          StdErr     T-stat
[0='const']           1.0000          0.0000    48942084208964240.00
[1='eq_2.04']         0.0000          0.0000       5.16
[2='eq_2.05']         0.0000          0.0000       5.19
[3='eq_2.06']        -1.0000          0.0000    -22338902943805908.00

R^2= 1.000, N= 63, K= 4
****************************************************************

Support PICA record access methods independent from fixes

Catmandu::Fix::pica_map is useful for Catmandu fixes but processing in raw Perl requires some helper methods, e.g.:

use Catmandu -all;
my %mapping = (
    '010@a' => 'language',
    '003A$0', => 'dc.identifier'
);
importer('PICA', file =>"pica.xml", type=> "XML")->each(sub {
    my $hashref = $_[0];

    my $id = pica_field($hashref->{record}, '003@');
    my $date = pica_value($hashref->{record}, '001A$0');

    my $data = pica_mapping($hashref->{record}, \$mapping);
});

I refactored an independent function parse_pica_path to start with:

We could also bless the PICA record structure, e.g. to do:

$hashref->{record}->fields('1***'); # returns a list of arrays
$hashref->{record}->values('003@$0'); # returns a list of scalar values

In particular one should be able to easily filter holding fields, aggregated by level 1.

00-load.t doesn't work under Windows

Test results:

D:\Workspace\Dist\Catmandu-PICA>perl -TIlib ./t/00-load.t
Datei *.pm nicht gefunden
# Testing Catmandu::PICA , Perl 5.016001, C:\Programme\StrawBerry\perl\bin\perl.exe
1..0
# No tests run!

Install fails on some system

see http://matrix.cpantesters.org/?dist=Catmandu-PICA+0.16.

Error:


#   Failed test 'use Catmandu::Importer::PICA;'
#   at t/00-load.t line 6.
#     Tried to use 'Catmandu::Importer::PICA'.
#     Error:  Can't locate PICA/Parser/XML.pm in @INC (@INC contains: /home/stro/cpan/build/5.12.4/build/Catmandu-PICA-0.16-8qUGDg/blib/lib /home/stro/cpan/build/5.12.4/build/Catmandu-PICA-0.16-8qUGDg/blib/arch /home/stro/perl/5.12.4/lib/site_perl/5.12.4/x86_64-linux /home/stro/perl/5.12.4/lib/site_perl/5.12.4 /home/stro/perl/5.12.4/lib/5.12.4/x86_64-linux /home/stro/perl/5.12.4/lib/5.12.4 .) at /home/stro/cpan/build/5.12.4/build/Catmandu-PICA-0.16-8qUGDg/blib/lib/Catmandu/Importer/PICA.pm line 6, <DATA> line 4.
# BEGIN failed--compilation aborted at /home/stro/cpan/build/5.12.4/build/Catmandu-PICA-0.16-8qUGDg/blib/lib/Catmandu/Importer/PICA.pm line 6, <DATA> line 4.
# Compilation failed in require at t/00-load.t line 6, <DATA> line 4.
# BEGIN failed--compilation aborted at t/00-load.t line 6, <DATA> line 4.

#   Failed test 'use Catmandu::Exporter::PICA;'
#   at t/00-load.t line 6.
#     Tried to use 'Catmandu::Exporter::PICA'.
#     Error:  Can't locate PICA/Writer/Plus.pm in @INC (@INC contains: /home/stro/cpan/build/5.12.4/build/Catmandu-PICA-0.16-8qUGDg/blib/lib /home/stro/cpan/build/5.12.4/build/Catmandu-PICA-0.16-8qUGDg/blib/arch /home/stro/perl/5.12.4/lib/site_perl/5.12.4/x86_64-linux /home/stro/perl/5.12.4/lib/site_perl/5.12.4 /home/stro/perl/5.12.4/lib/5.12.4/x86_64-linux /home/stro/perl/5.12.4/lib/5.12.4 .) at /home/stro/cpan/build/5.12.4/build/Catmandu-PICA-0.16-8qUGDg/blib/lib/Catmandu/Exporter/PICA.pm line 4, <DATA> line 4.
# BEGIN failed--compilation aborted at /home/stro/cpan/build/5.12.4/build/Catmandu-PICA-0.16-8qUGDg/blib/lib/Catmandu/Exporter/PICA.pm line 4, <DATA> line 4.
# Compilation failed in require at t/00-load.t line 6, <DATA> line 4.
# BEGIN failed--compilation aborted at t/00-load.t line 6, <DATA> line 4.

#   Failed test 'use Catmandu::Fix::pica_map;'
#   at t/00-load.t line 6.
#     Tried to use 'Catmandu::Fix::pica_map'.
#     Error:  Can't locate PICA/Path.pm in @INC (@INC contains: /home/stro/cpan/build/5.12.4/build/Catmandu-PICA-0.16-8qUGDg/blib/lib /home/stro/cpan/build/5.12.4/build/Catmandu-PICA-0.16-8qUGDg/blib/arch /home/stro/perl/5.12.4/lib/site_perl/5.12.4/x86_64-linux /home/stro/perl/5.12.4/lib/site_perl/5.12.4 /home/stro/perl/5.12.4/lib/5.12.4/x86_64-linux /home/stro/perl/5.12.4/lib/5.12.4 .) at /home/stro/cpan/build/5.12.4/build/Catmandu-PICA-0.16-8qUGDg/blib/lib/Catmandu/Fix/pica_map.pm line 9, <DATA> line 4.
# BEGIN failed--compilation aborted at /home/stro/cpan/build/5.12.4/build/Catmandu-PICA-0.16-8qUGDg/blib/lib/Catmandu/Fix/pica_map.pm line 9, <DATA> line 4.
# Compilation failed in require at t/00-load.t line 6, <DATA> line 4.
# BEGIN failed--compilation aborted at t/00-load.t line 6, <DATA> line 4.
# Testing Catmandu::PICA 0.16, Perl 5.012004, /home/stro/perl/5.12.4/bin/perl
# Looks like you failed 3 tests of 4.

PICA::Data is not installed although it´s defined as prerequisite in cpanfile and META.json.

Add more fixes to modify PICA+ records

Here path denotes a PICA Path (such as 003@$0), field a field reference (such as dc.creator), value a string value and contents a field content (e.g. $afoo$bbar).

  • pica_update(path, values|contents, [options]) replace or add subfields or fields (deprecates pica_add and pica_set)
  • pica_replace(path, pattern, value) to search and replace in PICA subfield values with regular expression
  • pica_subfields(code,value,code,value...) or pica_subfields(contents) to set subfields of current field(s)
  • pica_annotate([path,] annotation) to set annotation
  • pica_append([path,] value|contents) to add full field(s) or subfield(s) (maybe not needed)
  • pica_sort([path,] schedule) to sort and reduce to known (sub)fields
  • pica_reduce([path,] schedule) to reduce to known (sub)fields (?)

Existing fixes:

  • pica_add(field, path, [options]) : should be depreacted because of order of arguments
  • pica_set(field, path, [options]) : might be depreacted because of order of arguments. Alternative name?
  • pica_keep(path [, path...])
  • pica_remove([path])
  • pica_map(path, field)
  • pica_tag(tag)
  • pica_occurrence(occurrence)

t/03-fix.t started to fail (unicode issue?)

On my smoker systems the test suite started to fail:

#   Failed test 'fix with pluck'
#   at t/03-fix.t line 25.
Wide character in print at /usr/perl5.31.11p/lib/5.31.11/Test2/Formatter/TAP.pm line 125.
#          got: '160.45 �36420368139783642036811'
#     expected: '160.45 €36420368139783642036811'

#   Failed test 'fix record'
#   at t/03-fix.t line 27.
Wide character in print at /usr/perl5.31.11p/lib/5.31.11/Test2/Formatter/TAP.pm line 125.
#     Structures begin differing at:
#          $got->{price} = '160.45 �364205076X9783642050763'
#     $expected->{price} = '160.45 €364205076X9783642050763'
# Looks like you failed 2 tests of 6.
t/03-fix.t ................. 
Dubious, test returned 2 (wstat 512, 0x200)
Failed 2/6 subtests 

(See also http://fast-matrix.cpantesters.org/?dist=Catmandu-PICA%201.02;reports=1#sl=7,1 )

Statistical analysis suggests that this started to fail with PICA::Data 1.08 (@nichtich : FYI):

****************************************************************
Regression 'mod:PICA::Data'
****************************************************************
Name           	       Theta	      StdErr	 T-stat
[0='const']    	      1.0000	      0.0000	196853234403964704.00
[1='eq_1.08']  	     -1.0000	      0.0000	-64900589445389040.00

R^2= 1.000, N= 92, K= 2
****************************************************************

Replacement of angle brackets

When converting from plain pica to pica xml, Catmandu replaces < with &amp;lt;; i think it should be &lt; instead.

Example file (one record in plain pica format)
angle-bracket.txt

Command
catmandu convert PICA --type binary to PICA --type XML < angle-bracket.txt > angle-bracket.xml

The resulting xml is not valid according to xmllint, see field 021A subfield a.
xmllint angle-bracket.xml

fixes pica_set and pica_add to create PICA

# Copy from field dc.identifier to pica field 003@ subfield 0, replace first existing field
pica_set('dc.identifier','003A0');

# Copy from field dc.identifier to pica field 010@ in an additional new subfield 0
pica_add('dc.language','010@a');

Double \x1D added when format is "normalized"

Hello,
I'm using your Perl module to convert Marc21 files into Pica+Files. We use the normalized format as this is directly supported by our library system.

#!/usr/bin/perl

use strict;
use utf8;
use warnings;

use PICA::Record;
use PICA::Writer;
use PICA::Field;

use Encode qw(encode decode);

my $writer =  PICA::Writer->new('tests/out.pica', format => 'normalized');
my $field =   new PICA::Field('021A');
my $record = new PICA::Record();

$field->add('a', 'Foo');
$field->add('d', 'Bar');

$record->appendif($field);

$writer->write('', $record);
$writer->write('', $record);
$writer->write('', $record);

$writer->end();

print "Pica file written";
1;

Produces the output:

�
�021A �aFoo�dBar
�
�
�021A �aFoo�dBar
�
�
�021A �aFoo�dBar

These rectangles between the records are \1xD (record separator) chars. CBS (library system) has problems if there are two \x1D and we need to remove one of them.

PICA::Parser::Plus - leader length

Leader length seems not to be standardized between different data providers. So check for optional leader could not be done via length and missing subfields.

Simplify internal PICA format

From

[$tag, $occurrence, '_', '', @subfields]

to

[$tag, $occurrence, @subfields]

this will break backwards compatibility.

Tests fail (with Catmandu 0.9503?)

t/07-pica-each.t fails, probably only if Catmandu 0.9503 is installed:

#   Failed test 'created is_ger tag'
#   at t/07-pica-each.t line 36.
#          got: undef
#     expected: 'true'

#   Failed test 'fields 045. deleted'
#   at t/07-pica-each.t line 37.
# Looks like you failed 2 tests of 5.
t/07-pica-each.t ........ 
Dubious, test returned 2 (wstat 512, 0x200)
Failed 2/5 subtests 

pica_map error

Hi

i try to use pica_map, but I get the following error when I execute the following command:

catmandu convert JSON --fix 'pica_map(021A, dc_title)' < test.json

Oops! Tried to execute the fix 'pica_map' but can't find Catmandu::Fix::pica_map on your system.
Error: No such fix package: pica_map
Package name: Catmandu::Fix::pica_map
Fix name: pica_map
Source:
	pica_map(021A, dc_title)

When I look into the install moduls via catmandu info, it says:
Catmandu::Fix::pica_map | 1.03 | copy pica values of one field to a new field

Catmandu version:
catmandu (Catmandu::CLI) version 1.2015 (/usr/bin/catmandu)

I also use e.g. pica_add() which works.

Cheers
Michel

Breaking change in PICA Path expression language with PICA::Data 2.0

PICA::Data 2.0 will introduce a breaking change in PICA Path expression language (see gbv/PICA-Data#109 (comment)). The changes only affect some special cases (but the this depends on use case!). Current unit tests of Catmandu::PICA don't cover these cases, so Catmandu::PICA will install with PICA::Data 2.0 once the latter has been released but existing installation will work with the old syntax at same version of Catmandu::PICA. We may create a new release of Catmandu::PICA with PICA::Data 2.0 as requirement.

pull request against jorol/Catmandu-PICA

There is a pull request to my repository "jorol/Catmandu-PICA". I tried to "redirect" it to our main repository "gbv/Catmandu-PICA", but this failed (see #21).

What is a proper way to merge these changes in "gbv/Catmandu-PICA"?

What could be done to avoid such situations in the future? I think the commit history of my master repository is broken (I merged the changes of "gbv/Catmandu-PICA" back in my master to keep it up to date) ...

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.