yob / onix Goto Github PK

View Code? Open in Web Editor NEW

39.0 5.0 33.0 5.81 MB

A convenient mapping between ruby objects and the ONIX XML specification

License: MIT License

Ruby 96.60% XSLT 3.40%

onix's Introduction

ONIX

UNMAINTAINED

This gem is unmaintained. A fork with active maintainers is available: cacofonix

The ONIX standard is a somewhat verbose XML format that is rapidly becoming the industry standard for electronic data sharing in the book and publishing industries.

This library provides a slim layer over the format and simplifies both reading and writing ONIX files in your ruby applications.

This replaces the obsolete rbook-onix gem that was spectacular in its crapness. Let us never speak of it again.

Feature Support

This library currently only handles ONIX 2.1 files (all revisions). At some point I'll need to work out what to do about supporting ONIX 3.0 files. I suspect a separate library will be the simplest solution.

ONIX::Reader only handles the reference tag versions of ONIX 2.1. Use ONIX::Normaliser to convert any short tag files to reference tags.

ONIX::Writer only generates reference tag ONIX files.

It baffles me why anyone thought designing two parallel versions of the ONIX spec was a good idea. Use reference tags my friends, and let short tags fade away into irrelevant obscurity.

DTD Loading

To correctly handle named entities when reading an ONIX file, this gem attempts to load the DTD describing the ONIX format into memory. By default, this means each file you read will require several hundred Kb of data to be downloaded over the net.

This is obviously not desirable in most cases. To avoid it, you need to add copies of the ONIX DTDs into your system XML catalog. On Debian and Ubuntu systems, the quickest way to do that is to build and install the package available @ http://github.com/yob/onix-dtd

Installation

gem install onix

Usage

See files in the examples directory to get started quickly. For further reading view the comments to the following classes:

ONIX::Reader - For reading ONIX files
ONIX::Writer - For writing ONIX files
ONIX::Normaliser - For normalising ONIX files before reading them. Fixes encoding issues, etc
ONIX::Lists - For building hashes of code lists from the ONIX spec

Licensing

This library is distributed under the terms of the MIT License. See the included file for more detail.

Contributing

All suggestions and patches welcome, preferably via a git repository I can pull from. To be honest, I'm not really expecting any, this is a niche library.

onix's People

Contributors

Stargazers

Watchers

onix's Issues

Bug in ONIX::Normaliser next_tempfile()

You will want to set the unlink_now option to true when you close the tempfile. The way it is right now, the file can actually be unlinked after you copy the old file to that location.

Wont Read Onix Feed

Ive got an onix feed that is sent to me via a zip file in an email. The zip file contains a 100+ mb xml file and a dtd file. The top of the file looks like this:

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE ONIXMessage SYSTEM
"ONIX_BookProduct_3.0_short.dtd">
<ONIXmessage release="3.0">
<header>
<sender>
<x298>Publisher</x298>
<x299>Vendor</x299>
<j272>[email protected]</j272>
</sender>
<x307>20140311</x307>
<m183>An Onix message file from Publisher</m183>
</header>

in spite of the fact that this file has well over 10,000 products in it, the gem wont read any of them.

reader.each do |product|
    puts product.inspect
end

The each loop does nothing, it never fires, its as if the XML file had zero products in it.

Ive spent several days here, heres the entire algorithm for reference:

def self.parse_onix(publisher_id, onix_file)
    Zip::ZipFile.open(onix_file.tempfile.path) do |zip|
        xml_file = ""
        dir = "#{Rails.root.to_s}/tmp/onix/"

        zip.each do |entry|
            next if entry.name =~ /__MACOSX/ or \
             entry.name =~ /\.DS_Store/ or !entry.file?
            logger.debug "#{entry.name}"
            puts entry.name
            FileUtils::mkdir_p(dir)
            #this_file = FileUtils.touch(dir + entry.name)
            entry.extract(dir + entry.name)

            p '--->Thing:'+entry.name.last(3)
            if entry.name.last(3) == 'xml'
                xml_file = dir + entry.name
            end
        end

        Work.fix_dtd_path(dir, xml_file)

        reader = ONIX::Reader.new(xml_file)

        puts reader.inspect

        reader.each do |product|
            puts product.inspect
        end
    end
end


def self.fix_dtd_path(dir, xml_file)
    xml = File.read(xml_file)

    # fix the path in the DOCTYPE
    dtd_file = 'ONIX_BookProduct_3.0_short.dtd'
    xml = xml.gsub(dtd_file, dir + dtd_file)
    File.delete(xml_file)
    File.open(xml_file, 'w') do |file|
        file.write(xml)
    end
end

Enable Sourcegraph

I want to use Sourcegraph for onix code search, browsing, and usage examples. Can an admin enable Sourcegraph for this repository? Just go to https://sourcegraph.com/github.com/yob/onix. (It should only take 30 seconds.)

Thank you!

list extraction

Hi,
I noticed your format for list definition and created the following script to extract lists. The first step is to have the List definitions from the Onix CodeList xsd. Then change LIST_TO_RETRIEVE and the code list xsd file name appropriately. It's not too clean but I found that it helped me retrieve large lists quite easily (much better than doing it by hand). I hope it helps:

require 'xmlsimple'
xml = XmlSimple.xml_in('code_list.xsd')

LIST_TO_RETRIEVE = "List31"

xml.each do |key, value|

  if key.to_s == 'simpleType'

  value.each do |v|
    if v['name'].downcase == LIST_TO_RETRIEVE.downcase

      v['restriction'].each do |a|

        list_size = a['enumeration'].size
        a['enumeration'].each do |k|

          list_size -= 1
          val = ''
          #puts k['value'] + ' => "' + k['annotation'].class.to_s
          if k['value'][0..1] == '00' or k['value'].to_i != 0
            val = k['value'].to_i.to_s + ' => "' + k['annotation'][0]['documentation'][0] + '"'
          else
            val = '"' + k['value'] + '" => "' + k['annotation'][0]['documentation'][0] + '"'
          end

          val = val + ',' if list_size > 0
          puts val
        end

      end
    end
  end

  end

end

Thnx for Onix!

-Vivek.