Git Product home page Git Product logo

onix's Introduction

ONIX

UNMAINTAINED

This gem is unmaintained. A fork with active maintainers is available: cacofonix

The ONIX standard is a somewhat verbose XML format that is rapidly becoming the industry standard for electronic data sharing in the book and publishing industries.

This library provides a slim layer over the format and simplifies both reading and writing ONIX files in your ruby applications.

This replaces the obsolete rbook-onix gem that was spectacular in its crapness. Let us never speak of it again.

Feature Support

This library currently only handles ONIX 2.1 files (all revisions). At some point I'll need to work out what to do about supporting ONIX 3.0 files. I suspect a separate library will be the simplest solution.

ONIX::Reader only handles the reference tag versions of ONIX 2.1. Use ONIX::Normaliser to convert any short tag files to reference tags.

ONIX::Writer only generates reference tag ONIX files.

It baffles me why anyone thought designing two parallel versions of the ONIX spec was a good idea. Use reference tags my friends, and let short tags fade away into irrelevant obscurity.

DTD Loading

To correctly handle named entities when reading an ONIX file, this gem attempts to load the DTD describing the ONIX format into memory. By default, this means each file you read will require several hundred Kb of data to be downloaded over the net.

This is obviously not desirable in most cases. To avoid it, you need to add copies of the ONIX DTDs into your system XML catalog. On Debian and Ubuntu systems, the quickest way to do that is to build and install the package available @ http://github.com/yob/onix-dtd

Installation

gem install onix

Usage

See files in the examples directory to get started quickly. For further reading view the comments to the following classes:

  • ONIX::Reader - For reading ONIX files
  • ONIX::Writer - For writing ONIX files
  • ONIX::Normaliser - For normalising ONIX files before reading them. Fixes encoding issues, etc
  • ONIX::Lists - For building hashes of code lists from the ONIX spec

Licensing

This library is distributed under the terms of the MIT License. See the included file for more detail.

Contributing

All suggestions and patches welcome, preferably via a git repository I can pull from. To be honest, I'm not really expecting any, this is a niche library.

Further Reading

onix's People

Contributors

mfvargo avatar yob avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

onix's Issues

Bug in ONIX::Normaliser next_tempfile()

You will want to set the unlink_now option to true when you close the tempfile. The way it is right now, the file can actually be unlinked after you copy the old file to that location.

Wont Read Onix Feed

Ive got an onix feed that is sent to me via a zip file in an email. The zip file contains a 100+ mb xml file and a dtd file. The top of the file looks like this:

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE ONIXMessage SYSTEM
"ONIX_BookProduct_3.0_short.dtd">
<ONIXmessage release="3.0">
<header>
<sender>
<x298>Publisher</x298>
<x299>Vendor</x299>
<j272>[email protected]</j272>
</sender>
<x307>20140311</x307>
<m183>An Onix message file from Publisher</m183>
</header>

in spite of the fact that this file has well over 10,000 products in it, the gem wont read any of them.

reader.each do |product|
    puts product.inspect
end

The each loop does nothing, it never fires, its as if the XML file had zero products in it.

Ive spent several days here, heres the entire algorithm for reference:

def self.parse_onix(publisher_id, onix_file)
    Zip::ZipFile.open(onix_file.tempfile.path) do |zip|
        xml_file = ""
        dir = "#{Rails.root.to_s}/tmp/onix/"

        zip.each do |entry|
            next if entry.name =~ /__MACOSX/ or \
             entry.name =~ /\.DS_Store/ or !entry.file?
            logger.debug "#{entry.name}"
            puts entry.name
            FileUtils::mkdir_p(dir)
            #this_file = FileUtils.touch(dir + entry.name)
            entry.extract(dir + entry.name)

            p '--->Thing:'+entry.name.last(3)
            if entry.name.last(3) == 'xml'
                xml_file = dir + entry.name
            end
        end

        Work.fix_dtd_path(dir, xml_file)

        reader = ONIX::Reader.new(xml_file)

        puts reader.inspect

        reader.each do |product|
            puts product.inspect
        end
    end
end


def self.fix_dtd_path(dir, xml_file)
    xml = File.read(xml_file)

    # fix the path in the DOCTYPE
    dtd_file = 'ONIX_BookProduct_3.0_short.dtd'
    xml = xml.gsub(dtd_file, dir + dtd_file)
    File.delete(xml_file)
    File.open(xml_file, 'w') do |file|
        file.write(xml)
    end
end

list extraction

Hi,
I noticed your format for list definition and created the following script to extract lists. The first step is to have the List definitions from the Onix CodeList xsd. Then change LIST_TO_RETRIEVE and the code list xsd file name appropriately. It's not too clean but I found that it helped me retrieve large lists quite easily (much better than doing it by hand). I hope it helps:

require 'xmlsimple'
xml = XmlSimple.xml_in('code_list.xsd')

LIST_TO_RETRIEVE = "List31"

xml.each do |key, value|

  if key.to_s == 'simpleType'

  value.each do |v|
    if v['name'].downcase == LIST_TO_RETRIEVE.downcase

      v['restriction'].each do |a|

        list_size = a['enumeration'].size
        a['enumeration'].each do |k|

          list_size -= 1
          val = ''
          #puts k['value'] + ' => "' + k['annotation'].class.to_s
          if k['value'][0..1] == '00' or k['value'].to_i != 0
            val = k['value'].to_i.to_s + ' => "' + k['annotation'][0]['documentation'][0] + '"'
          else
            val = '"' + k['value'] + '" => "' + k['annotation'][0]['documentation'][0] + '"'
          end

          val = val + ',' if list_size > 0
          puts val
        end

      end
    end
  end

  end

end

Thnx for Onix!

-Vivek.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.