Git Product home page Git Product logo

onix-data's Introduction

Quality gate

Reliability Security Rating Maintainability

Lines of Code Bugs Vulnerabilities Coverage Duplicated Lines (%)

ONIX-Data

This solution is a C# library that serves to provide .NET data structures (and an accompanying set of helpful parsers) for the ONIX XML format, which is the international standard for representing the electronic data regarding books (along with other media). This format has been established by the international book trade body known as EDITEUR. Within this solution, you will find two collections of classes for serialization/deserialization: one that represents the legacy format (i.e., 2.1 and earlier) and another that represents the current format (i.e., 3.0). In addition, two helpful parser classes have been included in order to assist with the population of those collections.

Even though the "sunset date" for the legacy version 2.1 has passed, many (if not most) organizations still use 2.1 for the time being, and they will likely be used for the near future.

Unfortunately, since validation of ONIX files has proven problematic on the .NET platform, there is an accompanying Java project that can serve to validate those files instead.

Requirements

  • Visual Studio 2012 (at least)
  • An unconditional love for a XML tag collection that attempts to cover the ontology of the known universe.

ONIX Editions Handled

  • ONIX 3.0 (short tags)
  • ONIX 3.0 (reference tags)
  • ONIX 2.1.3 and earlier (short tags)
  • ONIX 2.1.3 and earlier (reference tags)

NOTE: Even though this project addresses many tags of both ONIX versions, it does not currently parse out all of them, especially in the case of ONIX 3.0 (which appears to aim at supporting the ontology of the known universe). In the case that you find something unsupported and wanted, you can create an issue within this repo, and I will attempt to address it in my free time. (Or you can implement it on your own and then issue a pull.)

For Large ONIX Files

When parsing larger ONIX files (typically anything greater than 250 MB), it's strongly encouraged to use the OnixLegacyPlusParser class (for ONIX 2.1) and the OnixPlusParser class (for ONIX 3.0). These two classes are used just like the OnixLegacyParser and OnixParser classes, and they will help the user to avoid out-of-memory exceptions.

Notes

There is one caveat to know before using any of the Parsers: the ONIX-Data project does perform non-optional preprocessing on the ONIX file before doing any actual parsing. These changes are merely real-world substitutions for ONIX encodings (found in the ONIX DTD), which is the same result for the output when parsing with a DTD. These non-optional replacements actually change the file itself, and it can take a few minutes to finish (like 6-8 minutes per 400 MB), depending on the machine's specs. So, if you value the original copy of your ONIX file (i.e., with non-standard ONIX encodings), be sure to create a backup copy beforehand.

The Parsers also have an optional preprocessing step (invoked via the constructor), which will perform other friendly edits (like removing misformed HTML encodings, etc.) that will clean the file of any suspicious characters. These characters can cause the Microsoft XML libraries to throw an exception.

If you would like to become better acquainted with legacy format of the ONIX standard, you can find documentation and relevant files (XSDs, DTDs, etc.) on the archive page of EDITEUR.

If you would like to become better acquainted with the current version of the ONIX standard, you can find documentation and relevant files (XSDs, DTDs, etc.) on the current page of EDITEUR.

Usage Examples

// An example of using the ONIX parser for the contemporary ONIX standard (i.e., 3.0)
int nOnixPrdIdx = 0;
string sFilepath = @"YourVer3OnixFilepath.xml";

FileInfo CurrentFileInfo = new FileInfo(sFilepath);
using (OnixParser V3Parser = new OnixParser(CurrentFileInfo, true))
{
    OnixHeader Header = V3Parser.MessageHeader;

    foreach (OnixProduct TmpProduct in V3Parser)
    {
        string tmpISBN = TmpProduct.ISBN;

        var Title       = TmpProduct.Title;
        var Author      = TmpProduct.PrimaryAuthor;
        var Language    = TmpProduct.DescriptiveDetail.LanguageOfText;
        var PubDate     = TmpProduct.PublishingDetail.PublicationDate;
        var SeriesTitle = TmpProduct.SeriesTitle;
        var USDPrice    = TmpProduct.USDRetailPrice;

        var BarCodes = TmpProduct.OnixBarcodeList;

        /*
         * The IsValid method will inform the caller if the XML within the Product tag is invalid due to syntax
         * or due to invalid data types within the tags (i.e., a Price with text).
         *
         * (The functionality to fully validate the product in accordance with the ONIX standard is beyond the scope
         * of this library.)
         *
         * If the product is valid, we can use it; if not, we can record its issue.  In this way, we can proceed 
         * with parsing the file, without being blocked by a problem with one record.
         */
        if (TmpProduct.IsValid())
        {
            System.Console.WriteLine("Product [" + (nOnixPrdIdx++) + "] has EAN(" +
                                     TmpProduct.EAN + ") and USD Retail Price(" + TmpProduct.USDRetailPrice.PriceAmount +
                                     ") - HasUSRights(" + TmpProduct.HasUSRights() + ").");
                                     
            /*
            * For 1-to-many composites, where a product can have more than one subitem (like Contributor), you should
            * use the lists that have a prefix of 'Onix', so that you can avoid having to detect whether or not the
            * reference or short composites have been used.
            */
            if (TmpProduct.DescriptiveDetail.OnixContributorList != null)
            {
                foreach (var TmpContrib in TmpProduct.DescriptiveDetail.OnixContributorList)
                {
                    System.Console.WriteLine("\tAnd has a contributor with key name (" + TmpContrib.KeyNames + ").");
                }
            }                                         
        }
        else
        {
            System.Console.WriteLine(TmpProduct.GetParsingError());
        }
    }
}

// An example of using the ONIX parser for the legacy ONIX standard (i.e., 2.1)
int nLegacyShortIdx = 0;
string sLegacyShortFilepath = @"YourOnixFilepath.xml";
using (OnixLegacyParser onixLegacyShortParser = new OnixLegacyParser(new FileInfo(sLegacyShortFilepath), true))
{
    OnixLegacyHeader Header = onixLegacyShortParser.MessageHeader;

    // Check some values of the header

    foreach (OnixLegacyProduct TmpProduct in onixLegacyShortParser)
    {
        string Ean = TmpProduct.EAN;

        /*
         * The IsValid method will inform the caller if the XML within the Product tag is invalid due to syntax
         * or due to invalid data types within the tags (i.e., a Price with text).
         *
         * (The functionality to fully validate the product in accordance with the ONIX standard is beyond the scope
         * of this library.)
         *
         * If the product is valid, we can use it; if not, we can record its issue.  In this way, we can proceed 
         * with parsing the file, without being blocked by a problem with one record.
         */
        if (TmpProduct.IsValid())
        {
            System.Console.WriteLine("Product [" + (nLegacyShortIdx++) + "] has EAN(" +
                                     TmpProduct.EAN + ") and USD Retail Price(" + TmpProduct.USDRetailPrice.PriceAmount +
                                     ") - HasUSRights(" + TmpProduct.HasUSRights() + ").");
                                     

            /*
             * For 1-to-many composites, where a product can have more than one subitem (like Contributor), you should
             * use the lists that have a prefix of 'Onix', so that you can avoid having to detect whether or not the
             * reference or short composites have been used.
             */
            if (TmpProduct.OnixContributorList != null)
            {
                foreach (OnixLegacyContributor TempContrib in TmpProduct.OnixContributorList)
                {
                    System.Console.WriteLine("\tAnd has a contributor with key name (" + TempContrib.KeyNames + ")."); 
                }
            }
        }

        }
        else
        {
            System.Console.WriteLine(TmpProduct.GetParsingError());
        }
    }
}

onix-data's People

Contributors

jaerith avatar szolkowski avatar dgil-unedbarbastro avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.