Git Product home page Git Product logo

VarioML - formatting genetic data in JSON or XML

VarioML is a flexible framework which can be used as a template for developing serialization formats for variation data focusing on the LSDB (Locus Specific Mutation Database) use cases. Several data formats come pre-defined, but VarioML elements function as building blocks that can be restructured to fit different implementations to build new data exchange formats - from patient-centered formats to aggregated variant data formats. Originally an XML/RelaxNG framework, VarioML has been rebuilt using JSON Schema.

VarioML was first developed in the GEN2PHEN program from 2011 - 2013 and it was based on an observation model developed there. A simplified UML model can be found here. VarioML has since been reimplemented in JSON, and we're currently working on improviding the documentation with more examples.

VarioML is a collaborative, community-driven specification in active development. If you'd like to collaborate with us, please let us know through admin at varioml.org, or feel free to create an issue.

When using or discussing VarioML, please refer to:

Byrne et al (2012). VarioML framework for comprehensive variation data representation and exchange. BMC Bioinformatics. 2012 Oct 3;13:254. doi: 10.1186/1471-2105-13-254.

Important:

  • All new implementations should use validation to ensure consistency with the core specification. When using Schematron, contact admin at varioml.org for help with generating validation scripts.
  • Use standard ontologies for the content, such as the Sequence Ontology and VariO. Some LSDB specific terms are defined using SKOS vocabularies.
  • Source attributes in database xrefs (such as gene and ref_seq) as well as ontology terms should use database abbreviations defined in the MIRIAM registry. For example, use hgnc.symbol for gene names, refseq for NCBI reference sequences, and obo.so for Sequence Ontology references.

JSON implementation

The JSON implementation is a clean redesign from the VarioML XML format, and our current focus. A JSON schema is provided to validate your data files, to provide descriptions and examples, and as the template for the documentation.

VarioML-JSON is implemented in LOVD3.

  • LOVD3 uses VarioML in all its recent APIs (both the submission API first released in 2016 and the new data retrieval API to be released in 2021). The new data retrieval API uses the GA4GH Table Discovery API wrapped around VarioML data objects.
  • See an example implementation written for the LOVD3 submission API.
  • For the basic JSON Schema, see either the LSDB object's JSON Schema example based on the XML implementation, or the clean reimplementation which is still work in progress.

XML implementation

A translation for EXI is supported using the Excificient library. XML validation is supported using Schematron.

VarioML-XML is implemented in Café Variome. See the specification for the implementation.

Apps

View a collection of demo applications. Also see validation tools.

News

  • 2021-03-02: We'll be updating the documentation on the JSON implementation. Please bear with us.
  • 2016-09-10: Libraries updated. NEMDB Examples added.
  • 2012-08-19: Population changed to singleton in frequency. This does not have an impact in any existing XML implementations (the property is not used).
  • 2012-08-08: Gene has changed to a list of genes. One variant may have impact on more than one gene.
  • 2012-08-08: EXI binary XML support added. EXI can reduce the size of memory usage by 3-~10 times, already even without compression.
  • 2012-02-09: JAXB / JSON (based on Jackson) API implementation is in org.varioml.jaxb folder. The code has not been fully tested. The SimpleXML implementation will be retired (support is possible though).
  • Comments/feedback: admin <> varioml.org. Please send us an email if you are using VarioML so that we can accommodate your requirements!

Funding

VarioML has received funding from the European Community's Seventh Framework Programme (FP7/2007-2013) under grant agreement number 200754 - the GEN2PHEN project.

LSDB Data exchange format's Projects

varioml icon varioml

supporting the integration, federation, and exchange of Variation Data

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.