Git Product home page Git Product logo

rxiv-types's Introduction

rxiv-types (v0.1.0)

Built with: xsdata-pydantic

Introduction

A complete implementation of the XML/JSON schema for *Rxiv preprint servers. This covers arXiv, medrXiv, biorXiv, chemrXiv, and DOAJ.

This package helps to parse XML/JSON data into Pydantic models. This validates the input xml data and provides typehints for working with the complex XML structures present in *Rxiv data.

Why do I need this?

Parsing XML on its own is challenging. Add to it the feature rich data inside of each citation, and you will find yourself with hours or days of navigating the XML structure.

The approach here was to autogenerate Pydantic classes to parse the XML using the xsdata-pydantic tool. This approach has the benefit of making sure every piece of data is parsed properly, and an error is thrown if something is missing or incorrect. Instead of using dictionaries to hold the data, Pydantic classes have the benefit of providing type hints with tab completion for IDEs, making it easier to navigate the complex structure of the citation data.

How do I use it?

It is possible to use xsdata-pydantic and the autogenerated classes directly to parse an XML file, but we provide a convenience function to easily open *rxiv XMl citations and PMC open access articles.

Example 1: Parse ChemRxiv Data

from pathlib import Path

import requests

from rxiv_types import chemrxiv_records

chemrxiv_url = "https://chemrxiv.org/engage/chemrxiv/public-api/v1/oai?verb=ListRecords&metadataPrefix=oai_dc&from=2000-01-01"

# 1. Get some chemrxiv data from the API
result = requests.get(chemrxiv_url)
destination = Path(f"downloads/data/chemrxiv.xml")
destination.parent.mkdir(parents=True, exist_ok=True)
with open(destination, "wb") as fw:
    fw.write(result.content)

# 2. Parse the data, and display the first article title
result = chemrxiv_records(destination)

# 3. Print some information about the first record
print("Paper 1:")
print(f"Title: {''.join(result.list_records.record[0].metadata.dc.title)}")
print(f"Authors: {'; '.join(result.list_records.record[0].metadata.dc.creator)}")
print(f"Abstract: {''.join(result.list_records.record[0].metadata.dc.description)}")

Output:

Paper 1:
Title: Excitonics: A universal set of binary gates for molecular exciton processing and signaling
Authors: Nicolas, Sawaya; Dmitrij, Rappoport; Daniel, Tabor; Alan, Aspuru-Guzik
Abstract: The ability to regulate energy transfer pathways through materials is an
 important goal of nanotechnology, as a greater degree of control is 
crucial for developing sensing, solar energy, and bioimaging 
applications. Such control necessitates a toolbox of actuation methods 
that can direct energy transfer based on user input. Here we propose a 
novel molecular exciton gate, analogous to a traditional transistor, for
 controlling exciton migration in chromophoric systems. The gate may be 
activated with an input of light or an input flow of excitons. Unlike 
previous gates and switches that control exciton transfer, our proposal 
does not require isomerization or molecular rearrangement, instead 
relying on excitation migration via the second singlet (S2) state of the
 gate molecule--hence the system is named an "S2 exciton gate." After 
presenting a set of system properties required for proper function of 
the S2 exciton gate, we show how one would overcome the two possible 
challenges: short-lived excited states and suppression of false 
positives. Precision and error rates are studied computationally in a 
model system with respect to excited-state decay rates and variations in
 molecular orientation. Finally, we demonstrate that the S2 exciton gate
 gate can be used to produce binary logical AND, OR, and NOT operations,
 providing a universal excitonic computation platform with a range of 
potential applications, including e.g. in signal processing for 
microscopy.

FAQ

Why are the return structures so complicated?

The return structures are a direct reflection of the XML format defined by OAI and any customizations from the hosting preprint servers. In the future some utility classes might be made for common components (title, authors, etc), but for now this is intended to be an unbiased way of parsing the XML.

rxiv-types's People

Contributors

nicholas-schaub avatar

Watchers

 avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.