Git Product home page Git Product logo

content-resolver's Introduction

DataCite

DataCite is a leading global non-profit organisation that provides persistent identifiers (DOIs) for research data. Our goal is to help the research community locate, identify, and cite research data with confidence.

About this repository

This is the generic DataCite repository for bugs, enhancements, and other issues. DataCite users can add their ideas through the DataCite Roadmap.

content-resolver's People

Contributors

katrinleinweber avatar kjgarza avatar koelnconcert avatar yarikoptic avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

content-resolver's Issues

citation mismatches DataCite Schema Documentation

Documentation:

Creator (PublicationYear): Title. Publisher. Identifier

DataCite Metadata Search Beta:

Creator; (PublicationYear): Title; Publisher. Identifier

Also the documentation describes (optional) display of Version and ResourceType.

RIS file format is documented to be encoded in windows-1252 but is generated as UTF-8

The RIS file format is oficially documented to be (for historic reasons, unfortunately) in the windows-1252 character set. The content-resolver returns the text data in UTF-8. This makes it ususeable for citations with umlauts, if you try to import them into Endnote. The same issue exists for Crossref output.

For PANGAEA (where we use the crossref and datacite content negotiation to get full citation information), we use Apache Tika's caracter sets auto-negotiation before parsing the RIS file, but thats not agood idea.

I just opened the issue to hopefully fix this (in communication with Crossref).

add nice 404 page

it might be irritating if a doi is already registered, but not (yet) in search available

better rendering of nameIdentifier and other fields

Currently e.g. for contributors the whole text content of the <contributor>; so nameIdentifier is just concatenated to contributorName. The nameIdentifier should also be actionable or at least show schemeURI and/or nameIdentifierScheme.

support rightsURI

new in schema 3.0. maybe we should link the rights value if there is a URI given.

Don't use application/ld+json mime type

Using this content type to return metadata in schema.org/JSON-LD format breaks support for custom application/ld+json media registered with a DOI. See codemeta/codemeta#125 for background.

We should instead use a more specific mime type. This needs more discussion, but for now, we can use application/vnd.schemaorg.ld+json.

BIBTEX output format does not escape characters according to Latex

The Latex escapes a lot of common characters, so importing the BIBTEX files from content-resolver in most cases lead to Latex errors. The rules for escaping characters are very complicated.

At PANGAEA we have an escaper class for BIBTEX that handles most of western chars to be correctly escaped when exported as Latex text (used by BIBTEX). We can provide this Java code here, it should be available to everyone. Its mainly a POJO with a static method that gets a String and returns the String as escaped Latex code.

labels too long

Labels that are too long need to be wrapped so that they do not mask the metadata:

datacite label

Why are DOIs converted to uppercase?

I see that many DataCite services seem to provide DOIs in ALL CAPS. For example, 10.5281/ZENODO.48810 rather than 10.5281/zenodo.48810. I didn't know until now that DOI resolution was case agnostic.

However, I view it as extremely undesirable to transform DOIs from their registered case. Basically, this defies the uniqueness tenant of the DOI system. In other words, I can no longer use DOIs as a primary key for a resource without converting to lowercase myself?

correctly format personal names in citeproc JSON

Split given and family name, e.g.

"author": [{
    "family": "Vision",
    "given": "Todd"
}, {
    "family": "Rueda",
    "given": "Laura"
}, {
    "family": "Dasler",
    "given": "Robin"
}, {
    "family": "Haak",
    "given": "Laure"
}, {
    "family": "Cruse",
    "given": "Patricia"
}, {
    "literal": "THOR Consortium"
}]

[Bug-report]: For RIS-citations the charset= is doubled in the HTTP response header:

Dear Datacite,
[Feature request:] Would you consider to use utf-8 as standard character set in all citation services?
Or, at least support Accept-Charset / Accept with charset as shown below.

[Bug-report]: For RIS-citations the charset= is doubled in the HTTP response header:

$ curl -v http://data.datacite.org/application/x-research-info-systems/10.21334/npolar.2016.3d72756d -H "Accept: application/x-research-info-systems;charset=utf-8" -H "Accept-Charset:utf-8"

< HTTP/1.1 200 OK
< Content-Type: application/x-research-info-systems; charset=charset=windows-1252

Front logo Front conversations

Text formats returned as UTF-8 without indicating a charset

I found this with both text/x-bibliography and text/turtle.

I'm using python-requests; I had trouble making the problem clear using other tools:

> r = requests.get('http://data.datacite.org/10.2312%2FGFZ.syserde.03.01.9', 
         headers={'accept':'text/x-bibliography; style=harvard3'})

No encoding is returned in the response, so according to the HTTP spec the response must be encoded as ISO-8859-1, and that is how it is (incorrectly) decoded.

> r.headers['content-type']
'text/x-bibliography'
> r.encoding
'ISO-8859-1'
> r.text 
u'Cacace,\xc2\xa0Mauro, Scheck-Wenderoth,\xc2\xa0Magdalena, 
Cherubini,\xc2\xa0Yvonne, and Przybycin,\xc2\xa0Anna Maria 2013,
\xe2\x80\x9cBeckenmodellierung: Temperatur in Sedimentbecken,\xe2\x80\x9d
Deutsches GeoForschungsZentrum GFZ, viewed 
<http://dx.doi.org/10.2312/GFZ.syserde.03.01.9>.\n'

Here, \xc2\xa0, \xe2\x80\x9c and \xe2\x80\x9d are all multi-byte UTF-8 characters (nbsp and left/right smartquotes) that have been decoded as multiple single-byte characters.

The correct output can be generated by explicitly decoding as UTF-8, but this is against spec and would require handling this particular service as a special case:

> r.content
'Cacace,\xc2\xa0Mauro, Scheck-Wenderoth,\xc2\xa0Magdalena, 
Cherubini,\xc2\xa0Yvonne, and Przybycin,\xc2\xa0Anna Maria 2013, 
\xe2\x80\x9cBeckenmodellierung: Temperatur in Sedimentbecken,\xe2\x80\x9d 
Deutsches GeoForschungsZentrum GFZ, viewed 
<http://dx.doi.org/10.2312/GFZ.syserde.03.01.9>.\n'
> r.content.decode('UTF8')
u'Cacace,\xa0Mauro, Scheck-Wenderoth,\xa0Magdalena, 
Cherubini,\xa0Yvonne, and Przybycin,\xa0Anna Maria 2013, 
\u201cBeckenmodellierung: Temperatur in Sedimentbecken,\u201d 
Deutsches GeoForschungsZentrum GFZ, viewed 
<http://dx.doi.org/10.2312/GFZ.syserde.03.01.9>.\n'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.