larsgw / citation.js Goto Github PK

Citation.js converts formats like BibTeX, Wikidata JSON and ContentMine JSON to CSL-JSON to convert to other formats like APA, Vancouver and back to BibTeX.

Home Page: https://citation.js.org/

License: MIT License

JavaScript 92.55% Roff 6.21% Shell 1.24%

bibtex citation citeproc-js csl-json wikidata

citation.js's People

Stargazers

Watchers

citation.js's Issues

Use prototypes and expose version number

Structure the Cite object better by, among other things:

use prototype methods instead of reassigning functions at each construction
expose or de-expose some props
expose version number

Use citeproc-js

Use citeproc-js as an optional/default output parsing program.

Add support for different output types (i.e. Harvard/author-date format)

Add option to output author-date (Harvard).

options with a custom csl template

Hello,
I am trying to get citations providing a csl template, but it seems to ignore the template and return a default citation format (apa I think).
Is this feature provided? Is something wrong with the following options?

var opt = { format: 'string', type : 'string', style : 'citation', lang : 'en-US', template : <XML_CSL> };

Thanks!

Move to asynchronous input parsing

Parsing input sometimes requires calls to e.g. the Wikidata API, but with the current setup, this has to use synchronous requests, something that should be avoided. I propose the following syntax:

Cite.parseInput(<data>, <optional: callback>)

function callback (result) {
  result // Parsed input (CSL-JSON)
}

Not providing callback (when required) will cause a warning and will probably be deprecated in the next version (so either v0.4 or v1.0). I could add both parseInput and parseInputAsync, to avoid the optional callback and just warning/throwing an error when parseInput is used when requests need to be made.

The following is possible too (perhaps both?):

Cite(<data>, <optional: options>, <optional: callback>)

function callback (result) {
  result // Cite object
}

This, however, has at least all the problems the one above has, and probably more.

Fix author parsing in Wikidata: Again! (the same, but different, but still the same)

Parsing an entire set of entity data is fine, parsing a single prop (author by string or by Wikidata ID, P2093 and P50 respectively) is not. Also, the code to merge both author types is about half of the code of the method. Should be able to be done easier.

This is both a code-enhancement (because the general code looks ugly) and a bug, the latter because the method (not the code) to parse a prop is now exposed to the public and should be able to be used independently of the method to parse a full entity.

Sort based on keys

Currently, sorting is done by BibTeX label. It would be nice if there's an option to sort as you want, for example first by author, then by year, then by title. Example API could be like this:

const data = Cite(...)

data.sort(['author', 'year', 'title'])

Tests exit with a syntax error on Travis CI

Tests exit with a syntax error on Travis CI. I'll have to fix this, as syntax errors make the test exit with 0.

Expose parseInput via Cite

Expose the currently private parseInput method on Cite, and change the way Wikidata ID strings are parsed. This should make writing test cases (#22) and async parsing (#29) easier.

Fix Wikidata ID parsing

Current setup:

1          2           3                4                     5

input ---> array  ---> each element  -> repeat from step 1 -> concat
       \-> string ---> url/single ID -> ID extracted, URL  -> WD JSON
                   \-> Multiple IDs  -> IDs extracted, URL -> WD JSON

Problems:

difficult to check for test cases (parseInput() on ID should return URL, instead of fetching data from API) [Edit: see #34]
parsing strings with multiple IDs is broken [Edit: fixed in cddab7f]
strings with single IDs count and are parsed as URLs
arrays with IDs are taken into another parse loop, which is unnecessarily time-consuming

Improve the parsing of non-JSON strings

Improve the parsing of non-JSON strings. Current situation:

JSON is parsed with JSON.parse()
Non-JSON JavaScript object data (e.g. any valid Object literal that isn't valid JSON) is converted to JSON using RegEx (bad practice), and then parsed with JSON.parse()
BibTeX is tokenised with Regex (technically bad practice, but in this case relatively fine) and then parsed with a sequence of while loops (both make for ugly though fairly readable code)
Wikidata ID lists and URLs is simple enough to not need non-Regex parsing, but may be worth to implement like that anyway.

Libraries possibly interesting:

PEG.js

Use Wikidata module in main module when possible

Add better infrastructure for Wikidata input

Add better infrastructure for Wikidata input in the Node.js module.

Expose ALL lib functions on Cite

This will be a trivial task after changes planned in #31.

JSON RegEx

In part one ((new Cite())._rgx.json[0]) of the RegEx to turn a JSON and Javascript Object strings into JSON, there seems to be a bug where unescaped single quotes in strings with double quotes are turned into unescaped double quotes, breaking the string. I will be working on this issue, since it's quite annoying.

To be clear: Of course, you can easily parse JSON, but I need to parse "relaxed" Javascript Objects to, which means I have to replace single quotes, when used to build strings, with double quotes, and quote unquoted keys.

Move jquery.Citation.js somewhere else

Move jquery.Citation.js somewhere else. I think it's to small for its own repo, but it really messes up version numbering, so I think I'll put it in the demo files somewhere. It's now flexible for wide use anyway, I think.

Low priority right now, probably one of the last things before releasing v0.3.0.

option to add an altmetric icon after the citation

Like this:

Improving BibTeX input & output

Currently most test cases work, but some things don't (see example below). I'll be working on:

Better use of field value delimiters, e.g. {, {{ or "
Move property parsing (BibTeX->CSL) to separate function, just as with Wikidata props
Move reversed prop parsing (CSL->BibTeX) to separate function as well
Improve prop parsing in both directions, if I can find good documentation

If someone has good documentation, that'd be really helpful.

Tasks

BibTeX string parser
BibTeX-JSON property parser
BibTeX-JSON property getter

Examples

The year field isn't working when imported in Mendeley (although it works in CiteULike). https://gist.github.com/larsgw/73d1e8fce01e0239d38ce1c599a81ccb

Make async the norm in testing

Update existing test cases to use async, and make a test case specific for sync parsing instead.

getIds fails test

In the docs I provided when I wrote the function, it clearly says: sorted id list. And the list doesn't get sorted. Oh my.

I'll have to look into how I intended the function, and then either update the function, or the docs (and test cases).

Add more test cases

Currently no edge cases are present, and the normal cases are very sparse anyway. Issues that could have been easily prevented:

Fix dependency browser support

Many of the dependencies have poor browser support, and when Citation.js is browserified, that doesn't change. Use babelify to compile dependencies too.

BibTeX processing does not handle booktitle in inproceedings

BibTeX processing does not support booktitle being specified directly in an @inproceedings entry, which is commonplace in the BibTeX export from many tools (such as Zotero). The test case seems to only use a crossref to @proceedings.

Example that reproduces this issue: citation.txt

Add (better) support for BibJSON

See blogpost below:

Citation.js now supports BibJSON. How I did that without actually updating Citation.js? Well, apparently I supported it all along. I've supported the quickscrape output format since July last year, and that turned out to be BibJSON. How convenient. I'll update the demo and docs to reflect this revelation (currently it just says "quickscrape's JSON scheme"), and, now that I can find actual documentation, some improvements to the parser. It's a good candidate for a new output format too.
Some side notes on updates v0.3.0-0 to v0.3.0-2: these are prerelease updates, making it possible to use code before I have fixed all the issues and added all the features I promised for version 0.3. These updates fixed a lot of file organization problems; next updates will restructure the Cite object and fix tests.
Lars Willighagen
_original

Make a converter for CSL JSON to BibTeX JSON

Add Bib.TXT support

Add Bib.TXT support. I would leave this to v0.4, which focuses on more and better input formats, but it seems like such a simple thing to do, and I don't want to wait for me to finish v0.3.

Start following code conventions and use babel

As I mentioned somewhere on my blog, the current 'code convention' is nonsense. I'll use standardJS and add relevant tests. Moreover, using ES6 (and babel) will make using good practices easier.

Make a new internal data standard

The current internal data standard is not suited for expansions of the program. Perhaps an implementation of Dublin Core. I propose the old method should still be valid input (and output?) as it will probably be easier to read for humans, depending on the design.

`v0.3.0-0` doesn't install globally

v0.3.0-0 doesn't install globally and gives the following error message:

npm ERR! Linux 4.9.0-1-686-pae
npm ERR! argv "/usr/local/bin/node" "/usr/local/bin/npm" "install" "-g" "citation-js"
npm ERR! node v7.4.0
npm ERR! npm  v4.0.5

npm ERR! bin/man.txt is not a valid name for a man file.  Man files must end with a number, and optionally a .gz suffix if they are compressed.
npm ERR! 
npm ERR! If you need help, you may report this error at:
npm ERR!     <https://github.com/npm/npm/issues>

This is caused by an invalid man entry file name, and will be fixed in the next version.

Move input parsing to a function

Move the input parsing functionality to an actual function, instead of having everything in the constructor scope.

Remove internal functions from Cite object properties

Add option to add own CSL locale and style template

Update docs

Update docs for, among others, the new modules: Node.js and Wikidata Node.js.

Use a better way to make Citation.js a Node module

Use a better way to make Citation.js a Node module. Currently, this is done in a makeshift manner.

Put custom CSL-JSON properties in special subobject

Currently, props like wikiID for Wikidata IDs and label for BibTeX labels, is put in the root object. I propose to put these in a _custom object, e.g.:

{
  author: "Lars Willighagen",
  title: "Citation.js",
  year: 2017,
  _custom: {
    wikiID: "Q1",
    label: "Willighagen2017Citation.js"
  }
}

Refactor code to be more Nodejs-like

Currently, all the source code (minus dependencies) is in one file. This is because, previously, that one file was used in the browser as well. Now that we have a browserify bundle (see #24), this isn't necessary anymore, and code should be split up into different files.

Also, I followed an... ehmm... unusual coding convention when I (re)wrote the source code in v0.2. I'll fix that too.

Some other expected improvements:

Use prototype instead of reassigning methods on each Cite construction [EDIT: done (#33)]
Make version accessible from the Cite object [EDIT: done (#33)]
Deprecate log-related methods
Make Cite#data read-only [EDIT: this can't actually be done]
Probably some more things

Make custom template and locale functioning easier

The current way of using custom templates and locales is confusing to say the best (see #28). Expected features:

default and custom template and locale registers, with appropriate methods
way to amend templates

Allow input string with multiple bibtex declarations

Package Citation.js as a public npm package

#6 is about making it work better as a local module for node.Citation.js, while this is about publishing all of it as a npm package.

use Wikidata webpage or resource IRI

Lars, currently the code points directly to the JSON:

https://www.wikidata.org/wiki/Special:EntityData/Q23576506.json

That exposes implementation details. It would be better to use either the Wikidata webpage or the resource IRI, so respectively:

https://www.wikidata.org/wiki/Q23576506
http://www.wikidata.org/entity/Q23576506

If I make a patch for that somewhere this weekend, can you give it a test?

Import reference by DOI

This is the easiest by just looking it up in Wikidata, but papers there are limited.

Edit: I take this back, by the way. It's just that I hadn't heard of Crossref, Citoid or DOI content negotiation yet, and that I was really impressed by the Wikidata API.

Merge P50 and P2093 when possible using series ordinal

Currently, when parsing Wikidata input, both P50 (author) and P2093 (author string) are evaluated, and the one holding more names is chosen over the other one. It would be nice to be able to merge them, using P1545 (series ordinal) to avoid conflicts. This may prove to be difficult, as the Wikidata simplifying method currently used discards these.

Node Wikidata parsing breaks in v0.2.12

More async!

Followup on #29 (Move to async parsing): We need to have async .set() and .add() methods too. That's all. Should be fairly easy.

Use browserify (or similar)

Currently, the main code is in an, as I like to call it, environmental wrapper. It's a self-executing function (to make variables non-global) that provides external libraries and browser-only functions to both the browser and the nodejs version. While the node substitutes for the browser-only functions and the minimalistic browser version of external node libraries are pretty ok, the way with which I detect what to use is not, and it's a quite error-prone piece of code anyway (it probably caused #18).

That's why I'm going to look into alternative (better) ways of making the main code work in both node and browser environments, and perhaps even more. Browserify is probably the best choice, but I'll have a look around. From what I've seen, not every node module can be browserified, but as I said, the substitutes are generally fine.

CSL is not defined

Hi, hope this is an appropriate question, being an utter newb. Good chance I'm just missing something obvious, but I can't figure out what..

Trying to get it run from the browser, I get the following error:
Uncaught ReferenceError: CSL is not defined at citation-0.2.js:2487
Which corresponds to:

  , window: {}
  , wdk: require( 'wikidata-sdk' )
  } : {
    CSL: CSL
  , striptags: function ( html ) {
      var tmp = document.createElement( 'div' )
      tmp.innerHTML = html
      return tmp.textContent || tmp.innerText || ''
    }

The CLI interface works fine, though I had to first pass the package through the wrap-cmd package to get it create a .cmd file on my Windows machine. (Which could very well be another mistaken understanding on my part.. =S )
I'm trying to incorporate it into my Jekyll page, which already parses a CSL-JSON file into a rudimentary bibliography.

Thank you for your work!
Lem

Requiring does not working on the browser

NPM modules are often used in the frontend now and requiring citation-js in the frontend gives an error. It would be great if it was meant to be exported as a frontend module as well.

it fails for Wikidata entries with "author" fields

E.g. for https://www.wikidata.org/wiki/Q27795847 where it fails with this message:

Uncaught TypeError: str.split is not a function
    at parseName (https://larsgw.github.io/citation.js/src/citation-0.2.js:415:19)
    at https://larsgw.github.io/citation.js/src/citation-0.2.js:1014:54
    at Array.map (native)
    at parseWikidataProp (https://larsgw.github.io/citation.js/src/citation-0.2.js:1014:22)
    at parseWikidataJSON (https://larsgw.github.io/citation.js/src/citation-0.2.js:1179:18)
    at parseInputData (https://larsgw.github.io/citation.js/src/citation-0.2.js:1793:16)
    at parseInput (https://larsgw.github.io/citation.js/src/citation-0.2.js:1835:14)
    at parseInputData (https://larsgw.github.io/citation.js/src/citation-0.2.js:1765:16)
    at parseInput (https://larsgw.github.io/citation.js/src/citation-0.2.js:1835:14)
    at Cite.add (https://larsgw.github.io/citation.js/src/citation-0.2.js:2176:17)

Make a wrapper function for http requests

Make a wrapper function for http requests.

Random Bugs Vol 1

Note: not intended for people who aren't me, this is just to document some bug fixes

Not all fixes are live yet.

Wikidata

Program errors with Q2 as input (9825abf, v0.3.0-8)
Program errors with Q30000000 as input (86c4eca, v0.3.0-10)

DOI

DOI gives journal-article instead of article-journal (8741c7e, v0.3.0-10)
DOI to BibTeX exits on certain props (8741c7e, v0.3.0-10)
DOI to formatted drops certain props (volume, issue, pages) [EDIT: caused by pub type] (8741c7e, v0.3.0-10)
Test cases shouldn't rely on DOI API (breaks for timestamps etc.) too much (relevant commit not live yet) (3e13e6f, v0.3.0-10)

Other

CSL-JSON to HTML is still not really functioning as intended (v0.3.0-8, not sure which commit)
The condition document && ... should be typeof document !== 'undefined' && ... in Cite#get() (0002f53, v0.3.0-8)
Cite#get() ing a DOM object returns undefined (0002f53, v0.3.0-8)
Cite#get() ing a DOM object returns a NodeList instead of a Node (68e8e7e, v0.3.0-9)