acl-org / acl-anthology Goto Github PK
View Code? Open in Web Editor NEWData and software for building the ACL Anthology.
Home Page: https://aclanthology.org
License: Apache License 2.0
Data and software for building the ACL Anthology.
Home Page: https://aclanthology.org
License: Apache License 2.0
S/N: 10
Title: Add stats and permission for supplementary materials
Proposer: Diane Litman
Suggested by Zeerak Waseem @zeerakw , @leondz and @evanmiltenburg @emilybender. See Twitter thread: https://twitter.com/evanmiltenburg/status/950367985889398790
S/N: 22
Priority: Low
Title: OAI-PMH functionality
I still see one problem with the CL articles:
in input files J10.xml through J16.xml, the journal title
"Computational Linguistics" is missing in the
header and therefore in the generated .bib files.
I guess this is a problem with the web scraper?
Go fix import of sigs to deal with other named semitic.yaml
In the not too distant future we should establish a number of mirrors that ensure that the core information of the anthology is always available. We will have to discuss exactly what these mirrors should do and where to get the resources for them.
On http://aclanthology.info/events/naacl-2016, the NAACL 2016 proceedings URL (http://aclweb.org/anthology/N16-1) is 404.
Slightly related text:
I still see one problem with the CL articles:
in input files J10.xml through J16.xml, the journal title
"Computational Linguistics" is missing in the
header and therefore in the generated .bib files.
I know the INLG 2017 proceedings are in the anthology (pdf, bib), but they're not listed on the SIGGEN page and don't appear in the search results.
The links to the individual pdf files referred to from https://aclanthology.info/volumes/proceedings-of-the-first-international-workshop-on-tree-adjoining-grammar-and-related-frameworks-tag-1 are broken.
The prefix that is used is "https://aclanthology.info/pdf/" (e.g., https://aclanthology.info/pdf/W/W90/W90-0200.pdf) while it should be (at least it's a link that works) "http://www.aclweb.org/anthology/" (e.g., http://www.aclweb.org/anthology/W/W90/W90-0200.pdf, or http://www.aclweb.org/anthology/W90-0200).
Moreover, in the bib files, the prefix that is used is http://aclanthology.coli.uni-saarland.de/pdf (e.g., http://aclanthology.coli.uni-saarland.de/pdf/W/W90/W90-0200.pdf)
S/N: 15
Priority: high
Title: Try to get ACL, etc. indexed again
On http://aclanthology.info/events/lrec-2014, the LREC 2014 proceedings URL (http://aclweb.org/anthology/L14-1) is 404.
On the same page, http://aclweb.org/anthology/L14-1000 is also 404.
S/N: 1
Title: Add support for two-word last names
Proposed by: Benjamin Van Durme
S/N: 20
Date: 2016-09-09
Priority: Low
Title: Get author and paper statistics automated
via email.
The author order seems likely to be a UI issue in the new anthology.
When I click on each of the papers, I see a different (correct) order of authors on the paper details page.
I also compared this to the old anthology. The old anthology appears to have the authors in the correct order too.
My guess is that the metadata has the correct order, but the web rendering is somehow reordering the authors on the page (only in paper list in the new anthology).
See screenshots:
S/N: 17
Priority: Medium
Title: Handle problem with single bibtex encoding
Proposer: Mark Steedman
S/N: 21
Date: 2016-09-09
Priority: Medium
Title: ACL 2017: Bibtex and style files for auto inclusion of DOIs
Hi Min-Yen,
I think you are the maintainer of the ACL Anthology? I found an error in the ACL 2017 bibtex file which makes it unparseable. There is missing comma at the end of line 4491 of this file:
http://aclweb.org/anthology/P/P17/P17-1.bib
Sincerely,
S/N: 12
Priority: high
Title: Move anthology PDFs over to aclanthology.info to get coverage by Scholar
Proposer: Darcy Dapra
S/N: 24
Priority: Low
Title: Move ingestion Q to Github Issues
Notes: Assigned to Christian Federmann
The current search does not explore the text of the papers themselves, but only their metadata. As it has been pointed out in issue #39, this is less than ideal.
The search functionality would be greatly enhanced if we could look into the content of the PDF files themselves. This is likely to be quite complicated.
I'm opening this issue as a feature proposal, in order to collect ideas.
S/N: 8
Title: Presentation / Poster Link handling
As seen in bug #51, giving a command-line argument for rake:import_xml
that it's not exactly 3 characters long deletes and re-imports the entire database from scratch. This is technically correct, but wrong for all practical purposes.
We Now have a trial virtual machine with the computer linguistics group in Saarbrücken. All the volunteers should get accounts for this machine.
S/N: 6
Date: 2016-06-30
Title: Auto create all types of bib files
Proposer: Mark Steedman
The diacritic character ń is causing problems while generating export files. I have just checked my publications list:
http://aclanthology.info/catalog?utf8=%E2%9C%93&search_field=all_fields&q=Agnieszka+Falenska
The three papers which were published under surname "Faleńska" and not "Falenska" have errors in bibtex files (but also all other export files). There is "Fale?ska" without n on the authors list (it should be \'{n} ).
It might cause problems for anybody who would like to cite those papers. And also for any other authors who have ń in their surnames.
We are at over 97% full on the server.
@villalbamartin, @CTNLP can you requisition more server space for the extra PDF files?
S/N: 18
Priority: Medium
Title: Show DOIs in single BibTex file
Documentation for ingesting new documents from STARTv2 into the Rails app shall be put up on the GitHub Wiki.
S/N: 14
Title: Move issues over to Github issues
Notes: Assigned to Christian Federmann
S/N: 11
Title: Fix old anthology ingestion to handle multi line authors
S/N: 4
Title: DBLP update
S/N: 3
Title: DOI update
Notes: synced at 16 Dec 2015
S/N: 7
Title: Presentation / Poster handling
S/N: 19
Date: 2016-09-09
Title: Ingest Volume, Issue for journal articles
Proposer: Dan Gildea
Notes: Assigned to Dan Gildea
The paper Assessing the Challenge of Fine-Grained Named Entity Recognition and Classification is returning a 403 when accessed through the Anthology interface. This happens because the paper's PDF has been moved, from http://www.aclweb.org/anthology/W10-2415.pdf
to http://www.aclweb.org/anthology/W10-2415.old.pdf
.
The paper can be still accessed through Google, but not through the Anthology. The person who pointed this out to me also mentioned that there's a revised version of it, available here: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.357.7217&rep=rep1&type=pdf
.
Should we replace the PDF link?
I can't seem to pull or push from the production machine in Saarlands with the aclanthology user.
Can we debug this problem?
S/N: 16
Priority: High
Title: Handle video links from old anthology
Compare:
Queries for words without numbers in them work well (e.g. "framenet"), and I can select the number of items per page. But I cannot choose the number of items per page for queries like "flickr30k" without the website breaking down:
The page you were looking for doesn't exist.
You may have mistyped the address or the page may have moved.
If you are the application owner check the logs for more information.
links to pdfs give a not found error, for example:
https://aclanthology.info/pdf/C/C98/C98-1001.pdf
I noticed that for papers in workshop proceedings the bib-file sometimes uses the entry type "inbook" (which from my point of view doesn't make sense) while for other "inproceedings" is used.
Compare http://aclanthology.info/papers/feelings-from-the-past-adapting-affective-lexicons-for-historical-emotion-analysis.bib and http://aclanthology.info/papers/semeval-2007-task-14-affective-text.bib
S/N: 9
Title: Handle Errata
The dates and the publisher information is not being picked up by google scholar. Also, a lot of the references are not indexed with the correct metadata (i.e., they don't get added to the list of citations of the papers they cite).
S/N: 5
Title: ACM update
Reported via email.
The bibtex entries are shown as @misc instead of @InProceedings:
https://aclanthology.info/papers/W13-2206/w13-2206
Did something change with respect to ingestion?
I didn't have this problem with re-ingesting proceedings but now it is happening. This is for W17.xml to change one author in W17-74.
aclanthology@aclanthology:~/acl-anthology$ rake import:xml["W17"] --trace
** Invoke import:xml (first_time)
** Invoke environment (first_time)
** Execute environment
** Execute import:xml
PG::FeatureNotSupported: ERROR: cannot truncate a table referenced in a foreign key constraint
DETAIL: Table "papers_people" references "people".
HINT: Truncate table "papers_people" at the same time, or use TRUNCATE ... CASCADE.
: TRUNCATE TABLE people RESTART IDENTITY;
rake aborted!
ActiveRecord::StatementInvalid: PG::FeatureNotSupported: ERROR: cannot truncate a table referenced in a foreign key constraint
DETAIL: Table "papers_people" references "people".
HINT: Truncate table "papers_people" at the same time, or use TRUNCATE ... CASCADE.
I doubt this issue is not know but I couldn't find it in the issue list here on github:
When I search for an author by name, quite often I get "The page you were looking for doesn't exist." errors even though, there are papers by this author in the anthology. E.g. when looking for Hannah Rashkin (see Screenshot) I get this error, but there is a paper by her: http://aclanthology.info/papers/connotation-frames-a-data-driven-investigation.bib
S/N: 23
Priority: Low
Title: Docker Container with minimal install
Proposer: Nitin Madnani
Notes: In progress
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.