Git Product home page Git Product logo

aclpub's Introduction

ACLPUB package

This is the official ACLPUB package for *ACL conferences. Its primary role is to package up the PDF files, BibTeX files, and optional extra files into a format that can be ingested by the ACL Anthology. Documentation can found under docs/ and view on the web. The latest version can always be found on Github.

Instructions for START

Softconf's STARTv2 system is the main system used for conference management within the ACL community. It includes extensive integration around the ACLPUB package. Information about how to use ACLPUB within START can be found here.

ACLPUB can also be run from the command line, which facilitates use with third-party conference management software.

Instructions for submitting to the Anthology

Instructions for submitting proceedings to the Anthology can be found here. These instructions were simplified in March of 2020 to accommodate the Anthology's new ID format). For more complete information on the overall process, please see the Anthology's Information for Submitters.

Branch Convention

The following branches have special import:

  • The master branch is used for main development and contains the official stable release.
  • The start branch reflects the current code being used in START. It is brought in sync with master at regular intervals.

History

  • 2005: The ACLPUB package and documentation were built in by Jason Eisner and Philipp Koehn, based in part on scripts by David Yarowsky that had been used for several years previously.
  • 2019: the code underwent substantial modernization and revision by David Chiang and Dan Gildea.
  • 2020: revisions were put in place to work with the Anthology's new ID format by Matt Post.

aclpub's People

Contributors

bethard avatar danielgildea avatar davidweichiang avatar mjpost avatar rrgerber avatar shimorina avatar texttheater avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

aclpub's Issues

Is the ACL 2020 format the same as EMNLP 2022

My paper draft is done over the ACL 2020 format (identical to the ACL 2022 format apparently) and I want to submit this paper to EMNLP 2022. Will this be an issue? Both templates look the same but I am afraid it will get a desk rejection because of wrong format. Please let me know if this would most likely be the case. Thanks!

use only attachment tags

i’m not sure if we are still using etc tags, need to check when at terminal and ensure we only use attachment.

use geometry package

Many of the templates, e.g., titlepage.tex have the following in their preamble:

\setlength\topmargin{0.2cm} \setlength\oddsidemargin{-0cm}
\setlength\textheight{24.7cm} \setlength\textwidth{16cm}
\setlength\columnsep{0.6cm}  \newlength\titlebox \setlength\titlebox{2.00in}
\setlength\headheight{5pt}   \setlength\headsep{0pt}
\setlength\footskip{1.0cm}
\setlength\leftmargin{0.0in}

Could we replace this with the geometry package? I think it would be simpler to understand that way:

\usepackage[a4paper,margin=2.5cm,columnsep=0.6cm,headheight=5pt,headsep=0pt,footskip=1cm]{geometry}
\newlength\titlebox \setlength\titlebox{5cm}

improvements for START book chair guide

Hello again!

The START guide doesn't mention the files just-program.tex and just-toc.tex, however, without them compiling proceedings on START is not possible. Also, START interface (Templates tab) has no fields for those two files. So in my understanding, the Step 3 of the guide becomes an obligatory one contrary to what is stated there.

I'd issue a PR to fix that, but I'm not sure if my understanding is correct, and if there wouldn't be any changes planned to templates and/or START interface.

Thank you for all your work!

Ignore duplicate entries in the order file

ACLPUB uses the order file to generate the program order. It checks that all papers are in the file, but if a paper is listed twice, it will generate two copies of it in the proceedings. This should be fixed.

Merge in easy2acl

It would be helpful to merge in the easy2acl repo and documentation (really just one script), so that we have a single access point for people wishing to contribute to the Anthology.

Remove HTML generation code?

Quite a few pieces of ACLPUB are dedicated to generation of HTML, but as far as I know, this code is not actually used any more. Is it?

For example, the proceedings.tgz generated for EMNLP 2018 contains only one HTML file, advertisement.html, which is a list of accepted papers. I believe that @desilinguist did not rely on this HTML file when making the EMNLP 2018 web page (because I think I gave him a Markdown version) and wonder whether anyone else uses it.

Assuming that the HTML generated here is not used anymore, I suggest retiring all relevant code (advertisement.pl authors.pl db-to-html.pl index.pl program-html.pl unified-authors.pl) so that all HTML generation is done from the Anthology (which @mbollmann is working on a modern version of).

Latex template files for authors

This repository is now the canonical place to download the latex and word templates for ARR, if one doesn't want to use Overleaf. And then Github makes it just awkward enough to download the multiple files in templates/latex that it's easier to clone the full repository. However, this repo isn't made for author end-users either:

  • The repository is really bloated if you're looking for just the latex templates - the full repository is 23MB, 13MB of which is in .git/ and 5.5MB is in templates/archive. (I know this is very old-man-yells-at-cloud in this age of terabytes but still - this doesn't need to be replicated across tens of thousands of directories.)
  • Most of the information (e.g. in docs/) is clearly for publication chairs, not authors.

Maybe ARR could offer tar bundles for the templates again? Or this repo could also contain tar bundles of the files in templates/latex, so it would be possible for authors to just download the single file?
Thank you for considering this.

bad bibkeys

ACLPUB seems sometimes to create bad bib keys when there are special characters in the lead author's name:

@InProceedings{b\"{u}ler-etal-2005-using,
  author    = "B\"{u}ler, Dirk and Minker, Wolfgang and Elciyanti, Artha",

make-anthology.sh shouldn't generate bib files

make-anthology.sh currently creates this structure:

anthology/
  P/
    P19/
      P19.xml
      P19-1001.pdf
      P19-1001.bib
      P19-1001.Supplementary.tgz

The *.bib files are no longer needed since we only use the XML at ingestion time.

\textasciitilde

We often see ñ converted to something like \\textasciitilde {n}, e.g., the following from EAMT 2020:

author = "Gema Ram\'{\i}rez-S\'{a}nchez and Jaume Zaragoza-Bernabeu and Marta Ba\\textasciitilde {n}\'{o}n and Sergio Ortiz Rojas",

It's possible this is caused by ACLPUB.

Merge changes from acl-org/acl-pub

All the relevant changes I see in acl-pub are recent changes from @danielgildea and me. I can put them into a PR here.

A lot of the changes involve anthologize.pl, which is run on the pub chair's local machine to convert the final.tgz generated by START into what the Anthology wants. I think it was never really decided whether that should have its home here or acl-pub.

Add SIG information

SIG information is available (I think?) in the meta file. This should be added as a tag in the XML so that it is available to automatically be added the Anthology at ingestion time (saving the manual effort and error associated with it).

Convert to nested format

The Anthology uses a new nested format. ACLPUB should generate that instead of the old flat format.

add “sig” tag to output

this has been added to the meta file, should be added to the xml so that the anthology can ingest it automatically

generating new ID format in cdrom/ layout

I am confused about ACLPUB is run and wonder if anyone can answer questions here.

The proceedings.tgz files I've received from pubchairs (say for ACL 2019) have a layout like this:

papers/
  proceedings/
    cdrom/
      pdf/
        P19-1001.pdf
        P19-1002.pdf
        ...

i.e., the actual paper IDs. However, I can't see where this is produced. It should be in bin/bib.pl, but that code does not produce the full Anthology ID, but rather something like

papers/
  proceedings/
    cdrom/
      pdf/
        naacl00.pdf
        naacl01.pdf
        naacl02.pdf
        ...

See for example lines 150–152 of bib.pl, where the bib file name is created:

  my $fn_base = sprintf "%0${digits}d", $pn;
  my $fn = "cdrom/bib/$abbrev$fn_base.bib";
  open(FILE,"> $fn") || printf(STDERR "Can't open $fn: $!\n");

Indeed, this is the format I've received recently when people have built this manually. So it seems to be an issue just with START.

Can anyone clarify what START is doing here? How is cdrom/pdf getting populated with the actual Anthology files?

Changes to "meta" file in START

I am hoping we can get START to make some changes to the meta file. These apply to all current conferences (e.g., ACL 2020):

Remove:

  • "Type" (type): no longer used
  • "Bibtex URL" (bib_url): no longer used

Add:

  • "Short book title" (key shortbooktitle), with the example "Proceedings of WMT"
  • "Volume name" (key volume), with the default example "TOBEFILLED: volume number or name within collection"

CC: @rrgerber

Documentation

The documentation here could be improved. Currently it is split between the top-level README and the (outwardly more important) anthologize README.

One thought here is to turn this into a Github pages site, update the top-level index.html to contain consolidated, clearly-delineated instructions, and then point people to acl-org.github.io/ACLPUB when they need to follow instructions.

I will give this some more attention in mid December.

Fix templates in START

Currently, START suggests the name format

Matt Post (Johns Hopkins University)

in the chair lines for ACLPUB. This is not parseable. We should ask them to change the format to

Post, Matt

which we can parse quite nicely.

copy files instead of using symlinks

One of the build scripts in Step 3 here creates symlinks instead of copying PDFs. This results in a common error where a submitter forgets to add the -h flag to tar, and then when unpacking it, I am left with a bunch of unresolved links. Disk space is cheap these days, and PDFs are small; we should just have it copy the files and avoid this situation.

Local copies of template no longer say DO NOT DISTRIBUTE

With the shift to stamping the paper number in START, the top line of the template has been removed. This used to read something like "ACL Submission XYZ. DO NOT DISTRIBUTE."

Now that that's been removed, local copies don't have this warning. I think it's useful to include this, though. If you share your own paper with others, e.g. requesting feedback, you don't have to 'opt-in' to telling them not to distribute further. It's automatically done.

The right place to put it, I think, is in the title block, just below "Anonymous ACL Submission".

U+200E in author name causes crash

U+200E is a left-to-right direction marker and was found in an author's last name, which caused ACLPUB to crash when building the EMNLP 2019 proceedings.

Two Questions About Signing the Transfer Copyright PDF

Hello, I'm submitting a paper to ACL2024 Findings. I'm requested to sign the acl-copyright-transfer-2021.pdf
located at templates/copyright/acl-copyright-transfer-2021.pdf. I have two following questions:

  • Which language should be used for signatures? English or the language the author speaks?

  • Among the three cases of signing (All authors signing, Work for company, One author signing) if I'm an MS. student as well as an intern from a company (part of my authors are from the company), while my affiliation is my college in the paper is my school, which one should I tick on? Work for company or One author signing?

doc/ and docs/

We have two top-level documentation directories. One should be removed (I believe doc/). Filing this to attend to later.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.