perrette / papers Goto Github PK

View Code? Open in Web Editor NEW

142.0 8.0 22.0 13.69 MB

Command-line tool to manage bibliography (pdfs + bibtex)

License: MIT License

Python 100.00%

pdf bibtex doi google-scholar filemanager crossref

papers's Introduction

papers

Command-line tool to manage bibliography (pdfs + bibtex)

Disclaimer: This tool requires further development and testing, and might never be fully production-ready (contributors welcome).
That said, it is becoming useful :)

Motivation

This project is an attempt to create a light-weight, command-line bibliography management tool. Aims:

maintain a PDF library (with appropriate naming)
maintain one or several bibtex-compatible collections, linked to PDFs
enough PDF-parsing capability to fetch metadata from the internet (i.e. crossref or google-scholar)

Dependencies

python 3.8+
poppler-utils (only:pdftotext): convert PDF to text for parsing
bibtexparser : parse bibtex files
crossrefapi : make polite requests to crossref API
scholarly : interface for google scholar
rapidfuzz : calculate score to sort crossref requests
unidecode : replace unicode with ascii equivalent

Install

pip install papers-cli
install third-party dependencies (Ubuntu: sudo apt install poppler-utils)

Note there is another project registered on pypi as papers, hence papers-cli for command-line-interface.

Getting started

This tool's interface is built like git, with main command papers and a range of subcommands.

Extract PDF metadata and add to library

Start with PDF of your choice (modern enough to have a DOI, e.g. anything from the Copernicus publications). For the sake of the example, one of my owns: https://www.earth-syst-dynam.net/4/11/2013/esd-4-11-2013.pdf

extract pdf metadata (doi-based if available, otherwise crossref, or google scholar if so specified)

  $> papers extract esd-4-11-2013.pdf
  @article{Perrette_2013,
  doi = {10.5194/esd-4-11-2013},
  url = {https://doi.org/10.5194%2Fesd-4-11-2013},
  year = 2013,
  month = {jan},
  publisher = {Copernicus {GmbH}},
  volume = {4},
  number = {1},
  pages = {11--29},
  author = {M. Perrette and F. Landerer and R. Riva and K. Frieler and M. Meinshausen},
  title = {A scaling approach to project regional sea level rise and its uncertainties},
  journal = {Earth System Dynamics}
  }

add pdf to papers.bib library, and rename a copy of it in a files directory files.

  $> papers add esd-4-11-2013.pdf --rename --copy --bibtex papers.bib --filesdir files --info
  INFO:papers:found doi:10.5194/esd-4-11-2013
  INFO:papers:new entry: perrette_landerer2013
  INFO:papers:mv /home/perrette/playground/papers/esd-4-11-2013.pdf files/perrette_et_al_2013_a-scaling-approach-to-project-regional-sea-level-rise-and-its-uncertainties.pdf
  INFO:papers:renamed file(s): 1

(the --info argument asks for the above output information to be printed out to the terminal)

That is equivalent to doing:

papers extract esd-4-11-2013.pdf > entry.bib
papers add entry.bib --bibtex papers.bib --attachment esd-4-11-2013.pdf --rename --copy

See Control fields when renaming file for how to specify file naming patterns.

Add library entry from its DOI

If you already know the DOI of a PDF, and don't want to gamble the fulltext search and match, you can indicate it via --doi:

papers add esd-4-11-2013.pdf --doi 10.5194/esd-4-11-2013 --bibtex papers.bib

The add command above also works without any PDF (create a bibtex entry without file attachment).

papers add --doi 10.5194/esd-4-11-2013 --bibtex papers.bib

List entries (and edit etc...)

Pretty listing (-1 or -l for one-liner listing, otherwise plain bibtex):

$> papers list -1
Perrette2013: A scaling approach to project regional sea level rise and it... (doi:10.5194/esd-4-11-2013, file:1)

Search with any number of keywords:

$> papers list perrette scaling approach sea level -1
... (short list)
$> papers list perrette scaling approach sea level --any -1
... (long list)
$> papers list --key perrette2013 --author perrette --year 2013 --title scaling approach sea level -1
... (precise list)

Add tags to view papers by topic:

$> papers list perrette2013 --add-tag sea-level projections -1
...
$> papers list --tag sea-level projections -1
Perrette2013: A scaling approach to project regional sea level rise and it... (doi:10.5194/esd-4-11-2013, file:1, sea-level | projections )

papers list is a powerful command, inspired from unix's find and grep.

It lets you search in your bibtex in a typical manner (including a number of special flags such as --duplicates, --review-required, --broken-file...), then output the result in a number of formats (one-liner, raw bibtex, keys-only, selected fields) or let you perform actions on it (currently --edit, --delete, --add-tag, --fetch). For instance, it is possible to manually merge the duplicates with:

$> papers list --duplicates --edit

Control fields when renaming file

    $> papers add --rename --info --name-template "{AuthorX}{year}-{Title}" --name-title-sep '' --name-author-sep '' esd-4-11-2013
    INFO:papers:found doi:10.5194/esd-4-11-2013
    INFO:papers:new entry: perrette2013scaling
    INFO:papers:create directory: files/2013
    INFO:papers:mv /home/perrette/playground/papers/esd-4-11-2013.pdf files/PerretteEtAl2013-AScalingApproachToProjectRegionalSeaLevelRiseAndItsUncertainties.pdf
    INFO:papers:renamed file(s): 1

where '--name-template' is a python template (will be formated via .format() method) with valid fields being any field available in the bibtex. Fields not in the bibtex will remain untouched.

To rename esd-4-11-2013.pdf as perrette_2013.pdf, the template should be --name-template {author}_{year} --name-author-num 1 If that happens to be the entry ID, ID also works.

To rename esd-4-11-2013.pdf as 2013/Perrette2013-AScalingApproachToProjectRegionalSeaLevelRiseAndItsUncertainties.pdf, name-template should be --name-template {year}/{Author}{year}-{Title} --name-title-sep '' (note the case).

Entries are case-sensitive, and a few more fields are added, so that:

'author' generates 'perrette'
'Author' generates 'Perrette'
'AUTHOR' generates 'PERRETTE'
'authorX' generates 'perrette', 'perrette_and_landerer' or 'perrette_et_al' dependening on the number of authors
'AuthorX' same as authorX but capitalized

The modifiers are:

--name-title-sep : separator for title words
--name-title-length : max title length
--name-title-word-size : min size to be considered a word
--name-title-word-num : max number of title words

and similarly:

--name-author-sep : separator for authors
--name-author-num : number of authors to (not relevant for {authorX})

The same template and modifiers system applies to the bibtex key generation by replacing the prefix --name- with --key-, e.g. --key-template

In the common case where the bibtex (--bibtex), files directory (--filesdir), and name and key formats (e.g. --name-template) do not change, it is convenient to (install)[#install-make-bibtex-and-files-directory-persistent] papers.

install: make bibtex and files directory persistent

$> papers install --bibtex papers.bib --filesdir files
papers configuration
* configuration file: /home/perrette/.config/papersconfig.json
* cache directory:    /home/perrette/.cache/papers
* absolute paths:     True
* files directory:    files (1 files, 5.8 MB)
* bibtex:            papers.bib (1 entries)

The configuration file is global (unless --local is specified), so from now on, any papers command will know about these settings: no need to specify bibtex file or files directory.

Type papers status -v to check your configuration.

You also notice a cache directory. All internet requests such as crossref requests are saved in the cache directory. This happens regardless of whether papers is installed or not.

local install

Sometimes it is desirable to have separate configurations. In that case a local install is the way to go:

$> papers install --local
Bibtex file name [default to existing: papers.bib] [Enter/Yes/No]:
Files folder [default to new: papers] [Enter/Yes/No]: pdfs
papers configuration
* configuration file: .papers/config.json
* cache directory:    /home/perrette/.cache/papers
* absolute paths:     True
* git-tracked:        False
* files directory:    pdfs (90 files, 337.4 MB)
* bibtex:             papers.bib (82 entries)

Creates a local configuration file in a hidden .papers folder. By default, it expects existing or creates new papers.bib bibliography and papers files folder in the local directory, though papers will ask first unless explicitly provided. Note that every call from a subfolder will also detect that configuration file (it has priority over global install).

By default, the local install is meant to be portable with bibtex and files, so the file paths are encoded relatively to the bibtex file. If instead absolute paths make more sense (example use case: local bibtex file but central PDF folder), then simply specify --absolute-paths options:

`papers install --local --absolute-paths --filesdir /path/to/central/pdfs`

uninstall

Getting confused with papers config files scattered in subfolders ? Check the config with

papers status -v

and remove the configuration file by hand (rm ...). Or use papers uninstall command:

papers uninstall

You may repeat papers status -v and cleaning until a satisfying state is reached, or remove all config files recursively up to (and including) global install:

papers uninstall --recursive

Relative versus Absolute path

By default, the file paths in the bibtex are stored as absolute paths (starting with /), except for local installs. It is possible to change this behaviour explicitly during install or in a case by case basis with --relative-paths or --absolute-paths options. With or without install.

Move library to a new location

Moving a library can be tricky. Simple cases are:

files are stored in a central repository, and the bibtex contains absolute paths. Then moving the bibtex by hand is fine.
files are stored alongside the bibtex, and the bibtex contains relative paths. Just move around the folder containing bibtex and files In any other cases, you risk breaking the file links.

Papers tries to be as little opinionated as possible about how files are organized, and so it relies on your own judgement and use case. When loading a bibtex, it always interprete relative file links as being relative to the bibtex file. When saving a bibtex, it will save file links accordingly to the default setting path (usually absolute, unless local install or unless you specify otherwise).

In any case, the following set of commands will always work provided the initial file links are valid (optional parameters in brackets):

touch new.bib
papers add /path/to/old.bib --bib new.bib [ --rename ] [ --relative-paths ] [ --filesdir newfilesdir ]
rm -f /path/to/old.bib

check

It's easy to end up with duplicates in your bibtex. After adding PDFs, or every once in a while, do:

papers check --duplicates

filecheck

Check for broken links, rename files etc. Example:

papers filecheck --rename

The command can be used to move around the file directory:

papers filecheck --rename --filesdir newfilesdir

That command is also convenient to check on what's actually tracked and what is not. Example workflow

papers filecheck --rename --filesdir tmp
# check what's left over in your initial files directory, e.g.
# papers extract files/leftover1.pdf
# papers add files/leftover1.pdf
# ...
papers filecheck --rename --filesdir files

Setup git-tracked library (optional)

Install comes with the option to git-track any change to the bibtex file (`--git`) options.

$> papers install --bibtex papers.bib --filesdir files --git  [ --git-lfs ]

From now on, every change to the library will result in an automatic git commit. And papers git ... command will work just as git ... executed from the bibtex directory. E.g. papers git add origin *REMOTE URL*; papers git lfs track files; papers git add files; papers git push

If --git-lfs is passed, the files will be backed-up along with the bibtex. Under the hood, bibtex and files (if applicable) are copied (hard-linked) to a back-up directory. Details are described in issue 51.

In local installs, backup occurs in .papers/. In global installs, defaults to ~/.local/.share/papers. Type papers status -v to find out.

For local install that are already git-tracked, the feature remains useful as it is the basis for papers undo and papers redo. You might want to add .papers to your .gitignore to avoid messing up with your larger project.

This feature is experimental.

undo / redo

Did a papers add and are unhappy with the result?

papers undo

will revert to the previous version. If repeated, it will jump back and forth between latest and before-latest. Unless papers is installed with --git option, in which case papers undo and papers redo will have essentially infinite memory (doing undos and making a new commit risk loosing history, unless you keep track of the commit).

Consult inline help for more detailed documentation!

Current features

parse PDF to extract DOI
fetch bibtex entry from DOI (using crossref API)
fetch bibtex entry by fulltext search (using crossref API or google scholar)
create and maintain bibtex file
add entry as PDF (papers add ...)
add entry as bibtex (papers add ...)
scan directory for PDFs (papers add ...)
rename PDFs according to bibtex key and year (papers filecheck --rename [--copy])
some support for attachment
merging (papers check --duplicates ...)
fix entries (papers check --format-name --encoding unicode --fix-doi --fix-key ...)
configuration file with default bibtex and files directory (papers install --bibtex BIB --filesdir DIR ...)
integration with git
undo/redo command (papers undo / redo)
display / search / list entries : format as bibtex or key or whatever (papers list ... [--key-only, -l])
list + edit or remove entry by key or else (papers list ... [--edit, --delete])
fix broken PDF links (papers filecheck ...):
- remove duplicate file names (always) or file copies (--hash-check)
- remove missing link (--delete-missing)
- fix files name after a Mendeley export (--fix-mendeley):
  - leading '/' missing is added
  - latex characters, e.g. {\_} or {\'{e}} replaced with unicode

Tests

Test coverage is improving (now 80%)

Currently covers:

papers extract (test on a handful of PDFs)
- parse pdf DOI
- fetch bibtex on crossref based on DOI
- fetch bibtex on crossref based fulltext search
- fetch bibtex on google-scholar based fulltext search
papers add
- add entry and manage conflict
- add pdf file, bibtex, directory
- add one pdf file with attachment (beta, API will change)
- conflict resolution
papers install
internals:
- duplicate test with levels EXACT, GOOD, FAIR (the default), PARTIAL
papers list
papers undo / redo (partial)
papers filecheck --rename (superficial)
papers check --duplicate (fix DOI etc.) (superficial)

Why not JabRef, Zotero or Mendeley (or...) ?

JabRef (2.10) is nice, light-weight, but is not so good at managing PDFs.
Zotero (5.0) features excellent PDF import capability, but it needs to be manually one by one and is a little slow. Not very flexible.
Mendeley (1.17) is perfect at automatically extracting metadata from downloaded PDF and managing your PDF library, but it is not open source, and many issues remain (own experience, Ubuntu 14.04, Desktop Version 1.17):
- very unstable
- PDF automatic naming is too verbose, and sometimes the behaviour is unexpected (some PDFs remain in on obscure Downloaded folder, instead of in the main collection)
- somewhat heavy (it offers functions of online syncing, etc)
- poor seach capability (related to the point above)

Above-mentioned issues will with no doubt be improved in future releases, but they are a starting point for this project. Anyway, a command-line tool is per se a good idea for faster development, as noted here, but so far I could only find zotero clients for their online API (like pyzotero or zotero-cli). Please contact me if you know another interesting project.

papers's People

Contributors

Stargazers

Watchers

papers's Issues

FILE field

Currently the assumed format for the file field is FILE:TYPE[;FILE:TYPE]... where TYPE is always pdf (so far) and FILE indicate the full file path.

This is understood by JabRef (at least for one file), but according to this discussion, each individual file should be instead BASENAME:FILE:TYPE or :FILE:TYPE or even :FILE:

Since BASENAME is redundant (it can be obtained from FILE), I kind of like :FILE:TYPE.
In any case, this should be accounted for when parsing the file field of a bibtex entry.

Respect crossref's etiquette

Respect crossref's etiquette:

cache request results
Currently results are cached using a local .crossref-bibtex.json file. This should be saved in a centralized configuration file (see related Planned Features in readme).
Specify a User-Agent header that properly identifies your script or tool and that provides a means of contacting you vai email using "mailto:". For example: GroovyBib/1.1 (https://example.org/GroovyBib/; mailto:[email protected]) BasedOnFunkyLib/1.4.

attribute loads is disappearing in module bibtexparser

Hi.

I tend to like living on git HEAD. I therefore installed bibtexparser from source, i.e. version 2.0.0b3. Launching the papers cli then yields:

AttributeError: module 'bibtexparser' has no attribute 'loads'

Just warning you that it will come to bite you when you upgrade bibtexparser to the planned version.

Automatically installing dependencies

pip install git+https://github.com/perrette/papers.git#egg=papers

doesn't automatically get the dependencies. Looks like setup.py is using distutils.core, which I don't know very well. Perhaps there's a way to do this in distutils as well, but in setuptools, setuptools.setup(install_requires=) can automatically get dependencies when installing.

Docs here:
https://packaging.python.org/guides/distributing-packages-using-setuptools/#install-requires

Controlling key and filename fields

Hey! Thanks for sharing this very nice software!

I'm trying to generate entries with key format like perrette2013scaling instead of Perrette_2013 and to rename the file as Perrette2013_AScalingApproachToProjectRegionalSeaLevelRiseAndItsUncertainties.pdf. Is such controlling implemented yet? If not, I could play with this and try to implement --key-fields and --filename-fields arguments to some subcommands, where the default would be --key-fields author,_,year :) Would you accept pull requests?

Cheers,
Malfatti

Interactive papers add ?

Hi.

I would like to know if it could be possible (or if it is kind of in the roadmap -- very well possibly could not be) to make papers addrun interactively and provide a mechanism to populate papers.bib interactively.

I'd like, for instance, to cite Théorie de l'addition des variables aléatoires, Paul Lévy, 1937, and it would be sensible, I guess, to enter that entry manually.

If it could be done manually, interactively, via the papers cli, that would be great. I do not see command line options for that, however.

Backup, Sync and git tracking

Originally, git tracking feature was added in order to add safety to handling a global papers install.
Implementation details are now jeopardized with local install. Local installs are often git-tracked themselves, and nested git repos does not play good. Worse, papers git install might trigger commits to a directory where it is not expected to (fortunately it is off by default, so it still requires explicit user action to be enabled). In the original implementation, the git directory could also be separate from the bibtex file. If that was the case, the bibtex would be copied to the git directory upon saving, and a commit would be done. That works, but using git commands to revert or reset to a previous commit would then only affect the git repo, and not the original bibtex, making the overall behavior unintuitive. Clearly, some overhaul is needed.

While it is not entirely clear to me yet how that feature should evolve. The basic idea of using git to safeguard the bibtex, and undo unwanted changes, is still relevant IMO. Here a few options:

use git as an internal tool in papers, without explicitly asking about it. papers undo (and a new command papers redo) could be used to navigate git history. The git repo would be saved in a central papers dir, using different branches to handle different bibtex locations (using a slug of the full bibtex path as branch name, for instance). That could work even without a proper installation. Maybe. Issue: bibtex rename would break the flow by creating a new branch. We could live with that.
propose hooks upon bibtex save. Here a whole workflow could be fine-tuned by users. Could be used as internal to implement higher-level feature.
add options to track files, sync with a remote server etc.

For now I'll just leave that issue open to collect ideas. Current simplistic implementation works OK.

install dir explicit path?

Might it make sense to change the install dir from ~/.local/config/ to ~/.local/config/papers, with the git keeping and all that? I also note that I'm not sure how well this plays with Windows, but can't test since I don't have a windows dev env up.

Other APIs

The https://github.com/ArcasProject/Arcas project by @Nikoleta-v3 has a couple of APIs (and ArcasProject/Arcas#4 a bunch more) that might potentially be useful for searching and matching ...

"list index out of range" issue stops multi-file adds

Absolutely loving this project. I have several folders of many pdfs I'm trying to add to papers, using

papers add -rc *.pdf

papers is hitting an error, with output

Traceback (most recent call last):
File "/usr/local/bin/papers", line 5, in
papers.bib.main()
File "/usr/local/lib/python3.4/dist-packages/papers/bib.py", line 1343, in main
check_install() and addcmd(o)
File "/usr/local/lib/python3.4/dist-packages/papers/bib.py", line 969, in addcmd
**kw)
File "/usr/local/lib/python3.4/dist-packages/papers/bib.py", line 390, in add_pdf
entry = bib.entries[0]
IndexError: list index out of range

And when it does this, none of the references have been added to the .bib file.

This happens if I include --ignore-errors as an argument, or don't use the -rc options. I've used setup to set up a config, and entering papers by hand works, as has several folders of just a few papers.

Suggestions?

Thanks!

Tags?

Is it possible to attach tags to each paper and use them to make searches?

FileNotFoundError: [WinError 2] The system cannot find the file specified

When I call python scripts\papers extract esd-4-11-2013.pdf in the anaconda prompt, also as an administrator, I get this error:

(PaperDownload) (base) C:\phd_scripts\paper-download\papers-master>python scripts\papers extract esd-4-11-2013.pdf
Traceback (most recent call last):
  File "scripts\papers", line 5, in <module>
    papers.bib.main()
  File "C:\ProgramData\Anaconda3\lib\site-packages\papers\bib.py", line 1359, in main
    extractcmd(o)
  File "C:\ProgramData\Anaconda3\lib\site-packages\papers\bib.py", line 1284, in extractcmd
    print(extract_pdf_metadata(o.pdf, search_doi=not o.fulltext, search_fulltext=True, scholar=o.scholar, minwords=o.word_count, max_query_words=o.word_count))
  File "C:\ProgramData\Anaconda3\lib\site-packages\papers\extract.py", line 201, in extract_pdf_metadata
    txt = pdfhead(pdf, maxpages, minwords, image=image)
  File "C:\ProgramData\Anaconda3\lib\site-packages\papers\extract.py", line 138, in pdfhead
    txt += readpdf(pdf, first=i, last=i)
  File "C:\ProgramData\Anaconda3\lib\site-packages\papers\extract.py", line 37, in readpdf
    sp.check_call(cmd)
  File "C:\ProgramData\Anaconda3\lib\subprocess.py", line 342, in check_call
    retcode = call(*popenargs, **kwargs)
  File "C:\ProgramData\Anaconda3\lib\subprocess.py", line 323, in call
    with Popen(*popenargs, **kwargs) as p:
  File "C:\ProgramData\Anaconda3\lib\subprocess.py", line 775, in __init__
    restore_signals, start_new_session)
  File "C:\ProgramData\Anaconda3\lib\subprocess.py", line 1178, in _execute_child
    startupinfo)
FileNotFoundError: [WinError 2] The system cannot find the file specified

I'm on Windows 10, 64bit

Is it any opportunity to get new full path of moved&renamed pdf file.

I use zathura for pdf viewing and added some bindings for using papers inside zathura. When I add some paper it change the location and I should manually re-open it. Is it any way to get new location after adding papers for automation it re-opening?

P.S. my bind is:
map <C-b> feedkeys ":exec papers add -r $FILE<Return>"

IndexError: string index out of range (when reproducing your example)

Ubuntu 18.04, Python 2.7 (although also tried with 3.5), poppler and pip-dependencies installed.

A regular papers install, no problems
papers install --bibtex papers.bib --filesdir files --git --gitdir ./
Download your papers and try to extract from it, get error message
papers add --rename --copy --bibtex papers.bib --filesdir files esd-4-11-2013.pdf --info

NFO:papers:bibtex: papers.bib
INFO:papers:filesdir: files
Traceback (most recent call last):
File "/usr/local/bin/papers", line 5, in
papers.bib.main()
File "/usr/local/lib/python2.7/dist-packages/papers/bib.py", line 1343, in main
check_install() and addcmd(o)
File "/usr/local/lib/python2.7/dist-packages/papers/bib.py", line 942, in addcmd
my = Biblio.load(o.bibtex, o.filesdir)
File "/usr/local/lib/python2.7/dist-packages/papers/bib.py", line 258, in load
return cls(bibtexparser.loads(bibtexs), filesdir)
File "/usr/local/lib/python2.7/dist-packages/papers/encoding.py", line 12, in
bibtexparser.loads = lambda s: _bloads(s.decode('utf-8') if type(s) is str else s)
File "/usr/local/lib/python2.7/dist-packages/bibtexparser/init.py", line 48, in loads
return parser.parse(bibtex_str)
File "/usr/local/lib/python2.7/dist-packages/bibtexparser/bparser.py", line 145, in parse
bibtex_file_obj = self._bibtex_file_obj(bibtex_str)
File "/usr/local/lib/python2.7/dist-packages/bibtexparser/bparser.py", line 213, in _bibtex_file_obj
if bibtex_str[0] == byte:
IndexError: string index out of range

Any clues? And thanks for what looks like such a great way to manage a bibliography!

Crossref bibtex encoding is unreliable

See issue CrossRef/rest-api-doc#581

Tests fail when ``.papers`` does not exist.

This is kinda nit-picky and it's not even clear if it's worth a "fix", but when the .papers dir does not exist, a test fails:

__________________________________________ TestAdd2.test_add ___________________________________________
[gw2] linux -- Python 3.11.3 /home/boyan/boyanshouse/miniconda3/envs/python311/bin/python

self = <test_papers.TestAdd2 testMethod=test_add>

    def test_add(self):
        self.assertTrue(os.path.exists(self.mybib))
        print("bibtex", self.mybib, 'exists?', os.path.exists(self.mybib))
>       sp.check_call('papers add --bibtex {} {}'.format(
            self.mybib, self.pdf), shell=True)

tests/test_papers.py:177: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

popenargs = ('papers add --bibtex /tmp/papers.biba8iolx_d /home/boyan/boyanshouse/Vazhno/Work/papers/tests/downloadedpapers/esd-4-11-2013.pdf',)
kwargs = {'shell': True}, retcode = 1
cmd = 'papers add --bibtex /tmp/papers.biba8iolx_d /home/boyan/boyanshouse/Vazhno/Work/papers/tests/downloadedpapers/esd-4-11-2013.pdf'

    def check_call(*popenargs, **kwargs):
        """Run command with arguments.  Wait for command to complete.  If
        the exit code was zero then return, otherwise raise
        CalledProcessError.  The CalledProcessError object will have the
        return code in the returncode attribute.
    
        The arguments are the same as for the call function.  Example:
    
        check_call(["ls", "-l"])
        """
        retcode = call(*popenargs, **kwargs)
        if retcode:
            cmd = kwargs.get("args")
            if cmd is None:
                cmd = popenargs[0]
>           raise CalledProcessError(retcode, cmd)
E           subprocess.CalledProcessError: Command 'papers add --bibtex /tmp/papers.biba8iolx_d /home/boyan/boyanshouse/Vazhno/Work/papers/tests/downloadedpapers/esd-4-11-2013.pdf' returned non-zero exit status 1.

../../../miniconda3/envs/python311/lib/python3.11/subprocess.py:413: CalledProcessError
----------------------------------------- Captured stdout call ------------------------------------------
bibtex /tmp/papers.biba8iolx_d exists? True
----------------------------------------- Captured stderr call ------------------------------------------
Traceback (most recent call last):
  File "/home/boyan/boyanshouse/miniconda3/envs/python311/bin/papers", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/boyan/boyanshouse/Vazhno/Work/papers/papers/bib.py", line 1577, in main
    check_install() and addcmd(o)
    ^^^^^^^^^^^^^^^
  File "/home/boyan/boyanshouse/Vazhno/Work/papers/papers/bib.py", line 1568, in check_install
    logger.info('filesdir: '+config.filesdir)
                ~~~~~~~~~~~~^~~~~~~~~~~~~~~~
TypeError: can only concatenate str (not "NoneType") to str

Test coverage issues

Test coverage is pretty poor, as I was reminded during the last few days.

Originally posted by @perrette in #42 (comment)

Have a more local --local install

Right now the papers install --local writes .papersconfig locally, but keep the same global defaults for filesdir and bibtex
While this may be OK and can be overwritten manually, the use case for --local is to have parallel, independent, medium-size projects. To me, this is one of the main uses of papers.

It would be good to adjust the defaults so that:

papers install --local defaults to local bibliography and filesdir
default to an existing bibliography if any is found

To make it more useful, it might also be good to have a recursive, upward search like git

tests fail on master?

(python311) → master Work/papers pytest -vv .                                                                               18:37:55
======================================================== test session starts ========================================================
platform linux -- Python 3.11.2, pytest-7.3.1, pluggy-1.0.0 -- /home/boyan/boyanshouse/miniconda3/envs/python311/bin/python3.11
cachedir: .pytest_cache
rootdir: /home/boyan/boyanshouse/Vazhno/Work/papers
plugins: anyio-3.6.2
collected 95 items                                                                                                                  

tests/test_papers.py::TestBibtexFileEntry::test_format_file PASSED                                                            [  1%]
tests/test_papers.py::TestBibtexFileEntry::test_format_files PASSED                                                           [  2%]
tests/test_papers.py::TestBibtexFileEntry::test_parse_file PASSED                                                             [  3%]
tests/test_papers.py::TestBibtexFileEntry::test_parse_files PASSED                                                            [  4%]
tests/test_papers.py::TestSimple::test_doi PASSED                                                                             [  5%]
tests/test_papers.py::TestSimple::test_fetch PASSED                                                                           [  6%]
tests/test_papers.py::TestSimple::test_fetch_scholar PASSED                                                                   [  7%]
tests/test_papers.py::TestInstall::test_local_install PASSED                                                                  [  8%]
tests/test_papers.py::TestAdd::test_add PASSED                                                                                [  9%]
tests/test_papers.py::TestAdd::test_add_rename PASSED                                                                         [ 10%]
tests/test_papers.py::TestAdd::test_add_rename_copy PASSED                                                                    [ 11%]
tests/test_papers.py::TestAdd::test_fails_without_install PASSED                                                              [ 12%]
tests/test_papers.py::TestAdd2::test_add PASSED                                                                               [ 13%]
tests/test_papers.py::TestAdd2::test_add_attachment PASSED                                                                    [ 14%]
tests/test_papers.py::TestAdd2::test_add_rename PASSED                                                                        [ 15%]
tests/test_papers.py::TestAdd2::test_add_rename_copy PASSED                                                                   [ 16%]
tests/test_papers.py::TestAdd2::test_fails_without_install PASSED                                                             [ 17%]
tests/test_papers.py::TestAddBib::test_addbib PASSED                                                                          [ 18%]
tests/test_papers.py::TestAddDir::test_adddir_pdf FAILED                                                                      [ 20%]
tests/test_papers.py::TestAddDir::test_adddir_pdf_cmd FAILED                                                                  [ 21%]
tests/test_papers.py::TestDuplicates::test_anotherkey PASSED                                                                  [ 22%]
tests/test_papers.py::TestDuplicates::test_conflictauthor PASSED                                                              [ 23%]
tests/test_papers.py::TestDuplicates::test_conflictdoi PASSED                                                                 [ 24%]
tests/test_papers.py::TestDuplicates::test_conflictyear PASSED                                                                [ 25%]
tests/test_papers.py::TestDuplicates::test_exactsame PASSED                                                                   [ 26%]
tests/test_papers.py::TestDuplicates::test_missingdoi PASSED                                                                  [ 27%]
tests/test_papers.py::TestDuplicates::test_missingfield PASSED                                                                [ 28%]
tests/test_papers.py::TestDuplicates::test_missingtitauthor PASSED                                                            [ 29%]
tests/test_papers.py::TestDuplicatesExact::test_anotherkey PASSED                                                             [ 30%]
tests/test_papers.py::TestDuplicatesExact::test_conflictauthor PASSED                                                         [ 31%]
tests/test_papers.py::TestDuplicatesExact::test_conflictdoi PASSED                                                            [ 32%]
tests/test_papers.py::TestDuplicatesExact::test_conflictyear PASSED                                                           [ 33%]
tests/test_papers.py::TestDuplicatesExact::test_exactsame PASSED                                                              [ 34%]
tests/test_papers.py::TestDuplicatesExact::test_missingdoi PASSED                                                             [ 35%]
tests/test_papers.py::TestDuplicatesExact::test_missingfield PASSED                                                           [ 36%]
tests/test_papers.py::TestDuplicatesExact::test_missingtitauthor PASSED                                                       [ 37%]
tests/test_papers.py::TestDuplicatesGood::test_anotherkey PASSED                                                              [ 38%]
tests/test_papers.py::TestDuplicatesGood::test_conflictauthor PASSED                                                          [ 40%]
tests/test_papers.py::TestDuplicatesGood::test_conflictdoi PASSED                                                             [ 41%]
tests/test_papers.py::TestDuplicatesGood::test_conflictyear PASSED                                                            [ 42%]
tests/test_papers.py::TestDuplicatesGood::test_exactsame PASSED                                                               [ 43%]
tests/test_papers.py::TestDuplicatesGood::test_missingdoi PASSED                                                              [ 44%]
tests/test_papers.py::TestDuplicatesGood::test_missingfield PASSED                                                            [ 45%]
tests/test_papers.py::TestDuplicatesGood::test_missingtitauthor PASSED                                                        [ 46%]
tests/test_papers.py::TestDuplicatesPartial::test_anotherkey PASSED                                                           [ 47%]
tests/test_papers.py::TestDuplicatesPartial::test_conflictauthor PASSED                                                       [ 48%]
tests/test_papers.py::TestDuplicatesPartial::test_conflictdoi PASSED                                                          [ 49%]
tests/test_papers.py::TestDuplicatesPartial::test_conflictyear PASSED                                                         [ 50%]
tests/test_papers.py::TestDuplicatesPartial::test_exactsame PASSED                                                            [ 51%]
tests/test_papers.py::TestDuplicatesPartial::test_missingdoi PASSED                                                           [ 52%]
tests/test_papers.py::TestDuplicatesPartial::test_missingfield PASSED                                                         [ 53%]
tests/test_papers.py::TestDuplicatesPartial::test_missingtitauthor PASSED                                                     [ 54%]
tests/test_papers.py::TestDuplicatesAdd::test_anotherkey SKIPPED (skip cause does not make sense with add)                    [ 55%]
tests/test_papers.py::TestDuplicatesAdd::test_conflictauthor PASSED                                                           [ 56%]
tests/test_papers.py::TestDuplicatesAdd::test_conflictdoi PASSED                                                              [ 57%]
tests/test_papers.py::TestDuplicatesAdd::test_conflictyear PASSED                                                             [ 58%]
tests/test_papers.py::TestDuplicatesAdd::test_exactsame SKIPPED (skip cause does not make sense with add)                     [ 60%]
tests/test_papers.py::TestDuplicatesAdd::test_missingdoi PASSED                                                               [ 61%]
tests/test_papers.py::TestDuplicatesAdd::test_missingfield PASSED                                                             [ 62%]
tests/test_papers.py::TestDuplicatesAdd::test_missingtitauthor PASSED                                                         [ 63%]
tests/test_papers.py::TestAddResolveDuplicate::test_append PASSED                                                             [ 64%]
tests/test_papers.py::TestAddResolveDuplicate::test_conflict_updated_from_original PASSED                                     [ 65%]
tests/test_papers.py::TestAddResolveDuplicate::test_conflict_updated_from_original_but_originalkey PASSED                     [ 66%]
tests/test_papers.py::TestAddResolveDuplicate::test_original_updated_from_conflict PASSED                                     [ 67%]
tests/test_papers.py::TestAddResolveDuplicate::test_overwrite PASSED                                                          [ 68%]
tests/test_papers.py::TestAddResolveDuplicate::test_raises PASSED                                                             [ 69%]
tests/test_papers.py::TestAddResolveDuplicate::test_skip PASSED                                                               [ 70%]
tests/test_papers.py::TestAddResolveDuplicateCommand::test_append PASSED                                                      [ 71%]
tests/test_papers.py::TestAddResolveDuplicateCommand::test_conflict_updated_from_original PASSED                              [ 72%]
tests/test_papers.py::TestAddResolveDuplicateCommand::test_conflict_updated_from_original_but_originalkey PASSED              [ 73%]
tests/test_papers.py::TestAddResolveDuplicateCommand::test_original_updated_from_conflict PASSED                              [ 74%]
tests/test_papers.py::TestAddResolveDuplicateCommand::test_overwrite PASSED                                                   [ 75%]
tests/test_papers.py::TestAddResolveDuplicateCommand::test_raises PASSED                                                      [ 76%]
tests/test_papers.py::TestAddResolveDuplicateCommand::test_skip PASSED                                                        [ 77%]
tests/test_papers.py::TestCheckResolveDuplicate::test_merge PASSED                                                            [ 78%]
tests/test_papers.py::TestCheckResolveDuplicate::test_not_a_duplicate PASSED                                                  [ 80%]
tests/test_papers.py::TestCheckResolveDuplicate::test_pick_conflict_1 PASSED                                                  [ 81%]
tests/test_papers.py::TestCheckResolveDuplicate::test_pick_reference_2 PASSED                                                 [ 82%]
tests/test_papers.py::TestCheckResolveDuplicate::test_raises PASSED                                                           [ 83%]
tests/test_papers.py::TestCheckResolveDuplicate::test_skip_check PASSED                                                       [ 84%]
tests/test_papers.py::TestAddConflict::test_add_conflict_key_check_raises PASSED                                              [ 85%]
tests/test_papers.py::TestAddConflict::test_add_conflict_key_nocheck_raises PASSED                                            [ 86%]
tests/test_papers.py::TestAddConflict::test_add_conflict_key_update PASSED                                                    [ 87%]
tests/test_papers.py::TestAddConflict::test_add_miss_doi_merge PASSED                                                         [ 88%]
tests/test_papers.py::TestAddConflict::test_add_miss_field_fails PASSED                                                       [ 89%]
tests/test_papers.py::TestAddConflict::test_add_miss_merge PASSED                                                             [ 90%]
tests/test_papers.py::TestAddConflict::test_add_miss_titauthor_merge PASSED                                                   [ 91%]
tests/test_papers.py::TestAddConflict::test_add_same PASSED                                                                   [ 92%]
tests/test_papers.py::TestAddConflict::test_add_same_but_file PASSED                                                          [ 93%]
tests/test_papers.py::TestAddConflict::test_add_same_but_key_fails PASSED                                                     [ 94%]
tests/test_papers.py::TestAddConflict::test_add_same_but_key_interactive PASSED                                               [ 95%]
tests/test_papers.py::TestAddConflict::test_add_same_but_key_update PASSED                                                    [ 96%]
tests/test_papers.py::TestAddConflict::test_add_same_doi_fails PASSED                                                         [ 97%]
tests/test_papers.py::TestAddConflict::test_add_same_doi_unchecked PASSED                                                     [ 98%]
tests/test_papers.py::TestAddConflict::test_add_same_doi_update_key PASSED                                                    [100%]

============================================================= FAILURES ==============================================================
____________________________________________________ TestAddDir.test_adddir_pdf _____________________________________________________

self = <test_papers.TestAddDir testMethod=test_adddir_pdf>

    def setUp(self):
        self.pdf1, self.doi, self.key1, self.newkey1, self.year, self.bibtex1, self.file_rename1 = prepare_paper()
        self.pdf2, self.si, self.doi, self.key2, self.newkey2, self.year, self.bibtex2, self.file_rename2 = prepare_paper2()
        self.somedir = tempfile.mktemp(prefix='papers.somedir')
        self.subdir = os.path.join(self.somedir, 'subdir')
        os.makedirs(self.somedir)
        os.makedirs(self.subdir)
        shutil.copy(self.pdf1, self.somedir)
        shutil.copy(self.pdf2, self.subdir)
        self.mybib = tempfile.mktemp(prefix='papers.bib')
>       sp.check_call('papers install --local --no-prompt --bibtex {}'.format(self.mybib), shell=True)

tests/test_papers.py:302: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

popenargs = ('papers install --local --no-prompt --bibtex /tmp/papers.bib8eszhc3e',), kwargs = {'shell': True}, retcode = 2
cmd = 'papers install --local --no-prompt --bibtex /tmp/papers.bib8eszhc3e'

    def check_call(*popenargs, **kwargs):
        """Run command with arguments.  Wait for command to complete.  If
        the exit code was zero then return, otherwise raise
        CalledProcessError.  The CalledProcessError object will have the
        return code in the returncode attribute.
    
        The arguments are the same as for the call function.  Example:
    
        check_call(["ls", "-l"])
        """
        retcode = call(*popenargs, **kwargs)
        if retcode:
            cmd = kwargs.get("args")
            if cmd is None:
                cmd = popenargs[0]
>           raise CalledProcessError(retcode, cmd)
E           subprocess.CalledProcessError: Command 'papers install --local --no-prompt --bibtex /tmp/papers.bib8eszhc3e' returned non-zero exit status 2.

../../../miniconda3/envs/python311/lib/python3.11/subprocess.py:413: CalledProcessError
------------------------------------------------------- Captured stderr call --------------------------------------------------------
usage: papers [-h]
              {status,install,add,check,filecheck,list,doi,fetch,extract,undo,git}
              ...
papers: error: unrecognized arguments: --no-prompt
__________________________________________________ TestAddDir.test_adddir_pdf_cmd ___________________________________________________

self = <test_papers.TestAddDir testMethod=test_adddir_pdf_cmd>

    def setUp(self):
        self.pdf1, self.doi, self.key1, self.newkey1, self.year, self.bibtex1, self.file_rename1 = prepare_paper()
        self.pdf2, self.si, self.doi, self.key2, self.newkey2, self.year, self.bibtex2, self.file_rename2 = prepare_paper2()
        self.somedir = tempfile.mktemp(prefix='papers.somedir')
        self.subdir = os.path.join(self.somedir, 'subdir')
        os.makedirs(self.somedir)
        os.makedirs(self.subdir)
        shutil.copy(self.pdf1, self.somedir)
        shutil.copy(self.pdf2, self.subdir)
        self.mybib = tempfile.mktemp(prefix='papers.bib')
>       sp.check_call('papers install --local --no-prompt --bibtex {}'.format(self.mybib), shell=True)

tests/test_papers.py:302: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

popenargs = ('papers install --local --no-prompt --bibtex /tmp/papers.bib_o_zoa39',), kwargs = {'shell': True}, retcode = 2
cmd = 'papers install --local --no-prompt --bibtex /tmp/papers.bib_o_zoa39'

    def check_call(*popenargs, **kwargs):
        """Run command with arguments.  Wait for command to complete.  If
        the exit code was zero then return, otherwise raise
        CalledProcessError.  The CalledProcessError object will have the
        return code in the returncode attribute.
    
        The arguments are the same as for the call function.  Example:
    
        check_call(["ls", "-l"])
        """
        retcode = call(*popenargs, **kwargs)
        if retcode:
            cmd = kwargs.get("args")
            if cmd is None:
                cmd = popenargs[0]
>           raise CalledProcessError(retcode, cmd)
E           subprocess.CalledProcessError: Command 'papers install --local --no-prompt --bibtex /tmp/papers.bib_o_zoa39' returned non-zero exit status 2.

../../../miniconda3/envs/python311/lib/python3.11/subprocess.py:413: CalledProcessError
------------------------------------------------------- Captured stderr call --------------------------------------------------------
usage: papers [-h]
              {status,install,add,check,filecheck,list,doi,fetch,extract,undo,git}
              ...
papers: error: unrecognized arguments: --no-prompt
====================================================== short test summary info ======================================================
FAILED tests/test_papers.py::TestAddDir::test_adddir_pdf - subprocess.CalledProcessError: Command 'papers install --local --no-prompt --bibtex /tmp/papers.bib8eszhc3e' returned non-zero e...
FAILED tests/test_papers.py::TestAddDir::test_adddir_pdf_cmd - subprocess.CalledProcessError: Command 'papers install --local --no-prompt --bibtex /tmp/papers.bib_o_zoa39' returned non-zero e...
============================================= 2 failed, 91 passed, 2 skipped in 19.02s ==============================================
(python311) → master Work/papers                                                                                            18:38:

This is on python 3.11, on master. Is this a blocker for anything?

add sponsor button

papers --add fails if in a subdir?

I see

papers add 2013_AdvCIS_Modeling\ and\ simulation\ of\ electrostatically\ gated\ nanochannels.pdf --rename --copy --info         
INFO:papers:bibtex: '/home/boyan/Vazhno/Work/Literature/library.bib'
INFO:papers:filesdir: '/home/boyan/Vazhno/Work/Literature/papers_organized'
INFO:papers:8036 entry files were updated
INFO:papers:pdftotext -f 1 -l 1 2013_AdvCIS_Modeling and simulation of electrostatically gated nanochannels.pdf /tmp/tmppsa0k9ff.txt
INFO:papers:found doi:10.1016/j.cis.2013.06.006
INFO:papers:duplicate :: update key to match existing entry: 2013/2013_pardon_van-der-wijngaart_modeling-and-simulation-of-electrostatically-gated-nanochannels => Pardon2013
Traceback (most recent call last):
  File "/home/boyan/miniconda3/envs/python/bin/papers", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/boyan/miniconda3/envs/python/lib/python3.11/site-packages/papers/__main__.py", line 1071, in main
    check_install(subp, o, config) and addcmd(subp, o, config)
                                       ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/boyan/miniconda3/envs/python/lib/python3.11/site-packages/papers/__main__.py", line 452, in addcmd
    biblio.add_pdf(file, attachments=o.attachment, rename=o.rename, copy=o.copy,
  File "/home/boyan/miniconda3/envs/python/lib/python3.11/site-packages/papers/bib.py", line 432, in add_pdf
    self.insert_entry(entry, update_key=True, **kw)
  File "/home/boyan/miniconda3/envs/python/lib/python3.11/site-packages/papers/bib.py", line 288, in insert_entry
    self.insert_entry_check(entry, update_key=update_key, rename=rename, copy=copy, **checkopt)
  File "/home/boyan/miniconda3/envs/python/lib/python3.11/site-packages/papers/bib.py", line 345, in insert_entry_check
    file = merge_files([candidate, entry], relative_to=self.relative_to)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/boyan/miniconda3/envs/python/lib/python3.11/site-packages/papers/duplicate.py", line 290, in merge_files
    check = checksum(f) if os.path.exists(f) else None
            ^^^^^^^^^^^
  File "/home/boyan/miniconda3/envs/python/lib/python3.11/site-packages/papers/utils.py", line 81, in checksum
    return hash_bytestr_iter(file_as_blockiter(open(fname, 'rb')), hashlib.sha256())
                                               ^^^^^^^^^^^^^^^^^
IsADirectoryError: [Errno 21] Is a directory: '/home/boyan/Vazhno/Work/Literature'

The pdf itself is OK, I think -- this is the right PDF metadata after the add.

papers extract 2013_AdvCIS_Modeling\ and\ simulation\ of\ electrostatically\ gated\ nanochannels.pdf                    
@article{Pardon_2013, title={Modeling and simulation of electrostatically gated nanochannels}, volume={199–200}, ISSN={0001-8686}, url={http://dx.doi.org/10.1016/j.cis.2013.06.006}, DOI={10.1016/j.cis.2013.06.006}, journal={Advances in Colloid and Interface Science}, publisher={Elsevier BV}, author={Pardon, G. and van der Wijngaart, W.}, year={2013}, month=nov, pages={78–94} }

My papers is installed, and the config is:

(python) → working Literature/Stage cat ~/.local/share/config.json                                                                                                   
{
  "absolute_paths": true,
  "backup_files": false,
  "bibtex": "/home/boyan/Vazhno/Work/Literature/library.bib",
  "editor": null,
  "filesdir": "/home/boyan/Vazhno/Work/Literature/papers_organized",
  "git": true,
  "gitdir": "/home/boyan/.local/share",
  "gitlfs": true,
  "keyformat": {
    "author_num": 2,
    "author_sep": "_",
    "template": "{year}/{year}_{author}_{title}",
    "title_length": 100,
    "title_sep": "-",
    "title_word_num": 100,
    "title_word_size": 1
  },
  "local": false,
  "nameformat": {
    "author_num": 2,
    "author_sep": "_",
    "template": "{authorX}_{year}_{title}",
    "title_length": 100,
    "title_sep": "-",
    "title_word_num": 100,
    "title_word_size": 1
  }
}

Note that if I switch to the {journal} tag in the config ( by doing "template": "{journal}/{year}{author}{title}") which should be supported, as {journal} is a valid BibTex field, I get

INFO:papers:bibtex: '/home/boyan/Vazhno/Work/Literature/library.bib'
INFO:papers:filesdir: '/home/boyan/Vazhno/Work/Literature/papers_organized'
INFO:papers:8036 entry files were updated
INFO:papers:pdftotext -f 1 -l 1 2013_AdvCIS_Modeling and simulation of electrostatically gated nanochannels.pdf /tmp/tmp1if0wmzn.txt
INFO:papers:found doi:10.1016/j.cis.2013.06.006
Traceback (most recent call last):
  File "/home/boyan/miniconda3/envs/python/bin/papers", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/boyan/miniconda3/envs/python/lib/python3.11/site-packages/papers/__main__.py", line 1071, in main
    check_install(subp, o, config) and addcmd(subp, o, config)
                                       ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/boyan/miniconda3/envs/python/lib/python3.11/site-packages/papers/__main__.py", line 452, in addcmd
    biblio.add_pdf(file, attachments=o.attachment, rename=o.rename, copy=o.copy,
  File "/home/boyan/miniconda3/envs/python/lib/python3.11/site-packages/papers/bib.py", line 427, in add_pdf
    entry['ID'] = self.generate_key(entry)
                  ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/boyan/miniconda3/envs/python/lib/python3.11/site-packages/papers/bib.py", line 367, in generate_key
    key = self.keyformat(entry)
          ^^^^^^^^^^^^^^^^^^^^^
  File "/home/boyan/miniconda3/envs/python/lib/python3.11/site-packages/papers/filename.py", line 108, in __call__
    return self.render(**entry)
           ^^^^^^^^^^^^^^^^^^^^
  File "/home/boyan/miniconda3/envs/python/lib/python3.11/site-packages/papers/filename.py", line 105, in render
    return stringify_entry(entry, **vars(self))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/boyan/miniconda3/envs/python/lib/python3.11/site-packages/papers/filename.py", line 68, in stringify_entry
    res = template.format(**fields)
          ^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: 'journal'

I can refile this as two issues, but am I calling "add" correctly? The behavior I expect is to have the PDF renamed and moved, and the entry added to the end of library.bib.

Drop 2.7 support

Need to make a new version without six etc...
Would make papers simpler to maintain and extend.

tests: $HOME not set?

After #53 -- which is a good idea! -- I see:

(python311) → master Work/papers tox -e py311                                                    8:07:09
GLOB sdist-make: /home/boyan/boyanshouse/Vazhno/Work/papers/setup.py
py311 create: /home/boyan/boyanshouse/Vazhno/Work/papers/.tox/py311
py311 installdeps: bibtexparser, scholarly, crossrefapi, rapidfuzz, unidecode, normality, pytest, pytest-cov
py311 inst: /home/boyan/boyanshouse/Vazhno/Work/papers/.tox/.tmp/package/1/papers-cli-2.3.dev130+gfd9291b.zip
py311 installed: alabaster==0.7.13,anyio==3.6.2,arrow==1.2.3,async-generator==1.10,attrs==23.1.0,Babel==2.12.1,banal==1.0.6,beautifulsoup4==4.12.2,bibtexparser==1.4.0,certifi==2022.12.7,chardet==5.1.0,charset-normalizer==3.1.0,coverage==7.2.4,crossrefapi==1.5.0,Deprecated==1.2.13,docutils==0.18.1,exceptiongroup==1.1.1,fake-useragent==1.1.3,free-proxy==1.1.1,h11==0.14.0,httpcore==0.17.0,httpx==0.24.0,idna==3.4,imagesize==1.4.1,iniconfig==2.0.0,Jinja2==3.1.2,lxml==4.9.2,MarkupSafe==2.1.2,normality==2.4.0,outcome==1.2.0,packaging==23.1,papers-cli @ file:///home/boyan/boyanshouse/Vazhno/Work/papers/.tox/.tmp/package/1/papers-cli-2.3.dev130%2Bgfd9291b.zip,pluggy==1.0.0,Pygments==2.15.1,pyparsing==3.0.9,PySocks==1.7.1,pytest==7.3.1,pytest-cov==4.0.0,python-dateutil==2.8.2,python-dotenv==1.0.0,rapidfuzz==3.0.0,requests==2.29.0,scholarly==1.7.11,selenium==4.9.0,six==1.16.0,sniffio==1.3.0,snowballstemmer==2.2.0,sortedcontainers==2.4.0,soupsieve==2.4.1,Sphinx==6.2.1,sphinx-rtd-theme==1.2.0,sphinxcontrib-applehelp==1.0.4,sphinxcontrib-devhelp==1.0.2,sphinxcontrib-htmlhelp==2.0.1,sphinxcontrib-jquery==4.1,sphinxcontrib-jsmath==1.0.1,sphinxcontrib-qthelp==1.0.3,sphinxcontrib-serializinghtml==1.1.5,text-unidecode==1.3,trio==0.22.0,trio-websocket==0.10.2,typing_extensions==4.5.0,Unidecode==1.3.6,urllib3==1.26.15,wrapt==1.15.0,wsproto==1.2.0
py311 run-test-pre: PYTHONHASHSEED='2774689394'
py311 run-test: commands[0] | pytest --cov=papers --cov-append --cov-report=term-missing -xv
========================================== test session starts ==========================================
platform linux -- Python 3.11.3, pytest-7.3.1, pluggy-1.0.0 -- /home/boyan/boyanshouse/Vazhno/Work/papers/.tox/py311/bin/python
cachedir: .tox/py311/.pytest_cache
rootdir: /home/boyan/boyanshouse/Vazhno/Work/papers
plugins: cov-4.0.0, anyio-3.6.2
collected 0 items / 1 error                                                                             

================================================ ERRORS =================================================
__________________________________ ERROR collecting tests/test_add.py ___________________________________
../../../miniconda3/envs/python311/lib/python3.11/importlib/__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
<frozen importlib._bootstrap>:1206: in _gcd_import
    ???
<frozen importlib._bootstrap>:1178: in _find_and_load
    ???
<frozen importlib._bootstrap>:1128: in _find_and_load_unlocked
    ???
<frozen importlib._bootstrap>:241: in _call_with_frames_removed
    ???
<frozen importlib._bootstrap>:1206: in _gcd_import
    ???
<frozen importlib._bootstrap>:1178: in _find_and_load
    ???
<frozen importlib._bootstrap>:1149: in _find_and_load_unlocked
    ???
<frozen importlib._bootstrap>:690: in _load_unlocked
    ???
<frozen importlib._bootstrap_external>:940: in exec_module
    ???
<frozen importlib._bootstrap>:241: in _call_with_frames_removed
    ???
tests/__init__.py:3: in <module>
    sp.check_call('git config --list | grep user.name || git config --global user.name "Papers Tests"', shell=True)
../../../miniconda3/envs/python311/lib/python3.11/subprocess.py:413: in check_call
    raise CalledProcessError(retcode, cmd)
E   subprocess.CalledProcessError: Command 'git config --list | grep user.name || git config --global user.name "Papers Tests"' returned non-zero exit status 128.
-------------------------------------------- Captured stderr --------------------------------------------
fatal: $HOME not set
======================================== short test summary info ========================================
ERROR tests/test_add.py - subprocess.CalledProcessError: Command 'git config --list | grep user.name || git config --global us...
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
=========================================== 1 error in 0.22s ============================================
ERROR: InvocationError for command /home/boyan/boyanshouse/Vazhno/Work/papers/.tox/py311/bin/pytest --cov=papers --cov-append --cov-report=term-missing -xv (exited with code 1)
________________________________________________ summary ________________________________________________
ERROR:   py311: commands failed

Not sure what the thing is here...

scholarly.scholarly not found?

Hello folks -- @perrette first off, very very glad to see this fantastic project, and very much considering replacing my workflow (that relies on one of the particularly outdated proprietary packages you have listed on your front page) entirely with this! Thanks kindly! Thing is, I have a local library of 80 Gb of PDFs that's a good set of test cases here...

When I try papers extract yanofsky_qc.pdf --scholar, which should work, I get:

ModuleNotFoundError: No module named 'scholarly.scholarly'

This is with pip install papers-cli which may be out of date...

Anybody else seeing this?

'papers extract' results in a call with nonsensical arguments to pdftotext

Hi,

I've been using 'papers' for quite a while now and this is the first time I've seen this issue. I am trying to extract the bilbiographic info of this article* from its pdf. The program throws this exception:

Command Line Error: Wrong page range given: the first page (2) can not be after the last page (1).
Traceback (most recent call last):
File "/usr/bin/papers", line 8, in
sys.exit(main())
^^^^^^
File "/usr/lib/python3.11/site-packages/papers/main.py", line 1091, in main
extractcmd(subp, o)
File "/usr/lib/python3.11/site-packages/papers/main.py", line 546, in extractcmd
print(extract_pdf_metadata(o.pdf, search_doi=not o.fulltext, search_fulltext=True, scholar=o.scholar, minwords=o.word_count, max_query_words=o.word_count, image=o.image))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/site-packages/papers/extract.py", line 208, in extract_pdf_metadata
txt = pdfhead(pdf, maxpages, minwords, image=image)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/site-packages/papers/extract.py", line 134, in pdfhead
txt += readpdf(pdf, first=i, last=i)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/site-packages/papers/extract.py", line 41, in readpdf
sp.check_call(cmd)
File "/usr/lib/python3.11/subprocess.py", line 413, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['pdftotext', '-f', '2', '-l', '2', 'paper.pdf', '/tmp/tmpaq14gv_5.txt']' returned non-zero exit status 99.

Apparently, 'papers' is calling 'pdftotext' with arguments that make no sense, so, what is making 'papers' get confused about those arguments?

(Have I mentioned how much I like this program? Cheers!)

*https://www.nature.com/articles/s41567-020-0990-x

question: is there a way to share the PDFs and metadata on a common server?

Hi,

I'm looking for a command-line tool like "paper", but I also need that the PDFs and bibtex data are shared on a common server for the whole team.

What would you recommend?

Improve test coverage

We now have about 60% test coverage (test coverage issues were fixed in #44).

Name                  Stmts   Miss  Cover
-----------------------------------------
papers/__init__.py        4      0   100%
papers/__main__.py      520    206    60%
papers/_version.py        2      0   100%
papers/bib.py           492    192    61%
papers/config.py        250     86    66%
papers/duplicate.py     384    136    65%
papers/encoding.py       84     22    74%
papers/extract.py       236    151    36%
papers/filename.py       51      4    92%
papers/latexenc.py       57     36    37%
-----------------------------------------
TOTAL                  2080    833    60%

As a first objective, we should aim for 100% test coverage for __main__.py, i.e. that every sub-command and if/then/else branching is executed at least once. That way we can be sure that namespace, import and other syntax errors ring the bell. From there, we can start thinking about semantic, i.e. thinking about simple and intricate cases where we expect a meaningful result.

The coverage reporting includes a missing lines section that is more informative:

Name                  Stmts   Miss  Cover   Missing
---------------------------------------------------
papers/__init__.py        4      0   100%
papers/__main__.py      520    206    60%   26, 30, 79-87, 103-119, 123-133, 147-149, 152, 171, 177-178, 185-186, 190-204, 215, 219-220, 239, 245-250, 253-260, 271, 306, 318-330, 333-339, 343-347, 356, 360-370, 374, 377, 381-386, 391-496, 499, 510-511, 815, 820-821, 827, 829, 831, 833, 838-842
papers/_version.py        2      0   100%
papers/bib.py           492    192    61%   45-47, 67, 140-143, 157-158, 175-190, 214, 233-234, 237, 262, 273, 332, 371, 381, 388, 401-402, 408, 440-446, 450, 455-459, 469, 505, 513-514, 523, 541, 562-565, 575-580, 590-594, 598-613, 616-629, 633-660, 664-668, 671, 674-682, 686, 692-703, 709-781
papers/config.py        250     86    66%   32-39, 66-67, 94-98, 105, 110, 124, 157-159, 164-165, 171, 174-186, 189-205, 213, 219, 229, 232, 234, 241, 246, 248, 254, 257-262, 299, 306-309, 317-319, 322-326, 333, 344-345, 349-359
papers/duplicate.py     384    136    65%   72-74, 88-89, 95-96, 100-101, 150-157, 163, 174, 176, 178, 195, 200-235, 242, 246, 272, 294, 304, 312, 314-318, 340, 349-371, 401-404, 407, 410, 417, 420, 427, 430-433, 436-451, 477-484, 487, 490, 493, 496-497, 500, 511-525, 538-542, 560-561, 619, 622
papers/encoding.py       84     22    74%   35, 63-64, 106-120, 125-128
papers/extract.py       236    151    36%   32, 51-78, 97, 105, 109, 117-118, 132, 144-159, 179-189, 193-199, 202, 215-221, 226-229, 234-239, 244-245, 250-268, 272, 277-286, 292-332, 337-367, 372-386
papers/filename.py       51      4    92%   18-19, 45-46
papers/latexenc.py       57     36    37%   23-31, 41-90, 100-101
---------------------------------------------------
TOTAL                  2080    833    60%

Parsing braces when generating citation key

If the BibTeX returned by the DOI query includes braces in the author list, then it seems the code which generates the citation key based on the last name of the first author fails.

Example: The article http://dx.doi.org/10.4169/amer.math.monthly.118.05.450 returns via papers extract the following "entry"

@article{Alan D. Sokal_2011,
 author = {{Alan D. Sokal}, },
 doi = {10.4169/amer.math.monthly.118.05.450},
 journal = {The American Mathematical Monthly},
 number = {5},
 pages = {450},
 publisher = {Informa UK Limited},
 title = {A Really Simple Elementary Proof of the Uniform Boundedness Theorem},
 url = {http://dx.doi.org/10.4169/amer.math.monthly.118.05.450},
 volume = {118},
 year = {2011}
}

Note that the citation key contains spaces and is invalid BibTeX.

Strange behavior with pdf-folder and bibtex folder

my config:

{
  "bibtex": "/home/user/Library/Bibtex/lib.bib",
  "filesdir": "/home/user/Library/PDFs",
  "git": true,
  "gitdir": "/home/user/Library"
}

If I trying to use paper add file.pdf I got error:

Traceback (most recent call last):
  File "/usr/bin/papers", line 5, in <module>
    papers.bib.main()
  File "/usr/lib/python3.9/site-packages/papers/bib.py", line 1350, in main
    check_install() and addcmd(o)
  File "/usr/lib/python3.9/site-packages/papers/bib.py", line 987, in addcmd
    savebib(my, o)
  File "/usr/lib/python3.9/site-packages/papers/bib.py", line 898, in savebib
    config.gitcommit()
  File "/usr/lib/python3.9/site-packages/papers/config.py", line 115, in gitcommit
    if not os.path.samefile(self.bibtex, target):
  File "/usr/lib/python3.9/genericpath.py", line 101, in samefile
    s2 = os.stat(f2)
FileNotFoundError: [Errno 2] No such file or directory: '/home/user/Library/lib.bib'

But bib file is located /home/user/Library/Bibtex/lib.bib
If I add sim-link from /home/user/Library/Bibtex/lib.bib -> /home/user/Library/lib.bib
it works fine.

DOI parsing fails in a few cases

The current method to retrieve DOI consists in search for regular expressions over the first two pages, and to keep the first one that appear.

Accepted prefixes are (lower or upper case):

'doi:', 'doi: ', 'doi ', 'dx\.doi\.org/', 'doi/'

DOI itself is searched as:

r"10\.\d\d\d\d/.*?"

And is expected to finish with:

r"[, \n]"

The method fails in a few cases:

when DOI spreads over two lines (e.g. here)
when other DOIs appear before the actual paper's DOI, for example here

These could be solved by more permissive parsing of DOI, but keep it conservative for now until a good solution is found.

Nevertheless, existing edits / fixes currently include:

underscore sometimes gets converted into an empty space by pdftotxt, so we also detect ending with any space followed by a digit. This solves at least one case.

hardlinking kinda fails, sometimes..

OK, this is like some inside baseball here, but is worth noting I have occasionally started to see the following test failure on one machine:

self = <tests.test_add.TestAdd testMethod=test_add_rename_copy>

    def test_add_rename_copy(self):
    
>       paperscmd(f'add -rc --bibtex {self.mybib} --filesdir {self.filesdir} {self.pdf}')

tests/test_add.py:76: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
tests/common.py:59: in paperscmd
    return speedy_paperscmd(cmd, *args, **kwargs)
tests/common.py:53: in speedy_paperscmd
    return call(main, args, check=check, check_output=check_output)
tests/common.py:34: in call
    return f(*args, **kwargs)
papers/__main__.py:1071: in main
    check_install(subp, o, config) and addcmd(subp, o, config)
papers/__main__.py:452: in addcmd
    biblio.add_pdf(file, attachments=o.attachment, rename=o.rename, copy=o.copy,
papers/bib.py:432: in add_pdf
    self.insert_entry(entry, update_key=True, **kw)
papers/bib.py:288: in insert_entry
    self.insert_entry_check(entry, update_key=update_key, rename=rename, copy=copy, **checkopt)
papers/bib.py:320: in insert_entry_check
    self.insert_entry(entry, update_key, rename=rename, copy=copy)
papers/bib.py:311: in insert_entry
    if rename: self.rename_entry_files(entry, copy=copy)
papers/bib.py:529: in rename_entry_files
    self.move(file, newfile, copy)
papers/bib.py:223: in move
    return _move(file, newfile, copy=copy, dryrun=papers.config.DRYRUN)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

f1 = '/home/boyan/boyanshouse/Vazhno/Work/papers/tests/downloadedpapers/bg-8-515-2011.pdf'
f2 = '/tmp/papers.files1tk1bilj/perrette_et_al_2011_near-ubiquity-of-ice-edge-blooms-in-the-arctic.pdf', copy = True
interactive = True, dryrun = False, hardlink = True

    def move(f1, f2, copy=False, interactive=True, dryrun=False, hardlink=True):
        maybe = 'dry-run:: ' if dryrun else ''
        dirname = os.path.dirname(f2)
        if dirname and not os.path.exists(dirname):
            logger.info(f'{maybe}create directory: {dirname}')
            if not dryrun: os.makedirs(dirname)
        if f1 == f2:
            logger.info('dest is identical to src: '+f1)
            return
    
        if os.path.exists(f2):
            # if identical file, pretend nothing happened, skip copying
            if os.path.samefile(f1, f2) or checksum(f2) == checksum(f1):
                if not copy and not dryrun:
                    logger.info(f'{maybe}rm {f1}')
                    os.remove(f1)
                return
    
            elif interactive:
                ans = input(f'dest file already exists: {f2}. Replace? (y/n) ')
                if ans.lower() != 'y':
                    return
            else:
                logger.info(f'{maybe}rm {f2}')
                if not dryrun:
                    os.remove(f2)
    
        if copy:
            # If we can do a hard-link instead of copy-ing, let's do:
            if hardlink:
                cmd = f'{maybe}ln {f1} {f2}'
                logger.info(cmd)
                if not dryrun:
>                   os.link(f1, f2)
E                   OSError: [Errno 18] Invalid cross-device link: '/home/boyan/boyanshouse/Vazhno/Work/papers/tests/downloadedpapers/bg-8-515-2011.pdf' -> '/tmp/papers.files1tk1bilj/perrette_et_al_2011_near-ubiquity-of-ice-edge-blooms-in-the-arctic.pdf'

papers/utils.py:119: OSError

This is not reproducible on two other machines -- and I think the filesystems are all flat! -- but one of the solutions seems to be here: higlass/higlass-manage#3 (tldr: replace os.link with shutil.copy2).

There's some back and forth of increasing dubiousness about hardlinks being bad and somewhat dangerous, and re-considering my behavior in the past, I'm tending to agree; I tend to just copy the files, and let the filesystem then dedupe them if if needs be -- the option here may be just not supporting the --hardlink option...

But again, this only rears it's head on one machine (and when I run tests manually, not though tox, cf #54 ).