joostkremers / parsebib Goto Github PK

View Code? Open in Web Editor NEW

35.0 35.0 9.0 233 KB

Elisp library for reading .bib files

License: BSD 3-Clause "New" or "Revised" License

Emacs Lisp 100.00%

parsebib's People

Contributors

Stargazers

Watchers

Forkers

emacsmirror giobo malb plumpmath jabranham andras-simonyi bdarcus aikrahguzar foxfriday

parsebib's Issues

Unexpected escape characters

When I read an entry that contains a backslash in its title (e.g., $\phi$ ), parsebib-read-entry returns a title in which the backslash is duplicated ( $\\phi$ ). Is this intentional or a bug? In my case, it causes problems because I have to reconstruct what the original string was. (Context: tmalsburg/helm-bibtex#83)

invalid-read-syntax error

The error I have is in other libraries use of parsebib but eval-buffer of parsebib.el fails as well.

It seems to boils down to these strings

(setq val "\N{DOUBLE DAGGER}") 
; (invalid-read-syntax "\\N{DOUBLE DAGGER}" 4 30)

(setq val "\n{DOUBLE DAGGER}")
"
{DOUBLE DAGGER}"

(setq val "\\N{DOUBLE DAGGER}")
"\\N{DOUBLE DAGGER}"

(emacs-version)
"GNU Emacs 29.0.50 (build 1, x86_64-w64-mingw32)"
(pkg-info-version-info 'parsebib)
"20230228.1530"

Thanks!

Add support for org-bibtex

Just thought I'd add this here, based on this comment/conversation.

emacs-citar/citar#397 (comment)

And suggestions from @andras-simonyi:

Yes, ol-bibtex.el (née org-bibtex.el) also contains functionality to read BibTeX entries from org-bibtex files, and the most important function in this regard is org-bibtex-headline which reads a single BibTeX entry from the Org headline (properties) at point. A slightly modified form of that function (citeproc-bt-from-org-headline) is used in the citeproc-el org-bibtex reader -- the "parser" basically calls that function for all headlines in the Org file via org-map-entries. It'd probably be trivial to do something very similar in parsebib, although I'm not sure about the performance implications.

emacs-citar/citar#397 (comment)

parsebib uses cl-loop without requiring cl-macs

If using a compiled version of parsebib by helm-bibtex, function parsebib-read-entry returns an error because cl-loop is not defined. This results in an empty list when calling helm-bibtex.

Inserting

(eval-when-compile
  (require 'cl-macs))

after 'require' at beginning of parsebib.el solved the problem.

What can we expect if we round-trip a bibliography file?

The reason I ask is for interaction with revision control systems like git and subversion. If bib databases are dumped into, for example, lexicographically-sorted .bib files, then RCSes will be able to accurately identify diffs. But if they are dumped in an arbitrary order, then the bib files will effectively be binary files from the standpoint of RCSes (or, indeed, anything that relies on diff).

Believe I have found an off-by-one error in parsebib--match-paren-forward

Here's the current definition:

(defun parsebib--match-paren-forward ()
  "Move forward to the closing paren matching the opening paren at point.
This function handles parentheses () and braces {}.  Return t if a
matching parenthesis was found.  Note that this function puts
point right before the closing delimiter (unlike e.g.,
`forward-sexp', which puts it right after.)"
  (cond
   ((eq (char-after) ?\{)
    (parsebib--match-brace-forward))
   ((eq (char-after) ?\()
    ;; This is really a hack. We want to allow unbalanced parentheses in
    ;; field values (BibTeX does), so we cannot use forward-sexp
    ;; here. For the same reason, looking for the matching paren by hand
    ;; is pretty complicated. However, balanced parentheses can only be
    ;; used to enclose entire entries (or @STRINGs or @PREAMBLEs) so we
    ;; can be pretty sure we'll find it right before the next @ at the
    ;; start of a line, or right before the end of the file.
    (let ((beg (point)))
      (re-search-forward parsebib--entry-start nil 0)
      (skip-chars-backward "@ \n\t\f")
      (if (eq (char-after) ?\))
          ;; if we've found a closing paren, return t
          t
        ;; otherwise put the cursor back and signal an error
        (goto-char beg)
        (signal 'scan-error (list "Unbalanced parentheses" beg (point-max))))))))

This fails to successfully parse this entry in one of my bib files:

@ARTICLE(Stix:LivingRoom,
	AUTHOR = {Gary Stix},
	TITLE = "Domesticating Cyberspace",
	JOURNAL = SciAm,
	YEAR = {1993},
	VOLUME = {269},
	NUMBER = {2},
	PAGES = {100--110},
	MONTH = aug
)

@ARTICLE(Bal:DistPLs,
...

What happens when I step through this is that we take the branch for matching parens, then we jump forward to the second @ (for Bal:DistPLs). The next step is to skip-chars-backward, which puts the cursor right after the closing paren that we are looking for. But the code assumes that the cursor will be right before that closing paren -- it checks (char-after) when, AFAICT, it should check (char-before).

Compilation failing on emacs 26.3

Hello,

I believe that after the f41befa commit, the package is failing to compile on emacs 26.3.

I caught this when I looked into why the ox-hugo test suite started failing just on emacs 26.3. ox-hugo uses citeproc for some of its tests and that package depends on parsebib.

Error

Eager macro-expansion failure: (error "rx ‘not’ syntax error: ]")
...
  mapconcat(#f(compiled-function (x) #<bytecode 0x13ec1e1>) ((: (* (: "[" (* (not "]")) "]")) "{" (group-n 2 (0+ (not "}"))) (opt "}")) (group-n 3 letter)) "\\|")
  rx-or((or (: (* (: "[" (* (not "]")) "]")) "{" (group-n 2 (0+ (not "}"))) (opt "}")) (group-n 3 letter)))
  rx-form((or (: (* (: "[" (* (not "]")) "]")) "{" (group-n 2 (0+ (not "}"))) (opt "}")) (group-n 3 letter)) *)
  rx-kleene((opt (or (: (* (: "[" (* (not "]")) "]")) "{" (group-n 2 (0+ (not "}"))) (opt "}")) (group-n 3 letter))))
  rx-form((opt (or (: (* (: "[" (* (not "]")) "]")) "{" (group-n 2 (0+ (not "}"))) (opt "}")) (group-n 3 letter))) :)
  #f(compiled-function (x) #<bytecode 0x13ebe99>)((opt (or (: (* (: "[" (* (not "]")) "]")) "{" (group-n 2 (0+ (not "}"))) (opt "}")) (group-n 3 letter))))
  mapconcat(#f(compiled-function (x) #<bytecode 0x13ebe99>) ((* blank) (opt (or (: (* (: "[" (* ...) "]")) "{" (group-n 2 (0+ (not "}"))) (opt "}")) (group-n 3 letter)))) nil)
  rx-and((: (* blank) (opt (or (: (* (: "[" (* ...) "]")) "{" (group-n 2 (0+ (not "}"))) (opt "}")) (group-n 3 letter)))))
  rx-form((: (* blank) (opt (or (: (* (: "[" (* ...) "]")) "{" (group-n 2 (0+ (not "}"))) (opt "}")) (group-n 3 letter)))) :)
  #f(compiled-function (x) #<bytecode 0x13ebe99>)((: (* blank) (opt (or (: (* (: "[" (* ...) "]")) "{" (group-n 2 (0+ (not "}"))) (opt "}")) (group-n 3 letter)))))
  mapconcat(#f(compiled-function (x) #<bytecode 0x13ebe99>) ("\\" (group-n 1 (or (1+ letter) nonl)) (: (* blank) (opt (or (: (* (: "[" ... "]")) "{" (group-n 2 (0+ ...)) (opt "}")) (group-n 3 letter))))) nil)
  rx-and((and "\\" (group-n 1 (or (1+ letter) nonl)) (: (* blank) (opt (or (: (* (: "[" ... "]")) "{" (group-n 2 (0+ ...)) (opt "}")) (group-n 3 letter))))))
  rx-form((and "\\" (group-n 1 (or (1+ letter) nonl)) (: (* blank) (opt (or (: (* (: "[" ... "]")) "{" (group-n 2 (0+ ...)) (opt "}")) (group-n 3 letter))))))
  rx-to-string((and "\\" (group-n 1 (or (1+ letter) nonl)) (: (* blank) (opt (or (: (* (: "[" ... "]")) "{" (group-n 2 (0+ ...)) (opt "}")) (group-n 3 letter))))) t)
  #f(compiled-function (&rest regexps) #<bytecode 0x13ed2bd>)("\\" (group-n 1 (or (1+ letter) nonl)) (: (* blank) (opt (or (: (* (: "[" (* ...) "]")) "{" (group-n 2 (0+ (not "}"))) (opt "}")) (group-n 3 letter)))))
  (rx "\\" (group-n 1 (or (1+ letter) nonl)) (: (* blank) (opt (or (: (* (: "[" (* ...) "]")) "{" (group-n 2 (0+ (not "}"))) (opt "}")) (group-n 3 letter)))))
  (cons (quote parsebib--replace-command-or-accent) (rx "\\" (group-n 1 (or (1+ letter) nonl)) (: (* blank) (opt (or (: (* (: "[" ... "]")) "{" (group-n 2 (0+ ...)) (opt "}")) (group-n 3 letter))))))
  (list (cons (quote parsebib--replace-command-or-accent) (rx "\\" (group-n 1 (or (1+ letter) nonl)) (: (* blank) (opt (or (: (* ...) "{" (group-n 2 ...) (opt "}")) (group-n 3 letter)))))) (cons (quote parsebib--replace-literal) (rx-to-string (\` (or (\,@ (mapcar (function car) parsebib-TeX-literal-replacement-alist)) (1+ blank))))))
  (\` ((parsebib--replace-command-or-accent \, (rx "\\" (group-n 1 (or (1+ letter) nonl)) (: (* blank) (opt (or (: ... "{" ... ...) (group-n 3 letter)))))) (parsebib--replace-literal \, (rx-to-string (\` (or (\,@ (mapcar ... parsebib-TeX-literal-replacement-alist)) (1+ blank)))))))
  (defvar parsebib-TeX-markup-replacement-alist (\` ((parsebib--replace-command-or-accent \, (rx "\\" (group-n 1 (or (1+ letter) nonl)) (: (* blank) (opt (or ... ...))))) (parsebib--replace-literal \, (rx-to-string (\` (or (\,@ ...) (1+ blank))))))) "Alist of replacements and strings for TeX markup.\nThis is used in `parsebib-clean-TeX-markup' to make TeX markup more\nsuitable for display.  Each item in the list consists of a replacement\nand a regexp.  The replacement can be a string (which will\nsimply replace the match) or a function (the match will be\nreplaced by the result of calling the function on the match\nstring).  Earlier elements are evaluated before later ones, so if\none string is a subpattern of another, the second must appear\nlater (e.g. \"''\" is before \"'\").\n\nFor the common cases of replacing a LaTeX command or a literal\nit is faster to use `parsebib-TeX-command-replacement-alist'\nand `parsebib-TeX-literal-replacement-alist' respectively.")
  eval-buffer(#<buffer  *load*-955859> nil "/tmp/runner/ox-hugo-dev/elpa_26/parsebib-20220916.2236/parsebib.el" nil t)  ; Reading at buffer position 18963
  load-with-code-conversion("/tmp/runner/ox-hugo-dev/elpa_26/parsebib-20220916.2236/parsebib.el" "/tmp/runner/ox-hugo-dev/elpa_26/parsebib-20220916.2236/parsebib.el" nil t)
  require(parsebib)
  eval-buffer(#<buffer  *load*-705935> nil "/tmp/runner/ox-hugo-dev/elpa_26/citeproc-20220921.1924/citeproc-itemgetters.el" nil t)  ; Reading at buffer position 1376
  load-with-code-conversion("/tmp/runner/ox-hugo-dev/elpa_26/citeproc-20220921.1924/citeproc-itemgetters.el" "/tmp/runner/ox-hugo-dev/elpa_26/citeproc-20220921.1924/citeproc-itemgetters.el" nil t)
  require(citeproc-itemgetters)
  eval-buffer(#<buffer  *load*-991405> nil "/tmp/runner/ox-hugo-dev/elpa_26/citeproc-20220921.1924/citeproc.el" nil t)  ; Reading at buffer position 1679
  load-with-code-conversion("/tmp/runner/ox-hugo-dev/elpa_26/citeproc-20220921.1924/citeproc.el" "/tmp/runner/ox-hugo-dev/elpa_26/citeproc-20220921.1924/citeproc.el" t t)
  require(citeproc nil t)

Full log

Unmatched full-width braces lead to "Unbalanced parentheses" errors

(I am using Spacemacs and have tons of customization, but I assume this is irrelevant? If the following is not enough to reproduce the issue, I will try again in a fresh install / vanilla emacs.)

Reproducing Steps

% Failing.bib
@article{Title,
  title = {{Title}},
  author = {Author},
  year = {1970},
  journal = {Journal},
  abstract = {（} % <-- Culprit
}

% Passing.bib
@article{Title,
  title = {{Title}},
  author = {Author},
  year = {1970},
  journal = {Journal},
  abstract = {（）} % <-- No error
}

Evaluation results:

(parsebib-parse "/tmp/Passing.bib")
#s(hash-table size 65 test equal rehash-size 1.5 rehash-threshold 0.8125 data ("Title" (("abstract" . "（）<–Noerror") ("journal" . "Journal") ("year" . "1970") ("author" . "Author") ("title" . "Title") ("=type=" . "article") ("=key=" . "Title"))))

(parsebib-parse "/tmp/Failing.bib")
Debugger entered--Lisp error: (scan-error "Unbalanced parentheses" 23 145)
  scan-sexps(23 1)
  forward-sexp(1)
  parsebib--match-brace-forward()
  parsebib--match-paren-forward()
  parsebib-read-entry("article" nil #<hash-table equal 0/65 0x15659cd2c74d> nil t)
  parsebib-parse-bib-buffer(:entries #<hash-table equal 0/65 0x15659cd2c72d> :strings #<hash-table equal 0/65 0x15659cd2c74d> :expand-strings t :inheritance t :fields nil :replace-TeX t)
  #f(compiled-function (file) #<bytecode -0x424eb6921cb642f>)("/tmp/Failing.bib")
  parsebib-parse("/tmp/Failing.bib")
  (progn (parsebib-parse "/tmp/Failing.bib"))
  elisp--eval-last-sexp(t)
  #<subr eval-last-sexp>(t)
  #f(compiled-function (&rest _it) #<bytecode 0x19dc700cc452>)()
  eval-sexp-fu-flash-doit-simple(#f(compiled-function (&rest _it) #<bytecode 0x19dc700cc452>) #f(compiled-function (&rest args2) #<bytecode 0x68db0840b9799c7>) #f(compiled-function (&rest args2) #<bytecode 0x6821c372fab19c7>))
  eval-sexp-fu-flash-doit(#f(compiled-function (&rest _it) #<bytecode 0x19dc700cc452>) #f(compiled-function (&rest args2) #<bytecode 0x68db0840b9799c7>) #f(compiled-function (&rest args2) #<bytecode 0x6821c372fab19c7>))
  esf-flash-doit(#f(compiled-function (&rest _it) #<bytecode 0x19dc700cc452>) #f(compiled-function (&rest args2) #<bytecode 0x68db0840b9799c7>) #f(compiled-function (&rest args2) #<bytecode 0x6821c372fab19c7>) #f(compiled-function (&rest args2) #<bytecode 0xa28255960f219d0>))
  ad-Advice-eval-last-sexp(#<subr eval-last-sexp> t)
  apply(ad-Advice-eval-last-sexp #<subr eval-last-sexp> t)
  eval-last-sexp(t)
  eval-print-last-sexp(nil)
  funcall-interactively(eval-print-last-sexp nil)
  command-execute(eval-print-last-sexp)

Expecting behavior

The parser should treat full-width characters as normal text instead of syntactic elements.

P.S. Both Failing.bib and Passing.bib pass validation by biber (via biber --tool -V Failing.bib / biber --tool -V Passing.bib).

`point-at-eol` is obsolete

I just noticed this with my Emacs 29 install:

⛔ Warning (comp): parsebib.el:634:21: Warning: ‘point-at-eol’ is an obsolete function (as of 29.1); use ‘line-end-position’ or ‘pos-eol’ instead.

Looking a bit, that second alternative is new, but the other is not.

Normalize key and type fields?

My understanding is that BibTeX keys and entry types are both case-insesitive. Given that, would it make sense to normalize them in parsebib to have only lowercase letters?

Move `ebib-clean-TeX-markup` to parsebib

Follow-up to:

emacs-citar/citar#535

The function doesn't reference any other ebib functions, but it does rely on ebib-TeX-markup-replace-alist, so I assume that would need to be moved as well.

But it seems a straightforward move-and-rename.

Alas, I'm not familiar enough with this codebase to know how best to then integrate it here.

Should probably ~~be expanded to do~~ also at some point add a parallel function that does the same for CSL JSON markup, though the use of markup there isn't really standardized ATM.

New Release Tag?

Hello,

As mentioned in my comment on the relevant commit, the internal version number for this library has been bumped to 4.1, but there hasn't been a release tag created since 2021-12-08 with 3.1. Also, with the recent changes that merged some functionality from Ebib, the latest version of Ebib now relies on code from 4.1, even though it has not yet been 'released'.

Is there a way we could have a new release tag created, to communicate to others that there has been a major update?

String definitions are always returned braced, even when they are unbraced in the file

I have the following string definitions:

@String{up = {University Press}}
@String{cup = "Cambridge " # up}
@String{oup = "Oxford " # up}

Bibtex and biblatex both understand such recursive strings. Ebib does not (yet), so I was working on a PR to fix this. Importantly, given the above configuration cup will render as Cambrige University Press, but given this config it won't:

@String{up = {University Press}}
@String{cup = {"Cambridge " # up}}

In this case, Bib(La)TeX recognises that the value is braced and therefore 'protected'. I found I couldn't distinguish the cases in ebib, because for both configurations it returns the abbrev's value/definition as braced (that is, both case it returns the string {"Cambridge " # up}. I had to unbrace this to get the expansion to work, but that misses the situations where there are intentional braces to protect the values.

Tracing the code, ebib just returns whatever is in its database, and that is determined by parsebib. Honestly the code is a bit dense for me, but I'm fairly sure the problem is in parsebib-read-string.

(As a side note, looking at parsebib's docstrings, it seems that it can expand more complex strings in field values like "Some " # up. I was writing a PR which modified ebib-get-string to split such strings and process them recursively. Is this worth it, or is there an easier way with parsebib?

Aligning key/type field names in json and bib/latex

A follow-up to #12.

The changes I needed to make to use parsebib-parse for the initial bibtex-actions candidate list, and therefore also bypass all that bibtex-completion code, were trivial.

Two high-level conclusions:

The mapping approach as in #12 (comment) mostly works (see screenshot).
Except, there's seems to be a minor problem with the different ways types and keys are represented in bib/latex and in csl json, so bibtex-completion functions which depends on looking that data up directly (like edit-entry it seems; actually, not sure on this) don't work with the json representation.

For parsebib, the obvious solution for both is what we've previously discussed: parsebib-get-value to also address 1, and for our packages to use that.

On the bibtex-completion end, if we could get this addressed, it should allow you to simultaneously remove a lot of code, @tmalsburg, and secondarily get json support. I don't see a lot places in the current code that look explicitly and directly for =key=, for example; most are using the bibtex-completion-get-value wrapper, which could be adjusted to look for both =key= and id, or to call a possible parsebib alternative.

direct parsing of files, csl-json feedback

Why is parsing restricted to the buffer?

Is it impractical to allow direct file parsing?

Bug in `parsebib--convert-tex-italics`

parsebib--convert-tex-italics introduced in 83a77ea errors when the submatch has a backslash followed by a character that signals an error in replace-match. An example is the title "Fexprs as the Basis of {{Lisp}} Function Application, or, {\\emph{\\$vau}}: {{The}} Ultimate Abstraction", which causes replace-match to signal the error "Invalid use of `\\' in replacement text". parsebib--convert-tex-bold and parsebib--convert-tex-small-caps are also subject to this bug.

Entries with `}` as final character in abstract appears to fail.

M-x parsebib-find-next-item does not show up

Versions
emacs: 24.3.1
.emacs

; bare bones
(require 'package)
(package-initialize)
(require 'parsebib)
;(load "parsebib.el")
;(load "/home/chaitanya/.emacs.d/elpa/parsebib-20150205.1305/parsebib.el")

C-h f parse-bib-next-item shows help about the function.

But, M-x parsebib-find-next-item does not show up. Says, No match.

Tried (load "parsebib.el") and also (load "absolute-path-to-parsebib.el"); both give same result as above.

When loading .el file do not byte compile to .elc files; although, I am not sure if this should be the case.

Since, parsebib is not working for me, I can't use helm-bibtex and therefore org-ref.

Might be related to tmalsburg/helm-bibtex#25

Excluding fields with parsebib-parse?

I know I can explicitly itemize the fields I want returned, but is there an easy way to exclude one or more fields?

I want, for example, to grab all the data, except abstracts.

If not, I have an easy-enough workaround, but just thought I'd ask.

(parsebib-parse 
  bibtex-actions-bibliography :fields (-flatten bibtex-actions-field-map)))

missing from melpa

hi @joostkremers !
trying to package-install ebib or parsebib causes this error:
package-install-from-archive: http://melpa.org/packages/parsebib-20210108.1525.el: Not found.

BTW, thanks for your great work!

Please add a git tag for version 2.3

Thanks.

support string replacements

Hi there, over at tmalsburg/helm-bibtex#161 (which uses parsebib) we are discussing how to deal with bibtex @string. @tmalsburg points out that replacing strings in bibtex entries should be handled in parsebib, which to me seems right. My PR has some code for attempting this but at the helm-bibtex level. Would code like this be a good addition to parsebib?