joostkremers / parsebib Goto Github PK
View Code? Open in Web Editor NEWElisp library for reading .bib files
License: BSD 3-Clause "New" or "Revised" License
Elisp library for reading .bib files
License: BSD 3-Clause "New" or "Revised" License
When I read an entry that contains a backslash in its title (e.g., $\phi$
), parsebib-read-entry
returns a title in which the backslash is duplicated ($\\phi$
). Is this intentional or a bug? In my case, it causes problems because I have to reconstruct what the original string was. (Context: tmalsburg/helm-bibtex#83)
The error I have is in other libraries use of parsebib but eval-buffer of parsebib.el fails as well.
It seems to boils down to these strings
(setq val "\N{DOUBLE DAGGER}")
; (invalid-read-syntax "\\N{DOUBLE DAGGER}" 4 30)
(setq val "\n{DOUBLE DAGGER}")
"
{DOUBLE DAGGER}"
(setq val "\\N{DOUBLE DAGGER}")
"\\N{DOUBLE DAGGER}"
(emacs-version)
"GNU Emacs 29.0.50 (build 1, x86_64-w64-mingw32)"
(pkg-info-version-info 'parsebib)
"20230228.1530"
Thanks!
Just thought I'd add this here, based on this comment/conversation.
emacs-citar/citar#397 (comment)
And suggestions from @andras-simonyi:
Yes,
ol-bibtex.el
(néeorg-bibtex.el
) also contains functionality to read BibTeX entries from org-bibtex files, and the most important function in this regard isorg-bibtex-headline
which reads a single BibTeX entry from the Org headline (properties) at point. A slightly modified form of that function (citeproc-bt-from-org-headline
) is used in theciteproc-el
org-bibtex reader -- the "parser" basically calls that function for all headlines in the Org file viaorg-map-entries
. It'd probably be trivial to do something very similar in parsebib, although I'm not sure about the performance implications.
If using a compiled version of parsebib by helm-bibtex, function parsebib-read-entry returns an error because cl-loop is not defined. This results in an empty list when calling helm-bibtex.
Inserting
(eval-when-compile
(require 'cl-macs))
after 'require' at beginning of parsebib.el solved the problem.
The reason I ask is for interaction with revision control systems like git and subversion. If bib databases are dumped into, for example, lexicographically-sorted .bib
files, then RCSes will be able to accurately identify diffs. But if they are dumped in an arbitrary order, then the bib files will effectively be binary files from the standpoint of RCSes (or, indeed, anything that relies on diff
).
Here's the current definition:
(defun parsebib--match-paren-forward ()
"Move forward to the closing paren matching the opening paren at point.
This function handles parentheses () and braces {}. Return t if a
matching parenthesis was found. Note that this function puts
point right before the closing delimiter (unlike e.g.,
`forward-sexp', which puts it right after.)"
(cond
((eq (char-after) ?\{)
(parsebib--match-brace-forward))
((eq (char-after) ?\()
;; This is really a hack. We want to allow unbalanced parentheses in
;; field values (BibTeX does), so we cannot use forward-sexp
;; here. For the same reason, looking for the matching paren by hand
;; is pretty complicated. However, balanced parentheses can only be
;; used to enclose entire entries (or @STRINGs or @PREAMBLEs) so we
;; can be pretty sure we'll find it right before the next @ at the
;; start of a line, or right before the end of the file.
(let ((beg (point)))
(re-search-forward parsebib--entry-start nil 0)
(skip-chars-backward "@ \n\t\f")
(if (eq (char-after) ?\))
;; if we've found a closing paren, return t
t
;; otherwise put the cursor back and signal an error
(goto-char beg)
(signal 'scan-error (list "Unbalanced parentheses" beg (point-max))))))))
This fails to successfully parse this entry in one of my bib files:
@ARTICLE(Stix:LivingRoom,
AUTHOR = {Gary Stix},
TITLE = "Domesticating Cyberspace",
JOURNAL = SciAm,
YEAR = {1993},
VOLUME = {269},
NUMBER = {2},
PAGES = {100--110},
MONTH = aug
)
@ARTICLE(Bal:DistPLs,
...
What happens when I step through this is that we take the branch for matching parens, then we jump forward to the second @
(for Bal:DistPLs
). The next step is to skip-chars-backward, which puts the cursor right after the closing paren that we are looking for. But the code assumes that the cursor will be right before that closing paren -- it checks (char-after)
when, AFAICT, it should check (char-before)
.
Hello,
I believe that after the f41befa commit, the package is failing to compile on emacs 26.3.
I caught this when I looked into why the ox-hugo
test suite started failing just on emacs 26.3. ox-hugo
uses citeproc
for some of its tests and that package depends on parsebib
.
Eager macro-expansion failure: (error "rx ‘not’ syntax error: ]")
...
mapconcat(#f(compiled-function (x) #<bytecode 0x13ec1e1>) ((: (* (: "[" (* (not "]")) "]")) "{" (group-n 2 (0+ (not "}"))) (opt "}")) (group-n 3 letter)) "\\|")
rx-or((or (: (* (: "[" (* (not "]")) "]")) "{" (group-n 2 (0+ (not "}"))) (opt "}")) (group-n 3 letter)))
rx-form((or (: (* (: "[" (* (not "]")) "]")) "{" (group-n 2 (0+ (not "}"))) (opt "}")) (group-n 3 letter)) *)
rx-kleene((opt (or (: (* (: "[" (* (not "]")) "]")) "{" (group-n 2 (0+ (not "}"))) (opt "}")) (group-n 3 letter))))
rx-form((opt (or (: (* (: "[" (* (not "]")) "]")) "{" (group-n 2 (0+ (not "}"))) (opt "}")) (group-n 3 letter))) :)
#f(compiled-function (x) #<bytecode 0x13ebe99>)((opt (or (: (* (: "[" (* (not "]")) "]")) "{" (group-n 2 (0+ (not "}"))) (opt "}")) (group-n 3 letter))))
mapconcat(#f(compiled-function (x) #<bytecode 0x13ebe99>) ((* blank) (opt (or (: (* (: "[" (* ...) "]")) "{" (group-n 2 (0+ (not "}"))) (opt "}")) (group-n 3 letter)))) nil)
rx-and((: (* blank) (opt (or (: (* (: "[" (* ...) "]")) "{" (group-n 2 (0+ (not "}"))) (opt "}")) (group-n 3 letter)))))
rx-form((: (* blank) (opt (or (: (* (: "[" (* ...) "]")) "{" (group-n 2 (0+ (not "}"))) (opt "}")) (group-n 3 letter)))) :)
#f(compiled-function (x) #<bytecode 0x13ebe99>)((: (* blank) (opt (or (: (* (: "[" (* ...) "]")) "{" (group-n 2 (0+ (not "}"))) (opt "}")) (group-n 3 letter)))))
mapconcat(#f(compiled-function (x) #<bytecode 0x13ebe99>) ("\\" (group-n 1 (or (1+ letter) nonl)) (: (* blank) (opt (or (: (* (: "[" ... "]")) "{" (group-n 2 (0+ ...)) (opt "}")) (group-n 3 letter))))) nil)
rx-and((and "\\" (group-n 1 (or (1+ letter) nonl)) (: (* blank) (opt (or (: (* (: "[" ... "]")) "{" (group-n 2 (0+ ...)) (opt "}")) (group-n 3 letter))))))
rx-form((and "\\" (group-n 1 (or (1+ letter) nonl)) (: (* blank) (opt (or (: (* (: "[" ... "]")) "{" (group-n 2 (0+ ...)) (opt "}")) (group-n 3 letter))))))
rx-to-string((and "\\" (group-n 1 (or (1+ letter) nonl)) (: (* blank) (opt (or (: (* (: "[" ... "]")) "{" (group-n 2 (0+ ...)) (opt "}")) (group-n 3 letter))))) t)
#f(compiled-function (&rest regexps) #<bytecode 0x13ed2bd>)("\\" (group-n 1 (or (1+ letter) nonl)) (: (* blank) (opt (or (: (* (: "[" (* ...) "]")) "{" (group-n 2 (0+ (not "}"))) (opt "}")) (group-n 3 letter)))))
(rx "\\" (group-n 1 (or (1+ letter) nonl)) (: (* blank) (opt (or (: (* (: "[" (* ...) "]")) "{" (group-n 2 (0+ (not "}"))) (opt "}")) (group-n 3 letter)))))
(cons (quote parsebib--replace-command-or-accent) (rx "\\" (group-n 1 (or (1+ letter) nonl)) (: (* blank) (opt (or (: (* (: "[" ... "]")) "{" (group-n 2 (0+ ...)) (opt "}")) (group-n 3 letter))))))
(list (cons (quote parsebib--replace-command-or-accent) (rx "\\" (group-n 1 (or (1+ letter) nonl)) (: (* blank) (opt (or (: (* ...) "{" (group-n 2 ...) (opt "}")) (group-n 3 letter)))))) (cons (quote parsebib--replace-literal) (rx-to-string (\` (or (\,@ (mapcar (function car) parsebib-TeX-literal-replacement-alist)) (1+ blank))))))
(\` ((parsebib--replace-command-or-accent \, (rx "\\" (group-n 1 (or (1+ letter) nonl)) (: (* blank) (opt (or (: ... "{" ... ...) (group-n 3 letter)))))) (parsebib--replace-literal \, (rx-to-string (\` (or (\,@ (mapcar ... parsebib-TeX-literal-replacement-alist)) (1+ blank)))))))
(defvar parsebib-TeX-markup-replacement-alist (\` ((parsebib--replace-command-or-accent \, (rx "\\" (group-n 1 (or (1+ letter) nonl)) (: (* blank) (opt (or ... ...))))) (parsebib--replace-literal \, (rx-to-string (\` (or (\,@ ...) (1+ blank))))))) "Alist of replacements and strings for TeX markup.\nThis is used in `parsebib-clean-TeX-markup' to make TeX markup more\nsuitable for display. Each item in the list consists of a replacement\nand a regexp. The replacement can be a string (which will\nsimply replace the match) or a function (the match will be\nreplaced by the result of calling the function on the match\nstring). Earlier elements are evaluated before later ones, so if\none string is a subpattern of another, the second must appear\nlater (e.g. \"''\" is before \"'\").\n\nFor the common cases of replacing a LaTeX command or a literal\nit is faster to use `parsebib-TeX-command-replacement-alist'\nand `parsebib-TeX-literal-replacement-alist' respectively.")
eval-buffer(#<buffer *load*-955859> nil "/tmp/runner/ox-hugo-dev/elpa_26/parsebib-20220916.2236/parsebib.el" nil t) ; Reading at buffer position 18963
load-with-code-conversion("/tmp/runner/ox-hugo-dev/elpa_26/parsebib-20220916.2236/parsebib.el" "/tmp/runner/ox-hugo-dev/elpa_26/parsebib-20220916.2236/parsebib.el" nil t)
require(parsebib)
eval-buffer(#<buffer *load*-705935> nil "/tmp/runner/ox-hugo-dev/elpa_26/citeproc-20220921.1924/citeproc-itemgetters.el" nil t) ; Reading at buffer position 1376
load-with-code-conversion("/tmp/runner/ox-hugo-dev/elpa_26/citeproc-20220921.1924/citeproc-itemgetters.el" "/tmp/runner/ox-hugo-dev/elpa_26/citeproc-20220921.1924/citeproc-itemgetters.el" nil t)
require(citeproc-itemgetters)
eval-buffer(#<buffer *load*-991405> nil "/tmp/runner/ox-hugo-dev/elpa_26/citeproc-20220921.1924/citeproc.el" nil t) ; Reading at buffer position 1679
load-with-code-conversion("/tmp/runner/ox-hugo-dev/elpa_26/citeproc-20220921.1924/citeproc.el" "/tmp/runner/ox-hugo-dev/elpa_26/citeproc-20220921.1924/citeproc.el" t t)
require(citeproc nil t)
(I am using Spacemacs and have tons of customization, but I assume this is irrelevant? If the following is not enough to reproduce the issue, I will try again in a fresh install / vanilla emacs.)
% Failing.bib
@article{Title,
title = {{Title}},
author = {Author},
year = {1970},
journal = {Journal},
abstract = {(} % <-- Culprit
}
% Passing.bib
@article{Title,
title = {{Title}},
author = {Author},
year = {1970},
journal = {Journal},
abstract = {()} % <-- No error
}
Evaluation results:
(parsebib-parse "/tmp/Passing.bib")
#s(hash-table size 65 test equal rehash-size 1.5 rehash-threshold 0.8125 data ("Title" (("abstract" . "()<–Noerror") ("journal" . "Journal") ("year" . "1970") ("author" . "Author") ("title" . "Title") ("=type=" . "article") ("=key=" . "Title"))))
(parsebib-parse "/tmp/Failing.bib")
Debugger entered--Lisp error: (scan-error "Unbalanced parentheses" 23 145)
scan-sexps(23 1)
forward-sexp(1)
parsebib--match-brace-forward()
parsebib--match-paren-forward()
parsebib-read-entry("article" nil #<hash-table equal 0/65 0x15659cd2c74d> nil t)
parsebib-parse-bib-buffer(:entries #<hash-table equal 0/65 0x15659cd2c72d> :strings #<hash-table equal 0/65 0x15659cd2c74d> :expand-strings t :inheritance t :fields nil :replace-TeX t)
#f(compiled-function (file) #<bytecode -0x424eb6921cb642f>)("/tmp/Failing.bib")
parsebib-parse("/tmp/Failing.bib")
(progn (parsebib-parse "/tmp/Failing.bib"))
elisp--eval-last-sexp(t)
#<subr eval-last-sexp>(t)
#f(compiled-function (&rest _it) #<bytecode 0x19dc700cc452>)()
eval-sexp-fu-flash-doit-simple(#f(compiled-function (&rest _it) #<bytecode 0x19dc700cc452>) #f(compiled-function (&rest args2) #<bytecode 0x68db0840b9799c7>) #f(compiled-function (&rest args2) #<bytecode 0x6821c372fab19c7>))
eval-sexp-fu-flash-doit(#f(compiled-function (&rest _it) #<bytecode 0x19dc700cc452>) #f(compiled-function (&rest args2) #<bytecode 0x68db0840b9799c7>) #f(compiled-function (&rest args2) #<bytecode 0x6821c372fab19c7>))
esf-flash-doit(#f(compiled-function (&rest _it) #<bytecode 0x19dc700cc452>) #f(compiled-function (&rest args2) #<bytecode 0x68db0840b9799c7>) #f(compiled-function (&rest args2) #<bytecode 0x6821c372fab19c7>) #f(compiled-function (&rest args2) #<bytecode 0xa28255960f219d0>))
ad-Advice-eval-last-sexp(#<subr eval-last-sexp> t)
apply(ad-Advice-eval-last-sexp #<subr eval-last-sexp> t)
eval-last-sexp(t)
eval-print-last-sexp(nil)
funcall-interactively(eval-print-last-sexp nil)
command-execute(eval-print-last-sexp)
The parser should treat full-width characters as normal text instead of syntactic elements.
P.S. Both Failing.bib
and Passing.bib
pass validation by biber
(via biber --tool -V Failing.bib
/ biber --tool -V Passing.bib
).
I just noticed this with my Emacs 29 install:
⛔ Warning (comp): parsebib.el:634:21: Warning: ‘point-at-eol’ is an obsolete function (as of 29.1); use ‘line-end-position’ or ‘pos-eol’ instead.
Looking a bit, that second alternative is new, but the other is not.
My understanding is that BibTeX keys and entry types are both case-insesitive. Given that, would it make sense to normalize them in parsebib to have only lowercase letters?
Follow-up to:
The function doesn't reference any other ebib functions, but it does rely on ebib-TeX-markup-replace-alist
, so I assume that would need to be moved as well.
But it seems a straightforward move-and-rename.
Alas, I'm not familiar enough with this codebase to know how best to then integrate it here.
Should probably be expanded to do also at some point add a parallel function that does the same for CSL JSON markup, though the use of markup there isn't really standardized ATM.
Hello,
As mentioned in my comment on the relevant commit, the internal version number for this library has been bumped to 4.1, but there hasn't been a release tag created since 2021-12-08 with 3.1. Also, with the recent changes that merged some functionality from Ebib, the latest version of Ebib now relies on code from 4.1, even though it has not yet been 'released'.
Is there a way we could have a new release tag created, to communicate to others that there has been a major update?
I have the following string definitions:
@String{up = {University Press}}
@String{cup = "Cambridge " # up}
@String{oup = "Oxford " # up}
Bibtex and biblatex both understand such recursive strings. Ebib does not (yet), so I was working on a PR to fix this. Importantly, given the above configuration cup
will render as Cambrige University Press
, but given this config it won't:
@String{up = {University Press}}
@String{cup = {"Cambridge " # up}}
In this case, Bib(La)TeX recognises that the value is braced and therefore 'protected'. I found I couldn't distinguish the cases in ebib, because for both configurations it returns the abbrev's value/definition as braced (that is, both case it returns the string {"Cambridge " # up}
. I had to unbrace this to get the expansion to work, but that misses the situations where there are intentional braces to protect the values.
Tracing the code, ebib just returns whatever is in its database, and that is determined by parsebib. Honestly the code is a bit dense for me, but I'm fairly sure the problem is in parsebib-read-string
.
(As a side note, looking at parsebib's docstrings, it seems that it can expand more complex strings in field values like "Some " # up
. I was writing a PR which modified ebib-get-string
to split such strings and process them recursively. Is this worth it, or is there an easier way with parsebib?
A follow-up to #12.
The changes I needed to make to use parsebib-parse
for the initial bibtex-actions candidate list, and therefore also bypass all that bibtex-completion code, were trivial.
Two high-level conclusions:
edit-entry
it seems; actually, not sure on this) don't work with the json representation.For parsebib, the obvious solution for both is what we've previously discussed: parsebib-get-value
to also address 1, and for our packages to use that.
On the bibtex-completion end, if we could get this addressed, it should allow you to simultaneously remove a lot of code, @tmalsburg, and secondarily get json support. I don't see a lot places in the current code that look explicitly and directly for =key=
, for example; most are using the bibtex-completion-get-value
wrapper, which could be adjusted to look for both =key=
and id
, or to call a possible parsebib alternative.
Why is parsing restricted to the buffer?
Is it impractical to allow direct file parsing?
parsebib--convert-tex-italics
introduced in 83a77ea errors when the submatch has a backslash followed by a character that signals an error in replace-match
. An example is the title "Fexprs as the Basis of {{Lisp}} Function Application, or, {\\emph{\\$vau}}: {{The}} Ultimate Abstraction"
, which causes replace-match
to signal the error "Invalid use of `\\' in replacement text"
. parsebib--convert-tex-bold
and parsebib--convert-tex-small-caps
are also subject to this bug.
See also tmalsburg/helm-bibtex#436
; bare bones
(require 'package)
(package-initialize)
(require 'parsebib)
;(load "parsebib.el")
;(load "/home/chaitanya/.emacs.d/elpa/parsebib-20150205.1305/parsebib.el")
C-h f parse-bib-next-item shows help about the function.
But, M-x parsebib-find-next-item does not show up. Says, No match.
Tried (load "parsebib.el") and also (load "absolute-path-to-parsebib.el"); both give same result as above.
When loading .el file do not byte compile to .elc files; although, I am not sure if this should be the case.
Since, parsebib is not working for me, I can't use helm-bibtex and therefore org-ref.
Might be related to tmalsburg/helm-bibtex#25
I know I can explicitly itemize the fields I want returned, but is there an easy way to exclude one or more fields?
I want, for example, to grab all the data, except abstracts.
If not, I have an easy-enough workaround, but just thought I'd ask.
(parsebib-parse
bibtex-actions-bibliography :fields (-flatten bibtex-actions-field-map)))
hi @joostkremers !
trying to package-install
ebib or parsebib causes this error:
package-install-from-archive: http://melpa.org/packages/parsebib-20210108.1525.el: Not found
.
BTW, thanks for your great work!
Thanks.
Hi there, over at tmalsburg/helm-bibtex#161 (which uses parsebib) we are discussing how to deal with bibtex @string
. @tmalsburg points out that replacing strings in bibtex entries should be handled in parsebib, which to me seems right. My PR has some code for attempting this but at the helm-bibtex level. Would code like this be a good addition to parsebib?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.