Comments (16)
Line 263 of what file? Link?
from asciidoc-py.
from asciidoc-py.
from asciidoc-py.
from asciidoc-py.
from asciidoc-py.
Yeah, need to detect mode passed to read (and write) and then decode/encode) appropriate within the function to just expect to deal with strings going in and going out.
Do you have a file and arguments you've used to run this to test?
from asciidoc-py.
Possibly the read should read bytes if encoding is not known, not strings, and then its decoded to Unicode string. (Python 3 decode() moved to bytes objects).
from asciidoc-py.
from asciidoc-py.
I ran into the error when building a package for newsboat on Fedora. Its documentation is generated using a2x. The current development version of Fedora uses this package instead of the original Python 2 version and was failing the doc builds. For a quick repro, from the newsboat repository run a2x -f xhtml doc/faq.txt
. If you would like a specific tag I'm using r2.11.1
.
from asciidoc-py.
@lfkeitel Fedora is maybe a bit early off the mark, this is still developmental, see #15 and the repository message.
from asciidoc-py.
I'm well aware of that. It wasn't my decision. Fedora 29 is going with Python 3 as the default and they're trying their best to prepare. I just have to deal with it. I've already put in a comment with them about it so they may make a patch to the pre-release package until it's fixed upstream. I just thought you would like to know that I ran into the bug.
from asciidoc-py.
@lfkeitel no problem, thanks for reporting, certainly Fedora should be used to making patches to early release packages, and updating them regularly, its a fairly bleeding edge distro after all :)
from asciidoc-py.
So looking at this further, it seems that we can either remove the whole encoding check and just assume UTF-8 always, or rewrite read_file slightly such that it loads the file as a byte string, reads the first line or two to check if there's an encoding and if there is, then we decode the whole file as that encoding, else we fallback to standard 'utf-8'. Let me know which route you'd like to follow @elextr.
For now, I've modified #5 to just always assume UTF-8 and that should fix @lfkeitel's problem at least.
from asciidoc-py.
Thanks. This is how asciidoc3 handles it: https://github.com/asciidoc3/asciidoc3/blob/master/a2x3.py#L302. If it helps at all. They just call encode with the detected encoding.
from asciidoc-py.
Well, except that you're still opening the file in your default locale (which might be UTF-8 or might not, docker alpine defaults to ASCII) so you're already doing a transformation on file load and then just encoding it later. If you're going to care about the encoding, it should be done at the file level when you're reading it in. Of course, that also just always encodes it in utf-8 if the file specifies an encoding which is...weird.
Also, that has a bug in that it does a str()
around a byte object which results in a string that starts with b'
and ends with '
, though that's a separate issue.
from asciidoc-py.
I think what you'd probably want to do is something like:
with open(filename, 'rb') as open_file:
contents = open_file.read()
mo = re.search(b'\A<\?xml.* encoding="(.*?)"', contents)
contents = contents.decode(mo.group(1) if mo else 'utf-8')
to more properly read the file and get it as a proper unicode string without losing any characters, though I'm not sure how that would affect things in case it writes this stuff back out to a file (that's probably expecting utf-8).
from asciidoc-py.
Related Issues (20)
- `GPL-2.0-only` or `GPL-2.0-or-later`? HOT 7
- Released sdist tarballs not usable using autoconf HOT 11
- Can't parse spaces in some circumstances with 10.1.1-1 HOT 4
- Fix deprecated warnings in regexes in a2x
- Fix regex DeprecationWarning in asciidoc
- Trigger website publish action on asciidoc-py release action HOT 1
- Python instalation places files outside of package scope HOT 1
- Blocks modules not included in latest release HOT 1
- tests/test_plugin.py block error HOT 3
- Better document pytest and pytest-mock test dependencies
- [Feature Request] Add support for GemText (Gemini Protocol)
- Remove lang-pl.conf from this repo
- Substitute 0777 with 0o777 HOT 2
- Option --verbose does not work HOT 1
- Add compat attribute flags to control parsing HOT 1
- 10.2.0: issue with directory tree of installed resources HOT 4
- Building Docker image fails: autoconf: error: no input file HOT 1
- Remove legacy from title HOT 7
- Makefile.in installs documentation in $prefix/share/doc because $PACKAGE_TARNAME is empty
- a2x: cannot override docbook backend with custom config anymore
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from asciidoc-py.