Git Product home page Git Product logo

Comments (16)

elextr avatar elextr commented on June 8, 2024

Line 263 of what file? Link?

from asciidoc-py.

lfkeitel avatar lfkeitel commented on June 8, 2024

from asciidoc-py.

elextr avatar elextr commented on June 8, 2024

here

from asciidoc-py.

lfkeitel avatar lfkeitel commented on June 8, 2024

from asciidoc-py.

lfkeitel avatar lfkeitel commented on June 8, 2024

from asciidoc-py.

MasterOdin avatar MasterOdin commented on June 8, 2024

Yeah, need to detect mode passed to read (and write) and then decode/encode) appropriate within the function to just expect to deal with strings going in and going out.

Do you have a file and arguments you've used to run this to test?

from asciidoc-py.

elextr avatar elextr commented on June 8, 2024

Possibly the read should read bytes if encoding is not known, not strings, and then its decoded to Unicode string. (Python 3 decode() moved to bytes objects).

from asciidoc-py.

lfkeitel avatar lfkeitel commented on June 8, 2024

from asciidoc-py.

lfkeitel avatar lfkeitel commented on June 8, 2024

I ran into the error when building a package for newsboat on Fedora. Its documentation is generated using a2x. The current development version of Fedora uses this package instead of the original Python 2 version and was failing the doc builds. For a quick repro, from the newsboat repository run a2x -f xhtml doc/faq.txt. If you would like a specific tag I'm using r2.11.1.

from asciidoc-py.

elextr avatar elextr commented on June 8, 2024

@lfkeitel Fedora is maybe a bit early off the mark, this is still developmental, see #15 and the repository message.

from asciidoc-py.

lfkeitel avatar lfkeitel commented on June 8, 2024

I'm well aware of that. It wasn't my decision. Fedora 29 is going with Python 3 as the default and they're trying their best to prepare. I just have to deal with it. I've already put in a comment with them about it so they may make a patch to the pre-release package until it's fixed upstream. I just thought you would like to know that I ran into the bug.

from asciidoc-py.

elextr avatar elextr commented on June 8, 2024

@lfkeitel no problem, thanks for reporting, certainly Fedora should be used to making patches to early release packages, and updating them regularly, its a fairly bleeding edge distro after all :)

from asciidoc-py.

MasterOdin avatar MasterOdin commented on June 8, 2024

So looking at this further, it seems that we can either remove the whole encoding check and just assume UTF-8 always, or rewrite read_file slightly such that it loads the file as a byte string, reads the first line or two to check if there's an encoding and if there is, then we decode the whole file as that encoding, else we fallback to standard 'utf-8'. Let me know which route you'd like to follow @elextr.

For now, I've modified #5 to just always assume UTF-8 and that should fix @lfkeitel's problem at least.

from asciidoc-py.

lfkeitel avatar lfkeitel commented on June 8, 2024

Thanks. This is how asciidoc3 handles it: https://github.com/asciidoc3/asciidoc3/blob/master/a2x3.py#L302. If it helps at all. They just call encode with the detected encoding.

from asciidoc-py.

MasterOdin avatar MasterOdin commented on June 8, 2024

Well, except that you're still opening the file in your default locale (which might be UTF-8 or might not, docker alpine defaults to ASCII) so you're already doing a transformation on file load and then just encoding it later. If you're going to care about the encoding, it should be done at the file level when you're reading it in. Of course, that also just always encodes it in utf-8 if the file specifies an encoding which is...weird.

Also, that has a bug in that it does a str() around a byte object which results in a string that starts with b' and ends with ', though that's a separate issue.

from asciidoc-py.

MasterOdin avatar MasterOdin commented on June 8, 2024

I think what you'd probably want to do is something like:

with open(filename, 'rb') as open_file:
        contents = open_file.read()
mo = re.search(b'\A<\?xml.* encoding="(.*?)"', contents)
contents = contents.decode(mo.group(1) if mo else 'utf-8')

to more properly read the file and get it as a proper unicode string without losing any characters, though I'm not sure how that would affect things in case it writes this stuff back out to a file (that's probably expecting utf-8).

from asciidoc-py.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.