Git Product home page Git Product logo

Comments (6)

WesleyAC avatar WesleyAC commented on May 24, 2024

Just ran into this myself, it would be great if there was some option to fix this! Perhaps an option to print the error message in the cell (in red, so it's clear it's not that literal text?)

I'm working with the Firefox history database, so sadly removing the malformed data is not an option :(

from litecli.

amjith avatar amjith commented on May 24, 2024

Do you have an example value that I could use to reproduce this?

from litecli.

WesleyAC avatar WesleyAC commented on May 24, 2024

I uploaded an example database file here: https://hack.wesleyac.com/test.sqlite

Using the invalid unicode value \xc3\x28. Let me know if that's sufficient for you :)

from litecli.

amjith avatar amjith commented on May 24, 2024

Thank you @WesleyAC. I was able to reproduce the issue. The fix is now in a PR (pending review from other core devs).

Long form description of what is going on:

Turns out sqlite3 library for Python uses utf-8 by default which works fine since Sqlite3 stores everything as utf-8. But as you pointed out there could be invalid unicode values that can sneak in. Thankfully the python library allows overriding of the decoder that can be used. So I've caught the exception and applied latin-1 decoding. Unfortunately this is a batch process which means, if a single value has an invalid byte value, the whole set has to use the fallback encoding of latin-1.

It seems to work well for now, but I can't use it to highlight the invalid value in red.

from litecli.

zzl0 avatar zzl0 commented on May 24, 2024

Unfortunately this is a batch process which means, if a single value has an invalid byte value, the whole set has to use the fallback encoding of latin-1.

Seems we can use decode('utf-8', 'backslashreplace') to avoid this issue:

>>> b'\xf0\x9f\x98\x8a\x80abc'.decode('utf-8', 'backslashreplace')
'😊\\x80abc'
>>> b'\xf0\x9f\x98\x8a\x80abc'.decode('latin-1')
\x9f\x98\x8a\x80abc'

from litecli.

zzl0 avatar zzl0 commented on May 24, 2024

I just dived into this issue a little, the root cause of this is:

  1. SQLite uses a dynamic type system (the type is recommended, not required), even though UTF-8 is the default encoding for TEXT type, but SQLite does not check if it's a valid UTF-8 string when inserting to it.
  2. Python's sqlite library is using UTF-8 to decode the TEXT column by default. When it encounters invalid UTF-8 char, it throws UnicodeDecodeError: 'utf-8' codec can't decode byte ... error.

@amjith's CR fixed this issue by catching the UnicodeDecodeError and then try to decode it as latin-1.

from litecli.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.