Comments (7)
Using the work of everyone here (thank you everyone!) I've tried to combine the change sets into one clean set of commits and put a shiny new wrapper on things, which also sits on PyPI as pycld3.
https://github.com/bsolomon1124/pycld3
Reviews appreciated. Again, I've made my best effort to make sure the incremental changes across different forks are picked up and put together.
from cld3.
@ipla I've fixed these memory leaks in my fork of CLD3. Basically, the elizafox version creates a new model object on each call to get_language
and on top of it doesn't clean it up. My fork has both the original functions (but cleans up the objects) and a class called LanguageIdentifier
which permits reuse of the model for faster performance.
The fork is iamthebot/cld3
from cld3.
I believe there's still a small error in your fork.
You use the comparison:
str(res.language) != ident.kUnknown:
This is not doing what you think it is.
Originally, res.language
is a CPP string, while ident.kUnknown
is a const char array (with value "und").
However, str(res.language)
does not do the correct coercion in the same way that str(b"hello")
does not decode the string; it just makes a str
representation of that bytes
object.
>>> str(b"hello")
"b'hello'"
>>> str(b"hello") == "hello" # No!
False
What is needed here is:
if <bytes> res.language != <bytes> ident.kUnknown:
You can prove this for yourself by throwing this into get_language()
:
cdef string tst = b"und"
print(tst)
print(str(tst) == ident.kUnknown)
print(tst.decode("utf-8") == ident.kUnknown)
Then
python3 setup.py build_ext --inplace --quiet && python3 -c 'import cld3; cld3.get_language("hello there!")'
Will produce False, False
.
from cld3.
Hi @jasonriesa and @akihiroota87: do the maintainers of google/cld3 have any interest in incorporating Python bindings within this repo, by reviewing and combining the various forks mentioned above?
As a tangentially related change, as a part of those forks, the Chromium dependency was removed. If that wasn't the case, the logical solution might be a git submodule, but since the C source itself has changed in the forks, that becomes difficult.
from cld3.
Thanks @bsolomon1124! I actually just copied that part from the elizafox cld3 fork so I guess many of us had been using this in its broken form for a while lol. The new wrapper looks great and we'll switch to using it soon.
from cld3.
I have been testing the Elizafox/cld3 Python binding and I had severe memory issues. The more sentences I detect, the more memory is used. I don't know if this is an issue in cld3 or in the Python binding specifically.
And given that I cannot open any issue in any of the Python binding forks, I though to report it here.
from cld3.
gcld3
- a Python binding for CLD3 from Google
PyPI: https://pypi.org/project/gcld3/
GitHub: https://github.com/google/cld3/tree/master/gcld3
from cld3.
Related Issues (20)
- gcld3 compatible with pylint and mypy
- ImportError (Python3.9, protobuf 21.5) HOT 1
- Training set HOT 2
- request for documentation: how to add a new language
- Some of the languages not detected properly if repeats the words multiple times
- Some of the English words detect as different language HOT 4
- Import gcld3 fails
- Train a new model
- Can't install gcld3 on MacOS Ventura 13.2.1 HOT 4
- Could not build wheels for gcld3, which is required to install pyproject.toml-based projects HOT 5
- Cannot build with CMake and Make HOT 1
- documentation issue. HOT 1
- Make a release
- Unable to install gcld3 on python 3.10.12 HOT 1
- Cannot detect the regional variations/ dialects in the Language with gcld3
- Unable to install gcld3 on Apple MacOS M2 HOT 1
- Installing on windows requires 'protobuf.lib' file that does not exist
- Please, update installation instructions for windows.
- Please support Traditional Chinese to return "zh-Hant" code. HOT 1
- Is this project DEAD?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cld3.