Comments (4)
Hi, could you provide a better description of what exactly you need Tongrams to do?
In this way, I can help you better with this problem.
from tongrams.
Currently I plan to use it as an independent compressed storage format for bigrams and maybe to try to add its support into wordninja
and wordsegment
(as they are written in python and use internally looking-up a probability by the n-gram, so these are low-hanging fruits). instant-segment
support cannot really be brought fast properly, since this lib is header-only, so one has to build a binary of this lib first and then us FFIs of ths languages. So currently I plan just convert from tongrams to their format.
I am not very familiar to this lib currently and currently I expect like only
lookup
- something to iterate all the n-grams with their probabilities
- something to create a storage using a list of n-grams (unigrams and bigrams are usually stored separately in the datasets accompanying word splitting libraries)
gonna have uses for my use case.
from tongrams.
If the libraries you want to use store their output in a (rather standard)
Google-like format as the one illustrated in the README, then you can easily
index their content using Tongrams.
The operations are the ones you mentioned, plus perplexity scoring when
probabilities are associated to n-grams.
from tongrams.
If the libraries you want to use store their output in a (rather standard)
Google-like format as the one illustrated in the README,
Most of them use the format that is somehow like the one in the README, but a bit different in whitespaces.
then you can easily index their content using Tongrams.
Do you mean using the CLI tools? I meant using the python API (yeah, for my use case it can be possible to pre-serialize the dataset and then consume it (it is likely to be the default use case), though it is not the best way to deal with it (I think about the lib as a middleware, I have abstract classes for storing ngrams in some "internal" format, and backends have methods to convert models from/to their internal formats to/from the abstraction layer "internal" format).), without any subprocess
calls. Also I meant adding n-grams one-by-one via API, without serializing them as text first into a file only to read the file with tongrams lib.
from tongrams.
Related Issues (15)
- Using Tongrams HOT 6
- Can't compile tongrams HOT 3
- SIGABRT Crash HOT 10
- format for vocabulary file HOT 1
- an error when I try python tongrams HOT 2
- how to use tongram in a class HOT 1
- Move as much of things out of headers as possible and make tongrams a shared library HOT 3
- lookup() - Segmentation fault when ngram is not in data structure HOT 5
- Trying build_trie with arpa file. HOT 4
- Sequence is not sorted HOT 9
- Can't load MPH-based models in Python HOT 1
- sort_arpa can't work HOT 4
- sort_grams - found the bug causing the exception HOT 4
- Compile fails on gcc 4.9, Debian Jessie HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tongrams.