Comments (6)
It is not difficult, but currently the library does not support it yet.
If you don't care about the efficiency of the operation, one option is to proceed by scan: retrieve all children of a given prefix and just select the top-k.
For most queries, it will probably be efficiency too because of the few children to be examined.
A more complicated solution could use additional RMQ data structures.
from tongrams.
Hi.
The data structures storing the weights as counts support the lookup operation that, given a ngram, returns its associated weight.
The data structures storing the weights as prob/backoff support, instead, the score operation.
See details in the README (Benchmark section) and in the relevant paper.
Currently, top-k queries are not supported and there is no python binding.
from tongrams.
Thanks @jermp, how difficult would it be to modify things to support some top-k?
The k doesn't have to be unlimited. For example if I have an ngram-prefix, finding the best 5 or 10 words right after the prefix would be highly useful.
from tongrams.
Thanks @jermp, I see the lookup() operation in trie_count_lm.h, but what's the right way to scan all children of a given prefix?
from tongrams.
Currently, there is no method that allows you to do this but I can implement that.
from tongrams.
That would be fantastic.
from tongrams.
Related Issues (15)
- Can't compile tongrams HOT 3
- SIGABRT Crash HOT 10
- format for vocabulary file HOT 1
- an error when I try python tongrams HOT 2
- how to use tongram in a class HOT 1
- Move as much of things out of headers as possible and make tongrams a shared library HOT 3
- Implement building ngrams storage via python HOT 4
- lookup() - Segmentation fault when ngram is not in data structure HOT 5
- Trying build_trie with arpa file. HOT 4
- Sequence is not sorted HOT 9
- Can't load MPH-based models in Python HOT 1
- sort_arpa can't work HOT 4
- sort_grams - found the bug causing the exception HOT 4
- Compile fails on gcc 4.9, Debian Jessie HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tongrams.