Comments (4)
Hi,
for example, with the command
./sort_arpa 2 ../test_data/arpa ../test_data/1-grams.sorted.gz arpa_sorted_2grams
you sort in suffix order the 2-grams of the test ARPA file test_data/arpa
.
The output is a file (called arpa_sorted_2grams
in the example above) with all the sorted 2-grams.
You must supply the vocabulary as a list of uni-grams (test_data/1-grams.sorted.gz
) in the example.
So you sort all orders and you concatenate them together (plus the ARPA header too).
Let me know is everything is clear now.
PS. There was a minor bug that I fixed now. So pull the new version of the code before trying again.
from tongrams.
Hi,
are you sure your arpa file is sorted correctly?
It should be sorted in suffix order.
If it is not sorted, you can use the sorting utility src/sort_arpa.cpp
, here https://github.com/jermp/tongrams/blob/master/src/sort_arpa.cpp
Let me know.
from tongrams.
Hi, thanks for the quick response,
I have tried "sort_arpa", the thing is the output of that command isn't ARPA format, so I couldn't use it with build_trie, not sure if I am misunderstanding something.
from tongrams.
Thanks. working now.
from tongrams.
Related Issues (18)
- Using Tongrams HOT 6
- Can't compile tongrams HOT 3
- SIGABRT Crash HOT 10
- format for vocabulary file HOT 1
- an error when I try python tongrams HOT 2
- how to use tongram in a class HOT 1
- Move as much of things out of headers as possible and make tongrams a shared library HOT 3
- Implement building ngrams storage via python HOT 4
- lookup() - Segmentation fault when ngram is not in data structure HOT 5
- Sequence is not sorted HOT 9
- Can't load MPH-based models in Python HOT 1
- sort_arpa can't work HOT 4
- Update external dependencies and use PTHash instead of EMPHF
- One master tool instead of many different executables
- Remove dependency from boost
- sort_grams - found the bug causing the exception HOT 4
- Compile fails on gcc 4.9, Debian Jessie HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tongrams.