Comments (2)
I have been thinking about the NMT prediction score, but I did not find a way to get it out of OpenNMT. There must be a way, but the library is mostly documented if you want to run it on a terminal. For using it as a library, there isn't too much documentation, but if you are interested in figuring out how to get the prediction score out, that would be great.
Anyways, I think that the best way of determining the right normalization candidate would be to do it contextually. Currently, Natas only does normalization one word at a time. You could use a language model to rank the output normalizations in a sentence to pick the ones that seem to form a sentence that makes the most sense.
from natas.
Thanks for the response! After some hunting around, it looks like OpenNMT models will indeed output a prediction score, which we can capture (you're right about the library's documentation being sparse). I'll open a PR for my attempt at doing so. I'm not very familiar with this particular translation model, however, so if you catch any problems, let me know and I'm happy to tweak things or consult further.
I like your idea of using another model to test for normalization validity. Seems like there would be interesting work to be done to determine whether a large model trained on contemporary language (like base BERT) would work well in that scenario, or if you would instead need to train something from scratch (all of EEBO-TCP, for example). I suspect it would depend on what you use for test sentences and whether you prioritize corpus coherence or matching with contemporary orthography.
from natas.
Related Issues (4)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from natas.