Comments (3)
Hi @mathias3! Help would be very welcome. Although I don't have so much time for this repository, it's still a good thing to have and fills a niche that cannot be filled by more recent advancement in the NLP world.
I've created version 0.1.17 and I have fixed the most glaring issues with the repository, mainly related to the gensim and python incompatibilities.
There is also still the develop
branch, which contains many fixes and new features I originally planned to implement or are implemented partially. For example, the code for the following models is fully or partially there:
- Added Hierarchical (Convolutional) Embeddings for all Models
- Added MaxPooling
- Added Features to Sentencevectors
- Added further unittests
- Workaround for Numpy memmap issue (numpy/numpy#13172)
- SVD ram subsampling for SIF / uSIF (customizable, standard is 1 GB of RAM)
- Minor fixes for nan-handling
- Minor fixes for sentencevectors class
There are a few things which might make sense to add to the roadmap:
- Newer models (I don't know, not up to date in this regard)
- Working the hierarchical op into the main averaging cython routine
- Support for a user definable embedding class (i.e. fse version of
BaseKeyedVectors
to get away from the Gensim dependency) - Different CI (Travis free mode not longer available)
- Add
pre_inference
andpost_inference
(I think I forgot this one) - Refactoring the horribly complicated
Input
class - Reworking the threading (at least from my last experience the input thread is the bottleneck, not the actual computation)
- Untangling the bad design decision to actually store the
BaseKeyedVector
from Gensim internally. If users want mmap, they can just load that and pass it. - Edit: Approximate nearest neighbor search (i.e. by annoy support)?
- Return vectors only above a certain threshold #34
- Fix zero division error #47
Happy to work on some of the issues as well, should have more time next year
Who might be interested to help?
@mathias3 @grantmwilliams @AlexMRuch
from fast_sentence_embeddings.
@mathias3: There is also a new version on pypi: 0.1.17
from fast_sentence_embeddings.
Fixed / added in 0.2.0
:
- Offering pretrained models and making them accessible
- Fix zero division error
- Bugfixes for python 3.8 builds
- Code refactoring to black style
from fast_sentence_embeddings.
Related Issues (20)
- Encounter "Divided 0 Error" HOT 3
- Paranmt Model HOT 3
- Handling out of vocabulary HOT 2
- Hierarchical (Convolutional) Embeddings HOT 1
- MaxPooling Model
- Add Features to Sentencevectors
- SVD ram subsampling for SIF / uSIF
- Move Away from Travis.CI
- Refactor and benchmark IndexedSentence
- Rework Threading Input class
- Don't absorb KeyedVectors into BaseS2V class
- Add gensim 4.0.0 support HOT 5
- ImportError: cannot import name '_l2_norm' from 'gensim.models.keyedvectors HOT 2
- from the Results, CBOW is best, therefore why use SIF? HOT 1
- S3E pooling?
- out-of-vocabulary imputation? HOT 2
- Have full api document ? HOT 1
- Best way to save a fine-tuned vectorizer object for later use HOT 1
- error with fse.average function
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fast_sentence_embeddings.