Comments (4)
Also, something to think about - should we move some dependencies into "optional"?
https://setuptools.pypa.io/en/latest/userguide/dependency_management.html
from pyserini.
On Ubuntu 18.04.6. I did the following:
conda env list
conda create -n pyserini-pypi-test python=3.9
conda activate pyserini-pypi-test
conda install -c pytorch pytorch faiss-cpu
pip install pyserini
The important package versions:
$ pip list | egrep '(numpy|pyjnius|transformers|torch|sentencepiece|faiss|scikit-learn|lightgbm|spacy|pandas)\s'
faiss 1.7.4
lightgbm 4.0.0
numpy 1.25.2
pandas 2.1.0
pyjnius 1.5.0
scikit-learn 1.3.0
sentencepiece 0.1.99
spacy 3.6.1
torch 2.0.1
transformers 4.32.1
Above config, will get this error for faiss
:
ImportError: libmkl_intel_lp64.so.1: cannot open shared object file: No such file or directory
Issue described here: facebookresearch/faiss#2890
Fix is to install mkl
separately:
conda install mkl=2021
@Andrwyl in the student cs env, did you have any issues?
from pyserini.
For the student env, from the beginning all the way to the end there were no issues. Following the instructions exactly should work all the way to the end of onboarding. The only issue that you run into is there is a cpu limit on every student account which causes you to be kicked off while doing dense retrieval. ulimit -t unlimited
gives you unlimited cpu time.
https://uwaterloo.ca/computer-science-computing-facility/teaching-hosts shows the specs of the student computers. The resources are shared so its possible that on high usage days you may end up failing (has not happened to me though).
To access, any uwaterloo student can just do ssh [email protected]
, but I'm pretty sure nearly everyone has used this for a 2nd year CS class
from pyserini.
Added reference to student teaching hosts in onboarding guide: https://github.com/castorini/onboarding/blob/master/ura.md#initial-screening
from pyserini.
Related Issues (20)
- merge a large index with small index \ adding small collection of docs to a large index
- Pyserini download index doesn't actually appear to check tarball size
- Install Failed building wheel for nmslib with pybind11-2.6.1
- How are you handling duplicate entries for the corpus and qrels? HOT 1
- mContriever pre-built index for Mr.TyDi datasets
- Support for jsonl.gz input in pyserini.encode
- Optimizations when building a dense index
- Improper Contriever encoding with the current pyserini.encode class
- Error When Setting Up Pyserini: python -m spacy download en_core_web_sm
- Issue with fetching raw documents
- Create a `Rerank` module in Pyserini
- Contriever training script & hyper-parameter values
- Trying to index own corpus
- Lucene query runtime
- Exception: Unable to find javac HOT 1
- NEED HELP: How to get original documents from Faiss index?
- Enhance Onboarding Documentation: Highlight Existing Embedded Indexes
- Update docs about fetching doc text given docid for dense indexes HOT 1
- Missing module 'jnius_config'
- List of encoder supported
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pyserini.