Comments (5)
hi @jacklin64 @justram can you take a look at this please? thanks!
from pyserini.
Hello @timbmg,
Thank you for bringing this to our attention.
I understand the concern you raised regarding the usage of torch.autocast
in the code that should also run on CPU.
You are correct; however, currently, torch.autocast
only supports bfloat16
when device='cpu'
, and our prebuilt index is based on the CUDA + float16 setting.
As a result, encoding the document corpus with device='cpu' & bfloat16 would deviate from our original setting and might lead to significant processing time.
To address this issue and enable CPU-based encoding, we recommend considering the ONNX setting instead.
If you have any further questions or suggestions, please feel free to share them with us.
from pyserini.
A similar situation might be the case for pyserini/encode/_unicoil.py
.
Is there any documentation available for encoding (and possibly subsequently indexing) using the ONNX setting to enable CPU support?
from pyserini.
We have CPU support in Anserini via ONNX, e.g., https://github.com/castorini/anserini/blob/master/docs/regressions/regressions-msmarco-passage-splade-pp-ed-onnx.md
from pyserini.
@timbmg Seeing no further updates, closing. Please re-open if needed.
from pyserini.
Related Issues (20)
- The anserini library does not load on Windows when the user name is in Chinese. HOT 1
- merge a large index with small index \ adding small collection of docs to a large index
- Pyserini download index doesn't actually appear to check tarball size
- Install Failed building wheel for nmslib with pybind11-2.6.1
- How are you handling duplicate entries for the corpus and qrels? HOT 1
- mContriever pre-built index for Mr.TyDi datasets
- Support for jsonl.gz input in pyserini.encode
- Optimizations when building a dense index
- Improper Contriever encoding with the current pyserini.encode class
- Error When Setting Up Pyserini: python -m spacy download en_core_web_sm
- Issue with fetching raw documents
- Create a `Rerank` module in Pyserini
- Contriever training script & hyper-parameter values
- Trying to index own corpus
- Lucene query runtime
- Exception: Unable to find javac HOT 1
- NEED HELP: How to get original documents from Faiss index?
- Enhance Onboarding Documentation: Highlight Existing Embedded Indexes
- Update docs about fetching doc text given docid for dense indexes HOT 1
- Missing module 'jnius_config'
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pyserini.