Comments (6)
I think the submodule was originally there to mirror the java structure:
pyserini.search
corresponds tojava/io/anserini/search
and would contain every module that we want to bridge from there.pysearch.py
was intended to mirrorSimpleSearcher.java
, which is the only module we have bridged to pyserini for now
If we donβt plan to extend anything else over from java/io/anserini/search
in the near future, we can probably get rid of the submodule? Otherwise I'd second @zeynepakkalyoncuβs suggestion to perhaps rename pysearch.py
to something like simple_searcher.py
.
from pyserini.
I actually think the current setup is fine. As long as we want to keep a main "source" folder (pyserini) and individual modules (collection, index, etc.), which we should, the "extra" nested layer is necessary in Python. Would it help to rename pysearch.py
to directly reflect the class it contains (simple_searcher.py
instead of pysearch.py
?
from pyserini.
I don't know Python very well, so please correct me - but here's my understanding: we have pyserini/search/pysearch.py
:
pyserini
is the packagepyserini.search
being the sub-package.pysearch
is the name of the moduleSimpleSearcher
is the name of a class in the module.
I guess what I'm suggesting is that we don't need the sub-package? So
pyserini
is the packagesearch
is the name of the moduleSimpleSearcher
is the name of a class in the module.
I think this means renaming pyserini/search/pysearch.py
to pyserini/search.py
and that's it?
from pyserini.
If we donβt plan to extend anything else over from
java/io/anserini/search
in the near future, we can probably get rid of the submodule?
Do you actually mean sub-package instead of sub-module? Or do I have the Python terminology wrong?
I think the main difference here is that in Java, every class needs to be in it's own file, whereas in Python, a file can have multiple classes... so I think this means Python can have one level shallower nesting? We can have pyserini/search.py
and for other classes in Anserini we want to bridge, just throw in search.py
also?
rename
pysearch.py
to something likesimple_searcher.py
I don't think this would work, because pyserini/collection/pycollection.py
has multiple classes in it. So under my proposal, pyserini/collection/pycollection.py
would be renamed pyserini/collection.py
(with all the same classes inside).
from pyserini.
Do you actually mean sub-package instead of sub-module? Or do I have the Python terminology wrong?
My bad, you are correct! I meant to say the sub-package π€¦ββ
I think the main difference here is that in Java, every class needs to be in it's own file, whereas in Python, a file can have multiple classes... so I think this means Python can have one level shallower nesting? We can have
pyserini/search.py
and for other classes in Anserini we want to bridge, just throw insearch.py
also?
Sure, that works also! When I first set this up I was unsure of whether we'd want to bridge, say, the search/topicreader
classes into pyserini/search/topicreader.py
one day and keep it under the search sub-package, but with the few classes pyserini has for now it might look a bit excessive.
However, if we collapse pyserini.search
into search.py
, does that mean also collapsing other sub-packages as well, likepyserini.collection
into collection.py
and so on? Perhaps the sub-packages can just help the repo look cleaner, in case the modules eventually grow into longer files?
from pyserini.
@emmileaf re: future proofing when modules grow too big. Agreed.
You've convinced me - let's keep the sub-packages. You were right all along! :)
Closing issue.
from pyserini.
Related Issues (20)
- merge a large index with small index \ adding small collection of docs to a large index
- Pyserini download index doesn't actually appear to check tarball size
- Install Failed building wheel for nmslib with pybind11-2.6.1
- How are you handling duplicate entries for the corpus and qrels? HOT 1
- mContriever pre-built index for Mr.TyDi datasets
- Support for jsonl.gz input in pyserini.encode
- Optimizations when building a dense index
- Improper Contriever encoding with the current pyserini.encode class
- Error When Setting Up Pyserini: python -m spacy download en_core_web_sm
- Issue with fetching raw documents
- Create a `Rerank` module in Pyserini
- Contriever training script & hyper-parameter values
- Trying to index own corpus
- Lucene query runtime
- Exception: Unable to find javac HOT 1
- NEED HELP: How to get original documents from Faiss index?
- Enhance Onboarding Documentation: Highlight Existing Embedded Indexes
- Update docs about fetching doc text given docid for dense indexes HOT 1
- Missing module 'jnius_config'
- List of encoder supported
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pyserini.