Git Product home page Git Product logo

Comments (6)

emmileaf avatar emmileaf commented on July 28, 2024 1

I think the submodule was originally there to mirror the java structure:

  • pyserini.search corresponds to java/io/anserini/search and would contain every module that we want to bridge from there.
  • pysearch.py was intended to mirror SimpleSearcher.java, which is the only module we have bridged to pyserini for now

If we don’t plan to extend anything else over from java/io/anserini/search in the near future, we can probably get rid of the submodule? Otherwise I'd second @zeynepakkalyoncu’s suggestion to perhaps rename pysearch.py to something like simple_searcher.py.

from pyserini.

zeynepakkalyoncu avatar zeynepakkalyoncu commented on July 28, 2024

I actually think the current setup is fine. As long as we want to keep a main "source" folder (pyserini) and individual modules (collection, index, etc.), which we should, the "extra" nested layer is necessary in Python. Would it help to rename pysearch.py to directly reflect the class it contains (simple_searcher.py instead of pysearch.py?

from pyserini.

lintool avatar lintool commented on July 28, 2024

I don't know Python very well, so please correct me - but here's my understanding: we have pyserini/search/pysearch.py:

  • pyserini is the package
  • pyserini.search being the sub-package.
  • pysearch is the name of the module
  • SimpleSearcher is the name of a class in the module.

I guess what I'm suggesting is that we don't need the sub-package? So

  • pyserini is the package
  • search is the name of the module
  • SimpleSearcher is the name of a class in the module.

I think this means renaming pyserini/search/pysearch.py to pyserini/search.py and that's it?

from pyserini.

lintool avatar lintool commented on July 28, 2024

@emmileaf

If we don’t plan to extend anything else over from java/io/anserini/search in the near future, we can probably get rid of the submodule?

Do you actually mean sub-package instead of sub-module? Or do I have the Python terminology wrong?

I think the main difference here is that in Java, every class needs to be in it's own file, whereas in Python, a file can have multiple classes... so I think this means Python can have one level shallower nesting? We can have pyserini/search.py and for other classes in Anserini we want to bridge, just throw in search.py also?

rename pysearch.py to something like simple_searcher.py

I don't think this would work, because pyserini/collection/pycollection.py has multiple classes in it. So under my proposal, pyserini/collection/pycollection.py would be renamed pyserini/collection.py (with all the same classes inside).

from pyserini.

emmileaf avatar emmileaf commented on July 28, 2024

Do you actually mean sub-package instead of sub-module? Or do I have the Python terminology wrong?

My bad, you are correct! I meant to say the sub-package πŸ€¦β€β™€

I think the main difference here is that in Java, every class needs to be in it's own file, whereas in Python, a file can have multiple classes... so I think this means Python can have one level shallower nesting? We can have pyserini/search.py and for other classes in Anserini we want to bridge, just throw in search.py also?

Sure, that works also! When I first set this up I was unsure of whether we'd want to bridge, say, the search/topicreader classes into pyserini/search/topicreader.py one day and keep it under the search sub-package, but with the few classes pyserini has for now it might look a bit excessive.

However, if we collapse pyserini.search into search.py, does that mean also collapsing other sub-packages as well, likepyserini.collection into collection.py and so on? Perhaps the sub-packages can just help the repo look cleaner, in case the modules eventually grow into longer files?

from pyserini.

lintool avatar lintool commented on July 28, 2024

@emmileaf re: future proofing when modules grow too big. Agreed.
You've convinced me - let's keep the sub-packages. You were right all along! :)

Closing issue.

from pyserini.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.