ansys / pymeilisearch Goto Github PK

View Code? Open in Web Editor NEW

5.0 5.0 0.0 3.2 MB

MeiliSearch Deployment Synchronizer

Home Page: https://pymeilisearch.docs.ansys.com/

License: MIT License

Python 100.00%

pymeilisearch's People

Contributors

Stargazers

Watchers

pymeilisearch's Issues

Implement an online GitHub scrapper

Implement some minor features before public release

Let us target at the following minor features before going public:

Create and AUTHORS.md file and a CONTRIBUTORS.md one
Update all pre-commit hooks
Make PyMeilisearch use itself together with multi-version indexing

Failing the build of all index document.

Failing to post the index to meilisearch, adding the logs.https://github.com/ansys/meilisearch-scraper/actions/runs/4571456865/jobs/8069772029

Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.11.2/x64/lib/python3.11/site-packages/requests/models.py", line 971, in json
    return complexjson.loads(self.text, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.2/x64/lib/python3.11/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.2/x64/lib/python3.11/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.2/x64/lib/python3.11/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/runner/work/meilisearch-scraper/meilisearch-scraper/py/meilisearch_tools.py", line 7, in <module>
    all_doc.add_all_public_doc(selected_keys=["ansys", "pyansys"])
  File "/opt/hostedtoolcache/Python/3.11.2/x64/lib/python3.11/site-packages/ansys/tools/meilisearch/all_doc_indexer.py", line 1[31](https://github.com/ansys/meilisearch-scraper/actions/runs/4571456865/jobs/8069772029#step:8:32), in add_all_public_doc
    self.add_documents_to_temp_index(index_uid)
  File "/opt/hostedtoolcache/Python/3.11.2/x64/lib/python3.11/site-packages/ansys/tools/meilisearch/all_doc_indexer.py", line 111, in add_documents_to_temp_index
    self._wait_task(response.json()["taskUid"])
                    ^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.2/x64/lib/python3.11/site-packages/requests/models.py", line 975, in json
    raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)
requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Error: Process completed with exit code 1.
```

Modify User guide and getting started

Description of the modifications


Lastly, I feel that the "User guide" should include more information about the two available templates. I really am not sure what they do or when I might pick one template over another (or if I could/should possibly create additional templates). I doubt that we need to add a lot of content, but I think adding more information on this topic would be helpful.

Improve the tabs and commands in getting-started/installing-pymeilisearch.rst
Add additional content about the templates in user-guide

originally posted by @PipKat at #52

Useful links and references

No response

Add metadata information close to the results

When you are looking for a property, for instance length, we should mention which class it belongs to.
Otherwise we are getting all the primitives results without knowing to which class they belong to.

It is then impossible to know which result to click on.

Current tab results must show up first

When you are on a tab, for instance GettingStarted.
The results should show content from this tab first.
Currently we have the opposite:

PyFluent dev documentation pymeilisearch is not working

More details: ansys/pyfluent#2093

See PyFluent dev doc search which doesn't work, compared to the stable version search which does work

Nightly dev doc build is failing in the "Scrap the document and deploy it to pymeilisearch" step, with the following traceback:

Serving directory /home/runner/work/pyfluent/pyfluent/HTML-Documentation-tag-v23.2.0 at http://localhost:8000
Traceback (most recent call last):
  File "/home/runner/.local/bin/pymeilisearch", line 8, in <module>
    sys.exit(main())
  File "/usr/lib/python3/dist-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/usr/lib/python3/dist-packages/click/core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/lib/python3/dist-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/lib/python3/dist-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/home/runner/.local/lib/python3.10/site-packages/ansys/tools/meilisearch/cli.py", line 92, in upload
    local_host_scraping(index, template, location, port, stop_urls)
  File "/home/runner/.local/lib/python3.10/site-packages/ansys/tools/meilisearch/server.py", line 120, in local_host_scraping
    scrape_website(index_uid, templates, directory, port, stop_urls)
  File "/home/runner/.local/lib/python3.10/site-packages/ansys/tools/meilisearch/server.py", line 92, in scrape_website
    scrap_web_page(index_uid, urls, templates, stop_urls)
  File "/home/runner/.local/lib/python3.10/site-packages/ansys/tools/meilisearch/create_indexes.py", line 137, in scrap_web_page
    web_scraper.scrape_url(url, index_uid, templates, stop_urls)
  File "/home/runner/.local/lib/python3.10/site-packages/ansys/tools/meilisearch/scrapper.py", line 159, in scrape_url
    temp_config_file = self._load_and_render_template(url, template, index_uid, stop_urls)
  File "/home/runner/.local/lib/python3.10/site-packages/ansys/tools/meilisearch/scrapper.py", line 60, in _load_and_render_template
    render_template(
  File "/home/runner/.local/lib/python3.10/site-packages/ansys/tools/meilisearch/templates/__init__.py", line 79, in render_template
    if "localhost" in urls[0]:
IndexError: list index out of range
Error: The operation was canceled.

Any help figuring out what is going wrong would be appreciated.

Implement a local HTML scraper

The implementation provides a convenient solution for local HTML scraping by serving a specified directory on a given port using an HTTP server. This functionality allows the website to be hosted locally, enabling easy access and extraction of its content for scraping purposes.

Convert this project into a fully fledged CLI

💡 Idea
The goal of this project is to become the entry point for documentation indexing by providing additional utilities on top of meilisearch.

📝 Tasks
To achieve previous goal, the following tasks need to be implemented:

Implement everything as a CLI
Users should interface with this tool through the command line. This allows to execute this tool also in any CI/CD actions.
#20
This requires the modification of docs-scraper to work with local host. The reason is that this tool assumes a server presenting all the content.
#17
This should be feasible with the current tools provided as of today in this repository.
#21
This should be feasible with the current tools too.

Implement an online HTML scrapper (from url)

Adapt the pymeilisearch to add the the stop urls through command line.

Adapt the stop urls through command line ( for 2 or 3 set of APIs)

Enable docker launch from pytest fixture

Looking at your tests, I can see that you are "assuming" that the user has a local Meilisearch instance deployed on localhost at port 7700. I'd consider to include this in the documentation as a must-do, so that if anybody wants to contribute knows how to do it.

Even for this last point... I'd consider adding docker as a dependency and launching the container through a fixture, in case the port is free. Might be a bit of an overkill but worth considering. Once tests finish, you can close the container. That'd make the whole process very "atomic"

Originally posted by @RobPasMue in #53 (comment)