Git Product home page Git Product logo

bco-rag's Issues

Figure out evaluations.json file output

Thinking about creating separate scripts solely for identifying potentially erroneous evals saves and then separately building the evaluations file on "high quality reviews" only (will have to determine that criteria).

Update logging

handle the logging directly in the objects rather than in the main.py entrypoint.

Separate query

The standardized queries contain two parts:

  • The query part that contains what is in the domain and what the domain represents.
  • The schema part contains the output formatting for the return response.

The first part of the query is obviously important for the semantic retrieval process but I have a hunch that including the second part of the query for the semantic retrieval is polluting the retrieval. Going to try splitting the query and re-injecting the data schema part before the data is sent to the llm.

Enforce domain generation ordering for dependent domains

Some domains are dependent on other domains. The big one is the parametric domain has to be generated after the description domain as the step numbers have to match up.

Will have to hold the description domain in memory and pass it in as part of the query for the parametric domain.

Semantic Chunking Chunk Size Bug

Llamaindex's SemanticSplitterNodeParser can sometimes produce chunks that are too large for the embedding model. Unfortunately there is no max length option for the semantic chunking to avoid this issue.

Will have to eventually subclass the SemanticSplitterNodeParser and create a two level safety net that will naively split large chunks into sub-chunks in order to stay under the embedding model input token limits.

Reference:
run-llama/llama_index#12270

Re-work default eval check

Right now the default score check prevents erroneously saving default evaluations. Should re-work the logic to exclude some fields such as:

  • Score (being implemented here: #9)
  • json format error (which I want to automatically set when falling back to the raw txt file)

Update documentation

Need to update some documentation for the migration and new features that have been added.

MongoDB Backend?

Should probably remove the local evaluation data. That was meant for the proof of concept and a true backend should probably be setup, potentially in the form of MongoDB and a simple Flask API. The domain generations (for now) can probably be kept in the repository.

Add score to ScoreEval

Should store the score along with the score eval. Because the scoring will iterate/change, need to keep track of which score the score eval was submitted against.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.