Git Product home page Git Product logo

vanna's People

Contributors

0xcha05 avatar andreped avatar anugyas avatar archit0 avatar arslanhashmi avatar arthurmor4is avatar ashishsingal1 avatar aus1st avatar danielcorin avatar dbtzy avatar hassan-elseoudy avatar kun321 avatar livenson avatar molrn avatar relic-yuexi avatar tal7aouy avatar theharshpat avatar vikyw89 avatar zainhoda avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

vanna's Issues

Plotly chart arising from `vn.ask()`

The chart generated by vn.ask() cannot be replotted if it is not to the user's liking. But they should be able to. Otherwise, users might have to reshape the resultant df, and do the plotting themselves. The value presented by Vanna might be diminished somewhat in this instance

Flow diagrams for SQL

Can we create mermaid charts that do flow diagrams for how a SQL statement gets executed and the different entities involved?

vn.get_training_data

vn.get_training_data() should return

id
type (question-sql, ddl, documentation)
data

Confidence score for generated SQL

Is there a way to get a confidence score in terms of how likely the SQL is to be correct? Or whether there are enough similar queries / etc to give Vanna enough context to generate the SQL?

Maybe this could be calculated via embedding distances?

Rename dataset to model

Model is a lot easier for people to understand.

  • create_model
  • train
  • get_training_data
  • delete_model
  • update_model_visibility
  • etc

print key dataset stats

Can we print out some key stats for a dataset - like name, description, training questions, asked questions, successful questions, visibility, users, admins, firstquestiondate, lastquestiondate, etc

Use Vanna with CSV files

Should we make a vn.use_df function that loads data into sqlite and connects to it so that you can run Vanna on dataframes that you might have brought in via CSV or some other method?

More informative results after running vn.train()

Running vn.train() returns True regardless of whether the SQL that is trained is correct or not. It also does not show whether the question that is being trained already exists. If it does, then what does vn.train() do? This raises the following issues:

  1. Returning True is not informative. It gives the impression that the SQL trained is correct but it might not be. I was able to train on erroneous SQL queries and it returned True as well.
  2. In what scenarios will False be returned?
  3. Are existing questions and their SQL code overwritten when vn.train() is run with an existing question?

bootstrap - automated one line training + results

Can we implement a bootstrapping one line "agent"? For example -

conn = snowflake connection
vn.set_dataset('dataset')
vn.bootstrap()

where bootstrap does the following -

  1. gets DDL and stores
  2. gets historical queries and stores along w generated questions
  3. generates 10 qs
  4. generates sql for those 10 questions
  5. runs sql, prints results, charts etc.

add user to database returns False

Can't add user to a dataset, returns an error you can only add a user to your own organisation. This is for an organisation that i just created and must have the ownership rights of it.

Screenshot 2023-07-24 at 9 42 22 AM

Make vn.train generic

vn.train should take in

question: str or None
sql: str or None
ddl: str or None
documentation: str or None
json_file: str or None
sql_file: str or None

  • If just question, throw an error and print out example usage
  • If just sql, do vn.generate_question to generate the question and then vn.add_sql
  • If just DDL, do vn.add_ddl
  • If just documentation, do vn.store_documentation
  • If just a json_file, read the json file using pd.read_json and then iterate through rows to get question and sql columns to do vn.add_sql
  • If just a sql_file, use sqlparse to separate the sql statements. Anything that's a create table should go into vn.add_DDL and other statements should do vn.generate_question and then vn.add_sql

All parameters defaults should be None and if the user passes in any combination of invalid parameters it should raise an exception

Don't return True/False

Instead of returning True/False, output nothing when the status.success is true otherwise throw an exception with the status.message

Training multiple queries at once (bulk training)

The ability to send in

  1. a JSON of question / SQL pairs, or
  2. a SQL file full of semicolon delimited SQL queries

and have Vanna automatically train against a dataset. For 2, would need to auto generate the questions as well

vn.generate_meta_description

This function will take in a question and use training data as a reference to answer questions about the data instead of returning SQL

Restrict the characters that can be in a dataset name

In order to avoid confusion and also to make the dataset name url safe, on input of the dataset name we should:

  • make it lowercase
  • replace spaces with a hyphen -
  • replace special characters with hyphen or remove it altogether

There should be a deterministic mapping of the input dataset string to the actual dataset name so that users can do vn.set_dataset('my WEirD dataset name!') and it will still work

Vanna generated documentation

Can we have Vanna generate documentation for tables and columns automatically? For example ..

vn.generate_docs(entity='table', name='<tablename>') would generate a docstring for a particular table, and
vn.generate_docs(entity='column', name='<columnname>') would generate a docstring for a particular column

and perhaps there could be a flag on the table call to also generate docs for cols within that table automatically?

Add GH action for NB example runs

Add GH action that:

  1. Runs on every push to the PR
  2. Runs a notebook: https://github.com/marketplace/actions/run-notebook
  3. Convert the notebook into docs using nbconvert

ENV variables required, should be fed from GH secrets to the action's context (find their values in Slack).

VANNA_API_KEY=xxx
VANNA_MODEL=xxx
SNOWFLAKE_ACCOUNT=xxx
SNOWFLAKE_USERNAME=xxx
SNOWFLAKE_PASSWORD=xxx
SNOWFLAKE_DATABASE=xxx

automatically get DDL using DB connection

instead of manually putting the DDL using store_dll(), can Vanna automatically get the DDL directly from the database if provided the connection string, and put each table in separately? at least for snowflake, bq and pg?

TypeError when running `vn.ask()`

Encountered the following error when saving the outputs of vn.ask() to four objects (query, df, plot, qns):
image

This error does not appear if the parameter print_results = False is passed.

Cache should only be for trained SQL

When we do vn.generate_sql, we pull from the cache if the SQL already exists. However, if the SQL is flagged or otherwise not in the training set, we should bypass the cache

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.