vanna-ai / vanna Goto Github PK
View Code? Open in Web Editor NEW๐ค Chat with your SQL database ๐. Accurate Text-to-SQL Generation via LLMs using RAG ๐.
Home Page: https://vanna.ai/docs/
License: MIT License
๐ค Chat with your SQL database ๐. Accurate Text-to-SQL Generation via LLMs using RAG ๐.
Home Page: https://vanna.ai/docs/
License: MIT License
The chart generated by vn.ask()
cannot be replotted if it is not to the user's liking. But they should be able to. Otherwise, users might have to reshape the resultant df, and do the plotting themselves. The value presented by Vanna might be diminished somewhat in this instance
Can we create mermaid charts that do flow diagrams for how a SQL statement gets executed and the different entities involved?
vn.get_training_data()
should return
id
type (question-sql, ddl, documentation)
data
Is there a way to get a confidence score in terms of how likely the SQL is to be correct? Or whether there are enough similar queries / etc to give Vanna enough context to generate the SQL?
Maybe this could be calculated via embedding distances?
Model is a lot easier for people to understand.
Can we print out some key stats for a dataset - like name, description, training questions, asked questions, successful questions, visibility, users, admins, firstquestiondate, lastquestiondate, etc
we should include who has permission to do this action (eg admin or anyone), for the dataset functions / write functions especially.
Add a parameter to vn.generate_plotly_code so that the user can specify that they want a line chart vs a bar chart etc
We need documentation for contributors on how to develop/test/etc
Please use flake8
for lint purposes.
Could we have a delete_dataset
function?
Right now vn.ask() prints the table in markdown format when you run it in a notebook. We should get it to display using the native display widget.
Should we make a vn.use_df function that loads data into sqlite and connects to it so that you can run Vanna on dataframes that you might have brought in via CSV or some other method?
Running vn.train()
returns True
regardless of whether the SQL that is trained is correct or not. It also does not show whether the question that is being trained already exists. If it does, then what does vn.train()
do? This raises the following issues:
True
is not informative. It gives the impression that the SQL trained is correct but it might not be. I was able to train on erroneous SQL queries and it returned True
as well.False
be returned?vn.train()
is run with an existing question?Can Vanna automatically get the last X historical SQL queries from dw that support this functionality, like Snowflake and BQ, if provided the connection?
Right now vn.generate_questions only references DDL
In order to get the postgres connector to work on a mac, I had to do:
pip install psycopg2-binary
I think we should consider pg8000 to avoid compatibility issues:
Can we implement a bootstrapping one line "agent"? For example -
conn = snowflake connection
vn.set_dataset('dataset')
vn.bootstrap()
where bootstrap does the following -
On the Code Reference site, it should be vn.get_models()
and vn.get_model()
as shown in the screengrab
vn.train should take in
question: str or None
sql: str or None
ddl: str or None
documentation: str or None
json_file: str or None
sql_file: str or None
All parameters defaults should be None and if the user passes in any combination of invalid parameters it should raise an exception
Instead of returning True/False, output nothing when the status.success
is true otherwise throw an exception with the status.message
The ability to send in
and have Vanna automatically train against a dataset. For 2, would need to auto generate the questions as well
This function will take in a question and use training data as a reference to answer questions about the data instead of returning SQL
In order to avoid confusion and also to make the dataset name url safe, on input of the dataset name we should:
-
There should be a deterministic mapping of the input dataset string to the actual dataset name so that users can do vn.set_dataset('my WEirD dataset name!') and it will still work
Can we have Vanna generate documentation for tables and columns automatically? For example ..
vn.generate_docs(entity='table', name='<tablename>')
would generate a docstring for a particular table, and
vn.generate_docs(entity='column', name='<columnname>')
would generate a docstring for a particular column
and perhaps there could be a flag on the table call to also generate docs for cols within that table automatically?
Add GH action that:
nbconvert
ENV variables required, should be fed from GH secrets to the action's context (find their values in Slack).
VANNA_API_KEY=xxx
VANNA_MODEL=xxx
SNOWFLAKE_ACCOUNT=xxx
SNOWFLAKE_USERNAME=xxx
SNOWFLAKE_PASSWORD=xxx
SNOWFLAKE_DATABASE=xxx
We'll likely have to begin using setuptools for this
instead of manually putting the DDL using store_dll()
, can Vanna automatically get the DDL directly from the database if provided the connection string, and put each table in separately? at least for snowflake, bq and pg?
If one query has an error, the subsequent queries to postgres will fail. We either need to open a new connection for each query or we need to do a rollback on error.
When we do vn.generate_sql
, we pull from the cache if the SQL already exists. However, if the SQL is flagged or otherwise not in the training set, we should bypass the cache
Our tests are on the server repo -- they need to be migrated to this repo
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.