Git Product home page Git Product logo

defog-python's Introduction

Defog Python

tests

Defog converts your natural language text queries into SQL and other machine readable code. This library allows you to easily integrate Defog into your python application, and has a CLI to help you get started.

cli.mp4

CLI

Installation

For a generic installation with Postgres or Redshift binaries, use pip install --upgrade defog

For a Snowflake installation, use pip install --upgrade 'defog[snowflake]'

For a MySQL installation, use pip install --upgrade 'defog[mysql]'

For a BigQuery installation, use pip install --upgrade 'defog[bigquery]'

For a Databricks installation, use pip install --upgrade 'defog[databricks]'

API Key

You can get your API key by going to https://defog.ai/signup and creating an account. If you fail to verify your email, you can email us at support(at)defog.ai

Connection Setup

You can either use our command line interface (CLI), which will take you through the setup step-by-step, or pass it in explicitly in python to the Defog class. The CLI uses the python api's behind the hood, and is just an interactive wrapper over it that does some extra validation on your behalf.

To get started, you can run the following CLI command, which will prompt you for your defog api key, database type, and the corresponding database credentials required.

defog init

If this is your first time running, we will write these information into a json config file, which will be stored in ~/.defog/connection.json. If we detect a file present already, we will ask you if you intend to re-initialize the file. You can always delete the file and defog init all over again. Note that your credentials are never sent to defog's servers. Once you have setup the connection settings, we will ask you for the names of the tables that you would like to register (space separated), generate the schema for each of them, upload the schema to defog, and print out the filename of a CSV with your metadata. If you do not wish to provide those at this point, you can exit this prompt by hitting ctrl+c

Generating your schema

To include tables in Defog's indexed, you can run the following to generate column descriptions for your tables and columns:

defog gen <table1> <table2> ...

This will generate a CSV file that is stored locally on your disk.

Updating Schema

If you would like to edit the auto-generate column descriptions, just edit the CSV and run the following to update the schema with defog:

defog update <csv_filename>

Querying

You can now run queries directly:

defog query "<your query>"

Happy querying!

Glossary

You might notice that sometimes our model fails to take into account some prior context for your own domain, eg converting certain fields into different types, joining certain tables, how to perform string matching, etc. To give the model a standard set of instructions attached to each query, you can pass us a glossary, which is basically just a string blob of up to 1000 characters that gives our model more specific instructions. You can manage your glossary using the following commands:

defog glossary update <path/to/glossary.txt>  # Update your glossary
defog glossary get                             # Get your current glossary
defog glossary delete                          # Delete your glossary

Golden Queries

In certain cases where the generated query follows a complex pattern, you can provide certain examples to our model to help it generate according to your desired patterns. You can manage your golden queries using the following commands:

defog golden get <json|csv>                    # Get your golden queries in JSON or CSV format
defog golden add <path/to/golden_queries.json> # Add golden queries from a JSON or CSV file
defog golden delete <path/to/golden_queries.json|all> # Delete specific golden queries or all of them

Note that when adding golden queries, the json/csv file provided needs to have the following keys/columns:

  • prev_question (optional): the existing question in the database if we're replacing a golden question-query pair
  • prev_sql (optional): the existing SQL in the database if we're replacing a golden question-query pair
  • question: the new question
  • sql: the new SQL

Deploying

You can deploy a defog server as a cloud function using the following command:

defog deploy <gcp|aws> [function_name]         # Deploy to GCP or AWS, optionally specifying the function name

Quota

You can check your quota usage per month by running:

defog quota

Free-tier users have 1000 queries per month, while premium users have unlimited queries.

Python Client

You can use the API from within Python as below

from defog import Defog
# your credentials are never sent to our server, and always run locally
defog = Defog() # your credentials will automatically be loaded if you have initialized defog already
question = "question asked by a user"
# run chat version of query
results = defog.run_query(
  question=question,
)
print(results)

Testing

For developers who want to test or add tests for this client, you can run:

pytest tests

Note that we will transfer the existing .defog/connection.json file over to /tmp (if at all), and transfer the original file back once the tests are done to avoid messing with the original config. If submitting a PR, please use the black linter to lint your code. You can add it as a git hook to your repo by running the command below:

echo -e '#!/bin/sh\n#\n# Run linter before commit\nblack $(git rev-parse --show-toplevel)' > .git/hooks/pre-commit
chmod +x .git/hooks/pre-commit

defog-python's People

Contributors

medhabasu avatar nirantk avatar rishsriv avatar wongjingping avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

defog-python's Issues

Generating Postgres Schema fails

When I use the generate_postgres_schema method as explained in the README I receive the following error from the API:

Traceback (most recent call last):
  File "/home/rishabhsriv/fsd-research/similar-vector-app/./app.py", line 204, in get_postgres_schema_gsheets
    temp_df = pd.read_csv(StringIO(csv))
  File "/home/rishabhsriv/.local/lib/python3.8/site-packages/pandas/util/_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "/home/rishabhsriv/.local/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 586, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/home/rishabhsriv/.local/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 488, in _read
    return parser.read(nrows)
  File "/home/rishabhsriv/.local/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 1047, in read
    index, columns, col_dict = self._engine.read(nrows)
  File "/home/rishabhsriv/.local/lib/python3.8/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 223, in read
    chunks = self._reader.read_low_memory(nrows)
  File "pandas/_libs/parsers.pyx", line 801, in pandas._libs.parsers.TextReader.read_low_memory
  File "pandas/_libs/parsers.pyx", line 857, in pandas._libs.parsers.TextReader._read_rows
  File "pandas/_libs/parsers.pyx", line 843, in pandas._libs.parsers.TextReader._tokenize_rows
  File "pandas/_libs/parsers.pyx", line 1925, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 3 fields in line 7, saw 5

I simply used the function, without doing anything else to the schema objects sent to the API.

Some particularities in my database:

  • I exported 16 tables all linked with foreign keys among them
  • I have some user defined columns
  • I am using a lot of different postgres types such as: JSON, JSONB, ARRAY, DATE, etc.

P.S.: It's the same schema I sent over email yesterday

error bigquery defog gen command : too many values to unpack (expected 2)

Defog install failed with the following error when trying to generate the schema for bigquery.
The schema consisted of only 2 tables, 12 and 2 columns respectively.

`(base) c:\Users\aburnazyan\AppData\Roaming\Python\Python39\Scripts>defog gen miabanutyun-84e26.fb_dataset.fb_data miabanutyun-84e26.fb_dataset.members
Connection details found. Reading connection details from file...
Connection details read from C:\Users\aburnazyan.defog\connection.json.
Getting the schema for each table in your database...
Sending the schema to Defog servers and generating a Google Sheet. This might take up to 2 minutes...
{'status': 'error', 'message': 'too many values to unpack (expected 2)'}
Traceback (most recent call last):
File "C:\Users\aburnazyan\AppData\Roaming\Python\Python39\site-packages\defog_init_.py", line 618, in generate_bigquery_schema
gsheet_url = resp["sheet_url"]
KeyError: 'sheet_url'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\lib\runpy.py", line 197, in run_module_as_main
return run_code(code, main_globals, None,
File "C:\ProgramData\Anaconda3\lib\runpy.py", line 87, in run_code
exec(code, run_globals)
File "c:\Users\aburnazyan\AppData\Roaming\Python\Python39\Scripts\defog.exe_main
.py", line 7, in
File "C:\Users\aburnazyan\AppData\Roaming\Python\Python39\site-packages\defog\cli.py", line 41, in main
gen()
File "C:\Users\aburnazyan\AppData\Roaming\Python\Python39\site-packages\defog\cli.py", line 258, in gen
gsheets_url = df.generate_db_schema(table_name_list)
File "C:\Users\aburnazyan\AppData\Roaming\Python\Python39\site-packages\defog_init
.py", line 634, in generate_db_schema
return self.generate_bigquery_schema(tables)
File "C:\Users\aburnazyan\AppData\Roaming\Python\Python39\site-packages\defog_init
.py", line 622, in generate_bigquery_schema
raise resp["message"]
TypeError: exceptions must derive from BaseException`

Unassociated database usage

Hi,
How can I use the API without accosiating my database? Just like the sqlcoder-demo, I wanna post my requests with question and table schema.

How to provide golden queries programatically

  • Is there any way to provide a few sample queries through a script or file when I am working on Python (and not the command line)?

  • Also, how can I modify the prompt you are using?

Hive Support

I need defog to be able to scan hive metastore, how do I accomplish this? can I write my own custom scanner? defog init presents me with limited db choices.

[`defog`] Postgres table names should be passed along with their respective schemas

I was trying to generate postgre schema

from defog import Defog

defog = Defog()

schema = defog.generate_postgres_schema(tables=['demos.restaurants','demos.restaurant_menu'],  upload=False)

But was getting an empty schema

Connection details found. Reading connection details from file...
Connection details saved to C:\Users\admin6996\.defog\connection.json.
Retrieved the following tables:
        demos.restaurants
        demos.restaurant_menu
Getting schema for each table in your database...
Getting foreign keys for each table in your database...
Getting indexes for each table in your database...
Sending the schema to the defog servers and generating a Google Sheet. This might take up to 2 minutes...
{'demos.restaurants': [], 'demos.restaurant_menu': []}

And if I pass the tables without the schema prefix I would get the below error

from defog import Defog

defog = Defog()

print(defog.generate_postgres_schema(tables=['restaurants','restaurant_menu'],  upload=False))
(venv) PS D:\> & "d:/work/venv/Scripts/python.exe" "d:/work/degog_poc.py"
Connection details found. Reading connection details from file...
Connection details saved to C:\Users\admin6996\.defog\connection.json.
Retrieved the following tables:
        restaurants
        restaurant_menu
Getting schema for each table in your database...
Getting foreign keys for each table in your database...
Traceback (most recent call last):
  File "d:\work\degog_poc.py", line 23, in <module>
    print(defog.generate_postgres_schema(tables=['restaurants','restaurant_menu'],  upload=False))
  File "D:\work\venv\lib\site-packages\defog\__init__.py", line 260, in generate_postgres_schema
    cur.execute(query)
psycopg2.errors.UndefinedTable: relation "restaurants" does not exist
LINE 6:                 AND conrelid::regclass IN ('restaurants'::re...

defog gen tablename1,tablename2 - error

when i used single table its okay , but defog gen multiple tables with comma sep. gets this below error and always
if single table metadata.csv is created , if i do another table, its not appended but only one is added .so trying multiple tables and gets below error.
Fyi database is databricks

We are now uploading this auto-generated schema to Defog.
Traceback (most recent call last):
File "/opt/homebrew/bin/defog", line 8, in
sys.exit(main())
^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/defog/cli.py", line 41, in main
gen()
File "/opt/homebrew/lib/python3.11/site-packages/defog/cli.py", line 312, in gen
df.update_db_schema(filename)
File "/opt/homebrew/lib/python3.11/site-packages/defog/admin_methods.py", line 10, in update_db_schema
schema_df = pd.read_csv(path_to_csv).fillna("")
^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/pandas/util/_decorators.py", line 211, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/pandas/util/_decorators.py", line 331, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/pandas/io/parsers/readers.py", line 950, in read_csv
return _read(filepath_or_buffer, kwds)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/pandas/io/parsers/readers.py", line 605, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/pandas/io/parsers/readers.py", line 1442, in init
self._engine = self._make_engine(f, self.engine)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/pandas/io/parsers/readers.py", line 1735, in _make_engine
self.handles = get_handle(
^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/pandas/io/common.py", line 713, in get_handle
ioargs = _get_filepath_or_buffer(
^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/pandas/io/common.py", line 451, in _get_filepath_or_buffer
raise ValueError(msg)
ValueError: Invalid file path or buffer object type: <class 'NoneType'>

defog update - Out of range float values are not JSON compliant

Hello,

When executing getting-started instruction:
https://docs.defog.ai/getting-started
on simple 2 tables.

defog_metadata.csv file looks fine
but an error occurred:

$ defog update defog_metadata.csv
Connection details found. Reading connection details from file...
Connection details read from /home/lukasz/.defog/connection.json.
Traceback (most recent call last):
  File "/home/lukasz/.local/lib/python3.11/site-packages/requests/models.py", line 511, in prepare_body
    body = complexjson.dumps(json, allow_nan=False)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/json/__init__.py", line 238, in dumps
    **kw).encode(obj)
          ^^^^^^^^^^^
  File "/usr/lib/python3.11/json/encoder.py", line 200, in encode
    chunks = self.iterencode(o, _one_shot=True)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/json/encoder.py", line 258, in iterencode
    return _iterencode(o, 0)
           ^^^^^^^^^^^^^^^^^
ValueError: Out of range float values are not JSON compliant

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/lukasz/.local/bin/defog", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/lukasz/.local/lib/python3.11/site-packages/defog/cli.py", line 45, in main
    update()
  File "/home/lukasz/.local/lib/python3.11/site-packages/defog/cli.py", line 289, in update
    resp = df.update_db_schema_csv(filename)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lukasz/.local/lib/python3.11/site-packages/defog/__init__.py", line 1016, in update_db_schema_csv
    r = requests.post(
        ^^^^^^^^^^^^^^
  File "/home/lukasz/.local/lib/python3.11/site-packages/requests/api.py", line 115, in post
    return request("post", url, data=data, json=json, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lukasz/.local/lib/python3.11/site-packages/requests/api.py", line 59, in request
    return session.request(method=method, url=url, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lukasz/.local/lib/python3.11/site-packages/requests/sessions.py", line 575, in request
    prep = self.prepare_request(req)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lukasz/.local/lib/python3.11/site-packages/requests/sessions.py", line 486, in prepare_request
    p.prepare(
  File "/home/lukasz/.local/lib/python3.11/site-packages/requests/models.py", line 371, in prepare
    self.prepare_body(data, files, json)
  File "/home/lukasz/.local/lib/python3.11/site-packages/requests/models.py", line 513, in prepare_body
    raise InvalidJSONError(ve, request=self)
requests.exceptions.InvalidJSONError: Out of range float values are not JSON compliant

failed

{'ran_successfully': False, 'error_message': None}

Does defog ai support the database SQL server?

According to the guidelines:https://docs.defog.ai/getting-started
I tried to execute the command: defog init
when it comes to the question: What database are you using? Available options are: postgres, redshift, mysql, bigquery, snowflake, databricks, which means that it is currently not supported SQL server?
But according to the official website of Defog AI: https://defog.ai/supported-dbs/
abc

so, does defog ai support the database SQL server? and if it does, which option should I choose?
besides, I have tried to choose mysql, but get the below error:

Traceback (most recent call last):
File "C:\Users\89290\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\89290\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "E:\MyLLMs\Project_DefogAI_API\venv\Scripts\defog.exe_main
.py", line 7, in
File "E:\MyLLMs\Project_DefogAI_API\venv\lib\site-packages\defog\cli.py", line 43, in main
gen()
File "E:\MyLLMs\Project_DefogAI_API\venv\lib\site-packages\defog\cli.py", line 308, in gen
filename = df.generate_db_schema(table_name_list, scan=scan)
File "E:\MyLLMs\Project_DefogAI_API\venv\lib\site-packages\defog\generate_schema.py", line 518, in generate_db_schema
return self.generate_mysql_schema(
File "E:\MyLLMs\Project_DefogAI_API\venv\lib\site-packages\defog\generate_schema.py", line 220, in generate_mysql_schema
conn = mysql.connector.connect(**self.db_creds)
File "E:\MyLLMs\Project_DefogAI_API\venv\lib\site-packages\mysql\connector\pooling.py", line 322, in connect
return CMySQLConnection(*args, **kwargs)
File "E:\MyLLMs\Project_DefogAI_API\venv\lib\site-packages\mysql\connector\connection_cext.py", line 144, in init
self.connect(**kwargs)
File "E:\MyLLMs\Project_DefogAI_API\venv\lib\site-packages\mysql\connector\abstracts.py", line 1360, in connect
self._open_connection()
File "E:\MyLLMs\Project_DefogAI_API\venv\lib\site-packages\mysql\connector\connection_cext.py", line 332, in _open_connection
raise get_mysql_exception(
mysql.connector.errors.OperationalError: 2013 (HY000): Lost connection to MySQL server at 'reading initial communication packet', system error: 0

I think this may be due to the program attempting to search for MySQL database instead of SQLserver database, which triggered a timeout error

Unable to generate query successfully

Hi, I am using your Python library to convert a query to SQL and run it on a PostgreSQL DB. However, when I run the script, it tells me that it has failed to run and that there is no error message at all. Why is this occurring?

My DB has only 1 table, 130K rows, and 14 columns. I am also using a free defog account.

Thank you!

image
image

Any way to deploy it in Azure ?

Trying to figure out a way to deploy it in Azure VM / Kubeflow ... not sure which VM series and steps to follow.. Appreciate your help

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.