ibis-project / ibis-birdbrain Goto Github PK
View Code? Open in Web Editor NEWportable Python ML-powered data bot
Home Page: https://ibis-project.github.io/ibis-birdbrain/
License: Apache License 2.0
portable Python ML-powered data bot
Home Page: https://ibis-project.github.io/ibis-birdbrain/
License: Apache License 2.0
I'm not going to do this anytime soon, #goodfirstissue
add the ability to visualize tables through plotly
this is hard
this is purely for demo purposes, but needed -- fill out the bot.execute_last_sql()
method, taking in a connection and blindly running con.sql()
with the last SQL attachment generated by the bot
this would be useful motivation for better attachment methods for retreiving attachments
Some options include:
probably several others. These should be evaluated as options to simplify Ibis Birdbrain, allowing for a thing wrapper that provides data platform access via Ibis.
add ability to search the internet, by duckduckgo-search by default
Currently, the bot can only really handle a single Ibis connection. You could already hack around this, but we should figure out the best ergonomics for handling multiple connections.
Right now, you can use the config.toml
to specify the Ibis connection URI for the eda
section, which will be used. Do we just make this a list? Do we have a bot per connection? How do we handle large numbers of tables?
end up w/ something like this: content="The function 'query_tables' encountered an error: cannot access local variable 'res' where it is not associated with a value\n\nThe payload you provided was: {'question': 'Join title ratings and title basics tables on tconst column'}\n\nYou can try to fix the error and call the function again.",
experienced by myself and @ianmcook
in this case, was trying a join -- should fix the retry logic in general, and should consider techniques for making joins more accurate (separate function, constrain to two tables?)
start namespacing by bot (roughly persona/demo/jtbd) consistently
Currently, only a single table can be queried. Various places need to be updates to work with lists of table (names) instead of a single table (name). We could consider dedicated single-table SELECTING and multiple-table JOINING functionality, or try a single function. I'm not sure which would work best.
Related, once multiple connections can be handled by the bot we probably need limits or best practices or something.
auto-create a GH issue on the repo w/ the bot messages for debugging
search over GitHub issues, discussions, etc. what are relevant for a given project
perhaps also GitHub search over code (vs clone and search locally)
implement filesystem operations as Flow(s) and Tasks to:
this would be helpful for working with code in the filesystem (.sql
or .python
), helping write content (blogs), etc.
need to be careful giving a bot access to a filesystem -- perhaps it can only write to <filename>.birdbrain.<extension>
? perhaps it always confirms before writing? UX TBD
need an efficient process for testing LLM + data stuff...
going to skip impelmenting this for now, 1:1 keeps it much simpler. this should be easy to extend but I also have arguments for why this is a bad idea
refer to ibis/ibisml
right now, a bot will always generate new SQL even if it (in the current or previous session) has seen the same text and correctly generated SQL
primarily for demo purposes, we should enable attaching a cache
database w/:
we would need some way of manually populated this database (perhaps outside the scope of Ibis Birdbrain) and adding to it as new queries are seen, w/ the user somehow able to give an accuracy score/delete incorrect generations
currently, response messages are "hard coded". there is a lm_response
attribute on the bot that, if True, should have a language model write a response
Need to add reference docs. Refer to IbisML and Ibis for examples.
fill out the encode()
methods, saving messages/attachments to a (configurable) database
Currently, the bot can search the internet. However, we should establish a more efficient method of ingesting/querying open-source documentation and enabling chat-over-documentation. We may consider:
We should get this working on Ibis Birdbrain, then Ibis, then extent to non-Ibis open-source projects that the bot can leverage. We may consider bot "flavors" or tools for loading/unloading documentation. The UX needs more thought for handling many projects.
refer to ibis/ibisml
re-implement conversational mode for the bot
there is currently a conversational: bool
input to the bot that defaults to False
. this work would, when setting that to True
:
right now, if I ask a bot w/ access to the IMDB database:
bot("what are the top rated movies?")
I get a response w/ a bunch of irrelevant movies
as a follow-up, if I ask:
bot("with at least 100k ratings")
the bot has no context of the previous message and results. ideally, it would "know" that I'm talking about appending instructions to the previous message, re-use the CodeAttachment from that preivous message, etc.
could naively pass in all messages to the Flows. would need some way to pick out relevant messages (see Marvin classification, extraction methods)
allow a user to give feedback on birdbrain while using birdbrain
bot("you're a terrible bot")
-> opens an issue w/ the conversation details and additional info prompted from the user
This is a test issue
depends on #19
setup w/ imdb data? clickhouse playground?
Currently, only the combination of 1.3.0 for Marvin and 0.28.1 for OpenAI seem to work. If you try to upgrade OpenAI on this version of marvin to >1, you'll get various errors
If you try to upgrade Marvin to 1.5.6, you'll get errors with trying to use Azure OpenAI. regular OpenAI seems fine
Marvin has been working on supporting >1.0.0 OpenAI, but it's not ready yet. The timeline for this is unclear. The main issue this will cause is dependency conflicts with other libraries, as >v1 of OpenAI is expected to be the primary supported version for some time
one option is to vendor our the Marvin code from 1.3.0 and maintian it within Ibis Birdbrain going forward
we need sufficient docs for someone landing on the project. in scope for now:
API reference can possibly be dropped out of scope
in the future, we would also want:
can largely be copied from Ibis
implement the bot.transpile_sql
(currently called translate_sql
, recommend renaming) method likely just using SQLGlot directly
takes a SQL string as input. along w/ #53, good motivation for better attachment methods (get last SQL Attachment, get a SQL attachment by some input text, etc.)
Currently, it's not intuitive to get access to Ibis objects after conversing with the bot. We should think about the ideal UX here and implement something.
post-preview probably
Currently, Internet search is through a free DuckDuckGo thing. This is fine, but I've noticed the results aren't great. We can add plugins/tools for other search engines, enabling multiple at once if we want to provide the best results. This would consist of adjusting the search_internet tool
use ibis.decompile()
on the expression to get Ibis code
in the text-to-SQL task, we should use SQLGlot to:
issues w/ 1.4; try 1.5 when out, diagnose issues, etc
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.