Git Product home page Git Product logo

ibis-birdbrain's People

Contributors

cpcloud avatar lostmygithubaccount avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

ibis-birdbrain's Issues

rerun SQL on a different backend

this is purely for demo purposes, but needed -- fill out the bot.execute_last_sql() method, taking in a connection and blindly running con.sql() with the last SQL attachment generated by the bot

this would be useful motivation for better attachment methods for retreiving attachments

internet search

add ability to search the internet, by duckduckgo-search by default

feat: work with multiple Ibis connections

Currently, the bot can only really handle a single Ibis connection. You could already hack around this, but we should figure out the best ergonomics for handling multiple connections.

Right now, you can use the config.toml to specify the Ibis connection URI for the eda section, which will be used. Do we just make this a list? Do we have a bot per connection? How do we handle large numbers of tables?

bug: retry SQL query doesn't work

end up w/ something like this: content="The function 'query_tables' encountered an error: cannot access local variable 'res' where it is not associated with a value\n\nThe payload you provided was: {'question': 'Join title ratings and title basics tables on tconst column'}\n\nYou can try to fix the error and call the function again.",

experienced by myself and @ianmcook

in this case, was trying a join -- should fix the retry logic in general, and should consider techniques for making joins more accurate (separate function, constrain to two tables?)

feat: join tables and work with multiple tables

Currently, only a single table can be queried. Various places need to be updates to work with lists of table (names) instead of a single table (name). We could consider dedicated single-table SELECTING and multiple-table JOINING functionality, or try a single function. I'm not sure which would work best.

Related, once multiple connections can be handled by the bot we probably need limits or best practices or something.

feat: GitHub search and whatnot

search over GitHub issues, discussions, etc. what are relevant for a given project

perhaps also GitHub search over code (vs clone and search locally)

re-implement RAG database

support RAG. similar to #46, but may be arbitrarily enabled on Flows and/or Tasks, e.g.:

  • have a Flow that searches Ibis docs, using a database w/ Ibis docs
  • have a Flow that writes/executes SQL code, as in #46

could use embeddings and/or regular text search

filesystem operations

implement filesystem operations as Flow(s) and Tasks to:

  • read files into the context
  • modify files
  • write files

this would be helpful for working with code in the filesystem (.sql or .python), helping write content (blogs), etc.

need to be careful giving a bot access to a filesystem -- perhaps it can only write to <filename>.birdbrain.<extension>? perhaps it always confirms before writing? UX TBD

testing

need an efficient process for testing LLM + data stuff...

handle multiple tasks and planning?

going to skip impelmenting this for now, 1:1 keeps it much simpler. this should be easy to extend but I also have arguments for why this is a bad idea

implement text-to-sql cache

right now, a bot will always generate new SQL even if it (in the current or previous session) has seen the same text and correctly generated SQL

primarily for demo purposes, we should enable attaching a cache database w/:

  • text
  • SQL
  • accuracy_score (?)

we would need some way of manually populated this database (perhaps outside the scope of Ibis Birdbrain) and adding to it as new queries are seen, w/ the user somehow able to give an accuracy score/delete incorrect generations

language model response

currently, response messages are "hard coded". there is a lm_response attribute on the bot that, if True, should have a language model write a response

feat: chat with documentation

Currently, the bot can search the internet. However, we should establish a more efficient method of ingesting/querying open-source documentation and enabling chat-over-documentation. We may consider:

  • a custom solution
  • integrating w/ an existing solution
  • custom chunking/ingestion of documentation

We should get this working on Ibis Birdbrain, then Ibis, then extent to non-Ibis open-source projects that the bot can leverage. We may consider bot "flavors" or tools for loading/unloading documentation. The UX needs more thought for handling many projects.

conversational mode

re-implement conversational mode for the bot

there is currently a conversational: bool input to the bot that defaults to False. this work would, when setting that to True:

  • use all (or the relevant subset) of messages and attachments as input to the bot's flows
  • may or may not include smart methods to avoid context length issues, improve accuracy, decrease time to response, etc.

why?

right now, if I ask a bot w/ access to the IMDB database:

bot("what are the top rated movies?")

I get a response w/ a bunch of irrelevant movies

as a follow-up, if I ask:

bot("with at least 100k ratings")

the bot has no context of the previous message and results. ideally, it would "know" that I'm talking about appending instructions to the previous message, re-use the CodeAttachment from that preivous message, etc.

how?

could naively pass in all messages to the Flows. would need some way to pick out relevant messages (see Marvin classification, extraction methods)

feedback flow

allow a user to give feedback on birdbrain while using birdbrain

bot("you're a terrible bot") -> opens an issue w/ the conversation details and additional info prompted from the user

evaluate upgrading marvin/openai Python packages

Currently, only the combination of 1.3.0 for Marvin and 0.28.1 for OpenAI seem to work. If you try to upgrade OpenAI on this version of marvin to >1, you'll get various errors

If you try to upgrade Marvin to 1.5.6, you'll get errors with trying to use Azure OpenAI. regular OpenAI seems fine

Marvin has been working on supporting >1.0.0 OpenAI, but it's not ready yet. The timeline for this is unclear. The main issue this will cause is dependency conflicts with other libraries, as >v1 of OpenAI is expected to be the primary supported version for some time

one option is to vendor our the Marvin code from 1.3.0 and maintian it within Ibis Birdbrain going forward

docs: finish docs for launch

we need sufficient docs for someone landing on the project. in scope for now:

  • landing page
  • getting started tutorials
  • concepts
  • API reference

API reference can possibly be dropped out of scope

in the future, we would also want:

  • how-to guides
  • contributing guides
  • (possibly) separate posts from Ibis; use Ibis posts for now

transpile sql between dialects

implement the bot.transpile_sql (currently called translate_sql, recommend renaming) method likely just using SQLGlot directly

takes a SQL string as input. along w/ #53, good motivation for better attachment methods (get last SQL Attachment, get a SQL attachment by some input text, etc.)

feat: better handoff to Ibis

Currently, it's not intuitive to get access to Ibis objects after conversing with the bot. We should think about the ideal UX here and implement something.

feat: better internet search options

Currently, Internet search is through a free DuckDuckGo thing. This is fine, but I've noticed the results aren't great. We can add plugins/tools for other search engines, enabling multiple at once if we want to provide the best results. This would consist of adjusting the search_internet tool

ibis code

use ibis.decompile() on the expression to get Ibis code

upgrade marvin

issues w/ 1.4; try 1.5 when out, diagnose issues, etc

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.