Git Product home page Git Product logo

aipl's Introduction

Welcome to Saulville

I live in the terminal, at the intersection of data and fun:

If you find this kind of stuff interesting, please reach out. Email me at [email protected], or come chat with us on IRC (libera.chat/#visidata) or on Discord. I'd love to talk with you!

aipl's People

Contributors

akamgm avatar anjakefala avatar cthulahoops avatar dovinmu avatar saulpw avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

aipl's Issues

`python-expr` one-liner seems to be a no-op

The goal here is to extract out the string value of the first key in each item in the examples list.

!format
https://raw.githubusercontent.com/google/BIG-bench/main/bigbench/benchmark_tasks/strategyqa/task.json
!read

!json-parse examples=examples
!!python-expr>key
list(examples[0]['target_scores'].keys())[0]

Rankout of 0.5 that goes directly to `call_cmd` will return a dict

 !literal>feigenbaum                                                                                                                                  
 4.66920                                                                                                                                              
 !!python                                                                                                                                             
 from aipl import defop                                                                                                                               
 from aipl.table import Table                                                                                                                         
 @defop('test', 1.5, 0.5)                                                                                                                               
 def testtest(aipl, t:Table) -> str:                                                                                                                  
     return dict(theanswer='42')                                                                                                                                      
                                                                                                                                                      
 !test                                                                                                                                                
 !format                                                                                                                                              
 {feigenbaum}                                                                                                                                         
 # AIPL Error (line 11 !format): 'feigenbaum'                                                                                                         
 !print       

Will result in a table that looks like this:

Screenshot from 2023-06-11 22-48-12
This is because we went directly into call_cmd without going through any of the recursion steps in eval_op.

This means that we never convert the dict returned by !test into a LazyRow with columns. The structure of the table gets interpreted as a LazyRow that is holding a dict that also contains a __parent key, and !format won't know to search through it.

Originally posted by @anjakefala in #23 (comment)

`!python`

!python no longer seems to be working on latest develop:

!python
print('hi')

adding a debug line in python.py:

def inner_exec(obj, *args, **kwargs):
    print('this', obj)
...

Just prints "this"

AIPL reads env vars at time of import, not instantiation

I have a simple flask server that serves results from running AIPL scripts over HTTP. I have to load_dotenv() so it has access to API keys like OpenAI's. But if I do this (at server start) it doesn't have access:

from aipl import AIPL
load_dotenv()
aipl = AIPL()

instead I have to do this:

load_dotenv()
from aipl import AIPL
aipl = AIPL()

can't load multiple scalar values from json

Two issues here:

  • name is treated as an iterable but it's a string so I just want it in a single scalar
  • description is ignored
!format
https://raw.githubusercontent.com/google/BIG-bench/main/bigbench/benchmark_tasks/strategyqa/task.json
!read

!json-parse name=name description=description
!columns name description
!json 2

How to provide API token?

Neat project! This may be obvious or I missed it in the README but how do you provide the chat gpt API token? I don't see an argument or env bar or something similar.

When is !format unable to find a reference?

 !literal>feigenbaum                                                                                                                                  
 4.66920                                                                                                                                              
 !!python                                                                                                                                             
 from aipl import defop                                                                                                                               
 from aipl.table import Table                                                                                                                         
 @defop('test', 1.5, 0)                                                                                                                               
 def testtest(aipl, t:Table) -> str:                                                                                                                  
     return '42'                                                                                                                                      
                                                                                                                                                      
 !test                                                                                                                                                
 !format                                                                                                                                              
 {feigenbaum}                                                                                                                                         
 # AIPL Error (line 11 !format): 'feigenbaum'                                                                                                         
 !print       

Hypothesis: rankout 0 and 0.5 do not set the __parent. Only 1 and 1.5 do. Since !test in this case also replaces the source table, feigenbaum is lost.

should python fns / vars be in scope for AIPL ops?

!!python
@defop('test', 0, 0)
def myfun(aipl, v:str) -> str:
    return 'hi'

!format>myfun
hello

!format
{myfun}
!print

Result:
<function myfun at 0x105d08540>

I think I'd say probably yes, though I'd think that maybe myfun should be overwritten by the !format

Add CI for AIPL

  • install and run pytest .
  • add banner to README with link to discord and test results

!cross ordering

I want to run a series of items through a number of models, but I want to load each model exactly once and then do all the inferences on it before moving to the next one. I have this example script:

# get list of models we're testing
!read
data/simple-models.txt
!split>model sep=\n
!global models

# get our task. prompt and examples are strings, data is a list of objects with key "datum"
!read
data/simple-task.json
!json-parse
!ref data
!cross <models

!format
{model}: {datum}
!print

But it prints out the following:

model1: datum1
model2: datum1
model3: datum1
model1: datum2
model2: datum2
model3: datum2
model1: datum3
model2: datum3
model3: datum3

Here is simple-task.json

{
    "prompt": "prompt",
    "example_datum": "example_datum",
    "example_response": "example_response",
    "data": [
        {
          "datum": "datum1"
        },
        {
          "datum": "datum2"
        },
        {
          "datum": "datum3"
        }
    ]
}

and simple-models.txt:

model1
model2
model3

tracking tokenization and cost for different LLMs

Estimating the cost of each LLM call is a handy tool, but if we start supporting non-OpenAI models we might want the ability to do that in a more general way. Specifically, GooseAI (adding in #30) doesn't return a count of tokens used, so we would need to know how to count the tokens given the prompt and completion (thus we'd need to know the tokenizer that the model used). The math is also different from the math for OpenAI, so if we wanted to improve / expand this feature it might make sense to have LLM provider-specific functions that can calculate the cost. We could also find a heuristic and make sure it's within error tolerance (usually just dividing the character count by a constant).

For now we'll probably just only have cost estimation for OpenAI.

when is this literal in scope?

!literal
gpt-3.5-turbo
gpt-neo-20b
!split>model sep=\n
<snip>
!llm>classification model={model} max_tokens=1
# 'model' is in scope
!format
{model} {classification} ({target_scores_Yes}): {statement}
# model is out of scope! "AIPL Error (line 31 !format): 'model'"

!!python
<snip>
!compute-precision>precision classification target_scores_Yes
!format
{model:15} {task:15} {precision}
!print
# model is in scope again

make all combinations of elements in multiple columns

It might be helpful to have the cartesian product of two or more columns. The use case here is iterating over the combination of the set of model IDs and tasks for BIG bench. Here's a toy example that I got working. I used the csv-load op in #30 on the following CSV:

agent,problem
pikachu,needing to recharge
poseidon,being too wet
!!python
from aipl.table import Table

@defop('cartesian-product', 1.5, 1.5)
def _cartesian(aipl, t:Table, col1, col2) -> dict:
    column1 = []
    column2 = []
    for row in t:
        column1.append(row[col1])
        column2.append(row[col2])
    ret = []
    for el1 in column1:
        for el2 in column2:
            ret.append({col1: el1, col2: el2})
    return ret

!csv-parse test.csv
!cartesian-product agent problem

!format
Please respond with a succinct suggestion for {agent} struggling with {problem}.
!llm
!format
Q: Please respond with a succinct suggestion for {agent} struggling with {problem}.
A: {_}
---
!print

Ideally we wouldn't need to specify the column names.

useful Python errors supressed

Writing Python code in a !python op that errors (eg from not importing something that you're using) leads to this confusing error:

(global) task = result of 
Traceback (most recent call last):
  File "/Users/rowan/Documents/aipl/venv/bin/aipl", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/Users/rowan/Documents/aipl/aipl/main.py", line 79, in main
    aipl.run(open(fn).read(), *inputs)
  File "/Users/rowan/Documents/aipl/aipl/interpreter.py", line 105, in run
    cmds = self.parse(script)
           ^^^^^^^^^^^^^^^^^^
  File "/Users/rowan/Documents/aipl/aipl/interpreter.py", line 95, in parse
    result = self.eval_op(command, Table(), contexts=[self.globals])
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/rowan/Documents/aipl/aipl/interpreter.py", line 169, in eval_op
    assert not ret  # ignore return value (no rankout)
    ^^^^^^^^^^^^^^
AssertionError

This is the op that caused this:

!!python
@defop('json-parse-custom', 0, 1.5)
def op_json_parse(aipl, v:str, **kwargs) -> Table:
    pass

AIPL line counts are off by one

Easiest report case: create an aipl file with a single line and command, !abort. Running it will yield:

<Table [0 no cols] > -> python (line 1)
<Table [1 _0] <LazyRow row={'_': '...(35 bytes)> -> abort (line 2)
aborted 

I'm fairly certain that this is because !!python\n is prepended to the script input in the interpreter. There's the hacky fix of subtracting 1 from the line count, but it would be a lot nicer to update the parser to obviate the need for manually inserting the !!python.

How does the structure of a table get affected by an op?

  • How do you add a column?
  • How do you add a child table?
  • How do you replace the entire table?

e.g. will replace the table (rankin=1.5, rankout=0)

!literal>feigenbaum
 4.66920
 !!python
 from aipl import defop
 from aipl.table import Table
 @defop('test', 1.5, 0)
 def testtest(aipl, t:Table) -> str:
     return '42'
                                                                                
 !test

e.g. will add a column (rankin=0, rankout=0)

!literal>feigenbaum
 4.66920
 !!python
 from aipl import defop
 from aipl.table import Table
 @defop('test', 0, 0)
 def testtest(aipl, t:Table) -> str:
     return '42'
                                                                                
 !test

multiple reductions on data

I'm working on a binary classification script and I want to compute multiple stats on it at the end to see how well the model(s) did. But if I compute eg precision as a 1.5=>0 transform, then I can't use the same data to compute recall. Here's how I'm getting around that now:

def recall(t:Table, predictions:str, true_values:str) -> float:
    N = true_values.shape[0]
    return (true_values == predictions).sum() / N

def precision(t:Table, predictions:str, true_values:str) -> float:
    TP = ((predictions == 1) & (true_values == 1)).sum()
    FP = ((predictions == 1) & (true_values == 0)).sum()
    return TP / (TP+FP)

@defop('compute-accuracy', 1.5, 0.5)
def compute(aipl, t:Table, predictions_colname, true_values_colname) -> dict:
    true_values = to_np_int_array(t, true_values_colname)
    predictions = to_np_int_array(t, predictions_colname)
    r = recall(t, predictions, true_values)
    print(r)
    p = precision(t, predictions, true_values)
    print(p)
    return {
        'recall': recall(t, predictions, true_values),
        'precision': precision(t, predictions, true_values)
    }

But it would be awesome to figure out a better and more general way of supporting these operations.

Reduce duplication between pyproject.toml and setup.py

Since it seems like poetry cannot get what it needs from a standard setup.py, I will try to adapt and see if I can de-duplicate and put as much as we need into pyproject.toml.

(As an example of something I need, I really like installing in-development projects in an editable state, and it seems like to do that we need both pyproject.toml and setup.py? https://stackoverflow.com/questions/62983756/what-is-pyproject-toml-file-for.)

Originally posted by @anjakefala in #4 (comment)

`!llm` and `IndexError`

running into this weird aipl-breaking bug, which I am not getting in my main aipl dev folder but is only happening when I clone the repo and install it in a new environment:

rowan@seshat test % python3 -V
Python 3.11.3
rowan@seshat test % git clone https://github.com/saulpw/aipl
<snip>
rowan@seshat test % python3 -m venv venv-aipl-test
rowan@seshat test % source venv-aipl-test/bin/activate
(venv-aipl-test) rowan@seshat test % python3 -m pip install ./aipl
<snip>
(venv-aipl-test) rowan@seshat test % aipl -i
> !format
hi!

<Table [1 _0] <LazyRow row={'_': '...(35 bytes)> -> format (line 1)
┏━━━━━┓
┃ _1  ┃
┡━━━━━┩
│ hi! │
└─────┘
> !llm
<Table [1 _1] <LazyRow row={'_': '...(38 bytes)> -> llm (line 1)
Traceback (most recent call last):
  File "/opt/homebrew/lib/python3.11/site-packages/aipl/repl.py", line 46, in repl
    rich.print(inputs[-1])
  File "/opt/homebrew/lib/python3.11/site-packages/rich/__init__.py", line 74, in print
    return write_console.print(*objects, sep=sep, end=end)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/rich/console.py", line 1699, in print
    extend(render(renderable, render_options))
  File "/opt/homebrew/lib/python3.11/site-packages/rich/console.py", line 1311, in render
    render_iterable = renderable.__rich_console__(self, _options)  # type: ignore[union-attr]
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/aipl/ops/debug.py", line 72, in _rich_table
    cell = str(cell)
           ^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/aipl/__init__.py", line 12, in __str__
    return f'AIPL Error (line {self.linenum} !{self.opname}): {self.exception}'
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/aipl/__init__.py", line 25, in __str__
    return f'{self.args[1]}: {self.args[0]}'
              ~~~~~~~~~^^^
IndexError: tuple index out of range

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/homebrew/bin/aipl", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/aipl/main.py", line 90, in main
    repl(aipl, inputs)
  File "/opt/homebrew/lib/python3.11/site-packages/aipl/repl.py", line 50, in repl
    traceback.print_exc()
    ^^^^^^^^^
NameError: name 'traceback' is not defined
(venv-aipl-test) rowan@seshat test % 

I get the same thing with those commands in a file, although I have to add a command after !llm that tries to do something with what it outputs.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.