saulpw / aipl Goto Github PK

View Code? Open in Web Editor NEW

119.0 119.0 7.0 408 KB

Array-Inspired Pipeline Language

License: MIT License

Python 98.30% Vim Script 1.70%

aipl's Introduction

Welcome to Saulville

I live in the terminal, at the intersection of data and fun:

I created VisiData, a magical little utility for slicing and dicing data.
The devottys (of which I am a founding member) are a small group of terminal fans who are constantly on the lookout for collaborators on small art projects and games.
You can sponsor my work on Patreon or via Github Sponsors.
Fun fact: In 2016, I uncovered a crossword scandal while cleaning the largest collection of crossword puzzles on the planet into my own text crossword format.

If you find this kind of stuff interesting, please reach out. Email me at [email protected], or come chat with us on IRC (libera.chat/#visidata) or on Discord. I'd love to talk with you!

aipl's People

Contributors

Stargazers

Watchers

Forkers

anjakefala hbcbh1999 cthulahoops standardgalactic ishaan-jaff akamgm 5l1v3r1

aipl's Issues

`python-expr` one-liner seems to be a no-op

The goal here is to extract out the string value of the first key in each item in the examples list.

!format
https://raw.githubusercontent.com/google/BIG-bench/main/bigbench/benchmark_tasks/strategyqa/task.json
!read

!json-parse examples=examples
!!python-expr>key
list(examples[0]['target_scores'].keys())[0]

Rankout of 0.5 that goes directly to `call_cmd` will return a dict

 !literal>feigenbaum                                                                                                                                  
 4.66920                                                                                                                                              
 !!python                                                                                                                                             
 from aipl import defop                                                                                                                               
 from aipl.table import Table                                                                                                                         
 @defop('test', 1.5, 0.5)                                                                                                                               
 def testtest(aipl, t:Table) -> str:                                                                                                                  
     return dict(theanswer='42')                                                                                                                                      
                                                                                                                                                      
 !test                                                                                                                                                
 !format                                                                                                                                              
 {feigenbaum}                                                                                                                                         
 # AIPL Error (line 11 !format): 'feigenbaum'                                                                                                         
 !print

Will result in a table that looks like this:

This is because we went directly into call_cmd without going through any of the recursion steps in eval_op.

This means that we never convert the dict returned by !test into a LazyRow with columns. The structure of the table gets interpreted as a LazyRow that is holding a dict that also contains a __parent key, and !format won't know to search through it.

Originally posted by @anjakefala in #23 (comment)

`!python`

!python no longer seems to be working on latest develop:

!python
print('hi')

adding a debug line in python.py:

def inner_exec(obj, *args, **kwargs):
    print('this', obj)
...

Just prints "this"

AIPL reads env vars at time of import, not instantiation

I have a simple flask server that serves results from running AIPL scripts over HTTP. I have to load_dotenv() so it has access to API keys like OpenAI's. But if I do this (at server start) it doesn't have access:

from aipl import AIPL
load_dotenv()
aipl = AIPL()

instead I have to do this:

load_dotenv()
from aipl import AIPL
aipl = AIPL()

can't load multiple scalar values from json

Two issues here:

name is treated as an iterable but it's a string so I just want it in a single scalar
description is ignored

!format
https://raw.githubusercontent.com/google/BIG-bench/main/bigbench/benchmark_tasks/strategyqa/task.json
!read

!json-parse name=name description=description
!columns name description
!json 2

How to provide API token?

Neat project! This may be obvious or I missed it in the README but how do you provide the chat gpt API token? I don't see an argument or env bar or something similar.

When is !format unable to find a reference?

 !literal>feigenbaum                                                                                                                                  
 4.66920                                                                                                                                              
 !!python                                                                                                                                             
 from aipl import defop                                                                                                                               
 from aipl.table import Table                                                                                                                         
 @defop('test', 1.5, 0)                                                                                                                               
 def testtest(aipl, t:Table) -> str:                                                                                                                  
     return '42'                                                                                                                                      
                                                                                                                                                      
 !test                                                                                                                                                
 !format                                                                                                                                              
 {feigenbaum}                                                                                                                                         
 # AIPL Error (line 11 !format): 'feigenbaum'                                                                                                         
 !print

Hypothesis: rankout 0 and 0.5 do not set the __parent. Only 1 and 1.5 do. Since !test in this case also replaces the source table, feigenbaum is lost.

should python fns / vars be in scope for AIPL ops?

!!python
@defop('test', 0, 0)
def myfun(aipl, v:str) -> str:
    return 'hi'

!format>myfun
hello

!format
{myfun}
!print

Result:
<function myfun at 0x105d08540>

I think I'd say probably yes, though I'd think that maybe myfun should be overwritten by the !format

Add CI for AIPL

install and run pytest .
add banner to README with link to discord and test results

!cross ordering

I want to run a series of items through a number of models, but I want to load each model exactly once and then do all the inferences on it before moving to the next one. I have this example script:

# get list of models we're testing
!read
data/simple-models.txt
!split>model sep=\n
!global models

# get our task. prompt and examples are strings, data is a list of objects with key "datum"
!read
data/simple-task.json
!json-parse
!ref data
!cross <models

!format
{model}: {datum}
!print

But it prints out the following:

model1: datum1
model2: datum1
model3: datum1
model1: datum2
model2: datum2
model3: datum2
model1: datum3
model2: datum3
model3: datum3

Here is simple-task.json

{
    "prompt": "prompt",
    "example_datum": "example_datum",
    "example_response": "example_response",
    "data": [
        {
          "datum": "datum1"
        },
        {
          "datum": "datum2"
        },
        {
          "datum": "datum3"
        }
    ]
}

and simple-models.txt:

model1
model2
model3

tracking tokenization and cost for different LLMs

Estimating the cost of each LLM call is a handy tool, but if we start supporting non-OpenAI models we might want the ability to do that in a more general way. Specifically, GooseAI (adding in #30) doesn't return a count of tokens used, so we would need to know how to count the tokens given the prompt and completion (thus we'd need to know the tokenizer that the model used). The math is also different from the math for OpenAI, so if we wanted to improve / expand this feature it might make sense to have LLM provider-specific functions that can calculate the cost. We could also find a heuristic and make sure it's within error tolerance (usually just dividing the character count by a constant).

For now we'll probably just only have cost estimation for OpenAI.

Add tests/.aipl tests to CI

update bigbench-binary-classifier to use !cross

This depends on being able to load two txt files into separate arrays; for now I'm going to merge #30 with the python hack

when is this literal in scope?

!literal
gpt-3.5-turbo
gpt-neo-20b
!split>model sep=\n
<snip>
!llm>classification model={model} max_tokens=1
# 'model' is in scope
!format
{model} {classification} ({target_scores_Yes}): {statement}
# model is out of scope! "AIPL Error (line 31 !format): 'model'"

!!python
<snip>
!compute-precision>precision classification target_scores_Yes
!format
{model:15} {task:15} {precision}
!print
# model is in scope again

make all combinations of elements in multiple columns

It might be helpful to have the cartesian product of two or more columns. The use case here is iterating over the combination of the set of model IDs and tasks for BIG bench. Here's a toy example that I got working. I used the csv-load op in #30 on the following CSV:

agent,problem
pikachu,needing to recharge
poseidon,being too wet

!!python
from aipl.table import Table

@defop('cartesian-product', 1.5, 1.5)
def _cartesian(aipl, t:Table, col1, col2) -> dict:
    column1 = []
    column2 = []
    for row in t:
        column1.append(row[col1])
        column2.append(row[col2])
    ret = []
    for el1 in column1:
        for el2 in column2:
            ret.append({col1: el1, col2: el2})
    return ret

!csv-parse test.csv
!cartesian-product agent problem

!format
Please respond with a succinct suggestion for {agent} struggling with {problem}.
!llm
!format
Q: Please respond with a succinct suggestion for {agent} struggling with {problem}.
A: {_}
---
!print

Ideally we wouldn't need to specify the column names.

Have a !test-input operator

It will inject a literal, if there is no input, so we can run tests which use require-input.

Support a REPL for .aipl

useful Python errors supressed

Writing Python code in a !python op that errors (eg from not importing something that you're using) leads to this confusing error:

(global) task = result of 
Traceback (most recent call last):
  File "/Users/rowan/Documents/aipl/venv/bin/aipl", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/Users/rowan/Documents/aipl/aipl/main.py", line 79, in main
    aipl.run(open(fn).read(), *inputs)
  File "/Users/rowan/Documents/aipl/aipl/interpreter.py", line 105, in run
    cmds = self.parse(script)
           ^^^^^^^^^^^^^^^^^^
  File "/Users/rowan/Documents/aipl/aipl/interpreter.py", line 95, in parse
    result = self.eval_op(command, Table(), contexts=[self.globals])
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/rowan/Documents/aipl/aipl/interpreter.py", line 169, in eval_op
    assert not ret  # ignore return value (no rankout)
    ^^^^^^^^^^^^^^
AssertionError

This is the op that caused this:

!!python
@defop('json-parse-custom', 0, 1.5)
def op_json_parse(aipl, v:str, **kwargs) -> Table:
    pass

AIPL line counts are off by one

Easiest report case: create an aipl file with a single line and command, !abort. Running it will yield:

<Table [0 no cols] > -> python (line 1)
<Table [1 _0] <LazyRow row={'_': '...(35 bytes)> -> abort (line 2)
aborted

I'm fairly certain that this is because !!python\n is prepended to the script input in the interpreter. There's the hacky fix of subtracting 1 from the line count, but it would be a lot nicer to update the parser to obviate the need for manually inserting the !!python.

How does the structure of a table get affected by an op?

How do you add a column?
How do you add a child table?
How do you replace the entire table?

e.g. will replace the table (rankin=1.5, rankout=0)

!literal>feigenbaum
 4.66920
 !!python
 from aipl import defop
 from aipl.table import Table
 @defop('test', 1.5, 0)
 def testtest(aipl, t:Table) -> str:
     return '42'
                                                                                
 !test

e.g. will add a column (rankin=0, rankout=0)

!literal>feigenbaum
 4.66920
 !!python
 from aipl import defop
 from aipl.table import Table
 @defop('test', 0, 0)
 def testtest(aipl, t:Table) -> str:
     return '42'
                                                                                
 !test

multiple reductions on data

I'm working on a binary classification script and I want to compute multiple stats on it at the end to see how well the model(s) did. But if I compute eg precision as a 1.5=>0 transform, then I can't use the same data to compute recall. Here's how I'm getting around that now:

def recall(t:Table, predictions:str, true_values:str) -> float:
    N = true_values.shape[0]
    return (true_values == predictions).sum() / N

def precision(t:Table, predictions:str, true_values:str) -> float:
    TP = ((predictions == 1) & (true_values == 1)).sum()
    FP = ((predictions == 1) & (true_values == 0)).sum()
    return TP / (TP+FP)

@defop('compute-accuracy', 1.5, 0.5)
def compute(aipl, t:Table, predictions_colname, true_values_colname) -> dict:
    true_values = to_np_int_array(t, true_values_colname)
    predictions = to_np_int_array(t, predictions_colname)
    r = recall(t, predictions, true_values)
    print(r)
    p = precision(t, predictions, true_values)
    print(p)
    return {
        'recall': recall(t, predictions, true_values),
        'precision': precision(t, predictions, true_values)
    }

But it would be awesome to figure out a better and more general way of supporting these operations.

Update parser for globals -> tables

Reference: 98ff2cf

Rankout 0 and 0.5 that replace the top-level table lose their `__parent`

Reductions replace the top-level table.

Example code:

 !literal>feigenbaum                                                                                                                                  
 4.66920                                                                                                                                              
 !!python                                                                                                                                             
 from aipl import defop                                                                                                                               
 from aipl.table import Table                                                                                                                         
 @defop('test', 1.5, 0)                                                                                                                               
 def testtest(aipl, t:Table) -> str:                                                                                                                  
     return '42'                                                                                                                                      
                                                                                                                                                      
 !test                                                                                                                                                
 !format                                                                                                                                              
 {feigenbaum}                                                                                                                                         
 # AIPL Error (line 11 !format): 'feigenbaum'                                                                                                         
 !nop

Originally posted by @anjakefala in #23 (comment)

Reduce duplication between pyproject.toml and setup.py

Since it seems like poetry cannot get what it needs from a standard setup.py, I will try to adapt and see if I can de-duplicate and put as much as we need into pyproject.toml.

(As an example of something I need, I really like installing in-development projects in an editable state, and it seems like to do that we need both pyproject.toml and setup.py? https://stackoverflow.com/questions/62983756/what-is-pyproject-toml-file-for.)

Originally posted by @anjakefala in #4 (comment)

`!llm` and `IndexError`

running into this weird aipl-breaking bug, which I am not getting in my main aipl dev folder but is only happening when I clone the repo and install it in a new environment:

rowan@seshat test % python3 -V
Python 3.11.3
rowan@seshat test % git clone https://github.com/saulpw/aipl
<snip>
rowan@seshat test % python3 -m venv venv-aipl-test
rowan@seshat test % source venv-aipl-test/bin/activate
(venv-aipl-test) rowan@seshat test % python3 -m pip install ./aipl
<snip>
(venv-aipl-test) rowan@seshat test % aipl -i
> !format
hi!

<Table [1 _0] <LazyRow row={'_': '...(35 bytes)> -> format (line 1)
┏━━━━━┓
┃ _1  ┃
┡━━━━━┩
│ hi! │
└─────┘
> !llm
<Table [1 _1] <LazyRow row={'_': '...(38 bytes)> -> llm (line 1)
Traceback (most recent call last):
  File "/opt/homebrew/lib/python3.11/site-packages/aipl/repl.py", line 46, in repl
    rich.print(inputs[-1])
  File "/opt/homebrew/lib/python3.11/site-packages/rich/__init__.py", line 74, in print
    return write_console.print(*objects, sep=sep, end=end)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/rich/console.py", line 1699, in print
    extend(render(renderable, render_options))
  File "/opt/homebrew/lib/python3.11/site-packages/rich/console.py", line 1311, in render
    render_iterable = renderable.__rich_console__(self, _options)  # type: ignore[union-attr]
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/aipl/ops/debug.py", line 72, in _rich_table
    cell = str(cell)
           ^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/aipl/__init__.py", line 12, in __str__
    return f'AIPL Error (line {self.linenum} !{self.opname}): {self.exception}'
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/aipl/__init__.py", line 25, in __str__
    return f'{self.args[1]}: {self.args[0]}'
              ~~~~~~~~~^^^
IndexError: tuple index out of range

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/homebrew/bin/aipl", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/aipl/main.py", line 90, in main
    repl(aipl, inputs)
  File "/opt/homebrew/lib/python3.11/site-packages/aipl/repl.py", line 50, in repl
    traceback.print_exc()
    ^^^^^^^^^
NameError: name 'traceback' is not defined
(venv-aipl-test) rowan@seshat test %

I get the same thing with those commands in a file, although I have to add a command after !llm that tries to do something with what it outputs.

Cost computation for Gpt-4 is wrong.

The cost of gpt-4 is:

(0.03 * prompt_tokens / 1000 + 0.06 * response_tokens / 1000)