Corresponding scripts to my rant on how maybe we shouldn't give LLMs standardized tests.
sat_evals.py
contains the script for the two evals (text and vision), the three .cache
files contain sampling results.
asy-eval.txt
is an unscientific evaluation of how well LLMs understand the Asymptote Graphics Language, which I did by copy and pasting the prompts in that file to my GPT-4 cli tool :)