jkomoros / prompt-garden Goto Github PK
View Code? Open in Web Editor NEWA framework for gardening LLM prompts
License: Apache License 2.0
A framework for gardening LLM prompts
License: Apache License 2.0
Useful once #23 is done
seeds/default.json
and just load them directlycountTokens
results for strings (since for some providers they go to the server)changeme
API key)getAPIKey
were changed to by default not throw but return '')The CLI should take a pointer to a profile directory (deafult to .profile
) and in it store the memory stores, the logs of runs, etc
FilesystemProfile
inherits from it.profiles/${NAME}/debug.log
.$NAME
so as long as other seeds don't start with a '$' then it's fine (no, the convention should be everyone prefixes a unique identifier in front of their variable)metadata.json
which mainly just stores a version. When loading up the profile, we check to see if the profile version is one we understand. For now if it doesn't match, just barf, but in the future we can upgrade the proflie.When it's fully federated, there's no way to discover good seeds, or see how highly rated / how successfully evaled they are.
complete
(and rename input
to prompt
?)--
in the command--all
is passedlist
command that prints out all seedsinquirer
live menu to select a seed to printdescription
in list.--mock
command to set the environment to mockanalyze
command that prints out var/lets (showing ones that are mismatched), store key values and namespaces, memory namespaces, whether there's a dynamic reference, and any remote seeds it references--packet
to pass multiple packets. Everywhere it's used (e.g. garden.diagram), have the treatment of an empty arrayIdeally there should be a way to seed it ... but then not have every single call just return the seeded value.
Maybe a random
and a seed-random
, and only when seed-random
is called is the random number generator for sub-seeds re randomized?
Environments get a random number generator. cloneWithSeed(newSeed) changes the seed. clone() uses the same generator as the top level.
shuffle
seed typeFor string manipulation
Introduced in #2.
Problem described in makeNestedSeedData.
https://stackoverflow.com/questions/76069429/defining-mutually-recursive-zod-schemas
Currently seed.grow() returns the final result, and there's no good way (other than turning on verbose and looking at logs) to see what the intermediate results are.
Also, seeds, when grown with the same environment, should reuse the same result unless force
is passed.
.grow() should return a Plant, which is a particular instantiated version of a seed with a given environment. seed.plant(env).grow(force). Seeds have a WeakMap memoization of env -> plant.
Garden should gain a plant(seedId, env)
convenience wrapper and grow(seedID, env)
convenience wrapper.
(Ensure that environments truly are readonly with typescript types)
See also a few comments in #3.
object
was added, since now every sub-key could just be a generic object, annoyingly...example-utility.json
doesn't work.... but it does for e.g. example-import.json
?strictUnions
(https://github.com/StefanTerdell/zod-to-json-schema and passing definitions for common ones)type
, the autocomplete is only to a subset. It's only later when you type in the type that it tells you which parameters to docomment
, description
, private
for autocomplete show up later, right now it's hard to tell which fields are actually the main ones for a seed_typelet-multi
with a sub-object schema checking doesn't workdefinitions
types (clearer error messages if nothing else)When making a complex seed packet, there are often a complex set of nested seeds. Some of them are entrypoints that should be exposed to a user, and some are just internal procedures that are factored out for convenience, but expect to be run in a context where certain variables are already set, for example, and will fail if they aren't.
Right now it's not possible to distinguish either of them.
Add a packet-level optional property, entrypoints
:
{
"version": 0,
"entrypoints": {
"main" : "This is a main seed"
},
"seeds": {
"main": ...,
}
}
The keys of entrypoints must all be seed IDs in this packet. The value is a user-facing description of what the seed does and its usage. In the future there might be other properties attached to an entrypoint, at which point the value for each entrypoint can be a sub-object with keys, and the case of just being a string will be sugar for a config object of shape {"description": ${STRING}}
.
If an entrypoint map is missing, it means that the entrypoints are implicitly all seeds, and the usage notes are the description
property of each seed, or '' if none is provided.
The CLI should use this information to, for example, allow listing entrypoints in a seed packet and choosing one to run. (Although it still should be possible to list all seeds in a packet with a CLI option). See #1.
If an entrypoints map is provided, at packet load time we should validate that each seed is a known ID. See #39
Originally tracked in #36.
Another way to do this is to add more properties, like hidden
or entrypoint:true
to Seed definition, and handle this entirely automatically and with convention without top-level properties in the packet. For example, how often will you NOT want to include a top-level seed as a an entrypoint?
Yeah, maybe the approach is a SeedDataBase.hidden?
: boolean field. Sub-seeds that are unrolled get a hidden:true
unless they already have an explicit hidden:false
. And you can provide a hidden:true
to top-level seeds for implementation details you want to hide. And then the CLI just knows to not print hidden seeds unless -a
is passed.
No, it should be private
. A seed marked private will not show up in the CLI (unless -a
is passed), and also may not be included from another seed packet (add a garden.publicSeed(ref)
that will fail if the seed is private, and all sub-seed fetches use publicSeed
-a
is passedexample-simple.json
example-complex.json
utility.json
seed packet of convenence methods for things like prompt-var
See also #11.
Test
If an array, its length. If it a string, it's length. If an object, its number of properties
Necessary for multi-user
The simplest one simply wates a configurable number of milliseconds.
But if there were some way to wait until a signal comes from another system, you could create agents
that are complex graphs of seeds.
Currently memories are always local to a profile. But ideally you shoudl be able to 'dial' a remote memory, and they should possibly require an access key to read and/or write.
At that point it's basically a polymath instance (although with the ability to write to the memory sometimes)
An extension of #23.
Needs a way to allow read and write privileges on remote memories. (And also to make sense of read/write privileges on local memories)
This would allow calculating the ID to include
Type: dynamic
.
A reference
property that expects an object, and will interpret it like a SeedReference. (Wait, won't the engine interpret a reference as a reference to execute?). Maybe reference
should be a packedSeedReference. (That has the benefit that it's not interpeted by the rest of the machinery as a reference)
Make sure the use case of "pass in a seed graph root to execute" works.
Update documentation for things like seed.references()
to make it clear that it does not include dynamic seed references.
dynamic should have a allow_remote
property. Unless it's set to true, then a reference that goes to a remote packet won't load.
reference
, which takes seed_id
and packet
, and auto-merges the current location of the seed and returns a packedReference.dynamic
, which takes a reference (test to make sure it rejects remote callsdynamic
in greenallow_remote
optional boolean to dynamic.disallow_remote
that seeds can't read back, but sub-seeds below that point in environment will throw if a remote is tried to access even if allow_remote
. And that's true for seed references that are direct
disallow_fetch
)Will help with debugging control flows
diagram
property that outputs a mermaid diagram defintiion--packet
)It's best practice to attach a prefix (e.g. komoroske.com
) to variable names in var/let, to store IDs, and memoryIDs. This helps ensure that different seeds from different authors don't stomp on each other's state.
But it's super tedious and annoying to add it everywhere within a file.
Add a namespace
environment variable.
The behavior is:
When computing the variable name for var/let, the memory ID for memory, or the store ID for store, if the prefix property is not empty, and the var name is not a known environment name (skip this for storeID and memoryID), and the var does not already include something before a prefix, replace the variable name / memory ID / store ID with prefix:varname
.
Another approach is to only do this for varNames etc that start with .
.
When used in combination with packet-level environment (issue #38), the convention is to set the prefix at the top-level packet environment and never think about it again.
This expansion is done at the time of the seed being executed, not before. The pro is that we can late-bind the prefix (so, for example, you don't need to prepend it in the packet-level environment if you're also specifying a model, because it won't be prefixed until being used). The con is that static analysis tools that look at the seed graph will need to know about this behavior to appropriately figure out the variable names (and also perform lets/vars in that graph traversal).
Document in README.md how namespace works. Describe the inner workings, but then summarize: "Include a namespace in the environment
of each packet. Whenever you need to reference a var, memory, or fact from another author, include author-prefix.com:
in front, and otherwise don't think about it."
validateSeedPacket should verify that any namespace that is set does not include :
.
How to handle _default_memory
et al? In some cases you really do want to overlap in the commons. If you set a namespace
, how do you say don't namespace this name
for one of memory, or store, or whatever? Maybe the answer is that all of those memory/store ids are namespaced, just with :_default_store
or whatever. And that way if you want to do a memory name that's not the same as the namespace you're using you can just manually configure :_default_store
. (_default_store should have its name changed if we expect people to every so often type it...)
Originally tracked in #36.
:
isNamespaced(input : StoreID | MemoryID | VarID) : boolean
name-limerick
break? (broke in cb74ab6, it is a object auto-expansion that's not working)For doing a sub-seed while letting each value in the map.
It would be possible to do this entirely in userland if there were +
and other arithmetic
fromEnd
bool to index
includes
, starts_with
, and ends_with
functionobject
seed type that constructs an object of values.property
seed type that plucks a given value from an objectlet
seed type that stores an object in an over-written environment (grow will need the environment passed to it explicitly)var
seed type that retrieves a named object from the enviornment (except for explicitly enumerated secret keys like openai_api_key
.Allows fetching from a remote origin. This is the primary way to get things in and out. It's also dangerous!
Fetches from seeds from local seed packets should be allowed, but the first time a remote seed fetches from a given URL, the user should get a chance to confirm (with a 'allow from now on'). The profile should store which seed packet locations the user has allowed to remember which ones need a confirmation.
The fetch should allow configuring subsubset of fetch
parameters.
Having an ability to store a secret store for a seed packet that only it can reference would be useful for API keys for fetch that the library isn't aware of for its own use.
property
seed type should work for dotted gets so it's possible to traverse into them
json
(default), string)disallow_fetch
protected propertyIt should take an array of sub-statements to execute, and return an array of each one's values.
Perhaps it should be one type with a boolean?
Kind of similar to type object
, and will also likely have some funky type issues like that one did
Should just be array
, and then later have a parallel
optional bool
Actually wait, doesn't this work already with property
?
{
"type": "template",
"string": "This is a {{name}} that is {{age}}.",
"values": {
"name": "Bob,
"age": 13,
"ignoredKey": "This key is ignored because it's not used in the template"
}
}
Will require a way for values to be an object. The test in getProperty
will have to be different
extract type
hould be mirror of template. Should be hand rolled and be {{ }}. Name of variable then pipeline type coercion. Allow fuzzy types for Boolean “yes” “y”, etc.
Allow explicit regexs. Allow loops (vars inside will be an array. ) allow optional (ok for it to not match)
Should it just be golang template syntax?
Loops are handled like this:
{{ loop items|modifier }}
{{ index|int|optional }}) {{item}}
{{ end }}
For the content
{
items: [
{index: 1, item: 'foo'},
{index: 2, item: 'bar'}
]
}
Would generate:
1) foo
2) bar
The content after the loop command, and before the first newline, is removed. The text before the loop end modifier that comes after a \n is also ignored.
Loop implies that there is an object with the given name in the results. Each item in the array is an object whose properties are used within the loop context for the inner items.
Loops may nest.
{{
then for each sub segment checks to verify configuration is valid and puts out a template part.{{
, }}
and |
inside of string values in arguments to patterns"
wrapping defaultdefault:'val'
loop
sub type to TemplatePart@loop:name
? Like, the argument to the loop is the name? That's less weird than the loop name being a random pieceoptional
working for loopspattern
has a quantifier without a non-greedy version (which won't work with loops), throw or silently fix it._
then don't extract anything for itwhitespace
modifier that matches whitespace. Typically used with _
.else
modifier for loop (what shows up if no items to loop)json
modifier type, to extract json (or render it via JSON.stringify())`eslint-disable-next-line
in the json rendering
testdotted property extract in loop works
ideally would be 342
but is 3
because of non-greedy matching. Really we want greedy matching to happen as long as the characters don't match a {
I think>?@
to %
{{-
and -}}
modifiers to strip whitespace like Golang doesdescribe
file://localhost/foo
file for relativeA SeedPacket where the seeds have instead of SeedRef, a SeedData, which is expanded to be a topline seed.
id
field to seeds (which is validated to verify it matches the local ID) since some seeds will otherwise have implied IDs. If it's not filled in then it will be filled in with a generic and deterministic ID so you don't HAVE to create oneType checking helps find a number of errors in seed packet definition at parse time. But there are other possible errors that it would be good to catch earlier.
For example, SeedReference
to an invalid seed in the same packet. In the future there will likely be other problems (and possibly other warnings, a lint). Once there is a verifyPacket() for an unrolled packet, I'm sure we'll figure out other things to verify.
Once this is done, a convention for let/var is to have a named seed that wraps it:
{
"get-user-name": {
"type": "var",
"name": {
"id": "get-user-name-name"
}
},
"get-user-name-name": {
"type": "noop",
"value": "user-name"
}
}
That way you can fail at packet parse if you have the wrong variable name.
If you want this to be internal, make sure it's marked private:true
. But if it's private:false
then it's able to be used by other seed packets.
Originally tracked in #36.
var
, which is an unnecessary let. Return an error (not thrown) from vverifySeedPacket. (don't throw it if threre's an out-of-packet seed reference that we haven't loaded)noop
seed type (like log but just literally returns value.--warn
is setIf you want to have an object or array with computed sub-properties, you have to wrap it in either a type:object
or type:array
. This is annoying, easy to get wrong, and adds an unnessecary level of extra indirection during authoring.
As a pre-processing step before the seed packet has nested seeds unrolled, process the object. Iterate through them, and if we find any with a SeedReference or SeedData property, then replace the parent with a type:object
wrapper (and do the same for arrays, too). This should be a pretty simple transformation and handle most cases fine.
(Once we make it so seed references have a seed
, not id
property, it will make this behavior even more resilient... as long as your sub-objects don't have a type
or seed
property then it will work as you think.)
Document comprehensively how to use the library
A permanent version of let/var.
Should store in a simple key/value store in .profiles
like AssociativeMemory.
Ensure it's inputValue (that is, only of types that are possible to persist in JSON)
The convention, like with environment, should be to prefix a domain you control to the front of the key: "komoroske.com:var"
Currently the whole library assumes that seeds are a single layer of depth and call out to other seeds with references. There's a lot of processing that goes on to take a possibly nested input and unroll it, and things like private
, and manual manipulation in a diagram (#19) and fretting about the edge cases of "what if this private seed is called by another context"?
What if instead seeds were fundamentally able to be nested, and that was just always true?
We'd get rid of all of the duplicative machinery of nested vs unnested seeds, and be able to rip out a lot of unrolling machinery.
For the purposes of caching results, we'd use dotted name
syntax to store intermediate products in the cases where you need to show intermediate results.
This would have implications throughout the stack but might end up being easier overall.
Effectively, a way to do directory-like setting and value enumeration.
This might not require much other than allowing dotted names to be set and being aware of dots in let/var , store/retrieve
Dotted gets
A local associative memory plugged into the library and matching a typespec.
value
for memorize is an array of text, process them in parallel (combining into a single request) instead of sequentially (which adds a network round trip time * items.length)_default
) to something more dsitinctivee that it's being used in a memory context, not a proflie context (similar to ids having c_whatever
testing recall seed
by actually using real (cached) embeddings for those values. This is somewaht important to verify the sorting is actually correct and not backwards....profile/memory/MEMORY-NAME/${normalized_embedding_model_name}/hsnw.db
. This requires the embedding_model_name to not have any illegal path charactersrecall.k
seed argument to be omitted (needs new machinery possibly to allow optional. And then maybe allow memory
to be provided as optional argument (falling back to env.memory) on recall and memorize using same machinery))Maybe meta-nodes should be seed packets, with a defined '' entrypoint
This can be covered (mostly) by having a let with a seed-reference to the sub-seed, and then the sub-seed uses var
within. But that's confusing about ways to use it and which variables it expects to be set.
Perhaps some kind of special parameter on a seed that describes the environment it expects to be set? And then there can be some tooling to complain if those aren't set. (or it could just be convention, a utility seed of expect
which takes an object, and then verifies that getting the var for each is not undefined.)
keys
let
throw
(to throw an error)call
function
function
has defaultscall.function
be typed to only be allowed to be a seed functioncall.function
throw if the seed reference is not to a functionarg:
namespace (because var
will have to have it so it's confusing)call
fail if the sub-object is not a function
?)the convention is to pass parameters with an arg:
prefix.
Ideally there'd be more than just convention, so there could be toolnig around it.
E.g. a call
seed, which is basically a let-multi
with a nested seed reference for block. And a function
seed type that is basically an expect
node but with a defined set of parameters and defaults. These don't do anything semantically except make it very clear to tooling what the intention is, allowing listing them as entrypoints, detecing missing parameters, etc.
function
should have an array of arguments, prefixed with arg:
, calle dargs. And defaults
, which are a subset of args. And any that don't have defaults are required and will throw.
Should call
require the arg:
namespace or not?
A tool in the framework to save a new version of a seed and add it to a seed packet, referencing older versions of hte seed.
An experience where there's a directory of your seeds with a file for each seed, and each time you rerun package it adds the updated seeds to the package.
To allow doing a search for results
Allow passing .
in names to mean create and override sub-objects.
Do this for both var/let/let-multi and store/retrieve, as well as property
Note that this makes environment overlaying more complex and need a special helper.
Once this is done, perhaps have secrets
be the place secrets are stored?
The simplest place to do this is property
, where it doesn't even have to be that hard, and then it can be used for #45
property
property
should also have an else
parameterlet/let-multi
store/retrieve
+
*
\
floor
ceiling
exp
This requires plantSeedPacket() having a uri to start, which it can prefix each ID with, fixing up any references inside of itself.
The best practice is to set a particular memory
and store
(and, in the future, prefix
) that uses a distinctive prefix (e.g. komoroske.com
) at the root of each seedGraph to make sure that all of the references within it use the same model/store/prefix.)
However, this quickly gets extremely repetitive in packets that contain lots of entrypoints, and for ones that are intermediate entrypoints, requires duplicative environment let-multi
statements that were already done by seeds that called them.
We should add a property, environment
, in a packet. It's an optional property, and should specify an EnvironmentData
overlay. The starter environment for each seed in the packet will have that environment overlaid on whatever the previous environment was. It's as though each root-level seed is wrapped in a (hidden) let-multi
with those properties.
What should the property be named? environment
is a bit misleading because it implies it's the entire environment when really it's an overlay. let
is a bit wrong because semantically it's a let-multi
. vars
is wrong because we use var
to mean retrieve the environment. env
is wrong because that's the only place in user-facing semantics that call it that.
When combined with #37, the convention will be to set a prefix
in the packet-level environment every time.
A typical use case for this will be to set a namespace, but also set things like the model
. The latter requires the former. Perhaps have the value be an object or an array of EnvironmentData. If it's an array then it should create the environment by overlaying each time. Actually, this might not be required, because namespace
is late bound, so you can set it in the same block as another thing.
Originally tracked in #36.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.