The prompt-garden from jkomoros

Add a seed_type embed

Blocks #23

Support other embedding models
Support batching of multiple bits of text
If provided multiple bits of text, batch it together in a single remote API call (I think this is also tracked in #23)

Testing

Remove the test seeds from seeds/default.json and just load them directly

Support other LLM providers

Create example eseds to try the same sub-graphs with different providers and see which ones are better
Document
countTokens should be able to take multiple things at once for one round-trip to production API for the ones that are not locally computed. (assuming those apis allow passing multiple bits of text)
Cache countTokens results for strings (since for some providers they go to the server)
Move growPrompt switch statement to being in providers/index.ts
computePrompt et al, if verbose is true, should print the model being used to the console.
Pop out the code to create a mocked embedding and use environment.random()
getProvidersWithAPIKeys() should have a defined priority order (update documenation in Selecting which model to use
) section
Suppress node warning when running using Google (fetch warning)
modelName should be a verified set of literals and each provider should verify it matches what they expect
google tokenCount should return a mocked count if environment says mock
if recall embedding is '' then just use a random one
Move countTokens switch statement to being in providers/index.ts
Move providers/index.ts to llms.ts?
Factor out Google request library and fetches into one google provider (and maybe put all of openai provider into one file too)
if you recall from a memory that was set with a different embedding size it fails (maybe not?)
have a way to switch LLM providers all at once, (completion_model and embedding_model at once) (if provider is set and embedding_model or completion_model is not, use the default model of type for provider. And then set provider based on the first provider that has a non-changeme API key)
Some way for models to detect which providers have API keys so they can automatically iterate through models to test sub-grapsh (this might just be a meta-node, if getAPIKey were changed to by default not throw but return '')

Local profiles

The CLI should take a pointer to a profile directory (deafult to .profile) and in it store the memory stores, the logs of runs, etc

Some way to have discovery/rating of seeds

When it's fully federated, there's no way to discover good seeds, or see how highly rated / how successfully evaled they are.

Seed_Type `prompt`

Support non-chat-based prompt (e.g. text-davicini-3)
Allow supporting other parameters (#5 )
Rename to complete (and rename input to prompt?)
Should allow setting model as an optional completionModelID parameter, defaulting to env.completion_model (all LLM-hitting methods should)

Seed_type `random`

Ideally there should be a way to seed it ... but then not have every single call just return the seeded value.

Maybe a random and a seed-random, and only when seed-random is called is the random number generator for sub-seeds re randomized?

Environments get a random number generator. cloneWithSeed(newSeed) changes the seed. clone() uses the same generator as the top level.

Fix the nested SeedData type definition problem

Introduced in #2.

Problem described in makeNestedSeedData.

https://stackoverflow.com/questions/76069429/defining-mutually-recursive-zod-schemas

Add caching and intermediate results

Currently seed.grow() returns the final result, and there's no good way (other than turning on verbose and looking at logs) to see what the intermediate results are.

Also, seeds, when grown with the same environment, should reuse the same result unless force is passed.

.grow() should return a Plant, which is a particular instantiated version of a seed with a given environment. seed.plant(env).grow(force). Seeds have a WeakMap memoization of env -> plant.

Garden should gain a plant(seedId, env) convenience wrapper and grow(seedID, env) convenience wrapper.

(Ensure that environments truly are readonly with typescript types)

Schema checking for seed packet files in VSCode

A way to mark some seeds as private

When making a complex seed packet, there are often a complex set of nested seeds. Some of them are entrypoints that should be exposed to a user, and some are just internal procedures that are factored out for convenience, but expect to be run in a context where certain variables are already set, for example, and will fail if they aren't.

Right now it's not possible to distinguish either of them.

Add a packet-level optional property, entrypoints:

{
  "version": 0,
  "entrypoints": {
    "main" : "This is a main seed"
  },
  "seeds": {
    "main": ...,
    
  }
}

The keys of entrypoints must all be seed IDs in this packet. The value is a user-facing description of what the seed does and its usage. In the future there might be other properties attached to an entrypoint, at which point the value for each entrypoint can be a sub-object with keys, and the case of just being a string will be sugar for a config object of shape {"description": ${STRING}}.

If an entrypoint map is missing, it means that the entrypoints are implicitly all seeds, and the usage notes are the description property of each seed, or '' if none is provided.

The CLI should use this information to, for example, allow listing entrypoints in a seed packet and choosing one to run. (Although it still should be possible to list all seeds in a packet with a CLI option). See #1.

If an entrypoints map is provided, at packet load time we should validate that each seed is a known ID. See #39

Originally tracked in #36.

Another way to do this is to add more properties, like hidden or entrypoint:true to Seed definition, and handle this entirely automatically and with convention without top-level properties in the packet. For example, how often will you NOT want to include a top-level seed as a an entrypoint?

Yeah, maybe the approach is a SeedDataBase.hidden? : boolean field. Sub-seeds that are unrolled get a hidden:true unless they already have an explicit hidden:false. And you can provide a hidden:true to top-level seeds for implementation details you want to hide. And then the CLI just knows to not print hidden seeds unless -a is passed.

No, it should be private. A seed marked private will not show up in the CLI (unless -a is passed), and also may not be included from another seed packet (add a garden.publicSeed(ref) that will fail if the seed is private, and all sub-seed fetches use publicSeed

CLI shouldn't show private seeds unless -a is passed

Complex examples

Add a `length` seed_type

If an array, its length. If it a string, it's length. If an object, its number of properties

Add debugging tools

Access control for seed packets

Necessary for multi-user

Add a `wait` seed_type

The simplest one simply wates a configurable number of milliseconds.

But if there were some way to wait until a signal comes from another system, you could create agents that are complex graphs of seeds.

Allow remote memories

Currently memories are always local to a profile. But ideally you shoudl be able to 'dial' a remote memory, and they should possibly require an access key to read and/or write.

At that point it's basically a polymath instance (although with the ability to write to the memory sometimes)

An extension of #23.

Needs a way to allow read and write privileges on remote memories. (And also to make sense of read/write privileges on local memories)

Add a `dynamic_reference` seedtype

This would allow calculating the ID to include

Type: dynamic.

A reference property that expects an object, and will interpret it like a SeedReference. (Wait, won't the engine interpret a reference as a reference to execute?). Maybe reference should be a packedSeedReference. (That has the benefit that it's not interpeted by the rest of the machinery as a reference)

Make sure the use case of "pass in a seed graph root to execute" works.

Update documentation for things like seed.references() to make it clear that it does not include dynamic seed references.

dynamic should have a allow_remote property. Unless it's set to true, then a reference that goes to a remote packet won't load.

Add reference, which takes seed_id and packet, and auto-merges the current location of the seed and returns a packedReference.
Add dynamic, which takes a reference (test to make sure it rejects remote calls
Mermaid diagram should render dynamic in green
Add an allow_remote optional boolean to dynamic.
Don't fail remote access if the remote packet is already loaded.
A way to set a secret key of disallow_remote that seeds can't read back, but sub-seeds below that point in environment will throw if a remote is tried to access even if allow_remote. And that's true for seed references that are direct
- It's not so much secret as much as a key that can never be set to false. It starts as false but then can only be set high.
- (Same thing for #45, just with disallow_fetch )
Add an example ot example-complex.json

Create a seed graph visualizer

Will help with debugging control flows

A `namespace` environment variable

It's best practice to attach a prefix (e.g. komoroske.com) to variable names in var/let, to store IDs, and memoryIDs. This helps ensure that different seeds from different authors don't stomp on each other's state.

But it's super tedious and annoying to add it everywhere within a file.

Add a namespace environment variable.

The behavior is:

When computing the variable name for var/let, the memory ID for memory, or the store ID for store, if the prefix property is not empty, and the var name is not a known environment name (skip this for storeID and memoryID), and the var does not already include something before a prefix, replace the variable name / memory ID / store ID with prefix:varname.

Another approach is to only do this for varNames etc that start with ..

When used in combination with packet-level environment (issue #38), the convention is to set the prefix at the top-level packet environment and never think about it again.

This expansion is done at the time of the seed being executed, not before. The pro is that we can late-bind the prefix (so, for example, you don't need to prepend it in the packet-level environment if you're also specifying a model, because it won't be prefixed until being used). The con is that static analysis tools that look at the seed graph will need to know about this behavior to appropriately figure out the variable names (and also perform lets/vars in that graph traversal).

Document in README.md how namespace works. Describe the inner workings, but then summarize: "Include a namespace in the environment of each packet. Whenever you need to reference a var, memory, or fact from another author, include author-prefix.com: in front, and otherwise don't think about it."

validateSeedPacket should verify that any namespace that is set does not include :.

How to handle _default_memory et al? In some cases you really do want to overlap in the commons. If you set a namespace, how do you say don't namespace this name for one of memory, or store, or whatever? Maybe the answer is that all of those memory/store ids are namespaced, just with :_default_store or whatever. And that way if you want to do a memory name that's not the same as the namespace you're using you can just manually configure :_default_store. (_default_store should have its name changed if we expect people to every so often type it...)

Originally tracked in #36.

Update the types for anything that takes a namespaced name to explicitly expect only up to a single :
Add a isNamespaced(input : StoreID | MemoryID | VarID) : boolean
Shouldn't type of letMulti be a record<varName, inputValue>`?
Make sure that empty store and memory ids work (and just make _default_store etc be '')
Update documentation, including base namespace
Update the examples to use this where appropriate
Document the explicti shared namespace as a commons
When did name-limerick break? (broke in cb74ab6, it is a object auto-expansion that's not working)

Add a `map` and `filter` seed_type

For doing a sub-seed while letting each value in the map.

It would be possible to do this entirely in userland if there were + and other arithmetic

push
spread (takes a and b, a and b can be objects or arrays (but must match). a or b can also be a non-array item, which will be automatically wrapped in arrays)
index (takes a container and a search. Container can be a string, an array, or an object)
Add a fromEnd bool to index
an includes, starts_with, and ends_with function
a slice() which can work on an array or string

Support seed_type `input`

Add a `fetch` seed_type

Allows fetching from a remote origin. This is the primary way to get things in and out. It's also dangerous!

Fetches from seeds from local seed packets should be allowed, but the first time a remote seed fetches from a given URL, the user should get a chance to confirm (with a 'allow from now on'). The profile should store which seed packet locations the user has allowed to remember which ones need a confirmation.

The fetch should allow configuring subsubset of fetch parameters.

Having an ability to store a secret store for a seed packet that only it can reference would be useful for API keys for fetch that the library isn't aware of for its own use.

property seed type should work for dotted gets so it's possible to traverse into them

Add a `array` seed_type

It should take an array of sub-statements to execute, and return an array of each one's values.

Perhaps it should be one type with a boolean?

Kind of similar to type object, and will also likely have some funky type issues like that one did

Should just be array, and then later have a parallel optional bool

Seed_type `choice`

Actually wait, doesn't this work already with property?

Support seed_type `template`

{
  "type": "template",
  "string": "This is a {{name}} that is {{age}}.",
  "values": {
    "name": "Bob,
    "age": 13,
    "ignoredKey": "This key is ignored because it's not used in the template"
  }
}

Will require a way for values to be an object. The test in getProperty will have to be different

extract type

hould be mirror of template. Should be hand rolled and be {{ }}. Name of variable then pipeline type coercion. Allow fuzzy types for Boolean “yes” “y”, etc.

Allow explicit regexs. Allow loops (vars inside will be an array. ) allow optional (ok for it to not match)

Should it just be golang template syntax?

Loops are handled like this:

{{ loop items|modifier }}
  {{ index|int|optional }}) {{item}}
{{ end }}

For the content

{
  items: [
    {index: 1, item: 'foo'},
    {index: 2, item: 'bar'}
  ]
}

Would generate:

1) foo
2) bar

The content after the loop command, and before the first newline, is removed. The text before the loop end modifier that comes after a \n is also ignored.

Loop implies that there is an object with the given name in the results. Each item in the array is an object whose properties are used within the loop context for the inner items.

Loops may nest.

VS Schema checking doesn't work in non-expanded Arrays and objects

With #42 fix, VS Code schema checking for objects within an array or object is turned off, because it just says 🤷 those are all InputValue.

For example, 6f7f735 was caused because the schema checking in the array was not turned on.

Allow remote seed packets

Allow seeds to reference seeds in other packets based on relative location (normalize nested references on packet planting based on the packet location)
Allow https absolute paths and fetching
Document regex types with describe
Seed should immediately return Plant() which has a value() that you await. This will be the way to expose intermediate results of sub-seeds.
Actually support mocked garden.fetchRemoteSeedPacket and test
A way to fetch all sub-seed referenced locations up front and validate the references before starting a compuation (check for loops in graphs)
Add a test for loading a seed in another packet that was already loaded, and then also one that must be loaded due to that reference
Use new URL() with a file://localhost/foo file for relative

Basic seed evaluation

Allow nested seed packets

A SeedPacket where the seeds have instead of SeedRef, a SeedData, which is expanded to be a topline seed.

Add a id field to seeds (which is validated to verify it matches the local ID) since some seeds will otherwise have implied IDs. If it's not filled in then it will be filled in with a generic and deterministic ID so you don't HAVE to create one
expandSeedData should actually unroll seeds
Add tests for nested data (and all ID auto-generating behavior)
Clean up naming (e.g. Nested for nested types, versus the current of empty for input and Expanded (confusingly) for ones that aren't nested)
Document

Do packet-level verfication

Type checking helps find a number of errors in seed packet definition at parse time. But there are other possible errors that it would be good to catch earlier.

For example, SeedReference to an invalid seed in the same packet. In the future there will likely be other problems (and possibly other warnings, a lint). Once there is a verifyPacket() for an unrolled packet, I'm sure we'll figure out other things to verify.

Once this is done, a convention for let/var is to have a named seed that wraps it:

{
  "get-user-name": {
    "type": "var",
    "name": {
      "id": "get-user-name-name"
    }
  },
  "get-user-name-name": {
    "type": "noop",
    "value": "user-name"
  }
}

That way you can fail at packet parse if you have the wrong variable name.

If you want this to be internal, make sure it's marked private:true. But if it's private:false then it's able to be used by other seed packets.

Originally tracked in #36.

Verify that every in-packet reference in a packet points to a seedID that exists.
Warn if there is a let on a private seed that has only one associated var, which is an unnecessary let. Return an error (not thrown) from vverifySeedPacket. (don't throw it if threre's an out-of-packet seed reference that we haven't loaded)
A noop seed type (like log but just literally returns value.
Warn if a namespace is not set
The CLI should print out warnings if --warn is set

Create a web app version

Single user GUI
Multi-user hosted version (possibly proprietary)

Allow computed objects and arrays without using `object` or `array`

If you want to have an object or array with computed sub-properties, you have to wrap it in either a type:object or type:array. This is annoying, easy to get wrong, and adds an unnessecary level of extra indirection during authoring.

As a pre-processing step before the seed packet has nested seeds unrolled, process the object. Iterate through them, and if we find any with a SeedReference or SeedData property, then replace the parent with a type:object wrapper (and do the same for arrays, too). This should be a pretty simple transformation and handle most cases fine.

(Once we make it so seed references have a seed, not id property, it will make this behavior even more resilient... as long as your sub-objects don't have a type or seed property then it will work as you think.)

Make it work for objects
Make it work for arrays
Document
Update type hierarchy to allow objects and arrays that have seeds in them...
Update complex examples to use it

Documentation

Document comprehensively how to use the library

Document how to create your own cards
Show a meta-prompt question in example.json

Add `persist` and `retrieve` seed_types

A permanent version of let/var.

Should store in a simple key/value store in .profiles like AssociativeMemory.

Ensure it's inputValue (that is, only of types that are possible to persist in JSON)

The convention, like with environment, should be to prefix a domain you control to the front of the key: "komoroske.com:var"

Manually test store/retreive in filesystem mode
Delete should return true if the key existed, false otherwise
Return null if value doesn't exist

Consider having actual nested-seeds

Currently the whole library assumes that seeds are a single layer of depth and call out to other seeds with references. There's a lot of processing that goes on to take a possibly nested input and unroll it, and things like private, and manual manipulation in a diagram (#19) and fretting about the edge cases of "what if this private seed is called by another context"?

What if instead seeds were fundamentally able to be nested, and that was just always true?

We'd get rid of all of the duplicative machinery of nested vs unnested seeds, and be able to rip out a lot of unrolling machinery.

For the purposes of caching results, we'd use dotted name syntax to store intermediate products in the cases where you need to show intermediate results.

This would have implications throughout the stack but might end up being easier overall.

Allow versioned seeds

Some way to do nested env or store value setting and listing

Effectively, a way to do directory-like setting and value enumeration.

This might not require much other than allowing dotted names to be set and being aware of dots in let/var , store/retrieve

Dotted gets

seed_type `memorize` and `recall`

A local associative memory plugged into the library and matching a typespec.

A `token_count` seed_type

Make the estimate be accurate for the model type (update documentation)

Support meta-nodes

Maybe meta-nodes should be seed packets, with a defined '' entrypoint

This can be covered (mostly) by having a let with a seed-reference to the sub-seed, and then the sub-seed uses var within. But that's confusing about ways to use it and which variables it expects to be set.

Perhaps some kind of special parameter on a seed that describes the environment it expects to be set? And then there can be some tooling to complain if those aren't set. (or it could just be convention, a utility seed of expect which takes an object, and then verifies that getting the var for each is not undefined.)

the convention is to pass parameters with an arg: prefix.

Ideally there'd be more than just convention, so there could be toolnig around it.

E.g. a call seed, which is basically a let-multi with a nested seed reference for block. And a function seed type that is basically an expect node but with a defined set of parameters and defaults. These don't do anything semantically except make it very clear to tooling what the intention is, allowing listing them as entrypoints, detecing missing parameters, etc.

function should have an array of arguments, prefixed with arg:, calle dargs. And defaults, which are a subset of args. And any that don't have defaults are required and will throw.

Should call require the arg: namespace or not?

A live editing experience for seeds

A tool in the framework to save a new version of a seed and add it to a seed packet, referencing older versions of hte seed.

An experience where there's a directory of your seeds with a file for each seed, and each time you rerun package it adds the updated seeds to the package.

Add a `search` seed_type

To allow doing a search for results

Allowed `dotted name` nested var and store retrieval

Allow passing . in names to mean create and override sub-objects.

Do this for both var/let/let-multi and store/retrieve, as well as property

Note that this makes environment overlaying more complex and need a special helper.

Once this is done, perhaps have secrets be the place secrets are stored?

The simplest place to do this is property, where it doesn't even have to be that hard, and then it can be used for #45

Support dotted property names in property
property should also have an else parameter
Support dotted property names in let/let-multi
Su[port dotted property names in store/retrieve
Support dotted names as template names
Fix error introduced in f9f4023

Allow arithmetic seed_types

Support multiple seed packets in seeds/

This requires plantSeedPacket() having a uri to start, which it can prefix each ID with, fixing up any references inside of itself.

Allow packet-level environment overlay

The best practice is to set a particular memory and store (and, in the future, prefix) that uses a distinctive prefix (e.g. komoroske.com) at the root of each seedGraph to make sure that all of the references within it use the same model/store/prefix.)

However, this quickly gets extremely repetitive in packets that contain lots of entrypoints, and for ones that are intermediate entrypoints, requires duplicative environment let-multi statements that were already done by seeds that called them.

We should add a property, environment, in a packet. It's an optional property, and should specify an EnvironmentData overlay. The starter environment for each seed in the packet will have that environment overlaid on whatever the previous environment was. It's as though each root-level seed is wrapped in a (hidden) let-multi with those properties.

What should the property be named? environment is a bit misleading because it implies it's the entire environment when really it's an overlay. let is a bit wrong because semantically it's a let-multi. vars is wrong because we use var to mean retrieve the environment. env is wrong because that's the only place in user-facing semantics that call it that.

When combined with #37, the convention will be to set a prefix in the packet-level environment every time.

A typical use case for this will be to set a namespace, but also set things like the model. The latter requires the former. Perhaps have the value be an object or an array of EnvironmentData. If it's an array then it should create the environment by overlaying each time. Actually, this might not be required, because namespace is late bound, so you can set it in the same block as another thing.

Originally tracked in #36.

Fix the sub-seed overlay problem documented below.
Update complex examples to not use top-level let but instead use environment.
Fix the edge case for nested seeds getting the environment re-set for them and possibly overriding intermediate lets.

jkomoros / prompt-garden Goto Github PK

prompt-garden's People

Contributors

Stargazers

Watchers

prompt-garden's Issues

Recommend Projects

Recommend Topics

Recommend Org