Git Product home page Git Product logo

prompt-garden's People

Contributors

jkomoros avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

prompt-garden's Issues

Add a seed_type embed

Blocks #23

  • Support other embedding models
  • Support batching of multiple bits of text
  • If provided multiple bits of text, batch it together in a single remote API call (I think this is also tracked in #23)

Testing

  • Remove the test seeds from seeds/default.json and just load them directly

Support other LLM providers

  • Create example eseds to try the same sub-graphs with different providers and see which ones are better
  • Document
  • countTokens should be able to take multiple things at once for one round-trip to production API for the ones that are not locally computed. (assuming those apis allow passing multiple bits of text)
  • Cache countTokens results for strings (since for some providers they go to the server)
  • Move growPrompt switch statement to being in providers/index.ts
  • computePrompt et al, if verbose is true, should print the model being used to the console.
  • Pop out the code to create a mocked embedding and use environment.random()
  • getProvidersWithAPIKeys() should have a defined priority order (update documenation in Selecting which model to use
    ) section
  • Suppress node warning when running using Google (fetch warning)
  • modelName should be a verified set of literals and each provider should verify it matches what they expect
  • google tokenCount should return a mocked count if environment says mock
  • if recall embedding is '' then just use a random one
  • Move countTokens switch statement to being in providers/index.ts
  • Move providers/index.ts to llms.ts?
  • Factor out Google request library and fetches into one google provider (and maybe put all of openai provider into one file too)
  • if you recall from a memory that was set with a different embedding size it fails (maybe not?)
  • have a way to switch LLM providers all at once, (completion_model and embedding_model at once) (if provider is set and embedding_model or completion_model is not, use the default model of type for provider. And then set provider based on the first provider that has a non-changeme API key)
  • Some way for models to detect which providers have API keys so they can automatically iterate through models to test sub-grapsh (this might just be a meta-node, if getAPIKey were changed to by default not throw but return '')

Local profiles

The CLI should take a pointer to a profile directory (deafult to .profile) and in it store the memory stores, the logs of runs, etc

  • Create Profile, an in-memory version of profile. FilesystemProfile inherits from it
  • Add the fetchLocal and the other local getter to profile
  • Add a logger and log everything through it
  • Have all logging happen to profile, .info() and have it decide if it should store or not based on verbose
  • FilesystemProfile should log to .profiles/${NAME}/debug.log.
  • CLI should allow setting a profile to use
  • Better ProfileFilesystem.log() format and resolve TODOs
  • Rename known environment variables to $NAME so as long as other seeds don't start with a '$' then it's fine (no, the convention should be everyone prefixes a unique identifier in front of their variable)
  • CLI shouldn't create a new profile unless --create is passed (to avoid creating a new profile when you made a typo
  • In the root of the profile should be metadata.json which mainly just stores a version. When loading up the profile, we check to see if the profile version is one we understand. For now if it doesn't match, just barf, but in the future we can upgrade the proflie.

Seed_Type `prompt`

  • Support non-chat-based prompt (e.g. text-davicini-3)
  • Allow supporting other parameters (#5 )
  • Rename to complete (and rename input to prompt?)
  • Should allow setting model as an optional completionModelID parameter, defaulting to env.completion_model (all LLM-hitting methods should)

Add a command line tool

  • Allow selecting a different seed to execute as a parameter
  • Command should print help
  • Don't require -- in the command
  • A way to load a seed from a specific file
  • For ids that are ambiguous (in multiple packets), ask using a choice UI in CLI first.
  • Private seeds should not be shown unless --all is passed
  • Add a list command that prints out all seeds
  • Use inquirer live menu to select a seed to print
  • Allow passing overrive key/value pairs to set in environment
  • diagram includePrivate
  • include description in list.
  • A tool to print all expanded seeds (to diagnose unrolling issues)
  • a --mock command to set the environment to mock
  • Add a command to list all named let/let-multi properties in a packet (or rooted from a seed). This will help find typos.
    • Generally a analyze command that prints out var/lets (showing ones that are mismatched), store key values and namespaces, memory namespaces, whether there's a dynamic reference, and any remote seeds it references
  • Allow --packet to pass multiple packets. Everywhere it's used (e.g. garden.diagram), have the treatment of an empty array

Seed_type `random`

Ideally there should be a way to seed it ... but then not have every single call just return the seeded value.

Maybe a random and a seed-random, and only when seed-random is called is the random number generator for sub-seeds re randomized?

Environments get a random number generator. cloneWithSeed(newSeed) changes the seed. clone() uses the same generator as the top level.

  • random seed type
  • Add random.min
  • Add random.max
  • Add random.round = 'none', 'floor', 'ceiling', 'round'
  • Add random.choice
  • random-seed seed type (default to now())
  • Add a shuffle seed type

Add caching and intermediate results

Currently seed.grow() returns the final result, and there's no good way (other than turning on verbose and looking at logs) to see what the intermediate results are.

Also, seeds, when grown with the same environment, should reuse the same result unless force is passed.

.grow() should return a Plant, which is a particular instantiated version of a seed with a given environment. seed.plant(env).grow(force). Seeds have a WeakMap memoization of env -> plant.

Garden should gain a plant(seedId, env) convenience wrapper and grow(seedID, env) convenience wrapper.

(Ensure that environments truly are readonly with typescript types)

See also a few comments in #3.

Schema checking for seed packet files in VSCode

  • When you fill in nested seeds the schema checking stops working. I think this happened when seed_type object was added, since now every sub-key could just be a generic object, annoyingly...
  • It actually works OK for things that are not value or nonObjectValue (e.g. recall.k).
  • schema checking on example-utility.json doesn't work.... but it does for e.g. example-import.json?
  • Consider using strictUnions (https://github.com/StefanTerdell/zod-to-json-schema and passing definitions for common ones)
  • When typing in a new seed and doing type, the autocomplete is only to a subset. It's only later when you type in the type that it tells you which parameters to do
  • Make it so the fields like comment, description, private for autocomplete show up later, right now it's hard to tell which fields are actually the main ones for a seed_type
  • When you do a let-multi with a sub-object schema checking doesn't work
  • Include more definitions types (clearer error messages if nothing else)
  • Lock down the types that are expected in the SeedData shape to be the type they actually expect (this will help with schema checking)
  • A bug in let seeds: you can't put a non-object. (Also in #18 )

A way to mark some seeds as private

When making a complex seed packet, there are often a complex set of nested seeds. Some of them are entrypoints that should be exposed to a user, and some are just internal procedures that are factored out for convenience, but expect to be run in a context where certain variables are already set, for example, and will fail if they aren't.

Right now it's not possible to distinguish either of them.

Add a packet-level optional property, entrypoints:

{
  "version": 0,
  "entrypoints": {
    "main" : "This is a main seed"
  },
  "seeds": {
    "main": ...,
    
  }
}

The keys of entrypoints must all be seed IDs in this packet. The value is a user-facing description of what the seed does and its usage. In the future there might be other properties attached to an entrypoint, at which point the value for each entrypoint can be a sub-object with keys, and the case of just being a string will be sugar for a config object of shape {"description": ${STRING}}.

If an entrypoint map is missing, it means that the entrypoints are implicitly all seeds, and the usage notes are the description property of each seed, or '' if none is provided.

The CLI should use this information to, for example, allow listing entrypoints in a seed packet and choosing one to run. (Although it still should be possible to list all seeds in a packet with a CLI option). See #1.

If an entrypoints map is provided, at packet load time we should validate that each seed is a known ID. See #39

Originally tracked in #36.

Another way to do this is to add more properties, like hidden or entrypoint:true to Seed definition, and handle this entirely automatically and with convention without top-level properties in the packet. For example, how often will you NOT want to include a top-level seed as a an entrypoint?

Yeah, maybe the approach is a SeedDataBase.hidden? : boolean field. Sub-seeds that are unrolled get a hidden:true unless they already have an explicit hidden:false. And you can provide a hidden:true to top-level seeds for implementation details you want to hide. And then the CLI just knows to not print hidden seeds unless -a is passed.

No, it should be private. A seed marked private will not show up in the CLI (unless -a is passed), and also may not be included from another seed packet (add a garden.publicSeed(ref) that will fail if the seed is private, and all sub-seed fetches use publicSeed

  • CLI shouldn't show private seeds unless -a is passed

Complex examples

  • Move current examples to example-simple.json
  • Create example-complex.json
  • An example that prompts for name and then returns it.
  • An example where it prompts for favorite things until a user gives '' (unless there are already some things) and then writes a limerick about the favorite things.
  • Favorite things limerick should allow a user to put in more things even if they already have a few.
  • Favorte things limterick should also include the user's name.
  • Remove unnecessary lets in the example-complex.json seeds
  • Generalize prompt-name into a meta-seed, where it takes the string to show to the user in the prompt, the location to store the value in.
  • Have a method to return all embedding models that have keys, all completion models, and all providers that have keys set
  • Have a seed that takes another seed and runs it on different llm providers and keeps track of answers
  • Have a seed that is told to look at a prompt and figure out inputs that might break the prompt
  • Add utility.json seed packet of convenence methods for things like prompt-var

See also #11.
Test

Add a `wait` seed_type

The simplest one simply wates a configurable number of milliseconds.

But if there were some way to wait until a signal comes from another system, you could create agents that are complex graphs of seeds.

Allow remote memories

Currently memories are always local to a profile. But ideally you shoudl be able to 'dial' a remote memory, and they should possibly require an access key to read and/or write.

At that point it's basically a polymath instance (although with the ability to write to the memory sometimes)

An extension of #23.

Needs a way to allow read and write privileges on remote memories. (And also to make sense of read/write privileges on local memories)

Add a `dynamic_reference` seedtype

This would allow calculating the ID to include

Type: dynamic.

A reference property that expects an object, and will interpret it like a SeedReference. (Wait, won't the engine interpret a reference as a reference to execute?). Maybe reference should be a packedSeedReference. (That has the benefit that it's not interpeted by the rest of the machinery as a reference)

Make sure the use case of "pass in a seed graph root to execute" works.

Update documentation for things like seed.references() to make it clear that it does not include dynamic seed references.

dynamic should have a allow_remote property. Unless it's set to true, then a reference that goes to a remote packet won't load.

  • Add reference, which takes seed_id and packet, and auto-merges the current location of the seed and returns a packedReference.
  • Add dynamic, which takes a reference (test to make sure it rejects remote calls
  • Mermaid diagram should render dynamic in green
  • Add an allow_remote optional boolean to dynamic.
  • Don't fail remote access if the remote packet is already loaded.
  • A way to set a secret key of disallow_remote that seeds can't read back, but sub-seeds below that point in environment will throw if a remote is tried to access even if allow_remote. And that's true for seed references that are direct
    • It's not so much secret as much as a key that can never be set to false. It starts as false but then can only be set high.
    • (Same thing for #45, just with disallow_fetch )
  • Add an example ot example-complex.json

Create a seed graph visualizer

Will help with debugging control flows

  • Seed.references() should return the direct nodes it references
  • A Packet() which has a diagram property that outputs a mermaid diagram defintiion
  • A tool to actually generate an SVG of the mermaid diagram and open a preview (https://www.npmjs.com/package/@mermaid-js/mermaid-cli)
  • Figure out why some references aren't showing up in the links
  • Add a diagram output in README
  • if includePrivate is false, the output is incorrect -- private sub-seeds of public seeds are rendered (but with an ugly name), and links from a public one to a private one then back out to a public one, instead of showing a link from the public one to the public one, shows no link between them
  • If includePrivate is false, and a public seed links to private ones, which then link out to a private one, show a link from the public to public one.
  • Include tooltips with more information on each one, including the values they have set
  • Links that contain numbers aren't handled correctly, they don't match (if it ends with a number, add an underscore)
  • Elide text that is a prefix of their parent (for private ones) for shorter graph names
  • Private seeds should be rendered a bit darker
  • Render remote subgraph in a different color
  • Allow specifyign to diagram() a location or list of locations to filter to. (If not provided, then use Object.keys(this._seeds)). Then any external ref whose absolute location isn't in the set of locations we're using will get the remote treatment that we currently do for https seeds.
    • Allow specifying this subset via CLI argument (same argument we use to say list a packet, --packet)
  • A mode to pretend like all private seeds are a part of their parent and just squash them in
  • Use sub-graphs for different packets (https://mermaid.js.org/syntax/flowchart.html?id=flowcharts-basic-syntax#subgraphs)
  • Add a simple CLI to print the diagram of the packet to stdout and exit
  • Allow includePrivate
  • fully remote references should show up
  • Ensure references between seed packets show up correctly

A `namespace` environment variable

It's best practice to attach a prefix (e.g. komoroske.com) to variable names in var/let, to store IDs, and memoryIDs. This helps ensure that different seeds from different authors don't stomp on each other's state.

But it's super tedious and annoying to add it everywhere within a file.

Add a namespace environment variable.

The behavior is:

When computing the variable name for var/let, the memory ID for memory, or the store ID for store, if the prefix property is not empty, and the var name is not a known environment name (skip this for storeID and memoryID), and the var does not already include something before a prefix, replace the variable name / memory ID / store ID with prefix:varname.

Another approach is to only do this for varNames etc that start with ..

When used in combination with packet-level environment (issue #38), the convention is to set the prefix at the top-level packet environment and never think about it again.

This expansion is done at the time of the seed being executed, not before. The pro is that we can late-bind the prefix (so, for example, you don't need to prepend it in the packet-level environment if you're also specifying a model, because it won't be prefixed until being used). The con is that static analysis tools that look at the seed graph will need to know about this behavior to appropriately figure out the variable names (and also perform lets/vars in that graph traversal).

Document in README.md how namespace works. Describe the inner workings, but then summarize: "Include a namespace in the environment of each packet. Whenever you need to reference a var, memory, or fact from another author, include author-prefix.com: in front, and otherwise don't think about it."

validateSeedPacket should verify that any namespace that is set does not include :.

How to handle _default_memory et al? In some cases you really do want to overlap in the commons. If you set a namespace, how do you say don't namespace this name for one of memory, or store, or whatever? Maybe the answer is that all of those memory/store ids are namespaced, just with :_default_store or whatever. And that way if you want to do a memory name that's not the same as the namespace you're using you can just manually configure :_default_store. (_default_store should have its name changed if we expect people to every so often type it...)

Originally tracked in #36.

  • Update the types for anything that takes a namespaced name to explicitly expect only up to a single :
  • Add a isNamespaced(input : StoreID | MemoryID | VarID) : boolean
  • Shouldn't type of letMulti be a record<varName, inputValue>`?
  • Make sure that empty store and memory ids work (and just make _default_store etc be '')
  • Update documentation, including base namespace
  • Update the examples to use this where appropriate
  • Document the explicti shared namespace as a commons
  • When did name-limerick break? (broke in cb74ab6, it is a object auto-expansion that's not working)

Add a `map` and `filter` seed_type

For doing a sub-seed while letting each value in the map.

It would be possible to do this entirely in userland if there were + and other arithmetic

  • push
  • spread (takes a and b, a and b can be objects or arrays (but must match). a or b can also be a non-array item, which will be automatically wrapped in arrays)
  • index (takes a container and a search. Container can be a string, an array, or an object)
  • Add a fromEnd bool to index
  • an includes, starts_with, and ends_with function
  • a slice() which can work on an array or string

Support seed_type `input`

  • Work in brower context
  • Work in CLI context
  • Add an example seed that contains an input that is passed to a prompt.
  • Add a object seed type that constructs an object of values.
  • Add a property seed type that plucks a given value from an object
  • Add a let seed type that stores an object in an over-written environment (grow will need the environment passed to it explicitly)
  • Add a var seed type that retrieves a named object from the enviornment (except for explicitly enumerated secret keys like openai_api_key.
  • Let should barf if you try to set a secret key
  • Let seed schema won't let you do non-objects for values or block
  • Allow choices, not just freeform text
  • Allow a enumerate for memory, stores (with an optional prefix)
  • Rename environment -> context

Add a `fetch` seed_type

Allows fetching from a remote origin. This is the primary way to get things in and out. It's also dangerous!

Fetches from seeds from local seed packets should be allowed, but the first time a remote seed fetches from a given URL, the user should get a chance to confirm (with a 'allow from now on'). The profile should store which seed packet locations the user has allowed to remember which ones need a confirmation.

The fetch should allow configuring subsubset of fetch parameters.

Having an ability to store a secret store for a seed packet that only it can reference would be useful for API keys for fetch that the library isn't aware of for its own use.

property seed type should work for dotted gets so it's possible to traverse into them

  • Add a fetch for remote URLs
  • Headers
  • Body
  • (other parameters)
  • format (json (default), string)
  • Add a fetch for local / relative URLs (if it needs to be supported differently)
  • Add disallow_fetch protected property
  • An option to not throw for http errors and instead pass back a string
  • When a seed from a remote packet initiates a fetch, ask the first time.
  • Filesystem profile should persist locations

Add a `array` seed_type

It should take an array of sub-statements to execute, and return an array of each one's values.

Perhaps it should be one type with a boolean?

Kind of similar to type object, and will also likely have some funky type issues like that one did

Should just be array, and then later have a parallel optional bool

Support seed_type `template`

{
  "type": "template",
  "string": "This is a {{name}} that is {{age}}.",
  "values": {
    "name": "Bob,
    "age": 13,
    "ignoredKey": "This key is ignored because it's not used in the template"
  }
}

Will require a way for values to be an object. The test in getProperty will have to be different

extract type

hould be mirror of template. Should be hand rolled and be {{ }}. Name of variable then pipeline type coercion. Allow fuzzy types for Boolean “yes” “y”, etc.

Allow explicit regexs. Allow loops (vars inside will be an array. ) allow optional (ok for it to not match)

Should it just be golang template syntax?

Loops are handled like this:

{{ loop items|modifier }}
  {{ index|int|optional }}) {{item}}
{{ end }} 

For the content

{
  items: [
    {index: 1, item: 'foo'},
    {index: 2, item: 'bar'}
  ]
}

Would generate:

1) foo
2) bar

The content after the loop command, and before the first newline, is removed. The text before the loop end modifier that comes after a \n is also ignored.

Loop implies that there is an object with the given name in the results. Each item in the array is an object whose properties are used within the loop context for the inner items.

Loops may nest.

  • Create own template that splits on {{ then for each sub segment checks to verify configuration is valid and puts out a template part.
  • Remove pupa dependency
  • Handle {{, }} and | inside of string values in arguments to patterns
  • Handle " wrapping default
  • Handle quoted strings within argumetn
  • Handle escaped quotes within the argument string
  • Support template.extract()
  • Make sure template.extract() handles special characters in the string literal portions
  • Support type:boolean
  • extract should be able to either error or return defaults if no match
  • Support type:number
  • Support arrays and fors
  • Support optional
  • Support pattern:'regex'
  • Support default:'val'
  • Add a loop sub type to TemplatePart
  • Extract the loop expression
  • (Ensure loop expression works for nested loops)
  • Get render working for loops
  • Test
  • Should hte syntax be @loop:name? Like, the argument to the loop is the name? That's less weird than the loop name being a random piece
  • Remove whitespace past a loop start up to and including the newline, and before the {{end}} up to the newline. (The intention of this is to make writing extraction loops with multi-lines and whitespace easier, but it might just be weird)
  • Allow a type of variable that takes multiple choices, and is a concatenation of choices. Have it be a choice modifier, that can show up multiple times and each time accumulates one more literal choice. The matching regexp should be a union of the literals.)
  • Document loops
  • Get optional working for loops
  • cache regexps in loops
  • if pattern has a quantifier without a non-greedy version (which won't work with loops), throw or silently fix it.
  • Support optional modifier on loop
  • If the name is _ then don't extract anything for it
  • Is there a bug where loops currently throw away anything that doesn't match between loop iterations (it just returns matchAll and is totally fine if there's arbitrary text between loop iterations)? That's useful for whitespace but presumably the behavior is weird for non whitespace.
  • Add a whitespace modifier that matches whitespace. Typically used with _.
  • Add a else modifier for loop (what shows up if no items to loop)
  • Add a json modifier type, to extract json (or render it via JSON.stringify())`
  • make template.render() take a type that allows JSON objects (and remove the eslint-disable-next-line in the json rendering test
  • Test dotted property extract in loop works ideally would be 342 but is 3 because of non-greedy matching. Really we want greedy matching to happen as long as the characters don't match a { I think>?
  • Switch control character from @ to %
  • Add {{- and -}} modifiers to strip whitespace like Golang does
  • Ensure that boolean extraction is case insensitive
  • Get extract working for loops and test
  • Document extract

Allow remote seed packets

  • Allow seeds to reference seeds in other packets based on relative location (normalize nested references on packet planting based on the packet location)
  • Allow https absolute paths and fetching
  • Document regex types with describe
  • Seed should immediately return Plant() which has a value() that you await. This will be the way to expose intermediate results of sub-seeds.
  • Actually support mocked garden.fetchRemoteSeedPacket and test
  • A way to fetch all sub-seed referenced locations up front and validate the references before starting a compuation (check for loops in graphs)
  • Add a test for loading a seed in another packet that was already loaded, and then also one that must be loaded due to that reference
  • Use new URL() with a file://localhost/foo file for relative

Allow nested seed packets

A SeedPacket where the seeds have instead of SeedRef, a SeedData, which is expanded to be a topline seed.

  • Add a id field to seeds (which is validated to verify it matches the local ID) since some seeds will otherwise have implied IDs. If it's not filled in then it will be filled in with a generic and deterministic ID so you don't HAVE to create one
  • expandSeedData should actually unroll seeds
  • Add tests for nested data (and all ID auto-generating behavior)
  • Clean up naming (e.g. Nested for nested types, versus the current of empty for input and Expanded (confusingly) for ones that aren't nested)
  • Document

Do packet-level verfication

Type checking helps find a number of errors in seed packet definition at parse time. But there are other possible errors that it would be good to catch earlier.

For example, SeedReference to an invalid seed in the same packet. In the future there will likely be other problems (and possibly other warnings, a lint). Once there is a verifyPacket() for an unrolled packet, I'm sure we'll figure out other things to verify.

Once this is done, a convention for let/var is to have a named seed that wraps it:

{
  "get-user-name": {
    "type": "var",
    "name": {
      "id": "get-user-name-name"
    }
  },
  "get-user-name-name": {
    "type": "noop",
    "value": "user-name"
  }
}

That way you can fail at packet parse if you have the wrong variable name.

If you want this to be internal, make sure it's marked private:true. But if it's private:false then it's able to be used by other seed packets.

Originally tracked in #36.

  • Verify that every in-packet reference in a packet points to a seedID that exists.
  • Warn if there is a let on a private seed that has only one associated var, which is an unnecessary let. Return an error (not thrown) from vverifySeedPacket. (don't throw it if threre's an out-of-packet seed reference that we haven't loaded)
  • A noop seed type (like log but just literally returns value.
  • Warn if a namespace is not set
  • The CLI should print out warnings if --warn is set

Allow computed objects and arrays without using `object` or `array`

If you want to have an object or array with computed sub-properties, you have to wrap it in either a type:object or type:array. This is annoying, easy to get wrong, and adds an unnessecary level of extra indirection during authoring.

As a pre-processing step before the seed packet has nested seeds unrolled, process the object. Iterate through them, and if we find any with a SeedReference or SeedData property, then replace the parent with a type:object wrapper (and do the same for arrays, too). This should be a pretty simple transformation and handle most cases fine.

(Once we make it so seed references have a seed, not id property, it will make this behavior even more resilient... as long as your sub-objects don't have a type or seed property then it will work as you think.)

  • Make it work for objects
  • Make it work for arrays
  • Document
  • Update type hierarchy to allow objects and arrays that have seeds in them...
  • Update complex examples to use it

Documentation

Document comprehensively how to use the library

  • Document how to create your own cards
  • Show a meta-prompt question in example.json

Add `persist` and `retrieve` seed_types

A permanent version of let/var.

Should store in a simple key/value store in .profiles like AssociativeMemory.

Ensure it's inputValue (that is, only of types that are possible to persist in JSON)

The convention, like with environment, should be to prefix a domain you control to the front of the key: "komoroske.com:var"

  • Manually test store/retreive in filesystem mode
  • Delete should return true if the key existed, false otherwise
  • Return null if value doesn't exist

Consider having actual nested-seeds

Currently the whole library assumes that seeds are a single layer of depth and call out to other seeds with references. There's a lot of processing that goes on to take a possibly nested input and unroll it, and things like private, and manual manipulation in a diagram (#19) and fretting about the edge cases of "what if this private seed is called by another context"?

What if instead seeds were fundamentally able to be nested, and that was just always true?

We'd get rid of all of the duplicative machinery of nested vs unnested seeds, and be able to rip out a lot of unrolling machinery.

For the purposes of caching results, we'd use dotted name syntax to store intermediate products in the cases where you need to show intermediate results.

This would have implications throughout the stack but might end up being easier overall.

seed_type `memorize` and `recall`

A local associative memory plugged into the library and matching a typespec.

  • Add a memory-mode remember
  • If value for memorize is an array of text, process them in parallel (combining into a single request) instead of sequentially (which adds a network round trip time * items.length)
  • Consider renameing default_memory (currently _default) to something more dsitinctivee that it's being used in a memory context, not a proflie context (similar to ids having c_whatever
  • Memoize hsnw readers when vending them out
  • Everywhere that a seed expects a string, have extractString() and accept an embedding too (document this)
  • Only save hsnw every so often (and on process exist)
  • If query is not provided, create a random embedding and fetch
  • Persist the text to memory (just flat json, then later duckdb). Get the vector from hsnw.getPoint()
  • Store metadata in duckdb (this is more efficient for larger memories but is less easy to debug)
  • embedding.text should not be optional
  • Profile.recall / memorize shouldn't have defaults, that should be on the caller to provide
  • Recreate embeddings of the proper type and constructor from persisted data. (Can I use query.constructor?)
  • Set maxElements intelligently and handle resizing into a larger store
  • Add a hnswlib version in ProfileFilesystem. See https://github.com/polymath-ai/polymath-ai/tree/main/core/db
  • Add a read_only ability for memory (where writing can only be done if a secret boolean key to allow all writes in env (or maybe it should be that it needs an access key to allow writing it? That would allow even remote memory writing: See #28 ))
  • Clean up th etesting recall seed by actually using real (cached) embeddings for those values. This is somewaht important to verify the sorting is actually correct and not backwards...
  • Store the memories in .profile/memory/MEMORY-NAME/${normalized_embedding_model_name}/hsnw.db. This requires the embedding_model_name to not have any illegal path characters
  • Allow recall.k seed argument to be omitted (needs new machinery possibly to allow optional. And then maybe allow memory to be provided as optional argument (falling back to env.memory) on recall and memorize using same machinery))

Support meta-nodes

Maybe meta-nodes should be seed packets, with a defined '' entrypoint

This can be covered (mostly) by having a let with a seed-reference to the sub-seed, and then the sub-seed uses var within. But that's confusing about ways to use it and which variables it expects to be set.

Perhaps some kind of special parameter on a seed that describes the environment it expects to be set? And then there can be some tooling to complain if those aren't set. (or it could just be convention, a utility seed of expect which takes an object, and then verifies that getting the var for each is not undefined.)

  • Needs keys
  • Needs let
  • Needs throw (to throw an error)
  • call
  • function
  • Test of call+ function
  • function has defaults
  • Make call.function be typed to only be allowed to be a seed function
  • Make call.function throw if the seed reference is not to a function
  • It should be that the arg list is explicitly expected to have arg: namespace (because var will have to have it so it's confusing)
  • Test function fails if args is not set
  • (Should call fail if the sub-object is not a function?)
  • example-utility#memorize-items should take this
  • example-utility#remove-numbers, remove-item-number should also take this
  • CLI should render functions differently
  • CLI should have a mode to print the parameters of a function
  • memorize-items should have memory be default

the convention is to pass parameters with an arg: prefix.

Ideally there'd be more than just convention, so there could be toolnig around it.

E.g. a call seed, which is basically a let-multi with a nested seed reference for block. And a function seed type that is basically an expect node but with a defined set of parameters and defaults. These don't do anything semantically except make it very clear to tooling what the intention is, allowing listing them as entrypoints, detecing missing parameters, etc.

function should have an array of arguments, prefixed with arg:, calle dargs. And defaults, which are a subset of args. And any that don't have defaults are required and will throw.

Should call require the arg: namespace or not?

A live editing experience for seeds

A tool in the framework to save a new version of a seed and add it to a seed packet, referencing older versions of hte seed.

An experience where there's a directory of your seeds with a file for each seed, and each time you rerun package it adds the updated seeds to the package.

Allowed `dotted name` nested var and store retrieval

Allow passing . in names to mean create and override sub-objects.

Do this for both var/let/let-multi and store/retrieve, as well as property

Note that this makes environment overlaying more complex and need a special helper.

Once this is done, perhaps have secrets be the place secrets are stored?

The simplest place to do this is property, where it doesn't even have to be that hard, and then it can be used for #45

  • Support dotted property names in property
  • property should also have an else parameter
  • Support dotted property names in let/let-multi
  • Su[port dotted property names in store/retrieve
  • Support dotted names as template names
  • Fix error introduced in f9f4023

Allow packet-level environment overlay

The best practice is to set a particular memory and store (and, in the future, prefix) that uses a distinctive prefix (e.g. komoroske.com) at the root of each seedGraph to make sure that all of the references within it use the same model/store/prefix.)

However, this quickly gets extremely repetitive in packets that contain lots of entrypoints, and for ones that are intermediate entrypoints, requires duplicative environment let-multi statements that were already done by seeds that called them.

We should add a property, environment, in a packet. It's an optional property, and should specify an EnvironmentData overlay. The starter environment for each seed in the packet will have that environment overlaid on whatever the previous environment was. It's as though each root-level seed is wrapped in a (hidden) let-multi with those properties.

What should the property be named? environment is a bit misleading because it implies it's the entire environment when really it's an overlay. let is a bit wrong because semantically it's a let-multi. vars is wrong because we use var to mean retrieve the environment. env is wrong because that's the only place in user-facing semantics that call it that.

When combined with #37, the convention will be to set a prefix in the packet-level environment every time.

A typical use case for this will be to set a namespace, but also set things like the model. The latter requires the former. Perhaps have the value be an object or an array of EnvironmentData. If it's an array then it should create the environment by overlaying each time. Actually, this might not be required, because namespace is late bound, so you can set it in the same block as another thing.

Originally tracked in #36.

  • Fix the sub-seed overlay problem documented below.
  • Update complex examples to not use top-level let but instead use environment.
  • Fix the edge case for nested seeds getting the environment re-set for them and possibly overriding intermediate lets.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.