aichaos / rivescript-wd Goto Github PK

View Code? Open in Web Editor NEW

7.0 8.0 3.0 111 KB

The RiveScript Working Draft describes the language specification for RiveScript.

Home Page: https://www.rivescript.com/

CSS 1.10% HTML 98.90%

rivescript

rivescript-wd's People

Contributors

Stargazers

Watchers

Forkers

aannuujj arzskanu kirsle

rivescript-wd's Issues

Foreign Macro Interface

Something that could benefit all versions of RiveScript is to design a "Foreign Macro Interface"

This would allow bot authors to write object macros written in literally any programming language, as long as a "host script" is written for the language. The host script's responsibility would be to read JSON input over STDIN, and write the result as JSON over STDOUT. This is very similar to how the Perl support for RiveScript-Java already works (com.rivescript.lang.Perl and its Perl host script). Similarly is the perl-objects example for RiveScript-Python.

Each implementation of RiveScript would have a generic "Foreign Macro Handler" that can work with any programming language. It might be defined like this (Python example):

from rivescript import RiveScript
from rivescript.lang.foreign import ForeignMacroHandler

bot = RiveScript()
bot.set_handler("ruby", ForeignMacroHandler(bot,
    host="/path/to/rubyhost",
))

# and proceed as normal...

The things that would be needed for this to work:

5x Foreign Macro Handlers, one for each implementation of RiveScript. These would be general purpose handlers that can work with all programming languages. They could support both interpreted languages like Ruby as well as compiled languages (using options to control the compile pipeline, such as a gcc command to build C code).
1x Host Script per programming language. This would be a program that speaks the Foreign Macro API (reading and writing JSON over standard I/O), and would only need to be written once for each language and would be equally usable by all versions of RiveScript.

Full Example: Ruby

For example, you could have RiveScript source that defines a Ruby object macro (at the time of writing, there is no native RiveScript implementation available for Ruby):

> object reverse ruby
    message = @args.join(" ")
    return a.reverse!
< object

+ reverse *
- <call>reverse <star></call>

When the RiveScript interpreter (say, the Python one) reads the source file, it would know that Ruby code has a handler (the ForeignMacroHandler), which would simply store the Ruby source code as a string until it's actually <call>ed on in RiveScript code.

When the <call> tag is processed, the Python RiveScript bot would shell out to the Ruby Host Script and send it a JSON blob along the lines of:

// The STDIN sent to the Ruby Host Script
{
  "username": "localuser",
  "message": "reverse hello world",
  "vars": {  // user variables
    "topic": "random",
    "name": "Noah"
  },
  // the Ruby source of the macro
  "source": "message = @args.join(\" \")\nreturn a.reverse!"
}

The Ruby Host Script would read and parse the JSON, evaluate and run the Ruby source, and return a similar JSON blob over STDOUT:

{
  "status": "ok",
  "vars": { "topic": "random", "name": "Noah" },
  "reply": "dlrow olleh"
}

RiveScript API Inside Foreign Macros

The big obvious drawback is that object macros typically receive (rs, args) parameters, where the rs is the same instance of the parent bot, and the code can call functions like current_user() and set_uservar().

To support this for foreign macros, each Host Script can define a "shim" class for the RiveScript API. All they would need to implement are the user variable functions, like setUservars() and currentUser(). They could also implement the bot and global functions if those are useful.

What the Host Script could do is just keep a dictionary of user vars, initially populated using the Input JSON, and provide shim user var functions that update its dictionaries. And when writing the Output JSON it can just serialize those user variable dicts.

For the case when the Foreign Macro already exists as a Native Macro in one implementation or another (i.e. Python), the RiveScript shim API should match the conventions of the native version, i.e. using current_user() rather than currentUser() as the naming convention for functions. In the case that one programming language has many Native Macros (e.g. JavaScript being usable in Go, JS and Java), the "most pure" version's API should be used (e.g. the JavaScript Foreign Macro should resemble the API of rivescript-js, not the Go-style naming convention from Go, or anything the Java port does).

So for example, for Python object macros, if you're programming your bot in Python, you would just use the default built-in Python support because this would be the most efficient: the code can be evaluated and cached by the program rather than on demand. But if you're programming your bot in Go, and you want to use Python objects, you could use the Foreign Macro Interface and have the exact same RiveScript API available to use.

Code Layout

Each implementation of RiveScript would keep its ForeignMacroHandler in its own git tree, probably under the lang/ namespace.

Host Scripts would be best bundled together as one large package, all in a common git repo. Possibly named something like rivescript-host-scripts or an acronym like rsfmh (RiveScript Foreign Macro Host).

The Host Scripts repo would include all possible available host scripts (Ruby, Bash, Go, C++, whatever the community is up to the task to write...) and would be easy to install somehow, so that as a mere mortal chatbot developer, your setup steps might be like:

$ pip install rivescript
$ git clone https://github.com/aichaos/rsfmh

bot = RiveScript()
bot.set_handler("ruby", ForeignMacroHandler(bot,
    host="./rsfmh/ruby.rb",
))

Add map and array support

Add support for hash maps and arrays in RiveScript.

Maps

Maps would be global variables (like bot variables). Their purpose is to hold simple key/value style data to simplify the reply structure for holding knowledge (examples would be: capitals of the states, etc.)

Syntax example:

// Define a map name and add key/value pairs one at a time
! map capitals CA = Sacramento
! map capitals MI = Lansing

// Define a map, provide the mappings immediately
! map capitals =
^ CA = Sacramento
^ MI = Lansing

// This would also have worked, but isn't as readable and won't be
// the preferred syntax:
! map capitals = CA = Sacramento
^ MI = Lansing

// If you "re-define" a map using the above syntax (provide the mappings
// immediately) on a map that already exists, your new key/value pairs
// will be merged with the existing ones. So you can add key/value pairs
// in bulk multiple times throughout your code with no problems. Map keys
// are always unique, though, so if you use the same key twice the value
// will be overwritten.

// Example of maps in use:
+ what is the capital of _
* <exists capitals <uppercase>> == true => The capital of <uppercase> is:\s
  ^ <map capitals <uppercase>>.
- I don't know what the capital of <uppercase> is.

New tags added to support maps:

<exists MAP_NAME KEY_NAME> - returns "true" if KEY_NAME exists in MAP_NAME
<map MAP_NAME> - returns the full contents of the map, in JSON format (mostly useful for debugging)
<map MAP_NAME KEY_NAME> - get the value of that key, or undefined if it doesn't exist
<map MAP_NAME KEY_NAME=VALUE> - dynamically reassign the value of a key in the map (the key doesn't need to exist in advance).

User Maps?

Some syntax ideas to support maps as user variables.

+ i have a (@colors) *
- <set %colors:<star1>=<star2>>Okay.

+ do you know about my *
* <get %colors:<star>> != undefined => Yes, your\s
  ^ <star> was colored <get %colors:<star>> right?
- You didn't tell me what color it is.

Human> I have a blue car
Bot> Okay.
Human> Do you know about my car?
Bot> Yes, your car was colored blue right?

The existing <get> and <set> tags would be repurposed for map variables. A map variable always has its name prefixed with a % symbol (like in Perl). To refer to a specific key, use a colon symbol ":".

<get %name> - would dump the map as JSON, for debugging
<get %name:key> - get the named key from the map, or "undefined"
<set %name:key=value> - set a key
<set %name:key=undefined> - delete a key by setting it to undefined (consistent with all other variable-setting tags). Maybe add a <delete> tag instead that works on all variable types (user vars, anyway)?

Sets

Sets would be arrays for user variables, for holding a list of things in one variable.

Syntax examples:

+ i like the color *
- <add @color=<star>>I'll remember that you like <star>.

+ what colors do i like
- You like: <get @color>.
// would output e.g.: "You like: red, green, blue."

Existing variable-setting tags would be re-used, but array names are prefixed with an @ symbol.

Sets would work like the data type of the same name in Python: its contents would be de-duplicated. If you add the same item to the set twice, it only ends up going in one time.

Tags:

<get @name> - returns a comma separated list of values
<get @name:length> - get the number of elements in the set
<get @name[0]> - get an item from the list by index
<add @name=value> - add a value to the set
<delete @name=value> - remove a value from the set

Iteration?

Some ideas to support iterating over set elements without making the syntax too ugly?

+ what are my favorite colors
* <get @colors:length> >= 2 => They are:\s
  ^ {iter @colors[:-1]}<item>,{/iter} and <get @colors[-1]>.
* <get @colors:length> == 1 => There is only <get @colors[0]>.
- You didn't tell me any.

Human> What are my favorite colors?
(if they have 2 or more)
Bot> They are: red, blue, yellow, and green.
(if they have 1)
Bot> There is only red.
(if none)
Bot> You didn't tell me any.

Tags:

{iter LIST}...{/iter}
- Creates an iterator over the list-like object, LIST. The code between the opening and closing tag will be run for each item in LIST, and the new <item> tag would hold the current item.
<item> - holds the current item in an iterator; only available inside an {iter}...{/iter} block.

Array index slices would work like in Python, so colors[:-1] meant "all colors except for the last one", example:

>>> colors = ["red", "blue", "yellow", "green"]
>>> colors[:-1]
['red', 'blue', 'yellow']

Support raw regular expression triggers

Per aichaos/rivescript-js#147 and aichaos/rivescript-python#78, it may be time for RiveScript to re-gain the ~Regexp command from its ancestor Chatbot::Alpha.

I'm increasingly becoming aware that Unicode is hard and regular expression engines are not all created equally. Each programming language has their own little quirks wrt. how meta expressions like the \b word-boundary sequence behaves when matching Unicode symbols.

The RiveScript spec should be amended (and the primary implementations updated) to support a ~ command for writing a raw regular expression. This will enable users to help themselves when they run into regexp matching bugs that +Triggers can't handle, and can't be modified to handle (either because it would break backward compatibility or because the +Trigger already reserves too many regexp special characters for its own use case).

The use of the ~Regexp should be generally discouraged in all documentation and it should be stated that its purpose is only to help with advanced use cases where the +Trigger system is inadequate. You could compare it to the way that database ORM's still allow you to write a raw SQL query by hand, but they strongly encourage you to use the ORM's object model as intended.

Implementation Notes

The ~Regexp should be treated the same as the +Trigger in RiveScript source files (when either command is seen, it becomes the new "root" of the reply data and any following *Condition, -Reply, @Redirect and so on would apply to the most recently seen +Trigger or ~Regexp). In the case that a ~Regexp was used, the functions like triggerRegexp() do not get called and the raw regexp is used as-is. This means of course that you can't use tags like <bot name> inside a ~Regexp.

Captured groups from the regexp that would go into $1..$n will get captured the same for <star1>..<starN>

Syntax Examples

Here are a couple examples how some common triggers would be represented by raw regular expressions:

+Trigger Version	~Regexp Equivalent
`+ my name is *`	`~ my name is (.+?)`
`+ i am # years old`	`~ i am (\d+?) years old`
`+ [*] (hello	hi) [*]`
`+ @hello`	N/A
`+ i am <bot name>`	N/A

Support ranged word wildcard matching

Support an extended syntax for wildcard characters in triggers to enable them to match ranged numbers of words, for example one could match "between 2 and 4 words" or one could be "no more than 3 words" and so on.

Syntax example:

// match 2 words (first and last name)
+ my name is *2
- Nice to meet you <formal>.

Types of ranged words to support:

* = match one or more words (current behavior; no change) -- regexp equivalent (.+?)
*5 = match exactly five words -- regexp <word>{5} where <word> is like (?:\b\w+\b[\s\r\n]*)
*~5 = match one to five words -- regexp <word>{1,5}
*2~5 = match two to five words -- regexp <word>{2,5}
*2~ = match at least two words -- regexp <word>{2,}

Add file-scoped parsing options

Add the ability to give options to the parser during the parse phase. The options should be scoped to just the current file (or streamed data) that the option appears in. The options would be read and processed as the file is being parsed, and would apply to all parts of the file that follow the line. As such, options can be defined and re-defined in multiple places. The option ends when the file ends, so that you don't accidentally mess up other unrelated files by leaving options left on.

Syntax example:

// change the concatenation mode for ^continue commands
! local concat = space

// with concat=space, the reply is "Hello human"; by default the ^continue
// doesn't add anything so it would be "Hellohuman" with no space in between
+ hello bot
- Hello
^ human.

// change the concat mode again in the same file
! local concat = newline

// now this one will join the lines with a line break (\n)
// Human> How are you?
// Bot> I am good.
// You?
+ how are you
- I am good.
^ You?

Ideas for supported parser options:

concat = space | newline | none - control how the parser concatenates lines of code (e.g. when using the ^Continue tag). The default would be none which means the lines are concatenated with no spaces or anything added in between, which is the current behavior of RiveScript. space would concatenate with spaces (no need to explicitly write \s in your code) and newline would concatenate with line breaks (\n).

aichaos / rivescript-wd Goto Github PK

rivescript-wd's People

Contributors

Stargazers

Watchers

Forkers

rivescript-wd's Issues

Full Example: Ruby

RiveScript API Inside Foreign Macros

Code Layout

Maps

User Maps?

Sets

Iteration?

Implementation Notes

Syntax Examples

Recommend Projects

Recommend Topics

Recommend Org