Git Product home page Git Product logo

Comments (10)

nimf avatar nimf commented on June 11, 2024 1

I just read the updated spec.md. It looks really good!
So, here is what I think we will need:

  1. Add parsing of "%" inside the probability operator.
  2. Allow alias definitions to have entity arguments.
  3. Implement the defaultDistribution cli argument.
  4. Update the calculation of the weights considering the distribution entity argument (if set) and defaultDistribution configuration.
  5. Expose defaultDistribution to the web editor

I feel like I can do 3 and 4. But I'm open to any suggestions.

from chatito.

rodrigopivi avatar rodrigopivi commented on June 11, 2024

Hi @nimf,

Thanks for the feedback, this tool is meant to help people who use it and all improvement ideas should be considered.

Being able to switch the sentence generation from the default "regular frequency" distribution to an "even" distribution is a great idea, this setting could be declared at the CLI params or the IDE config before generation (e.g.: --defaultDistribution=even or --defaultDistribution=regular), or at the DSL entity arguments level (e.g.: %[intent]("distribution": "even")), or at both levels, CLI and DSL, to have full control over each entity.

Regarding the probability operator, if the 100 limit as the sum of all probabilities is removed, and float values can be accepted. Then the weighted chances would just behave as documented at ChanceJs lib (https://chancejs.com/miscellaneous/weighted.html), i think that would behave as you described.

Yes, this changes would be valuable. You are most welcome to open a PR with this ideas implemented.

from chatito.

nimf avatar nimf commented on June 11, 2024

--defaultDistribution looks really good!

Regarding the probability operator, yeah, that would be exactly as described. My only concern is should we keep the percentage probability for regular distribution? Or should we also provide some argument to control that?

// As weights with even distribution
%[intent]("distribution": "even")   // Weight    Resulting percentage
    *[2] ~[alias1]                  // 2         66.66%
    ~[alias2] ~[alias3]             // 1         33.33%

// As percents with regular distribution
%[intent2]("distribution": "regular")  // Resulting percentage
    *[66] ~[alias1]                    // 66%
    ~[alias2] ~[alias3]                // 34%

// As weights with regular distribution
%[intent2]("distribution": "regular")  // Max Count  Resulting Weight  Resulting percentage
    *[2] ~[alias1]                     // 100        200               28.57%
    ~[alias2] ~[alias3]                // 500        500               71.43%

from chatito.

rodrigopivi avatar rodrigopivi commented on June 11, 2024

Good catch, relative weights and percentage probabilities are different things. So maybe changing the name to 'chance operator' might be better than 'probability operator' since the idea is to control the relative weights or the percentage probability.

What do you think of considering the value as a relative weight if there is no '%' symbol, and percentage probability if it comes with %.

Following that idea, then regular distribution would behave like:

%[intent]("distribution": "regular")   // Max Count  | Weight |  Prob
    ~[alias1]                          //     100        100      10%
    ~[alias2] ~[alias3]                //     500        500      50%
    ~[alias4]                          //     400        400      40%
// NOTE: operator with '%' defines the actual probability
%[intent]("distribution": "regular")    // Max Count  | Weight/Prob
    *[20%] ~[alias1]                    //   100            20%
    ~[alias2] ~[alias3]                 //   500            44.4444% // (500*80/900)
    ~[alias4]                           //   400            35.5556% // (400*80/900)
// NOTE: operator without '%' it can just multiply max count as the weight
%[intent]("distribution": "regular")  // Max Count  |  Weight  |  Prob
    *[2] ~[alias1]                    //     100         200       18.1818%
    ~[alias2] ~[alias3]               //     500         500       45.4545%
    ~[alias4]                         //     400         400       36.3636%

And for even:

%[intent]("distribution": "even")       // Max Count  | Weight |  Prob
    ~[alias1]                           //   100           1       33.3333%
    ~[alias2] ~[alias3]                 //   500           1       33.3333%
    ~[alias4]                           //   400           1       33.3333%
%[intent]("distribution": "even")     // Max Count  | Weight | Prob
    *[2] ~[alias1]                    //   100          2       50%
    ~[alias2] ~[alias3]               //   500          1       25%
    ~[alias4]                         //   400          1       25%
%[intent2]("distribution": "even")     // Max Count  | Weight/Prob
    *[20%] ~[alias1]                   //   100              20%
    ~[alias2] ~[alias3]                //   500              40%
    ~[alias4]                          //   400              40%

Let me know your thoughts on this. Also then maybe consider an input error if an entity defines one sentence with %'s and other sentence without %, for consistency.

from chatito.

rodrigopivi avatar rodrigopivi commented on June 11, 2024

Also considering that maybe this adds complexity to the DSL that is not that useful, and only providing even distribution and weighted operator instead of percentage provides overall better datasets and covers the same needs, maybe the only benefit of the current regular frequency distribution implementation is that it may be faster because it won't produce that many duplicates.

from chatito.

nimf avatar nimf commented on June 11, 2024

What do you think of considering the value as a relative weight if there is no '%' symbol, and percentage probability if it comes with %.

This is awesome! When I was reading the documentation for the probability operator I thought "oh, maybe the percent sign in the end would make it more clear"

Let me know your thoughts on this.

I really like this.

I think regular distribution is helpful in many cases, so we can set it via the distribution argument even when --defaultDistribution=even

Regarding dropping support for percentage probability operator:
Personally I like weighted probability more but I can clearly imagine when someone wants "this sentence to fill 30% of all examples and I don't care about the rest 10 sentences"

from chatito.

rodrigopivi avatar rodrigopivi commented on June 11, 2024

Agreed, keeping both strategies then. Just created a dev branch hoping to continue this implementation there. I've updated on that branch the spec to reflect this new features. Please let me know your thoughts on this, so we can coordinate the implementation as I'm hoping to help on it too. Thanks @nimf.

from chatito.

rodrigopivi avatar rodrigopivi commented on June 11, 2024

Hi @nimf ,

1 and 2 are done at dev branch. Hope you can rebase your PR to fit the new changes and continue with 3 and 4. Thanks for your help and collaboration.

from chatito.

nimf avatar nimf commented on June 11, 2024

Awesome!
I'll do a rebase and continue to work on 3 and 4 in that branch.

from chatito.

rodrigopivi avatar rodrigopivi commented on June 11, 2024

Published 2.3.0. It was great sharing the work on this Yuri, thanks.

from chatito.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.