Hi. Thank you for the Chatito! Our team uses Chatito pretty extensiv

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

--defaultDistribution looks really good! <p dir="

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Weighted probability about chatito HOT 10 CLOSED

nimf commented on June 11, 2024

Weighted probability

from chatito.

Comments (10)

nimf commented on June 11, 2024 1

I just read the updated spec.md. It looks really good!
So, here is what I think we will need:

Add parsing of "%" inside the probability operator.
Allow alias definitions to have entity arguments.
Implement the defaultDistribution cli argument.
Update the calculation of the weights considering the distribution entity argument (if set) and defaultDistribution configuration.
Expose defaultDistribution to the web editor

I feel like I can do 3 and 4. But I'm open to any suggestions.

from chatito.

rodrigopivi commented on June 11, 2024

Hi @nimf,

Thanks for the feedback, this tool is meant to help people who use it and all improvement ideas should be considered.

Being able to switch the sentence generation from the default "regular frequency" distribution to an "even" distribution is a great idea, this setting could be declared at the CLI params or the IDE config before generation (e.g.: --defaultDistribution=even or --defaultDistribution=regular), or at the DSL entity arguments level (e.g.: %[intent]("distribution": "even")), or at both levels, CLI and DSL, to have full control over each entity.

Regarding the probability operator, if the 100 limit as the sum of all probabilities is removed, and float values can be accepted. Then the weighted chances would just behave as documented at ChanceJs lib (https://chancejs.com/miscellaneous/weighted.html), i think that would behave as you described.

Yes, this changes would be valuable. You are most welcome to open a PR with this ideas implemented.

from chatito.

nimf commented on June 11, 2024

--defaultDistribution looks really good!

Regarding the probability operator, yeah, that would be exactly as described. My only concern is should we keep the percentage probability for regular distribution? Or should we also provide some argument to control that?

// As weights with even distribution
%[intent]("distribution": "even")   // Weight    Resulting percentage
    *[2] ~[alias1]                  // 2         66.66%
    ~[alias2] ~[alias3]             // 1         33.33%

// As percents with regular distribution
%[intent2]("distribution": "regular")  // Resulting percentage
    *[66] ~[alias1]                    // 66%
    ~[alias2] ~[alias3]                // 34%

// As weights with regular distribution
%[intent2]("distribution": "regular")  // Max Count  Resulting Weight  Resulting percentage
    *[2] ~[alias1]                     // 100        200               28.57%
    ~[alias2] ~[alias3]                // 500        500               71.43%

from chatito.

rodrigopivi commented on June 11, 2024

Good catch, relative weights and percentage probabilities are different things. So maybe changing the name to 'chance operator' might be better than 'probability operator' since the idea is to control the relative weights or the percentage probability.

What do you think of considering the value as a relative weight if there is no '%' symbol, and percentage probability if it comes with %.

Following that idea, then regular distribution would behave like:

%[intent]("distribution": "regular")   // Max Count  | Weight |  Prob
    ~[alias1]                          //     100        100      10%
    ~[alias2] ~[alias3]                //     500        500      50%
    ~[alias4]                          //     400        400      40%

// NOTE: operator with '%' defines the actual probability
%[intent]("distribution": "regular")    // Max Count  | Weight/Prob
    *[20%] ~[alias1]                    //   100            20%
    ~[alias2] ~[alias3]                 //   500            44.4444% // (500*80/900)
    ~[alias4]                           //   400            35.5556% // (400*80/900)

// NOTE: operator without '%' it can just multiply max count as the weight
%[intent]("distribution": "regular")  // Max Count  |  Weight  |  Prob
    *[2] ~[alias1]                    //     100         200       18.1818%
    ~[alias2] ~[alias3]               //     500         500       45.4545%
    ~[alias4]                         //     400         400       36.3636%

And for even:

%[intent]("distribution": "even")       // Max Count  | Weight |  Prob
    ~[alias1]                           //   100           1       33.3333%
    ~[alias2] ~[alias3]                 //   500           1       33.3333%
    ~[alias4]                           //   400           1       33.3333%

%[intent]("distribution": "even")     // Max Count  | Weight | Prob
    *[2] ~[alias1]                    //   100          2       50%
    ~[alias2] ~[alias3]               //   500          1       25%
    ~[alias4]                         //   400          1       25%

%[intent2]("distribution": "even")     // Max Count  | Weight/Prob
    *[20%] ~[alias1]                   //   100              20%
    ~[alias2] ~[alias3]                //   500              40%
    ~[alias4]                          //   400              40%

Let me know your thoughts on this. Also then maybe consider an input error if an entity defines one sentence with %'s and other sentence without %, for consistency.

from chatito.

rodrigopivi commented on June 11, 2024

Also considering that maybe this adds complexity to the DSL that is not that useful, and only providing even distribution and weighted operator instead of percentage provides overall better datasets and covers the same needs, maybe the only benefit of the current regular frequency distribution implementation is that it may be faster because it won't produce that many duplicates.

from chatito.

nimf commented on June 11, 2024

What do you think of considering the value as a relative weight if there is no '%' symbol, and percentage probability if it comes with %.

This is awesome! When I was reading the documentation for the probability operator I thought "oh, maybe the percent sign in the end would make it more clear"

Let me know your thoughts on this.

I really like this.

I think regular distribution is helpful in many cases, so we can set it via the distribution argument even when --defaultDistribution=even

Regarding dropping support for percentage probability operator:
Personally I like weighted probability more but I can clearly imagine when someone wants "this sentence to fill 30% of all examples and I don't care about the rest 10 sentences"

from chatito.

rodrigopivi commented on June 11, 2024

Agreed, keeping both strategies then. Just created a dev branch hoping to continue this implementation there. I've updated on that branch the spec to reflect this new features. Please let me know your thoughts on this, so we can coordinate the implementation as I'm hoping to help on it too. Thanks @nimf.

from chatito.

rodrigopivi commented on June 11, 2024

Hi @nimf ,

1 and 2 are done at dev branch. Hope you can rebase your PR to fit the new changes and continue with 3 and 4. Thanks for your help and collaboration.

from chatito.

nimf commented on June 11, 2024

Awesome!
I'll do a rebase and continue to work on 3 and 4 in that branch.

from chatito.

rodrigopivi commented on June 11, 2024

Published 2.3.0. It was great sharing the work on this Yuri, thanks.

from chatito.

Weighted probability about chatito HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent