Git Product home page Git Product logo

Comments (4)

danpere avatar danpere commented on May 12, 2024

Since a program could not be learned from the examples given, usually, more examples will not help. Since normally all programs expressible in the DSL which satisfy the examples are learned, no programs learned means that there are no programs in the DSL that satisfy all of the examples and adding more examples would only further constraint the learning problem. (I say "normally" because using the escape hatches of the learning procedure you could write your own non-monotonic learning sub-procedure... but that's generally a bad idea because of the confusion you bring up.) As you say, this means that the grammar would likely have to be extended to express the desired operation.

We know the error reporting is poor and it's an issue we intend to address.

If you are comfortable sharing your data, it would be helpful to see your inputs, both to determine if it is in fact not expressible and, if so, help us know how we might want to extend the language to cover your scenario. You can e-mail me at [email protected] if you don't want to share it publicly.

from prose.

TimLovellSmith avatar TimLovellSmith commented on May 12, 2024

@danpere Thanks for the feedback. I am not sure, but I think one of the problems I might be running into is that there are (at least) two date formats mingled in the documents yyyymmdd and mm/dd/yyyy, either of which could be the accepted 'output' and the grammar may be failing to generalize across them.

from prose.

danpere avatar danpere commented on May 12, 2024

For clarification, you are using the Extraction.Text language? (That's the sample that has that exact text as the error message.)

The differing formats might be the issue. Extraction.Text usually ends up being able to use context when the formats are different, but that might not apply in this case. Also, there is a regular expression internally for matching "dates" which is fairly flexible, but it can't cover everything. Extraction.Text does not currently support conditionals, but one way to work around that is to make multiple fields for the different date formats/contexts. @vuminhle may be able to give more tips on getting it to work on difficult scenarios.

from prose.

vuminhle avatar vuminhle commented on May 12, 2024

@danpere has covered all the main points.
If there is no program, it means that your task cannot be expressed in the current grammar. We could have given you back the problematic examples (or a maximal subset of working examples), but I'm not sure if that information is useful. Furthermore, there may be more than one variations of such sets.
We do give more indicative messages if your examples are conflicting or duplicating.

As you rightly observed, we can solve this by extending the grammar to support the task. @danpere mentioned learning conditional, which basically partitions your inputs into different clusters (each of which shares the same format) and learns a program for each of them. This is on-going work.

Which API did you use? Did you extract a substring out of a string, or a sequence of substrings out of a string?
It would be great if you can share one or two lines of your (anonymized) data, together with the fields you are extracting, so that we can analyze what is going on.

from prose.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.