Git Product home page Git Product logo

weighted-cyk-probabilistic-context-free-grammar's Introduction

Xian Chen


Implementation of Weighted CYK/Probabilistic Context Free Grammar:

Alterations/Additions:

1. Dictionary
To implement weighted CYK, I changed the dictionary, called grammar, from the previous homework assignment to a dictionary of dictionaries. The new grammar takes in tags as keys and each key contain dictionary items of tuples and their probability or a list of words and their corresponding probabilities

2. Parse Method
table2: “table2” is the table of probabilities that corresponds to “table”, a table of tags
trace: “trace” is declared as an empty list. It will be updated to contain the path of the sentence of the highest probability

The Parse method takes in a sentence and fills “table” and “table2.” While the method fills the diagonals of “table”, it fills in “table2” with the corresponding probabilities. 

On lines 113-143 in the code, weighted CYK is implemented. As we loop through the length of the sentence, we keep track of the path in which the algorithm takes to produce a sentence. Along the way, we calculate the probabilities of the path by multiplying the probability of being in the current state by the probability of being in the previous states. Table2 is then checked to see if the tag tuple is already exist. If it does, we check the probability of the tag tuple in the table. If the probability that exist in the table is less than the current probability of the tag tuple, then the entry in the table is replaced with the current one. The “trace” list is also updated with the higher probability entry. If the tag tuple is not in the table, it is simply added to the table and the “trace” list.  This functionality can be observed from line 113 to line 143 in the code. 

We then check “table2” to see if a sentence ‘S’ exist. If it does, the parse tree of the most probable sentence is printed. This is done by calling the get_tree method that was created (lines 157-164). 

Output: parse('the dog sees the ball'.split())
The Tags Table
     ['the', 'dog', 'sees', 'the', 'ball']
[[], ['DetP', 'DetS'], ['NPS'], [], [], ['S']]
[[], [], ['NS'], [], [], []]
[[], [], [], ['VS'], [], ['NVS']]
[[], [], [], [], ['DetP', 'DetS'], ['NPS']]
[[], [], [], [], [], ['NS']]

Probability Table
{(0, 1, 'DetS'): 0.36, (0, 2, 'NPS'): 0.08399999999999999, (3, 4, 'DetS'): 0.36, (1, 2, 'NS'): 0.3333333333333333, (4, 5, 'NS'): 0.3333333333333333, (0, 5, 'S'): 0.0005291999999999999, (3, 5, 'NPS'): 0.08399999999999999, (0, 1, 'DetP'): 1, (2, 5, 'NVS'): 0.020999999999999998, (2, 3, 'VS'): 0.25, (3, 4, 'DetP'): 1}

The Parse Tree
['S', ['NPS', ['DetS', 'the'], ['NS', 'dog']], ['NVS', ['VS', 'sees'], ['NPS', ['DetS', 'the'], ['NS', 'ball']]]]

As we can observe from the above parse tree, the program produced a correct parse tree for the sentence. NPS = Noun phrase singular; NVS = Verb Phrase Singular. NPS contains a determinator and a noun and NVS contains a verb, and a NPS, noun phrase singular. NPS is composed of a determinator and a noun phrase. 

Now lets observe the following sentence: “a big dog pickles a dog”
To the average user, the sentence is strange. Also pickles can either be a noun phrase plural (NPL), or a verb phrase singular (VS). When running the sentence on the program, we receive the following parse tree:

The Parse Tree
['S', ['NPS', ['DetS', 'a'], ['ADJNS', ['J', 'big'], ['NS', 'dog']]], ['NVS', ['VS', 'pickles'], ['NPS', ['DetS', 'a'], ['NS', 'dog']]]]

Not only does the program correctly select “pickles” to be a verb, it also correctly determined that the unusual sentence is in fact, a sentence. 



Multiple Possible Parse Trees/Ambiguous Parts-of-speech:
Now observing an even stranger sentence: “the big pickles liked the pickles with the red pickles”, the program outputs:

The Parse Tree
['S', ['NPPL', ['DetP', 'the'], ['ADJNPL', ['J', 'big'], ['NPL', 'pickles']]], ['VPPPL', ['NVPL', ['VPL', 'liked'], ['NPPL', ['DetP', 'the'], ['NPL', 'pickles']]], ['PPPL', ['P', 'with'], ['NPPL', ['DetP', 'the'], ['ADJNPL', ['J', 'red'], ['NPL', 'pickles']]]]]]

Again, a valid parse tree is produced. Notice that the sentence can be read as:
1. (NP(the big pickles)VP (like NP(the pickles with the red pickles))) or
2. (NP(the big pickles) VP(VP(like the pickles) PP(with (the red pickles))))  

The first sentence says:  the pickles with the red pickles were liked by the big pickles
The second sentence says: The pickles were liked by the big pickles and the red pickles

Notice that the program chose the highest probable parse as parse2 where the Verb Phrase is broken down to a verb phrase and a prepositional phrase. The program could’ve chosen the parse where the Verb Phrase is broken down to a verb phrase and a noun phrase. If we observe the dictionary of dictionaries (“grammar”) and calculate out the probabilities, we can see that the probability for a Verb Phrase containing a verb phrase and a prepositional phrase is set to be higher than a Verb Phrase containing a verb phrase and a Noun Phrase. 


In a similar case, the sentence “the dogs liked the red pickles with a ball” has multiple parse trees. 

(the dog) (liked (the red pickles with a ball))
(the dog) (liked (the red pickles) (with a ball))

The first sentence says: the red pickles with the ball are liked by the dog
The second sentence says: the red pickles are liked by the dog if the red pickles are with a ball

['S', ['NPPL', ['DetP', 'the'], ['NPL', 'dogs']], ['VPPPL', ['NVPL', ['VPL', 'liked'], ['NPPL', ['DetP', 'the'], ['ADJNPL', ['J', 'red'], ['NPL', 'pickles']]]], ['PPS', ['P', 'with'], ['NPS', ['DetS', 'a'], ['NS', 'ball']]]]]

Again, since the probability for a verb phrase containing a noun phrase and a prepositional phrase is set to be higher than a verb phrase containing a verb phrase and a noun phrase, the program chose the 2nd parse tree. 


Again, another example: “the pickles light the dogs with the dog”
(the pickles) (light (the dogs with the dog))
(the pickles)((light the dogs)(with the dog))

The first sentence says: the dogs with the dog were lit by the pickles
The second sentence says: the dogs were lit by the pickles and the dog

The Parse Tree
['S', ['NPPL', ['DetP', 'the'], ['NPL', 'pickles']], ['VPPPL', ['NVPL', ['VPL', 'light'], ['NPPL', ['DetP', 'the'], ['NPL', 'dogs']]], ['PPS', ['P', 'with'], ['NPS', ['DetS', 'the'], ['NS', 'dog']]]]]



weighted-cyk-probabilistic-context-free-grammar's People

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.