Git Product home page Git Product logo

savinrazvan / parser Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 7 KB

The Parser project focuses on implementing a context-free grammar (CFG) parser to analyze English sentences and extract noun phrases. The goal is to build a tool that can interpret sentence structures and provide insights into their grammatical components

License: MIT License

Jupyter Notebook 93.83% Python 6.17%
cfg context-free-grammar educational-project linguistics natural-language-processing nlp nltk parser python sentence-analysis

parser's Introduction

Parser

Overview

The Parser project focuses on implementing a context-free grammar (CFG) parser to analyze English sentences and extract noun phrases. The goal is to build a tool that can interpret sentence structures and provide insights into their grammatical components.

Features

  • Context-Free Grammar Parsing: Utilizes CFG to decompose sentences into their grammatical elements.
  • Noun Phrase Extraction: Identifies and extracts noun phrases based on CFG rules.
  • Tokenization and Normalization: Processes sentences into tokens and normalizes them for consistent analysis.

Requirements

  • Python 3
  • nltk library

Setup

  1. Install NLTK:
    pip install nltk
  2. Download Required NLTK Resources:
    import nltk
    nltk.download('punkt')

Project Structure

  • parser.py: Main script implementing the CFG parser and noun phrase extraction.
  • sentences/[sentence_file].txt: Text files containing sample sentences for parser testing.

Usage

  1. Run the Parser:

    python parser.py sentences/1.txt
    • If a filename is specified, the script reads the sentence from that file.
    • If no filename is provided, the script will prompt for an input sentence.
  2. Example Code:

    from parser import preprocess, np_chunk
    
    sentence = "The quick brown fox jumps over the lazy dog."
    tokens = preprocess(sentence)
    trees = list(parser.parse(tokens))
    for tree in trees:
        tree.pretty_print()
        for np in np_chunk(tree):
            print(" ".join(np.flatten()))

Code Details

  • Grammar Definitions: CFG rules are specified in TERMINALS and NONTERMINALS to represent sentence structure and parts of speech.
  • Preprocessing: The preprocess function tokenizes the input sentence, converts it to lowercase, and filters out non-alphabetic words.
  • Noun Phrase Chunking: The np_chunk function extracts noun phrases from the parse tree, avoiding nested noun phrases.

Example Outputs

The parser provides visual representations of sentence structure and extracted noun phrases for various sample sentences.

For more information, visit the Parser Project.

parser's People

Contributors

savinrazvan avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.