Git Product home page Git Product logo

pnarrative's Introduction

pNarrative

A python module for the extraction of sentiment and sentiment-based plot arcs from text. Inspired from the American author Kurt Vonnegut's rejected thesis (see a lecture here) and Matthew Jockers' Syuzhet R package, but with another method for extracting/estimating the "macro" shape of narratives, namely using the probabilistic framework of Gaussian Processes.

The GP implementation for this module is using the pseudo code from C. E. Rasmussen's & C. K. I. Williams's Gaussian Processes for Machine Learning (Algorithm 2.1, p. 19)

NOTE: THE MODULE IS STILL UNDER DEVELOPMENT AND IT MAY CONTAIN FEW BUGS

Reference

The module contains, as of this moment, the following sentiment lexcicons:

AFINN:
By Finn Arup Nielsen as the AFINN WORD DATABASE. Copyright protected and distributed under Open Database License (ODbL) v1.0.

BING:
By Minqing Hu and Bing Liu as the OPINION LEXICON.

Installation

First install the requirements/dependecies listed in the requirements.txt file

pip install -r requirements.txt

and then install the module by

python setup.py install

Demonstration/Example

Overview

The workflow of the module can be summarized as follows:

  1. Initialize an object (read text)
    from pNarrative import Narrative
    book = Narrative.Narrative(text = book_text)
    
  2. Split text into segments
    book.segment_text(mode = "sentence", lower = True)
    
  3. Get segment-sentiment scores
    from pNarrative.parsers
    sentiment_lexicon = get_sentiment_lexicon("afinn","sv") 
    book.get_sentiment_score(lexicon = sentiment_lexicon)
    
  4. Estimate Narrative Arc/Plot
    from pNarrative.kernels.rbf import rbf
    book.get_narrative_estimation(kernel= rbf, kernel_parameters= {"el":20, "sigma":1})
    
  5. Plot Narrative Arc/Plot
    book.plot_narrative(type = "gp", plot_errors = True)
    

For this particular demonstration we will use the Swedish written book "Bannlyst" by the late author Selma Lagerlöf, accessed through the website of the Gutenberg project.

from pNarrative import Narrative
import requests
from pNarrative.kernels.rbf import rbf
from pNarrative.parser.sentiment_scorer import get_sentiment_lexicon

Step 1: Init a "Narrative"-object

example_URL = "http://www.gutenberg.org/cache/epub/39147/pg39147.txt"
r = requests.get(example_URL)
book = Narrative.Narrative(book=r.text,id="Bannlyst - Selma Lagerlöf") 

Note: The "id" argument will used as the header when plotting the Narrative in the last step

Step 2: Segment text

In this example, we'll segment the text into sentences by setting the segmentation mode to "sentence". However, you could also split the text to any definition of a segment by setting the mode to "custom" and supplying a regex pattern to the "pattern" argument.

book.segment_text(mode = "custom", pattern = r'\.')
book.segment_text(mode = "sentence")
print("Number of sentences: {}\n\n".format(book.nrSegments))

print("Examples of sentences:")
print("_"*80)
for i, sent in enumerate(book.segments[200:205]):
    print("\t{}. {:<200}".format(i+1, sent))
Number of sentences: 5158


Examples of sentences:
________________________________________________________________________________
	1. På måndagen var det också fester och tillställningar, men
sen på en gång var det stopp.                                                                                                                
	2. Det hade kommit ut onda rykten om
nordpolsfararna.                                                                                                                                                     
	3. Hustruns ansikte stelnade till.                                                                                                                                                                         
	4. Ska jag nu få höra, att han har gjort något orätt?                                                                                                                                                      
	5. mumlade hon mellan
hårt sammanbitna tänder.                                                                                                                                                            

Step 3: Get sentiment Scores

You could use any custom sentiment lexicon to extract the sentence sentiments by using the "create_lexicon" function which takes a .txt file and converts it to a dictonary-formed python object. However, this module includes a number of lexicon that we can access using the "get_sentiment_lexicon" function.

In this case we will use the AFINN-SV-165 sentiment lexicon.

lexicon_sv = get_sentiment_lexicon(lexicon = "afinn",lang="sv")
book.get_sentiment_score(lexicon=lexicon_sv)

Step 4: Plot Narrative Arc/Plot

Then we simply run the get_narrative_estimation method to get the "macro" shape of the narrative. For this particular case, we'll use the rbf (radial basis function), a.k.a. the squared expontential, kernel with the parameters $\sigma = 1$ and $\ell = 20$.

%%timeit
book.get_narrative_estimation(kernel= rbf, kernel_parameters= {"el":20, "sigma":1})
576 ms ± 29.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Step 5: Plotting the Narrative

To plot the estimated narrative, use the plot_narrative method.

Currently supports the following plot types

  1. "gp"
  2. "rolling_mean"
  3. "merged" - Using both the gp-method and rolling mean

Without Scaling:

# 1. gp
book.plot_narrative(type = "gp", plot_errors=True, scale_narrative=False)

png

# 2. rolling mean

# The wdw_size specifies the window size of the rolling mean. Default: 10 percent of the length of the vector
book.plot_narrative(type = "rolling_mean",scale_narrative=False)

png

With Scaling:

# 1. gp
book.plot_narrative(type = "gp", plot_errors=True, scale_narrative=True)

png

# 2. rolling mean
book.plot_narrative(type = "rolling_mean",scale_narrative=True)

png

# 3. Merged

# When using the "merged" type, the narratives are automatically scaled 
book.plot_narrative(type = "merged",scale_narrative=True, plot_errors = True)

png

pnarrative's People

Contributors

arianbarakat avatar

Stargazers

 avatar  avatar ASHLEY avatar

Watchers

James Cloos avatar Måns Magnusson avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.