Git Product home page Git Product logo

svevo-letters-analysis's Introduction

Svevo Letters Analysis

News

Description

The purpose of this research project is to analyze the epistolary corpus of Italo Svevo, one of the great italian novelists of the twentieth century and a pioneer of the psychological novel in Italy. The analysis were performed on the corpus created by Cristina Fenu as final project work for the Masters in Digital Humanities of Università Ca' Foscari of Venice during academic year 2015-2016.

Results of the first analysis were published in the proceedings of the 2017 AIUCD Conference and are available on the website of the Svevian Museum of Trieste.

The analysis was structured in two parts:

  • Topic modeling of the italian corpus using latent Dirichlet allocation to extract the main topics contained in the corpus and estimate their association with interlocutors in time.

  • Sentiment analysis of the whole corpus using the Word-Emotion Association Lexicon (EmoLex) by Mohammad & al. to highlight relations between emotive states, topics and interlocutors through time.

This repository is structured as follows:

  • The datasets folder contains the original letter corpus, and is the location where all subsequent datasets used for our purposes are saved. New: Added positive/negative sentiment italian wordlist for recurrent words connotation analysis.

  • The results folder contain plots describing our findings and evaluating the performance of our LDA model in svg and png format.

  • The topic_modeling notebook contains all the code I used to perform my topic modeling analysis. In the end, it produces a svevo_with_topics.csv file containing topics assigned to each letter. Only the 500 italian letters with most separated topics are taken into account.

  • The sentiment_analysis_extraction notebook generates a sentiment.csv file containing the sentiment intensity percentage for all the letters in the original corpus.

  • The sentiment_analysis_evaluation notebook creates many additional datasets used to evaluate and plot our results.

  • New: The recurrent_words_connotation_analysis notebook is used to inspect which words are the cause of most positive/negative sentiment over Svevo lifespan.

  • The future_perspectives notebook contains approaches that were tested for the analysis and finally disregarded for their complexity or their results, but definitely deserve a second look for future utilization.

Requirements

In order for all the notebooks to work properly, the following requirements should be met.

Warning: The last part of future perspective notebook will not function out-of-the-box. See additional requirements below and notebook for more information on this topic.

Python packages

  • numpy
  • pandas
  • gensim
  • spacy
  • sklearn
  • pyLDAvis
  • matplotlib
  • seaborn
  • tqdm

Simply run pip install -r requirements.txt inside this folder to automatically install all dependencies.

For the spacy package, the languages should be installed as follows:

python -m spacy download en

python -m spacy download fr

python -m spacy download it

python -m spacy download de

R packages

  • syuzhet
  • dplyr
  • pander

Run install.packages("syuzhet", "dplyr", "pander") inside a R shell.

Additional requirements for Future Perspectives notebook

The set of italian embeddings necessary to test the Word2Vec approach in future_perspectives is available on the Italian NLP Lab website.

Results

A short report of the research project has been updated and is available! The last section contains a textual description of our findings. For a visual understanding, please refer to the results folder.

svevo-letters-analysis's People

Contributors

gsarti avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.