by Filippo Chiarello
Re-adapted from: Julia Silge Workshop Rstudio::conf 2020
๐๏ธ October 2020
Text data is increasingly important for strategic and competitive intelligence. The reasons are many, but the main one is that most of companies' pubblic information is nowadays in text format. Tidy data principles and tidy tools can make text mining easier, and will let you focus on the most important things of a business or technological realeted analysis: the questions you want to answer.
In these lessons, learn how to manipulate, summarize, and visualize the characteristics of text using these methods and R packages from the tidy tool ecosystem. These tools are highly effective for many analytical questions and allow analysts to integrate natural language processing into effective workflows already in wide use. Explore how to implement approaches such as sentiment analysis of texts, measuring tf-idf, network analysis of words, and building both supervised and unsupervised text models.
At the end of the lessons, students will understand how to:
- Perform exploratory data analyses of text datasets, including summarization and data visualization
- Understand and implement both tf-idf and sentiment analysis
- Build classification models for text using tidy data principles
During this lessons, we'll share code and slides via a GitHub repo and code interactively together using an RStudio Cloud project. You can log in to RStudio Cloud via Google credentials, GitHub credentials, or email. Go ahead and log in with your choice of method before we meet so you see what the platform looks like.
Filippo Chiarello is a data scientist and researcher at University of Pisa. His research focus is on the use of Natural Language Processing systems for understating technological innovations and its impact on the workforce. He is co-founder of the company Texty, research consultant for Errequadro and part of the Research Lab B4DS
This work is licensed under a Creative Commons Attribution 4.0 International License.