Git Product home page Git Product logo

gspbtm's Introduction

GSPbtm

GSPbtm provides an easy application of the BTM package based on the paper.

Great for use with any short text dataset with three types of models - NAV (noun, adjective, verb); NPN (noun, proper noun); (ADJ) (adjective only). There is a single wrapper function complete_btm() that will run the full model. User specifies the minimum and maximum number of topics to estimate and the type of model to run, and the model will select the best one based upon a calculated coherence score. Because we use a background topic supplied by BTM, the topics are generally well defined and informative; coherence scores are not overwhelmed by common stop words or uninformative terms.

The column with the text to be annotated and modeled must be named “text” and the user will be prompted if no column has the name. Annotation relies upon the udpipe package to POS tag and determine biterm coocurrence.

Once the best topic is selected, we borrow the Jensen-Shannon Divergence and PCA from the LDAvis package, and extract our X,Y coordinates for each topic. The top 12 terms are selected using the relevance and frequency selection criteria from the LDAvis package and returned as a separate output (note that the relevance/frequency split is hardcoded at 0.6).

Installation

You can install the development version of GSPbtm from GitHub with:

# install.packages("devtools")
devtools::install_github("taylorgrant/GSPbtm")

Example

Assuming a data frame with a column named “text”:

library(GSPbtm)
tm_out <- complete_btm(data, min_topics = 5, max_topics = 20, model = "NAV")

Function returns a list: tm_out$df is the original dataframe with topic numbers joined per row; tm_out$fulldata - data of top n terms with topic numbers and PCA coordinates; tm_out$range - the min and max to set axes when plotting.

gspbtm's People

Contributors

taylorgrant avatar

Stargazers

aRcatruz avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.