kwartler / text_mining Goto Github PK

View Code? Open in Web Editor NEW

51.0 10.0 67.0 32.77 MB

This repo contains data from Ted Kwartler's "Text Mining in Practice With R" book.

R 100.00%

text_mining's People

Contributors

Stargazers

Watchers

text_mining's Issues

glmnet intercept = FALSE

Ted, in your book you apply intercept = F within your glmnet models. All the other parameters you discuss but you do not provide justification for this parameter. I'm curious as to why you chose this parameter in the headline click bait case study?

Problem with glove function. Page 176

Problem in page 176

fit.glove <- glove(tcm = tcm, word_vectors_size = 50, x_max = 10, learning_rate = 0.2, num_iters = 15)

RStudio Say

Error in .subset2(public_bind_env, "initialize")(...) :
unused argument (grain_size = 100000)
Además: Warning message:
'glove' is deprecated.
Use 'GloVe' instead.
See help("Deprecated")

vocab_vectorizer(p.173)

In page 173,

> vectorizer <- vocab_vectorizer(vocab, + grow_dtm = FALSE, + skip_grams_window = 5)

I got the following error message:

Error in vocab_vectorizer(vocab, grow_dtm = FALSE,
skip_grams_window = 5) :
unused arguments (grow_dtm = FALSE, skip_grams_window = 5)

Thanks in advance,

Chang-Kyo Suh([email protected])

LDA and tf-idf document term matrix

Dear Ted

Question: Can we input tf-idf document term matrix into Latent Dirichlet Allocation (LDA)? if yes, how?

it does not work in my case and the LDA function requires the 'term-frequency' document term matrix.

Thank you
(I make a question as concise as possible. So, if you need more details, I can add

##########################################################################
                           TF-IDF Document matrix construction
##########################################################################    

> DTM_tfidf <-DocumentTermMatrix(corpora,control = list(weighting = 
function(x)+   weightTfIdf(x, normalize = FALSE)))
> str(DTM_tfidf)
List of 6
$ i       : int [1:4466] 1 1 1 1 1 1 1 1 1 1 ...
$ j       : int [1:4466] 6 10 22 26 28 36 39 41 47 48 ...
$ v       : num [1:4466] 6 2.09 1.05 3.19 2.19 ...
$ nrow    : int 64
$ ncol    : int 297
$ dimnames:List of 2
  ..$ Docs : chr [1:64] "1" "2" "3" "4" ...
  ..$ Terms: chr [1:297] "accommod" "account" "achiev" "act" ...
- attr(*, "class")= chr [1:2] "DocumentTermMatrix" "simple_triplet_matrix"
- attr(*, "weighting")= chr [1:2] "term frequency - inverse document 
frequency" "tf-idf"

##########################################################################
                           LDA section
##########################################################################

> LDA_results <-LDA(DTM_tfidf,k, method="Gibbs", control=list(nstart=nstart,
  +                                seed = seed, best=best, 
  +                                burnin = burnin, iter = iter, thin=thin))

##########################################################################
                           Error messages
##########################################################################
  Error in LDA(DTM_tfidf, k, method = "Gibbs", control = list(nstart = 
  nstart,  : 
  The DocumentTermMatrix needs to have a term frequency weighting

Error message for the Map function (p. 255)

When I try to run the following code from p 255.

all.ner<-Map(function(tex,fea,id) cbind(fea,
entity=substring(tex, fea$start,fea$end),file=id),
all.emails,all.ner,temp)

I get this error message:

Error in substring(tex, fea$start, fea$end) : invalid substring arguments

I can't figure out what the problem is. Any idea?

thanks,

Marton

Full Code not appearing in section 6.2.3, pg 196 | Corrected

In section 6.2.3, page 196 the last line of the code is not readable as shown in the following image(yellow highlighted section):-

For those of who are suffering from this the full code is as follows

clean.test <- headline.clean(test.headlines$headline)
test.dtm <- match.matrix(clean.test, weighting = tm::weightTfIdf, original.matrix = train.dtm)

Error function name headline.cleanv (page 188)

in the page 188

clean.train<-headline.cleanv(train.headlines$headline)

must be

clean.train<-headline.cleanv(train.headlines$headline)

A bug in the code from page 45 to find phone number reference?

In order to answer how often agents refer to phone numbers, which has the pattern xxx-xxx-xxxx in the USA, the code on the page 45 is

sum(grepl('[0-9]{3})|[0-9]{4}', text.df$text))/
nrow(text.df)

There is a ) after {3}. What is this closing bracket for? Without it, the relative frequency of a phone number being referenced is 0.1445171.

Thank you for your advice.

R-Code

I bought the printed version of this interesting book. Obviously the author is offering the data but not the R-Code for the different chapters online. Despite of some efforts I couldn't find the code-source. Yes, one can copy the code out of the book, but why this imposition?

match.matrix(p.188)

p.188(last sentence).

The textbook says "You do not need to specify the third parameter, original.matrix, since train.dtm is the original matrix." But when I try the following code:

train.dtm <- match.matrix(clean.train, weighting = tm::weightTfIdf)

I got the following error message:

Error in match.matrix(clean.train, weighting = tm::weightTfIdf) : object 'original.martix' not found
In addition: Warning message:
In weighting(x) : empty document(s): 409

Would you please try the code to advise me how to fix it? I attatchd the code for match.matrix as follows:

match.matrix.txt

#------------ End of an issue

Thanks in advance

Chang-Kyo Suh([email protected])

match.matrix function page 187-188

In the match.matrix function, there are three closed curly brackets "}" but only two open curly brackets "{", which leads to the function not working.

Since there is an if-statement that does not have an open {, I tried putting the missing bracket there; to no avail:

if (attr(original.matrix, "weighting")[2] == "tfidf") {
...
matrix <– fixed
}

Where does the missing bracket have to go?

kwartler / text_mining Goto Github PK

text_mining's People

Contributors

Stargazers

Watchers

Forkers

text_mining's Issues

glmnet intercept = FALSE

Problem with glove function. Page 176

vocab_vectorizer(p.173)

LDA and tf-idf document term matrix

Error message for the Map function (p. 255)

Full Code not appearing in section 6.2.3, pg 196 | Corrected

Error function name headline.cleanv (page 188)

A bug in the code from page 45 to find phone number reference?

R-Code

match.matrix(p.188)

match.matrix function page 187-188

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent