Git Product home page Git Product logo

text_mining's People

Contributors

kwartler avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

text_mining's Issues

glmnet intercept = FALSE

Ted, in your book you apply intercept = F within your glmnet models. All the other parameters you discuss but you do not provide justification for this parameter. I'm curious as to why you chose this parameter in the headline click bait case study?

Problem with glove function. Page 176

Problem in page 176

fit.glove <- glove(tcm = tcm, word_vectors_size = 50, x_max = 10, learning_rate = 0.2, num_iters = 15)

RStudio Say

Error in .subset2(public_bind_env, "initialize")(...) :
unused argument (grain_size = 100000)
Además: Warning message:
'glove' is deprecated.
Use 'GloVe' instead.
See help("Deprecated")

vocab_vectorizer(p.173)

In page 173,

> vectorizer <- vocab_vectorizer(vocab, + grow_dtm = FALSE, + skip_grams_window = 5)

I got the following error message:

Error in vocab_vectorizer(vocab, grow_dtm = FALSE,
skip_grams_window = 5) :
unused arguments (grow_dtm = FALSE, skip_grams_window = 5)

Thanks in advance,

Chang-Kyo Suh([email protected])

LDA and tf-idf document term matrix

Dear Ted

Question: Can we input tf-idf document term matrix into Latent Dirichlet Allocation (LDA)? if yes, how?

it does not work in my case and the LDA function requires the 'term-frequency' document term matrix.

Thank you
(I make a question as concise as possible. So, if you need more details, I can add

##########################################################################
                           TF-IDF Document matrix construction
##########################################################################    

> DTM_tfidf <-DocumentTermMatrix(corpora,control = list(weighting = 
function(x)+   weightTfIdf(x, normalize = FALSE)))
> str(DTM_tfidf)
List of 6
$ i       : int [1:4466] 1 1 1 1 1 1 1 1 1 1 ...
$ j       : int [1:4466] 6 10 22 26 28 36 39 41 47 48 ...
$ v       : num [1:4466] 6 2.09 1.05 3.19 2.19 ...
$ nrow    : int 64
$ ncol    : int 297
$ dimnames:List of 2
  ..$ Docs : chr [1:64] "1" "2" "3" "4" ...
  ..$ Terms: chr [1:297] "accommod" "account" "achiev" "act" ...
- attr(*, "class")= chr [1:2] "DocumentTermMatrix" "simple_triplet_matrix"
- attr(*, "weighting")= chr [1:2] "term frequency - inverse document 
frequency" "tf-idf"

##########################################################################
                           LDA section
##########################################################################

> LDA_results <-LDA(DTM_tfidf,k, method="Gibbs", control=list(nstart=nstart,
  +                                seed = seed, best=best, 
  +                                burnin = burnin, iter = iter, thin=thin))

##########################################################################
                           Error messages
##########################################################################
  Error in LDA(DTM_tfidf, k, method = "Gibbs", control = list(nstart = 
  nstart,  : 
  The DocumentTermMatrix needs to have a term frequency weighting

Error message for the Map function (p. 255)

When I try to run the following code from p 255.

all.ner<-Map(function(tex,fea,id) cbind(fea,
entity=substring(tex, fea$start,fea$end),file=id),
all.emails,all.ner,temp)

I get this error message:

Error in substring(tex, fea$start, fea$end) : invalid substring arguments

I can't figure out what the problem is. Any idea?

thanks,

Marton

Full Code not appearing in section 6.2.3, pg 196 | Corrected

In section 6.2.3, page 196 the last line of the code is not readable as shown in the following image(yellow highlighted section):-

code

For those of who are suffering from this the full code is as follows

clean.test <- headline.clean(test.headlines$headline)
test.dtm <- match.matrix(clean.test, weighting = tm::weightTfIdf, original.matrix = train.dtm)

A bug in the code from page 45 to find phone number reference?

In order to answer how often agents refer to phone numbers, which has the pattern xxx-xxx-xxxx in the USA, the code on the page 45 is

sum(grepl('[0-9]{3})|[0-9]{4}', text.df$text))/
nrow(text.df)

There is a ) after {3}. What is this closing bracket for? Without it, the relative frequency of a phone number being referenced is 0.1445171.

Thank you for your advice.

R-Code

I bought the printed version of this interesting book. Obviously the author is offering the data but not the R-Code for the different chapters online. Despite of some efforts I couldn't find the code-source. Yes, one can copy the code out of the book, but why this imposition?

match.matrix(p.188)

p.188(last sentence).

The textbook says "You do not need to specify the third parameter, original.matrix, since train.dtm is the original matrix." But when I try the following code:

train.dtm <- match.matrix(clean.train, weighting = tm::weightTfIdf)

I got the following error message:

Error in match.matrix(clean.train, weighting = tm::weightTfIdf) : object 'original.martix' not found
In addition: Warning message:
In weighting(x) : empty document(s): 409

Would you please try the code to advise me how to fix it? I attatchd the code for match.matrix as follows:

match.matrix.txt

#------------ End of an issue

Thanks in advance

Chang-Kyo Suh([email protected])

match.matrix function page 187-188

In the match.matrix function, there are three closed curly brackets "}" but only two open curly brackets "{", which leads to the function not working.

Since there is an if-statement that does not have an open {, I tried putting the missing bracket there; to no avail:

if (attr(original.matrix, "weighting")[2] == "tfidf") {
...
matrix <– fixed
}

Where does the missing bracket have to go?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.