bhargavvader / personal Goto Github PK

Contains Jupyter Notebooks of stuff I am working on.

Jupyter Notebook 100.00%

personal's Introduction

Personal Material

Contains personal material which is to share - Google Summer of Code proposals, Jupyter notebooks I have helped contribute to in the past, and new notebooks/code snippets I have been working on.

personal's People

Contributors

Stargazers

Watchers

Forkers

colcarroll contacth2 omorillo diwahars scarletmclearn drstatsvenu sudhanshuchib ranjankumarpatel whs2k cequencer chamroshi rogervaas ndjido charlie012in shshnkg sunil-sangwan fmqbl rustyonrampage sandy4321 shivamnijhawan96 stephenlee1121 shashank70 raider2908 window618 andow7 radovankavicky gapdata sbhttcha doug-descombaz rajpratim21 sin10036 carlosandres12 magicwanda fichel smileykaur opokualbert mindaugasmc cubekk jasonzdeng aracelimanzanoch bvdeenen dpbac janevin priya-gittest jmussach robertodias chetesta mividalocas dancacic jeleandro contactservice beghtesadi msoancah zioalex janpipek ntodata liuzheng1990 lriordanchen ikhedr xiaoxiao19 aviyallapalli vinven7 maryamnajafian tcratius emediacode danielp3011 peterbabinec philipobiorah jaykimbravekjh dsannad guptasaurav leannw ares7 github432 developerwillz txb1gfr0g deepak-0016 pspk clement93low max7521 parseb snowdj ramkumarr02 xy-liao phyllip statpods danielehnes sai-das fitrialif pb-pravin asavaritayal asing218 colemiller94 saiyolang rajagopal17 edypraveen ablitstein seanreed1111 rudrajit upalchowdhury

personal's Issues

PyDays - Scripts missing python interpreter directive

My script couldn't find matplotlib till I figured out that it was missing
#!/usr/bin/python

Pydata: Stopwords not being removed, even after adding to my_stop_words

Following up on pydata Amsterdam, where we noticed that stopwords like "the" were not removed from the corpus. This seems to happen in the notebook text_analysis_tutorial_unrun as well as ..._run.

Also, the word '-PRON-' appears in the clusters, but it was unclear where it's coming from.

PyDays - use unidecode for python2.7

Was made by someone in the room, I posted it here so it don't gets lost.

https://pypi.python.org/pypi/Unidecode

So instead of doc = nlp(clean(text))
doc=nlp(unidecode(text)) can be used.

This should preserve the original text as close as possible.

How to do Bigram and Trigram topic modeling using gensim ?

Hi bhargav

Its was informative notebook about topic modeling and spacy.

I have doubt how to do trigram and trigram topic modeling

texts = metadata['cleandata']
bigram = gensim.models.Phrases(texts)

example this gives lda output of - India , car , license , india , visit , visa

I want output as - India car license , Visit visa , indian hotel

This code gives Bigram using tfidf

def display_topics(model, feature_names, no_top_words):
    
    for topic_idx, topic in enumerate(model.components_):
        print("Topic:", (topic_idx))
        print(" ".join([feature_names[i]
        for i in topic.argsort()[:-no_top_words - 1:-1]]))


def tfidf_vectorizer(documents,total_features):

    #  TFIDF Vectorizer
    tfidf_vectorizer = TfidfVectorizer(max_features=total_features, stop_words='english',ngram_range=[2,2])
    tfidf = tfidf_vectorizer.fit_transform(documents)
    tfidf_feature_names = tfidf_vectorizer.get_feature_names()
    return tfidf_vectorizer,tfidf,tfidf_feature_names


def count_vectorizer(documents,total_features):

    #  Count Vectorizer
    tf_vectorizer = CountVectorizer(max_features=total_features, stop_words='english')
    tf = tf_vectorizer.fit_transform(documents)
    tf_feature_names = tf_vectorizer.get_feature_names()
    return tf_vectorizer,tf,tf_feature_names

My question is how to do in gensim trigram and bigram ?

Thanks in advance

PyDays - spaCy outputting rubbish under python 3.5

Solution, download the English dictionary:
python -m spacy.en.download all

topic modeling tutorial not showing notebook beyond output of Ln {22}

Topic Modelling with scikit-learn
Let us now use NMF and LDA which is available in sklearn to see how these topics work.

In [20]:
import sklearn
from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer
from sklearn.datasets import fetch_20newsgroups
from sklearn.decomposition import NMF, LatentDirichletAllocation
In [21]:
dataset = fetch_20newsgroups(shuffle=True, random_state=1, remove=('headers', 'footers', 'quotes'))
documents = dataset.data
In [22]:
documents
Out[22]:
[u"Well i'm not sure about the story nad it did seem biased. What\nI disagr

bhargavvader / personal Goto Github PK

personal's Introduction

Personal Material

personal's People

Contributors

Stargazers

Watchers

Forkers

personal's Issues

PyDays - Scripts missing python interpreter directive

Pydata: Stopwords not being removed, even after adding to my_stop_words

PyDays - use unidecode for python2.7

How to do Bigram and Trigram topic modeling using gensim ?

PyDays - spaCy outputting rubbish under python 3.5

topic modeling tutorial not showing notebook beyond output of Ln {22}

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent