Git Product home page Git Product logo

wordcloudsimpleexample's Introduction

Downloading libraries

!pip install wordcloud
Requirement already satisfied: wordcloud in c:\programdata\anaconda3\lib\site-packages (1.8.2.2)
Requirement already satisfied: matplotlib in c:\programdata\anaconda3\lib\site-packages (from wordcloud) (3.5.1)
Requirement already satisfied: numpy>=1.6.1 in c:\programdata\anaconda3\lib\site-packages (from wordcloud) (1.21.5)
Requirement already satisfied: pillow in c:\programdata\anaconda3\lib\site-packages (from wordcloud) (9.0.1)
Requirement already satisfied: packaging>=20.0 in c:\programdata\anaconda3\lib\site-packages (from matplotlib->wordcloud) (21.3)
Requirement already satisfied: python-dateutil>=2.7 in c:\programdata\anaconda3\lib\site-packages (from matplotlib->wordcloud) (2.8.2)
Requirement already satisfied: kiwisolver>=1.0.1 in c:\programdata\anaconda3\lib\site-packages (from matplotlib->wordcloud) (1.3.2)
Requirement already satisfied: pyparsing>=2.2.1 in c:\programdata\anaconda3\lib\site-packages (from matplotlib->wordcloud) (3.0.4)
Requirement already satisfied: fonttools>=4.22.0 in c:\programdata\anaconda3\lib\site-packages (from matplotlib->wordcloud) (4.25.0)
Requirement already satisfied: cycler>=0.10 in c:\programdata\anaconda3\lib\site-packages (from matplotlib->wordcloud) (0.11.0)
Requirement already satisfied: six>=1.5 in c:\programdata\anaconda3\lib\site-packages (from python-dateutil>=2.7->matplotlib->wordcloud) (1.16.0)

Importing libraries

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import nltk
nltk.download('punkt')
from nltk import wordpunct_tokenize
nltk.download('stopwords')
from nltk.corpus import stopwords
[nltk_data] Downloading package punkt to C:\Users\Jonathan
[nltk_data]     Oliva\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to C:\Users\Jonathan
[nltk_data]     Oliva\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
import os
print(os.listdir())
['.git', '.ipynb_checkpoints', 'dataset', 'LICENSE', 'README.md', 'wordcloud1.jpg', 'wordcloud_demostration.ipynb']
from wordcloud import WordCloud
import matplotlib.pyplot as plt
from PIL import Image

Downloading data

mydata = "dataset"
print(os.listdir(mydata))
['dnd-spells.csv', 'dnd.png']
df_dataset = pd.read_csv(mydata+"/dnd-spells.csv")
df_dataset[:5]
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
name classes level school cast_time range duration verbal somatic material material_cost description
0 Acid Splash Artificer, Sorcerer, Wizard 0 Conjuration 1 Action 60 Feet Instantaneous 1 1 0 NaN You hurl a bubble of acid. Choose one creature...
1 Blade Ward Bard, Sorcerer, Warlock, Wizard 0 Abjuration 1 Action Self 1 round 1 1 0 NaN You extend your hand and trace a sigil of ward...
2 Booming Blade Artificer, Sorcerer, Warlock, Wizard 0 Evocation 1 Action Self (5-foot radius) 1 round 0 1 1 a melee weapon worth at least 1 sp You brandish the weapon used in the spell’s ca...
3 Chill Touch Sorcerer, Warlock, Wizard 0 Necromancy 1 Action 120 Feet 1 round 1 1 0 NaN You create a ghostly, skeletal hand in the spa...
4 Control Flames Druid, Sorcerer, Wizard 0 Transmutation 1 Action 60 Feet Instantaneous or 1 hour 0 1 0 NaN You choose nonmagical flame that you can see w...

Preprocessing Data

Eliminate, if exist, null values in description

def calcData(row):
    counter = 0
    for element in row:
        if element == True:
            counter+=1
    return counter
df_dataset.isnull().apply(lambda x: calcData(x))
name               0
classes            0
level              0
school             0
cast_time          0
range              0
duration           0
verbal             0
somatic            0
material           0
material_cost    264
description        0
dtype: int64

Eliminate, if exist, duplicate values in description

sum(df_dataset.duplicated('description'))
0

Creating Dataset

stopwords_english = set(stopwords.words('english'))
len(stopwords_english)
179
def createCloudUnigrams(x):
    y = nltk.word_tokenize(x)
    
    return [word for word in y if not word in stopwords_english and word.isalnum()]
    
df_dataset['unigrams'] = df_dataset.apply(lambda x: createCloudUnigrams(x['description']),axis=1)
df_dataset['unigrams'][:5]
0    [You, hurl, bubble, acid, Choose, one, creatur...
1    [You, extend, hand, trace, sigil, warding, air...
2    [You, brandish, weapon, used, spell, casting, ...
3    [You, create, ghostly, skeletal, hand, space, ...
4    [You, choose, nonmagical, flame, see, within, ...
Name: unigrams, dtype: object
unigrams = df_dataset['unigrams']
cloud_set = {}

for terms in unigrams:
    counter = 0
    for term in terms:
        cloud_set[term] = cloud_set.get(term,0) + 1
dict(list(cloud_set.items())[:10])
{'You': 660,
 'hurl': 9,
 'bubble': 1,
 'acid': 37,
 'Choose': 56,
 'one': 379,
 'creature': 1298,
 'within': 603,
 'range': 359,
 'choose': 187}

Generating Wordcloud

# getting figure
tweet_mask = np.array(Image.open(mydata + "/dnd.png"))
wc = WordCloud(background_color="white", mask=tweet_mask,contour_width=1)
wc.generate_from_frequencies(cloud_set)
plt.figure( figsize=(20,20))
plt.imshow(wc)
plt.axis("off")
plt.show()

png

wordcloudsimpleexample's People

Contributors

jonaoliv avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.