Git Product home page Git Product logo

sharmaroshan / text-clustering Goto Github PK

View Code? Open in Web Editor NEW
4.0 1.0 6.0 507 KB

It is a very different task, as here I am going to cluster 200 different texts related to games and sports in 2 or more different clusters. we can also use zipf plot to determine how many useful clusters can be formed.

License: GNU General Public License v3.0

Jupyter Notebook 100.00%
data-analysis data-visualization data-mining cluster-analysis text-clustering pattern-recognition kmeans kmeans-clustering zipf elbow

text-clustering's Introduction

Text-Clustering

It is a very different task, as here I am going to cluster 200 different texts related to games and sports in 2 or more different clusters. we can also use zipf plot to determine how many useful clusters can be formed.

What is Text Clustering ?

Automatic document organization, topic extraction, information retrieval and filtering all have one thing in common. They require text clustering (sometimes also known as document clustering) to be done quickly and accurately.

If you’ve never heard of text clustering, this post will explain what it is, what it does, and how its currently being used to aid businesses. We’ll also briefly discuss how a business could employ text clustering too!

First, let’s define text clustering. Text clustering is the application of cluster analysis to text-based documents. It uses machine learning and natural language processing (NLP) to understand and categorize unstructured, textual data.

How does it Works ?

Typically, descriptors (sets of words that describe topic matter) are extracted from the document first. Then they are analyzed for the frequency in which they are found in the document compared to other terms. After which, clusters of descriptors can be identified and then auto-tagged.

From there, the information can be used in any number of ways. Google’s search engine is probably the best and most widely known example. When you search for a term on Google, it pulls up pages that apply to that term, but have you ever wondered how Google can analyze billions of web pages to deliver an accurate and fast result?

It’s because of text clustering! Google’s algorithm breaks down unstructured data from web pages and turns it into a matrix model, tagging pages with keywords that are then used in search results!

Example

To help you understand the process, it’s best to visualize an example:

Let’s simulate how text clustering would analyze (and tag) this sentence.

First, all punctuation is removed:

let us simulate how text clustering would analyze and tag this sentence

Then, all but the sentence’s descriptors are removed:

simulate how text clustering analyze tag sentence

At this point, its harder to visualize as a computer will be assigning each word a weighted value for use in tagging.

Business use cases

Perhaps one of the best parts of text clustering is its ability to be used in a wide variety of business settings. Text clustering can be used anywhere from product development to customer support. Let’s take a look at a few examples in which a business could employ text clustering.

  1. Creating a product roadmap

Your customers and target audience are talking all over the web about the products and features they want, but, traditionally, it’s difficult to aggregate all the data and turn it into an actionable report. It’s hard to know just how many really want a feature based on a handful of reviews and forum posts.

But with text clustering, all of your customer and target audience’s reviews can be analyzed and used to create a roadmap of features and products they’ll love!

You can even analyze competitor reviews to find potential deal breakers as well!

  1. Identify recurring support issues

Your customer support team gets asked the same questions day in and day out. But, it’s hard to truly analyze the pain points your customers may have when adopting products and address them correctly. Text clustering will enable you to not only see how frequent (or infrequent) an issue is, but also may help identify the root of the issue with additional tags.

  1. Creating better marketing copy

Another use case for text clustering is in your marketing copy. Depending on your organization you may have run thousands of different ads and have plenty of data with it. But understanding how the language of the ad impacted performance can be tough.

It’s difficult to spot trends in unstructured data such as marketing copy which is where text clustering can come into play. It can analyze and break down the topics and words which have the highest conversion rates enabling you to create highly relevant, highly converting web copy.

About Author

Derek Gerber is Director of Marketing at ActivePDF. Derek represents ActivePDF’s technologies, services, and solutions on-site and in the cloud. After leaving CNN in 2011, and helping sell Tallega in 2015, Derek joined ABBYY to coordinate international lead generation and business development campaigns. He was then recruited by ActivePDF to take control of marketing and drive the company’s vision through online marketing, strategic corporate sponsorships, and targeted events. Derek has been responsible for the analysis of customer research, current market conditions, sales enablement, and researching competitor information. Derek earned his B.S. in Business Economics from UC Irvine and is certified in many fields.

text-clustering's People

Contributors

sharmaroshan avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.