Media

How many emotions do humans have?

What do you think?

Dimensionality Reduction and Clustering

***

In the previous post, we have looked at how to calculate the similarity of the document pairs using ‘Cosine Similarity’ algorithm and found which pairs of the documents are more similar than the other pairs.


Now, let’s say we want to categorize or classify the documents (e.g. Politics, Sport, etc.) based on the common characteristics among the documents.


Instead of looking at the similarity values with ‘Cosine Similarity’ for each pair of the documents, we can use clustering algorithms like ‘K-means’ to automatically categorize documents based on the TF-IDF values.


In this post, I’m going to talk about what are the challenges for building the clustering model for this type of document data and how we can overcome, then walk you through how to build the clustering model in Exploratory.


*** 

https://blog.exploratory.io/demystifying-text-analytics-part-4-dimensionality-reduction-and-clustering-in-r-cbc8c90e0b14

No comments: