Media

How many emotions do humans have?

What do you think?

Text Documents Clustering using K-Means Algorithm

***

Introduction

Clustering can be considered the most important unsupervised learning problem; so, as every other problem of this kind, it deals with finding a structure in a collection of unlabeled data. 

A loose definition of clustering could be “the process of organizing objects into groups whose members are similar in some way”. 

A cluster is therefore a collection of objects which are coherent internally, but clearly dissimilar to the objects belonging to other clusters.

In this case we easily identify the 3 clusters into which the data can be divided; the similarity criterion is distance: two or more objects belong to the same cluster if they are “close” according to a given distance (in this case geometrical distance). 

This is called distance-based clustering, here I’m going to deal with is distance-based clustering. 

Another kind of clustering is conceptual clustering: two or more objects belong to the same cluster if this one defines a concept common to all that objects. 

In other words, objects are grouped according to their fit to descriptive concepts, not according to simple similarity measures.

*** 

https://www.codeproject.com/Articles/439890/Text-Documents-Clustering-using-K-Means-Algorithm

No comments: