Mahmood: Text Documents Clustering using K-Means Algorithm

***

Introduction

Clustering can be considered the most important unsupervised learning problem; so, as every other problem of this kind, it deals with finding a structure in a collection of unlabeled data.

A loose definition of clustering could be “the process of organizing objects into groups whose members are similar in some way”.

A cluster is therefore a collection of objects which are coherent internally, but clearly dissimilar to the objects belonging to other clusters.

In this case we easily identify the 3 clusters into which the data can be divided; the similarity criterion is distance: two or more objects belong to the same cluster if they are “close” according to a given distance (in this case geometrical distance).

This is called distance-based clustering, here I’m going to deal with is distance-based clustering.

Another kind of clustering is conceptual clustering: two or more objects belong to the same cluster if this one defines a concept common to all that objects.

In other words, objects are grouped according to their fit to descriptive concepts, not according to simple similarity measures.

***

https://www.codeproject.com/Articles/439890/Text-Documents-Clustering-using-K-Means-Algorithm

Mahmood

Main

Media

How many emotions do humans have?

What do you think?

Text Documents Clustering using K-Means Algorithm

No comments:

Online Search

Popular Posts

Online Tools