***
Introduction
Clustering can be considered the most important unsupervised learning problem; so, as every other problem of this kind, it deals with finding a structure in a collection of unlabeled data.
A loose definition of clustering could be “the process of organizing objects into groups whose members are similar in some way”.
A cluster is therefore a collection of objects which are coherent internally, but clearly dissimilar to the objects belonging to other clusters.
In this case we easily identify the 3 clusters into which the data can be divided; the similarity criterion is distance: two or more objects belong to the same cluster if they are “close” according to a given distance (in this case geometrical distance).
This is called distance-based clustering, here I’m going to deal with is distance-based clustering.
Another kind of clustering is conceptual clustering: two or more objects belong to the same cluster if this one defines a concept common to all that objects.
In other words, objects are grouped according to their fit to descriptive concepts, not according to simple similarity measures.
***
https://www.codeproject.com/Articles/439890/Text-Documents-Clustering-using-K-Means-Algorithm
No comments:
Post a Comment