Mahmood: Text Recognition with k-means Clustering

***

Abstract:

A thesaurus is a reference work that lists words grouped together according to similarity of meaning (containing synonyms and sometimes antonyms), in contrast to a dictionary, which contains definitions and pronunciations.

This paper proposes an innovative approach to improve the classification performance of Persian texts considering a very large thesaurus.

The paper proposes a flexible method to recognize and categorize the Persian texts employing a thesaurus as a helpful knowledge.

In the corpus, when utilizing the thesaurus the method obtains a more representative set of word- frequencies comparing to those obtained when the method disables the thesaurus.

Two types of word relationships are considered in our used thesaurus.

This is the first attempt to use a Persian thesaurus in the field of Persian information retrieval.

The k-nearest neighbor classifier, decision tree classifier and k-means clustering algorithm are employed as classifier over the frequency based features.

Experimental results indicate enabling thesaurus causes the method significantly outperforms in text classification and clustering

***

https://www.semanticscholar.org/paper/f93df05d8a814faa5d3004775ee2b6969e053d01

https://rcs.cic.ipn.mx/2014_84/Text%20Recognition%20with%20k-means%20Clustering.pdf

Mahmood

Main

Media

How many emotions do humans have?

What do you think?

Text Recognition with k-means Clustering

No comments:

Online Search

Popular Posts

Online Tools