***
Abstract:
The growth of Internet has produced a high volume of natural language textual data.
Such data can be sparse and may contain uninformative features which increase the dimensions of the data.
This high dimensionality in turn, decreases the efficiency of text mining tasks such as clustering.
Transforming the high dimensional data into a lower dimension is an important pre-processing step before applying clustering.
In this paper, dimensionality reduction method based on deep Autoencoder neural network named as DRDAE, is proposed to provide optimized and robust features for text clustering.
DRDAE selects less correlated and salient feature space from the high dimensional feature space.
To evaluate proposed algorithm, k-means is used to cluster text documents.
The proposed method is tested on five benchmark text datasets.
Simulation results demonstrate that the proposed algorithm clearly outperforms other conventional dimensionality reduction methods in the literature in terms of RI measure.
***
No comments:
Post a Comment