***
Abstract:
One of the challenging tasks in text classification is to reduce the dimensional feature space.
This paper discusses an enhanced text classification method using Bag-of-Words representation model with term frequency-inverse document frequency (tf-idf) and word embedding technique ‘GloVe’ to find words with similar semantic meaning.
We select the word with the highest sum of tf-idf as the most representative word with similar meanings.
The performance of the proposed method is compared with other methods such as Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), Latent Semantic Indexing (LSI), a hybrid approach PCA+LDA using the Naïve Bayes classifier.
Experimental results on three datasets, namely BBC, Classic4, and 20-newsgroup datasets, show that the proposed algorithm gives better classification results than existing dimension reduction techniques.
Lastly, we defined a new performance evaluation metric to check the classifier's performance on the reduced features.
***
https://www.sciencedirect.com/science/article/pii/S2667096822000052
No comments:
Post a Comment