***
`Abstract:
Text mining refers generally to the process of
extracting generally to the process of extracting
interesting and non-trivial and knowledge from
unstructured text data. Text mining is
interdisciplinary field which draws on information
retrieval, data mining, machine learning, statistics
and computational linguistics. Standard text mining
and information retrieval techniques of text
document usually rely on word matching. An
alternative way of information retrieval is
clustering. In which document pre-processing is an
important critical step in the clustering process and
it has a huge impact on the success extract
knowledge.
Document clustering is a technique used to group
similar documents. During the course of the project
we implement tf-idf and singular value
decomposition dimensionality reduction
techniques. We proposed an effective preprocessing and dimensionality reduction
techniques which helps the document clustering.
Finally we have chosen one dimension reduction
technique that performed best both in term of
clustering quality and computational efficiency.
***
No comments:
Post a Comment