Mahmood: Probabilistic Word Selection via Topic Modeling

Abstract—

We propose selective supervised Latent Dirichlet Allocation (ssLDA) to boost the prediction performance of the widely studied supervised probabilistic topic models.

We introduce a Bernoulli distribution for each word in one given document to select this word as a strongly or weakly discriminative one with respect to its assigned topic.

The Bernoulli distribution is parameterized by the discrimination power of the word for its assigned topic.

As a result, the document is represented as a “bag-of-selective-words” instead of the probabilistic “bag-of-topics” in the topic modeling domain or the flat “bag-of-words” in the traditional natural language processing domain to form a new perspective.

Inheriting the general framework of supervised LDA (sLDA), ssLDA can also predict many types of response specified by a Gaussian Linear Model (GLM).

Focusing on the utilization of this word selection mechanism for singe-label document classification in this paper, we conduct the variational inference for approximating the intractable posterior and derive a maximum-likelihood estimation of parameters in ssLDA.

The experiments reported on textual documents show that ssLDA not only performs competitively over “state-of-the-art” classification approaches based on both the flat “bag-of-words” and probabilistic “bag-of-topics” representation in terms of classification performance, but also has the ability to discover the discrimination power of the words specified in the topics (compatible with our rational knowledge).

Mahmood

Main

Media

How many emotions do humans have?

What do you think?

Probabilistic Word Selection via Topic Modeling

No comments:

Online Search

Popular Posts

Online Tools