Feature Subsumption for Sentiment Classification in Multiple Languages

Abstract

An open problem in machine learning-based sentiment classification is how to extract complex features that outperform simple features; figuring out which types of features are most valuable is another. Most of the studies focus primarily on character or word Ngrams features, but substring-group features have never been considered in sentiment classification area before. In this study, the substring-group features are extracted and selected for sentiment classification by means of transductive learning-based algorithm. To demonstrate generality, experiments have been conducted on three open datasets in three different languages: Chinese, English and Spanish. The experimental results show that the proposed algorithm’s performance is usually superior to the best performance in related work, and the proposed feature subsumption algorithm for sentiment classification is multilingual. Compared to the inductive learning-based algorithm, the experimental results also illustrate that the transductive learning-based algorithm can significantly improve the performance of sentiment classification. As for term weighting, the experiments show that the ``tfidf-c'' outperforms all other term weighting approaches in the proposed algorithm.

Publication
Advances in Knowledge Discovery and Data Mining
Hua Xu
Hua Xu
Tenured Associate Professor, Editor-in-Chief of Intelligent Systems with Applications, Associate Editor of Expert Systems with Application, Ph.D Supervisor

Related