Multi-class sentiment analysis using a hierarchical logistic model tree approach

Autores/as

  • Masun Nabhan Homsi Universidad Simón Bolívar

Resumen

ABSTRACT
This paper proposes a new hybrid system for multi-class sentiment analysis based on General Inquirer (GI) dictionary and a hierarchical Logistic Model Tree (LMT) approach. This new system consists of three layers, the Bipolar Layer (BL) is of one LMT (LMT-1) for classifying sentiment polarity, while the Intensity Layer (IL) comprises two LTMs (LMT-2 and LMT3) for detecting separately three positive and three negative sentiment intensities. Only in construction phase, the Grouping Layer (GL) is used to cluster positive and negative instances by employing 2 k-means respectively. In Pre-processing phase, the raw text data is subjected to a tokenizer, a tagger, a stemmer and finally to GI dictionary to count and label only verbs, nouns, adjectives and adverbs with 24 markers that are used later to compute feature vectors. In Sentiments Classification phase, feature vectors are first introduced to LMT-1, then they are grouped in GL according to class label, afterward these groups of instances are labeled manually, and finally positive instances are introduced to LMT-2 and negative instances to LMT-3. The three trees are trained and tested on Movie Review and SenTube datasets utilizing 10-folds stratified cross validation. LMT-1 yields a tree of 48 leaves and 95 of size with 90.88% of accuracy, while both LMT-2 and LMT-3 provide two trees of 1 leaf and 1 of size with 99.28% and 99.37% of accuracy respectively. Experiments show that the proposed hierarchical classification methodology gives a better performance compared to other prevailing approaches.
Keywords: Multi-class sentiments analysis, hybrid approach, logistic model tree, general inquirer dictionary (GI).

RESUMEN
En este trabajo se propone un nuevo sistema híbrido para el análisis de sentimientos en clase múltiple basado en el uso del diccionario General Inquirer (GI) y un enfoque jerárquico del clasificador Logistic Model Tree (LMT). Este nuevo sistema se compone de tres capas, la capa bipolar (BL) que consta de un LMT (LMT-1) para la clasificación de la polaridad de sentimientos, mientras que la segunda capa es la capa de la Intensidad (IL) y comprende dos LMTs (LMT-2 y LMT3) para detectar por separado tres intensidades de sentimientos positivos y tres intensidades de sentimientos negativos. Sólo en la fase de construcción, la capa de Agrupación (GL) se utiliza para agrupar las instancias positivas y negativas mediante el empleo de 2 k-means, respectivamente. En la fase de Pre-procesamiento, los textos son segmentados por palabras que son etiquetadas, reducidas a sus raíces y sometidas finalmente al diccionario GI con el objetivo de contar y etiquetar sólo los verbos, los sustantivos, los adjetivos y los adverbios con 24 marcadores que se utilizan luego para calcular los vectores de características. En la fase de Clasificación de Sentimientos, los vectores de características se introducen primero al LMT-1, a continuación, se agrupan en GL según la etiqueta de clase, después se etiquetan estos grupos de forma manual, y finalmente las instancias positivas son introducidas a LMT-2 y las instancias negativas a LMT-3. Los tres árboles están entrenados y evaluados usando las bases de datos Movie Review y SenTube con validación cruzada estratificada de 10-pliegues. LMT-1 produce un árbol de 48 hojas y 95 de tamaño, con 90,88% de exactitud, mientras que tanto LMT-2 y LMT-3 proporcionan dos árboles de una hoja y uno de tamaño, con 99,28% y 99,37% de exactitud, respectivamente. Los experimentos muestran que la metodología de clasificación jerárquica propuesta da un mejor rendimiento en comparación con otros enfoques prevalecientes.

Palabras clave: Análisis de sentimientos en clase múltiple, enfoque híbrido, logistic model tree, diccionario general inquirer (GI).

Descargas

Los datos de descargas todavía no están disponibles.

Citas

Cassinelli, A., C.W. Chen, 2009. CS224N Final Project Boost up! Sentiment categorization with machine learning techniques. Available at http://nlp.stanford.edu/courses/cs224n/2009/fp/16.pdf, 12 pp.

Inkpen, D.Z., O. Feiguina, G. Hirst, 2005. Generating more-positive and more-negative text. In: Shanahan, J., Y. Qu, J. Wiebe (Eds.). Computing Attitude and Affect in Text: Theory and Applications. The Information Retrieval Series, 20, 187-196.

Kennedy, A., D. Inkpen, 2006. Sentiment classification of movie reviews using contextual valence shifters. Computational Intelligence, 22(2), 110-125.

Landwehr, N., M. Hall, F. Eibe, 2003. Logistic model trees. Proc. 14th European Conf. on Machine Learning, Cavtat-Dubrovnik, Croatia, 241-252.

Pandey, S.J., 2011. Opinion analysis through constraint optimization. Master Thesis, Department of Computer Science, University of York, 144 pp.

Pang, B., L. Lee, 2004. A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. Proc. of the 42nd Annual Meeting on Association for Computational Linguistics, ACL'04, Stroudsburg, PA, USA, 271-278.

Pang, B., L. Lee, 2008. Opinion mining and sentiment analysis, Foundations and Trends in Information Retrieval, 2(1-2), 1-135.

Pang, B., L. Lee, S. Vaithyanathan, 2002. Thumbs up? Sentiment classification using machine-learning techniques. Proc. of the ACL-02 Conf. on Empirical Methods in Natural Language Processing, Philadelphia, PA, 79-86.

Porter, M.F., 1980. An algorithm for suffix stripping. Program, 14(3), 130-137.

Prabowo, R., M. Thelwall, 2009. Sentiment analysis: A combined approach. Journal of Informatics,

Severyn, A., A. Moschitti, O. Uryupina, B. Plank, K. Filippova, 2014. Opinion mining on YouTube. Proc. of the Annual Meeting of the Association for Computational Linguistics (ACL 2014), 10 pp.

Uryupina, O., B. Plank, A. Severyn, A. Rotondi, A. Moschitti, 2014. SenTube: A corpus for sentiment analysis on YouTube social media. Proc. of the 9th Conf. of Language Resources and Evaluation (LREC’14), Reykjavik, Iceland, 4244-4249.

Vinodhini, G., R.M. Chandrasekaran, 2012. Sentiment analysis and opinion mining: A survey. International Journal of Advanced Research in Computer Science and Software Engineering, 2(6), 282-292.

Wilson, T., J. Wiebe, R. Hwa, 2004. Just how mad are you? Finding strong and weak opinion clauses, Proc. of 19th National Conf. on Artificial Intelligence, 761-769.

Witten, I.H, E. Frank, M. Hall, 2005. Data mining: Practical machine learning tools and techniques, The Morgan Kaufmann Series in Data Management Systems, 629 pp.

Descargas

Publicado

2016-04-25

Cómo citar

Nabhan Homsi, M. (2016). Multi-class sentiment analysis using a hierarchical logistic model tree approach. Maskana, 5(Ed. Esp.). Recuperado a partir de https://publicaciones.ucuenca.edu.ec/ojs/index.php/maskana/article/view/718