Vietnamese Online Hotel Reviews Classification Bases on Term Features Selection

Abstract

This paper aims to present the improved techniques to classify the user’s feedbacks on hotel service qualities. The data were mainly collected from online feedback sources by PHP program. The training set was manually tagged as: NEGATIVE, POSITIVE, and NEUTRAL. In total, 2969 Vietnamese language terms were successfully collected. In the first part, the common machine learning techniques like K-Nearest Neighbor algorithm (KNN), Decision Tree, Naive Bayes (NB) and Support Vector Machines (SVM) were applying for classification. In the second part, we enhanced the efficiency of the text categorization by applying feature selection techniques, χ² (CHI). At the end of the paper, we concluded that the overall performance of general machine learning techniques was significantly improved by applying feature selection.

Publication
26th International Conference on Information Modelling and Knowledge Bases (EJC) 2016
Avatar
Bang Tran
Assistant Professor

My research interests include single-cell imputation, single-cell analysis.