Friday, December 1, 2017

Language-independent data set annotation for machine learning-based sentiment analysis


.
Abstract:
Social media platforms provide large amounts of user-generated text which can be utilized for text-based Sentiment Analysis in order to obtain insights about opinions on many aspects of life. Current approaches that are based on supervised learning require manually annotated data sets that are time-consuming to create and are specific to a single language. In this work, we present an approach to generate ground truth sentiment values for a data set from Twitter. We use a sentiment emoji lexicon and distribute known polarities of hashtags over neighbors in a graph we build on them. This approach is language-independent. Native speakers of five different languages evaluate the accuracy of the sentiment values assigned by our method on a corpus of Tweets. Our experiments show that the quality of our automatically assigned sentiment values is sufficiently high to be used for training of machine learning-based sentiment analysis.
.
https://ieeexplore.ieee.org/document/8122930