NotaRazzi: Language-independent data set annotation for machine learning-based sentiment analysis

Friday, December 1, 2017

Language-independent data set annotation for machine learning-based sentiment analysis

.
Abstract:
Social media platforms provide large amounts of user-generated text which can be utilized for text-based Sentiment Analysis in order to obtain insights about opinions on many aspects of life. Current approaches that are based on supervised learning require manually annotated data sets that are time-consuming to create and are specific to a single language. In this work, we present an approach to generate ground truth sentiment values for a data set from Twitter. We use a sentiment emoji lexicon and distribute known polarities of hashtags over neighbors in a graph we build on them. This approach is language-independent. Native speakers of five different languages evaluate the accuracy of the sentiment values assigned by our method on a corpus of Tweets. Our experiments show that the quality of our automatically assigned sentiment values is sufficiently high to be used for training of machine learning-based sentiment analysis.
.
https://ieeexplore.ieee.org/document/8122930

NotaRazzi

What Is Branding?

Are you sure that you got it right?

Friday, December 1, 2017

Language-independent data set annotation for machine learning-based sentiment analysis

No comments:

Post a Comment

Mobile

Media

Marketing