Bigdatapump participates in the www.vaalipohina.fi website for following the social media activity related to the Finnish parliamentary elections on 19.4.2015. One element of the site is the sentiment analysis of Twitter messages. It relates various emotional states to the ongoing discussion around the main political parties. But how one actually performs the real time text analysis in Finnish?
Artificial intelligence and machine learning
Text analysis and opinion mining are among the oldest and most challenging fields of artificial intelligence and machine learning technologies. They are fundamental for realizing natural human-computer interaction and can enhance communication between cultures and individual people. Benefits for disabled readers are remarkable. The rise of digitalization and social media are opening new possibilities for business applications, e.g. in following trends, developing marketing strategies, grouping feedback messages and documents, and predicting customer churn. It is easy to predict that text analytics and speech recognition will make our lives easier in the coming years.
There are few different approaches for performing the sentiment analysis for given text. Natural language processing (NLP) aims at high accuracy by mimicking human understanding. The raw text is parsed into words, whose roles and mutual relationships in the context of the whole sentence are resolved using grammatical and linguistic rules. The parsed text is then analyzed using statistical methods, neural networks, or various other supervised machine learning approaches. Interesting NLP projects are led in Stanford University: their sentiment analyzer can even understand negations and is reported to provide 80% accuracy for evaluating the sentiment of individual sentences.
The challenges of social media opinion mining
Considering the text produced in social media, the opinion mining is inherently inaccurate. Large part of the messages is full of abbreviations, typos, unusual grammar, short-living memes, and sarcasm. Their understanding requires general understanding of political and societal history, as well as stereotypes and the personal history of the discussion participants. Add to that the fact that even people don’t always agree about the sentiment, and it is clear that the opinion mining is a huge challenge.
When mining Finnish text, the situation is even more difficult because most development in the field is done for English. The NLP tools need to be readjusted for our special grammar and vocabulary. Google has put tremendous effort for the universal translation machine but it still cannot translate complicated Finnish sentences properly. The analysis may be easier than translating text, but nevertheless the algorithms must be generalized, modified, and readjusted for non-English languages. Increasing number of Finnish companies offer text mining B2B services, mainly for following social media trends and mining customer feedback.
Extracting information from individual social media messages is problematic, but fortunately for many applications it is not essential to dive that deep. An alternative approach for text mining is based on social psychology. It has been observed that our everyday language includes fingerprints of our personalities, central values and mental states in general. By counting and categorizing the words in the text, it is possible to get a grasp of, for instance, the positive/negative attitude, various emotion, and tendency for social intercourse. It's a bit like hearing a discussion in a language that you don't know very well but can recognize some words: after a moment you probably have an idea of the speakers' opinions and maybe even personalities.
One important issue in this approach is to have a large enough vocabulary, so that not too many sentences are required to derive the conclusions from the analyzed text. Once this is done, the approach is actually quite well suited also for non-English languages: because the focus is in the vocabulary and not the grammar, one can rely on automatic computer translations.
Vaalipöhinä sentiment analysis
The sentiment analyzer at www.vaalipohina.fi uses the psycholinguistic analysis software called Linguistic Inquiry and Word Count (LIWC) through the API provided by a Canadian company Receptiviti. It is a pioneering software that has been previously used in political context: It was applied in analyzing speeches of Al-Qaeda terrorists as well as the debates of political candidates prior to the presidential elections in USA. For instance, Barack Obama's and Mitt Romney's debate in 2012 could resolve their key differences, such as Obama's academic and Romney's business backgrounds, as well as their different interests for people and business. It was also used to analyze the political discussion in Germany during the weeks before the German federal elections; in this example the tweets were translated to English before the analysis.
Considering the solution used in www.vaalipohina.fi, we collect tweets from Twitter stream API using a set of search keywords that are related to the elections. The received messages are stored in a database hosted by Amazon Web Services. Every tweet is immediately translated to English by Google’s translator. Twice per hour the most recent translated messages are concatenated and analyzed using LIWC. The software provides 78 characteristics from the text, and in the site we show a few of them. For example, the Figures below present the intensities of money and work related discussions from March. The discussion contains of short-lived active bursts. The increased overall levels on 9.3., 10.-11.3., and 20.3. for the Finns Party (PS), the Centre Party (Keskusta), and Social Democrats (SDP) are related to the publications of their economical programmes.