What do Sentimenti tools do? – an interview with Dr Jan Kocoń

Dr Jan Kocoń is a natural language engineer and the person behind the machine learning process within SentiTool, our solution for analyzing emotions in the text. Dr Kocoń coordinates the work of the linguistics team, integrates individual elements of the tool, and works closely with the IT team.

If you have to describe Sentimenti and the tools to anybody, what would you say first?

Sentimenti is a project meant to analyze emotions hidden in the text. Unlike competitive solutions that recognize the overtones of the text only (positive, neutral or negative), our tools manage to understand the text, assign specific meanings to the words in the text and name the certain emotions people feel about them. These emotions, in turn, provide the knowledge base for a machine learning mechanism that automatically recognizes emotions at the level of sentences and the whole text.

What does it mean that we analyse emotions in the text?

In the research carried out in our project we adapted the Plutchik model. It includes eight basic emotions: joy, sadness, trust, repulsion, expectation, fear, surprise and anger. We are able to estimate to what extent these emotions are expressed in the text.

How do we know what emotions people feel?

The knowledge base that helps our project includes more than 30.000 meanings of words, for which 20.000 unique respondents assign ratings for overtones and emotions. We are talking about “meanings” and not “words” on purpose, because words are ambiguous; for example “dark” means something different in “dark blue” or “dark people” and only in the latter case it carries emotions. Each meaning will ultimately receive 50 marks from different people. This allows us to know what feelings are evoked by certain meanings in the text. However, the emotion of the text is not a simple summation of the emotions assigned to the meanings in the text...

What else makes the emotion analysis tools in the text work?

Two things come to us to help. The first one is our gargantuan database of opinions. It came with associated overtones, derived from different areas: travel, medicine, products, services and more. We have over 10 millions of such texts in our database, which is an excellent source of information about the general feeling of the author. However, in order to find out what emotions a given text evokes in the reader, we also conduct our own research, analogous to research on single meanings.

This time the subject of these studies is the texts. The respondents attribute basic emotions to them, exactly the same way as they do with meanings of the words.

The second pillar of our Sentimenti tool is a combination of various machine learning methods. Experts in natural language processing provide us with tools for text analysis at the syntactic and semantic level, additionally they create rules for the analysis of meanings in context such as: negation, conjecture, weakening or strengthening of the overtones, etc. This is an additional help for automatic methods, such as deep neural networks, which are used to make the right conclusions about the emotions in the measured text.

What do you think automatic emotion analysis can be useful for?

Ultimately, I see many applications for our tools. The very first area that comes to my mind would be the marketing, or, more precisely, display advertising. This certain area covers the market of advertisements displayed in the context of web articles and is matching them with the emotions that the text of the publication evokes in readers. For example, in a sad text there could be an advertisement of an insurance company, and in a merry, joyful text there could be an advertisement for a trip.

Another area that we could cover is brand monitoring, i.e. analyzing how companies’ customers write on the Internet about a given company, its products and what emotions accompany them. Another interesting area could be sorting customers’ email complaints against the emotions contained in them, detecting conflicts arising in employee correspondence, detecting upcoming crises in Social Media, and even the possibility of diagnosing mental illnesses – the potential of Sentimenti tools is really huge!

What else do you plan to do in Sentimenti?

So far, there is a prototype ready with a simple text analysis on the level of meanings with an overtone analysis using our huge opinion resources. Currently in the Sentimenti team in Wroclaw I am managing to build a machine learning mechanism. It will make it possible to aggregate both information from the meaning knowledge base and information from the natural language processing stream. We are constantly receiving new data about the feelings of people reading certain texts, which are our teaching collection. The more data we gather, the better the quality of the tool there is.