The purpose of this document is to get you familiar with the Sentitool tool, available as a web application at https://sentitool.sentimenti.pl. The service allows to recognize emotions expressed by texts according to the statistical user in 18 available languages, in particular: Polish, Czech, English, French, Italian, Portuguese, Spanish, Danish, Estonian, German, Norwegian, Russian, Swedish, Danish, Finnish, Greek, Slovenian, Turkish. Access to languages and other options described in the manual depends on the user’s account level.
You can analyze longer articles, as well as short mentions, including social media posts. You also have the option of emotive analysis for each paragraph of the analyzed text separately.
The tool’s primary function is to automatically add metadata to texts in the form of 11 values:
1. Emotions
Anger, Fear, Sadness, Disgust, Surprise, Expectation, Joy, Trust
2. Sentimenti
Positive Sentiment, Negative Sentiment
3. Arousal
Logging
Logging into the system requires a username, organization/company name and password.
To the name of the organization/company are assigned the available analysis methods and the accounts of the users who have access to it. In turn, an Archive is assigned to a user, where all files analyzed by that user are available.
Available services (methods of analysis)
Within the most extensive account, the following mechanisms for emotive evaluation in text are available (see table below for broader descriptions).
- Aggregators – the simplest models, aggregating data, varying the amount of data,
- Classifiers – zero-one models, differing in the way the data are analyzed,
- Regressors – the most advanced models.
Other account types may include fewer analysis methods to choose from, or give the user access to only one method.
After logging into the system, the user sees a screen that allows the user to select the analysis method (drop-down list), the type of file to be analyzed, and start the analysis.
The user view allows the user to load a file (the panel on the left) for analysis and to view the archive – the analysis history of a given user (the panel on the right).
Loading and processing the source file
A drag & drop window allows you to load a file for analysis. The tool accepts XLSX (XLS) files containing at least one column of text (consecutive mentions in rows) or TXT files collected in ZIP (one or more TXTs, can be in subfolders).
After loading the XLSX file, the service (Service), worksheet and column selection windows with texts for analysis become active. All of them open single-selection lists: methods, sheets and columns. Only after indicating which method and which data are to be analyzed (indicating the appropriate column) it is possible to start processing, i.e. automatic analysis of emotions by the selected method. Once the analysis begins, the file is transferred to the archive.
It is also possible to delete a file loaded on the server, for which the analysis has not yet started – a trash can icon next to the green start icon.
The output files containing the results of the emotion analysis are always in XLSX format, containing columns with the results (8 emotions, 2 variables describing sentiment and, in addition, emotional arousal). These columns are then docked to the columns of the file subjected to the analysis (or placed in a file created during the processing of TXT files). In the case of TXT analysis, each row of the resulting table is additionally marked with the name of the TXT file that was analyzed (the first column of the resulting file is automatically created).
Loading a ZIP file proceeds similarly to XLSX described above, but instead of selecting a sheet and column, it is possible to select the “multi-paragraph analysis” option. Then not only the entire text is analyzed, but also its individual paragraphs defined as fragments separated by end-of-line characters (“\n”, or “enter”). In the resulting XLSX, each paragraph will be numbered and classified as “first”, “last” or “middle” (all other paragraphs).
Files on entry
After loading the XLSX file on the server, select the sheet and column with the text to be analyzed. The Excel file must have at least one column having a name (in the first row of the file containing any data), which contains the text to be analyzed. In addition to this, the sheet can contain other columns with any data, including the date or time of publication of the mention. This data will not be lost or overwritten.
If, when selecting a column, blank lines or names appear in the list that were not in the XLSX we are viewing, it means that the file was not loaded correctly or contains errors. After selecting the sheet and column with data and clicking on the green start icon, the analyzed file is moved to the archive list, where you can view the results after analysis.
TXT files must be saved with UTF-8 encoding and collected into a ZIP file, which may contain subfolders – then the address of the TXT file (subfolder or nested subfolders) will be included in its name indicated in the resulting XLSX.
Once the file is loaded on the server and the analysis is initiated, the status of the file loaded on the server will change from INIT in the “Download” column to a count of consecutive mentions processed (e.g. 64/2158). When the processing of mentions is complete, a green icon will appear, indicating that the results in XLSX are ready for download. You can then download the resulting XLSX or view the summary and visualizations.
In the archive view, you can follow the progress of the file’s processing (the “Entries” column) or stop its analysis (if, for example, you have selected the wrong sheet or the wrong column for analysis). You should then press the yellow icon active only while the file is being processed. Once the analysis is complete, the results can be deleted from the archive (red trash can icon) – then they will no longer be available for download. After clicking on the “i” icon, one goes in turn to the visualization of the results.
Single text analysis
A single text can also be analyzed. You need to type it or copy it into the text box in the “Single text” tab, select the analysis method and click the start icon. In the case of a single text, the results cannot be downloaded, but a summary is available in the visualizations (the “i” icon in the archive).
Output files
For XLSX input file
The results of the emotion analysis are placed after the columns from the original XLSX file and saved in the same format, with the name of the analysis method added to the input file name. An example of the resulting file is shown below:
The results are in the form of 11 columns – 8 with the results of the analysis of emotion, 2 of polarity (positive and negative) and arousal (emotional overtones of the mention and to what extent). In the case of the classifier, an annotation of 1 indicates the presence of a feature (a given overtone, a given emotion), and 0 its absence. Aggregators return numerical values in the range of 0-1 (interpreted as the intensity of the emotion from 0-100%), and the regressor from 0 to 100% (as shown above).
The screenshot posted above shows the results of emotion measurement in the texts placed in the first visible column (“videoTitle”). The “service” column contains information about the service. The “percentage of emotive words” column applies only to aggregators and provides information on how many of the tokens comprising the text had any emotive annotation in the given base. The “content” column, on the other hand, is a copy of the data from the analyzed column and is the only place where the result file retains information about what texts describe the results of the emotive analysis.
For TXT input file
The results of the emotion analysis for all TXT files collected in a single ZIP (broken down by paragraph or not) are returned in an XLSX file generated by the analysis tool. It has the same structure as the results added to the processed XLSX file. In the case of TXT analysis, paragraph by paragraph in the resulting XLSX, each paragraph will be numbered and classified as “whole”, “first”, “last” or “middle” (all other paragraphs). The XLSX name is created based on the ZIP file name with the service name added, and the file name includes the TXT name (as below) preceded by the folder name and a “/” character if the ZIP contained subfolders.
When using the aggregator (statistics.xlsx file below), the results of the emotion analysis include information on the percentage of emotive words (i.e., those that have an emotive annotation and express at least one emotion).
Files in the Archive
The archive stores all files processed so far by a given user. Its structure is organized chronologically, with the most recent files at the top of the list.
As you can see, it collects all the results from any text processing models. Here you can download XLSX with the analysis results (green icon). You can also delete the file from the archive, in which case you will not be able to download it again and it will disappear from the archive list (trash icon). From this list, you can also access the summary and visualization of the results from a given file (blue “i” icon).
The archive allows you to track the processing progress of the file just loaded on the server and being analyzed. The file analysis can be interrupted and canceled (yellow icon), in which case no resulting XLSX will be produced.
The archive allows you to view the analysis history, including dates, methods used and analysis setting specifications for the input file.
It also allows you to filter files by their name, sheet or column name and method used (“Filter” field above the list).
Visualizations of the results
From the archive view, you can navigate (dark blue “i” icon) to a summary and visualization of the results from a given file.
Restrictions on data processing
The maximum size of a file sent for analysis by the Sentitool application is 5MB. For larger files, it is necessary to prepare them, i.e. split them into smaller ones, before sending them to the application.
Sentitool allows one file to be processed at a time by one user.
Individual accounts within the Sentitool application have built-in limits on the amount of data that can be analyzed on a given day. The limit applies to all XLSX files, ZIP files and individual texts in total.
The number of mentions in a given file can also be limited, and only that number will be analyzed, regardless of the actual length of the files analyzed.
The summary is divided into a description of the results (left), where there is information about the file and the service, and the results – averages in the case of the regressor and aggregators, totals in the case of the classifier. Two types of graphs are available. The first chart (top) always summarizes the 8 primary emotions, the second one summarizes sentiment and arousal (below).
Appendix: description of the Sentimenti tool
Sentimenti tools allow automatic analysis of emotions in texts in available languages. The basic function of the tool is to automatically assign 11 values to texts as a result of sentiment and emotion analysis. The following elements of the text’s overtones are taken into account:
Anger, Fear, Sadness, Disgust, Surprise, Expectation, Joy, Trust, Positive Sentiment, Negative Sentiment, Arousal
The first eight values are core emotions from Plutchik’s model. Each has a different quality and is associated with different behaviors – for example, revulsion with avoidance, anger with aggression, trust with a sense of security.
Complementing the emotion model is the sentiment model described by the last three values: positive and negative sentiment and arousal. Mentions that are explicitly negative or positive primarily express one type of sentiment, but a text can be ambivalent (express positive as well as negative sentiment in similar intensity) or neutral. Arousal can be understood as the overall intensity of all emotions in a text, its “temperature.” Neutral mentions have low arousal, while emotionally charged ones have moderate to high arousal.
Sentimenti’s emotion analysis was based on the results of a study in which words and texts were evaluated by a representative group of Polish language users. This means that its results reflect the reception of texts according to a model speaker of the chosen language.