We all have hidden biases and prejudices that affect how we treat other people. The first step to deal with them is to identify them. This is the main idea behind PABST - the Prototyp Anti-Bias Screening Technology. PABST is a text analysis tool that looks at your outgoing communication to a selected group of people, and pinpoints in what way you communicate differently with this group. Do you question women more frequently? Or express yourself more negatively towards foreigners? Do you say “hurry up!” 10 times more often to interns? These are the types of questions we wanted to answer, as that insight can be the start to changing your attitudes and behaviours.
We built a service consisting of two parts. One part is the text analysis server. It’s a .NET API that takes two huge chunks of text as input. One is the “experiment” corpus and one is the “control” corpus which is the baseline for comparison. The output is two-fold: a list of the expressions used more frequently by the experiment group, and a sentiment analysis containing the overall difference in positivity. A message such as “I hate crappy problems!” carries a very negative score, whereas “Beautiful puppy rainbows” ranks high.
The second part was a plugin for Slack - our favourite office chat tool. We chose Slack because it has excellent APIs to extract the chat logs through, while containing enough conversational data to make an meaningful analysis (the Prototyp Workspace is currently up to ~200k messages). Building a custom “Slack App” allowed us both to build the UI needed in order to extract the data, and also allowed for the data extraction itself. The basic steps of the app were these:
- Start Analyses. User starts a new analysis, and decides on the name of the group of people to investigate their biases toward. Say for example: “Dog owners”
- Categorization. The user is offered a bunch of dropdowns containing other slack users, and is asked to point out those which are in fact Dog owners.
- Results. The user gets back the frequency analysis table and sentiment analysis. It could look something like this:
Text analysis is complex, but fun! It’s easy to try - but hard to master. We managed to implement a naive approach and get some results already on the first day of work. But you quickly run into issues and complicating factors such as: language separation, deciding smallest unit of measurement, choosing unit of comparison, and filter factors.
Just like working with machine learning, text analysis requires a lot of data. And I mean a lot. A conversation with 1000 words in the experiment corpus is not a lot of data, and will riddle a frequency analysis with freak outliers, such as when mentioning a colleague or brand name a few times.
The Slack API is quite nice to work with, although there are many boundaries for building exactly the UI you want for your Slack App.