Prof. Dr. Helena Mihaljević, Professor for Data Science and Analytics at HTW Berlin © HTW Berlin / Alexander Rentsch

01 July 2024

"A central concern is to involve civil society stakeholders in the co-development of the technology."

The coronavirus pandemic and the subsequent vaccination debate, the Russian war of aggression in Ukraine and the West's political response, or the alleged influencing of elections worldwide. For years, civil society actors have been faced with the challenge of capturing and analyzing the growing amount of online content from anti-democratic movements. Conspiracy narratives pose a particular challenge in this regard.

Under the name "Analysis and research software for AI-supported analysis of anti-democratic movements online (ARAI)", a team from HTW Berlin is working with democ e. V. to develop an AI-supported open source technology that can search and analyze not only texts, but also images and sound on Telegram. In an interview with #ai_berlin, project manager Prof. Dr. Helena Mihaljević talks in detail about the technological approach behind the BMBF-funded research project, the topic of open source, ethical principles of AI systems and gender bias.

Prof. Dr. Mihaljević, what made you decide to specialize in data science and analytics, and what inspires and motivates you most about your work?

During my work at FIZ Karlsruhe, I was involved in various applied research projects - for example, we developed methods for disambiguating authors of scientific publications or search engines for mathematical formulas. These projects sparked my interest in data science, a versatile and interdisciplinary field in which I can put my skills as a mathematician to good use.

Data analysis and data-based models are used in many different areas. This opens up the opportunity to deal with different problems and applications. I am particularly interested in projects in which I see a high level of social relevance. The complexity of working with data is another motivation. Anyone who has worked in this context knows how many dimensions a data science project has and how important it is to keep critically scrutinizing the individual steps. Today's models and technologies in the field of machine learning and NLP are powerful tools that can be used in a variety of ways. But as soon as you leave the controlled laboratory setting in which many procedures are developed and evaluated, you are confronted with a multitude of challenges that require interdisciplinary and creative solutions.

I enjoy working with experts from different disciplines and learning new things every day, especially from the application domains. This diversity and constant development make my work particularly exciting and fulfilling.

First of all, congratulations on the selection of ARAI as one of the BMBF-funded DATI pilot projects. How did this come about and can you explain how AI can contribute to supporting civil society in general?

ARAI is about developing an open source tool for AI-supported analysis and monitoring of conspiracy ideology content on the messenger service Telegram. This work is being carried out by numerous civil society organizations and journalists in order to monitor anti-democratic, often far-right movements and to be able to react to corresponding developments. We see the impact that conspiracy theories can have, especially during crises - be it the COVID-19 pandemic or wars, but also events of major political significance such as protests or elections. The tool we have developed will be continuously evaluated in collaboration with civil society actors, and we want to actively involve them in the development process.

In one of our previous projects, we researched digital hate speech in the context of the COVID-19 pandemic. In another project with Matthias Becker from the Center for Research on Antisemitism (ZfA), we worked on the automated detection of antisemitic statements in mainstream online media. These projects included the development of annotation schemes for conspiracy narratives and antisemitic speech in social media texts, the creation and annotation of data sets, the training of detection models and the evaluation of existing technologies.

We have already cooperated closely with various civil society actors. It became clear that there is a great need for better technological solutions, especially for monitoring relevant channels and communities on platforms such as Telegram. The volume of messages can hardly be handled manually. In addition, simple methods such as keyword-based searches are often inadequate, as words can have different meanings depending on the context, users encode their statements and new trends bring with them a new vocabulary. Many civil society actors find the rapid pace of technological development difficult to access, which impairs the efficiency of their work.

This is why we have joined forces with democ e. V., an association dedicated to documenting and analyzing anti-democratic movements. democ is closely involved in civil society and has technical expertise in the areas of online monitoring and the development of open source software.

How does the ARAI project deal in detail with the challenge of identifying and combating the growing amount of anti-democratic online content and conspiracy theories?

In a previous project, we developed a model for detecting conspiracy theory content on Telegram. To create the training data, we deliberately chose an approach that avoids selecting data using specific keywords. The model thus learned to recognize more than just a few pre-specified conspiracy narratives. Our results show that the model's performance degrades only slightly when applied to channels outside the training dataset, as well as to news at later time points and other topics. We will integrate this model into the ARAI project.

In addition, we will include components for multimodal analysis, as content in social media is increasingly communicated multimodally. Depending on the platform and community, images, videos or audio recordings are increasingly being used. In order to do justice to this, we are planning to use methods that enable users to search multimodally, cluster content and identify trends. This will allow us to not only capture text-based content, but also analyze complex multimodal data, which is crucial in order to take into account the diversity of communication forms on modern social media platforms.

The project follows an open source approach. What advantages do you see in this openness for the development and dissemination of AI tools in civil society? Is this the right way to develop future tools?

By making the software available as open source, we want to promote collaborative further development. This enables different actors to adapt the technology and use it independently. Research projects usually have a limited duration. I therefore see an open source model as a mandatory prerequisite for enabling long-term maintenance and usability of the developed software. Another key advantage is transparency, which is particularly important when it comes to data-driven applications.

A central concern of our project is to actively involve civil society actors in the co-development of the technology. AI models are not static - they need to be continuously developed in order to remain relevant. Users of the technology are best placed to judge when a model is losing its performance. By creating technologically suitable feedback options, we can incorporate users' daily experiences into the training and deployment processes.

In our opinion, this active involvement of users can only be realized through technologies that promote openness and collaboration. An open source approach supports precisely these values and is therefore the right way to develop future tools with social impact. It enables flexible adaptation to ever-changing requirements and creates a platform for collaborative innovation and continuous improvement.

In view of the complex ethical issues raised by AI and new regulations such as the EU's AI Act, how do you ensure that the technologies you develop comply with ethical principles and promote trust in society?

Our software aims to contribute to strengthening democracy, especially in the online space. We focus on Telegram because it plays a crucial role in the spread of conspiracy ideology content; the platform is particularly popular with many anti-democratic actors, such as far-right and Islamist groups, due to its virtual lack of moderation. As already mentioned, our decision to make the software available as open source contributes significantly to transparency. This gives the public the opportunity to view the source code, check how our models work and suggest improvements. By actively involving civil society actors in the development process, we also ensure that different perspectives and needs are taken into account, especially with regard to ethical issues. These stakeholders help us to identify and address potential negative impacts at an early stage. We rely on regular feedback from users to continuously improve the models and ensure their performance and ethical appropriateness. However, we believe that the need to create transparency must always be balanced against the risk of misuse of technologies and data sets. With this in mind, we pay attention to potential risks when providing technologies and strive to minimize these as far as possible. In addition to the ethical issues, we will deal intensively and continuously with the relevant legal regulations in the project and take these into account when developing the components.

In your research, you also deal with gender bias and diversity in AI systems. How do these problems manifest themselves in practice and what approaches are you taking to address them?

In two earlier projects, we looked at technologies for personnel selection that increasingly use machine learning algorithms. These technologies cover a wide range of tasks, from the automated creation and improvement of job advertisements (augmented writing), CV parsing and candidate ranking to the psychological profiling of applicants based on videos, audio recordings or texts.

The HR sector is of central importance, as it determines whether people can find employment, secure their material existence and realize their professional potential. It should therefore be our social aspiration that these technologies reduce and not reinforce existing potential for discrimination in personnel selection. The EU's AI Act also classifies personnel selection technologies as high-risk applications. Gender-specific disadvantages continue to play a significant role here.

Our first project investigated, among other things, how technologies in the field of augmented writing can address gender bias and implement it technologically. We found that different technologies that pursue the same goal - namely to evaluate and possibly improve job advertisements with regard to their exclusion potential for women - produce very different results. This raises the fundamental question of which technology delivers the best results in practice and how practitioners can make an informed decision about its use.

Ideally, these technologies should be tested in practice and their effects measured. However, this poses a major challenge, as such experiments require personal data and there are no clear recommendations on how to implement such scientific studies. The fundamental question of how AI-based (or otherwise complex algorithmic) technologies for personnel selection should be audited is far from clear. This concerns evaluation concepts and metrics as well as legal frameworks and the question of how relevant diversity characteristics - in particular migration background or migration experience - should be collected. In a second project, we have therefore developed a data trust-based concept for auditing such technologies, which ensures the participation of various stakeholders.

How do you see the AI ecosystem in Berlin and the overlap between research and industry? What makes the location so special?

Berlin is a diverse AI ecosystem. The city offers an impressive density of universities and colleges that conduct intensive research into AI, as well as numerous companies and start-ups that develop and apply AI technologies.

There are a large number of institutions in Berlin that bring together different spheres and promote the exchange between research and practice. I would particularly like to highlight the Einstein Center Digital Future, with which I am associated. It creates opportunities for interdisciplinary collaboration between scientists and supports the transfer of research results into practice. This connection between science and industry is crucial in order to develop practical and applicable solutions.

Other institutions also play a central role in this context. The Technologiestiftung Berlin and its CityLab in particular are valuable partners with whom I have already collaborated on several projects and continue to do so. These institutions promote the integration of technological innovations in urban and social contexts.

Berlin is also home to the Weizenbaum Institute for the Networked Society and the Alexander von Humboldt Institute for Internet and Society (HIIG), both of which make important contributions to research into the social and ethical implications of digital technologies. The increasing networking efforts in the field of digital humanities in Berlin and Brandenburg are also promising and promote regional and interdisciplinary cooperation.

Finally, what advice would you give to young data scientists who are interested in integrating social and ethical aspects into their work?

There is a growing research community that is working intensively on the ethical aspects of machine learning and AI. Many researchers in this field work at institutions in Berlin. I recommend keeping an eye on the numerous events organized by these institutions. These offer excellent opportunities for further education and to get in touch with the community. Institutions with a civil society mission are increasingly employing data scientists. This opens up the opportunity to apply their skills directly in projects with social relevance in a practical way. I find interdisciplinary collaboration particularly important in order to incorporate different perspectives in a practical way. Combining technological knowledge with social science, legal and ethical approaches is important in order to develop more responsible solutions.

Thank you very much for the interview.