30 October 2024

“AI should not only help to disseminate information more quickly, but also to check its validity immediately.”

At a time when artificial intelligence (AI) is becoming increasingly integrated into everyday life, a key question arises: What role does AI play in the spread of disinformation, and how can we counter this threat? In particular, narrative disinformation amplified by AI systems has the potential to deepen social divisions and undermine trust in democratic processes. Dr. Vera Schmitt and her team at the TU Berlin are researching innovative approaches to identify and combat AI-based disinformation. In this interview, we want to learn more about the current challenges, promising strategies, and how research, politics, and society can work together to preserve the integrity of our information landscape.

Dr. Schmitt, you head the XplaiNLP group at the Quality and Usability Lab at TU Berlin. How did you personally discover your interest in research into disinformation narratives and AI? What fascinates you about this topic and what are you currently working on particularly intensively?

The subject of truth, reality and disinformation has occupied me for a long time. Even in my early youth, I was particularly interested in the ancient philosophers, Kant, Nietzsche and Wittgenstein, who dealt intensively with the concept of truth. How do we deal with our sensory perception of the world and the world of ideas? How can we be sure that our perceptions correspond to reality? And how can we determine whether the statements we make on the basis of our observations actually correspond to reality? These questions have stayed with me, and during my Ph.D. I founded the XplaiNLP research group to get to the bottom of some of them.

Within the research group we have two main focuses:

The first is how we can use natural language processing methods to identify misinformation and disinformation. Stylistic features such as hate speech and highly emotional content are often indicators of such misinformation. At the content level, we also compare statements, social media posts, and articles with known knowledge and established facts to determine whether they are misinformation or disinformation. We look not only at individual statements, but also at disinformation narratives that are often repeated across cultures.

Another focus of our work is the explicability of AI to make the language models we use to detect disinformation transparent. We focus on meaningful explanations in natural language to convey the model results in an understandable way to users of our tools with different levels of knowledge. We also study model-based hallucinations and work to minimize them so that the results of the models accurately reflect reality.

The capabilities of AI, especially in the area of text generation, are used daily to spread disinformation. The next US election will take place in November and it couldn't be more polarizing. What observations have you made from afar? How is AI being used specifically in the run-up to the election and are there measures in place to expose this misinformation?

In recent years, we have seen a lot of targeted disinformation in the run-up to elections, as well as conflicts and wars, such as in Ukraine and Israel. This trend is particularly evident in the run-up to the next US elections in November, which are already considered to be extremely polarized. AI-powered models, such as generative language models, are being used on a large scale to generate content - often personalized disinformation campaigns targeted at specific demographics. This ranges from fake social media posts to automated news articles that inflame emotions or deepen existing divisions. Most problematically, the generated content is often difficult to distinguish from authentic information because it appears plausible and professional on the surface. During the ABC debate, Donald Trump claimed that immigrants in Ohio were eating pets. This claim can be very clearly and quickly debunked as false. But it is often much harder. The question of whether something is true or false often cannot be answered with a simple "yes" or "no," but we also often have to deal with exaggerations, embellishments, and omissions of details that make an objective assessment of a situation very difficult. The proliferation of realistic deepfakes created by models such as DALL-E, Midjourney, and Sora also leaves a lasting impression. Disinformation has taken on a new dimension, leading to considerable uncertainty as we increasingly distrust images, videos, and even audio recordings. The image of Taylor Swift as a supporter of the Trump campaign is one of those examples where there is a lot of uncertainty.

Disclosure of misinformation is already underway and is actively promoted by initiatives such as the International Fact-Checking Signatories and fact-checkers such as DW, dpa and BR. However, these fact-checkers often reach readers too late or not at all, especially as many young people get their information mainly through social media. The Digital Services Act (DSA), which regulates social media platforms such as TikTok, Facebook and X, could play a key role here. Among other things, the DSA regulates content moderation on social media. This means that content distributed via social networks is consistently checked for facts and deep fakes. For content moderation, AI tools, such as those developed by our research group, can be used specifically to detect misinformation and disinformation in order to significantly support the fact-checking process and make it more effective. Effective collaboration between humans and AI is particularly important for detecting disinformation and deepfakes. AI can provide targeted support to journalists and content moderators to detect disinformation faster, more comprehensively, and in a more targeted manner.

You are working on the news-polygraph project. Can you tell us more about what this project is about? Which partners are involved, which technologies are used and how does the project help in the fight against disinformation?

The news-polygraph project is developing a platform to help journalists identify disinformation in images, video, audio and text. The research alliance consists of various partners from the media sector (DW and rbb), industry partners (Ubermetrics Technologies GmbH, delphai GmbH, Crowedee GmbH and Transfermedia GmbH) and research institutions (Fraunhofer IDMT, German Research Center for AI and the XplaiNLP group at TU Berlin). Together, we are working on a platform that helps journalists in their daily work to analyze information from different modalities (such as image, audio and text). The AI tools developed in this project must not only be robust, but also transparent, so that the model results are clearly understandable for journalists.

Recognizing synthetic text or synthetically generated sentences with ChatGPT or other large language models is nearly impossible when only small pieces of text or single sentences are available. Therefore, when analyzing text-based disinformation, we mainly focus on content and stylistic features. For content analysis, the XplaiNLP group builds a knowledge base of known facts and misinformation. Using this knowledge base, we can quickly compare whether the content to be checked has already been identified as misinformation or what the facts are about a particular statement or fact. We also analyze stylistic features, such as highly emotional content and political bias, to provide additional clues about the intent of the information being shared.

In audio deepfake detection, machine learning models are used to analyze extracted audio features and look for patterns and inconsistencies in timing, frequency, prosody and articulation that differ from natural human speech. For example, synthetic speech often exhibits irregularities in timing and spectral content (how frequencies are distributed over time). Models can use temporal analysis to check for unnatural pauses or rhythms, and spectral analysis to identify abnormal frequency patterns not normally found in human speech.

Deepfake detection in image and video uses various machine learning techniques to determine whether specific pixel areas or the entire image or video have been synthetically generated or manipulated. There are various models and approaches, such as vision transformers, autoencoders, or spectral analysis, that analyze specific features and anomalies in the image data. These approaches help to detect synthetically generated content or manipulation, for example by identifying irregularities in textures, lighting, or motion created by generative models such as GANs.

Transparency and explainability play a central role in all of these tools, so that journalists can actually understand and use them. That's why we in the XplaiNLP group are increasingly working on effective explanations to enable the reliable use of AI tools.

In addition to news-polygraph, you are also working on the VeraXtract project . What are the aims of this project and how does it differ from news-polygraph? What are the most important insights you have gained so far?

The VeraXtract project grew out of questions that arose primarily from research in the News Polygraph project. The focus is on gaining a better understanding of disinformation narratives. The goal is to create a comprehensive overview of misinformation and disinformation in order to make it easier for citizens to access this information.

In our research so far, we have found that many hoaxes and disinformation are often repeated and spread in slightly different forms in different countries. By aggregating these falsehoods and disinformation into narratives, we provide a quick overview of the disinformation currently circulating, both regionally and internationally. We not only work with textual data, but also develop a combined analysis that integrates audio, image, and video data.

With news polygraph, we want to support journalists with AI tools that have completely different requirements for their research and depth of analysis. In VeraXtract, we are working on an aggregation of disinformation in the form of narratives that create an overview and go into less depth. This overview is necessary to provide better information about disinformation and also to be able to more quickly assign statements to existing disinformation narratives. The results of the VeraXtract project will be integrated into a publicly accessible platform.

At first glance, News Polygraph and VeraXtract appear to be pursuing similar goals. How do these two projects interact? Are there concrete synergies that arise from their collaboration?

In the long term, the knowledge and experience gained from VeraXtract should help to strengthen society's resilience to disinformation. The solutions developed also have potential for use in content moderation in social media and could serve as the basis for new standards in the fight against disinformation in the future.
The VeraXtract project differs in its focus and use of AI tools. In the news polygraph, we want to support journalists with AI tools that have completely different requirements for their research and depth of analysis. In VeraXtract, we are working on an aggregation of disinformation in the form of narratives that create an overview and go into less depth. This overview is necessary to provide better information about disinformation and also to be able to more quickly assign statements to existing disinformation narratives. The results of the VeraXtract project will be integrated into a publicly accessible platform.

According to a recent study by Interface, only 22% of all AI specialists worldwide are female. VeraXtract is supported by the BMBF funding priority “AI junior research groups led by women”. What goals are you pursuing with your project in terms of gender imbalance in the field of AI?

In general, I first try to keep an eye on the suitability of applicants for certain positions and tasks. If they are equally suitable, I try to promote female candidates. We have just hired two new female Senior Researchers, who are also taking on more responsibility and team leadership for the NLP and XAI teams within the Group. However, the suitability of the applicants is clearly in the foreground.

Berlin is considered one of the leading centers for AI research in Europe. How do you experience the work and collaboration between research institutions in Berlin, particularly with regard to explainable AI (xAI)?

There is some cooperation between the various research institutions such as TU Berlin, HU Berlin, FU Berlin, as well as major research institutes such as the German Research Center for AI and various Fraunhofer Institutes. These institutions often work together on an interdisciplinary basis, which facilitates the exchange of knowledge and resources.

There is networking between various players in the field of explainable AI. Here, we in the XplaiNLP research group also work together with Fraunhofer HHI and BIFOLD as part of the FakeXplain project and benefit from collaboration with partners from other institutions.

Sometimes, however, the focus on certain issues gets lost, and there is a lack of a body or organization that keeps an eye on the different players on certain issues and brings them together again and again. Within academia, we regularly network at conferences where we present our research, but we don't often come into contact with organizations and companies unless we actively promote such collaborations. Of course, as scientists, we try to actively participate in and initiate various types of events on a regular basis. A good example is the EU DisinfoLab, which brings together a wide range of stakeholders at least once a year to identify and combat disinformation (EU DisinfoDay).

Last but not least: How do you envision information transfer and validity checks ideally taking place in 2030? What role will AI play in this?

In 2030, I envision information dissemination and validation as a seamless, automated, and highly personalized process in which AI plays a central role. Ideally, AI will not only help disseminate information faster, but also check its validity instantly. Platforms where news and content is shared could be equipped with intelligent AI-based systems that perform real-time fact-checking and automatically detect, flag or contextualize disinformation. Such an approach would allow correct information to be highlighted as it is consumed, and reliable sources to be automatically prioritized. Explanatory AI will play a key role, as it is not only about detecting disinformation, but also about helping users understand why certain content has been flagged as misleading or false. This builds trust in the technology and enables people to make informed decisions. In addition, AI-driven assistance systems could respond individually to the user's level of knowledge and needs, presenting information in an adapted form while checking facts and narratives for credibility. These systems could also work across modalities, checking not only text but also images, videos, and audio files for manipulation. Overall, AI will play an integrative role in the information landscape of 2030: not only will it reduce the spread of disinformation, but it will also ensure reliable and transparent information by acting as a constant fact-checker, identifying disinformation early on and educating the public about its dangers.

Thank you very much for the interview.