The intersection of data management and machine learning is poised to revolutionize the way organizations handle information, and Prof. Dr. Sebastian Schelter is at the forefront of this transformation. With his recent appointment at the Berlin Institute for the Foundations of Learning and Data (BIFOLD) and Technische Universität Berlin, Schelter is tackling some of the most pressing challenges in AI today. From improving data quality in machine learning models to fostering responsible data practices, his work is set to have a lasting impact on both academia and industry. In his interview with #ai_berlin, Prof. Schelter discusses the motivations behind his research, the challenges of integrating data management with AI, the future of trustworthy AI, and how Berlin’s dynamic tech ecosystem plays a crucial role in driving innovation forward.
Professor Schelter, congratulations on your new role at the Berlin Institute for the Foundations of Learning and Data (BIFOLD) and Technische Universität Berlin. Can you share with us the journey that led you to this exciting new chapter in your career? What personal experiences or inspirations have shaped your path?
My research journey started many years ago, when I rebuilt the recommender system of Zalando as a working student. This initiated my curiosity for problems at the intersection of data management and machine learning and made me join the database group at TU Berlin. There, I finished a PhD on massively parallel data processing systems and had the opportunity to conduct two summer internships with IBM Research and Twitter in California.
After graduation, I was curious to gather real-world experience in a company and joined the newly formed ML lab of Amazon in Berlin, where I worked for several years on problems related to large-scale forecasting and data quality. After a while, I was a little frustrated with the short-term focus in industry and decided to go back to academia.
I was lucky to be able to spend several years abroad as a fellow at New York University and as an assistant professor at the University of Amsterdam. In 2024, the unique character of BIFOLD as an institute focusing on problems at the intersection of data management and Machine Learning (ML) made me come home to Berlin again, and I am very grateful for this opportunity.
Your work at the DEEM Lab (Data Engineering for ML Lab) is at the intersection of data management and machine learning. Could you share some of the unique challenges your research group faces and the goals you are most passionate about achieving?
Our research addresses data-related problems in ML applications that cause negative economic, societal or scientific impact. Despite all the hype around ML and AI, it is still very difficult and expensive to develop data-driven applications. Moreover, ML models often crash or give wrong answers when exposed to real-world data. Our goal is to lower the technical bar for working with data science technologies and to foster the responsible management of data.
A unique challenge that our lab faces is that we need to hire PhD students with expertise in both data management and ML, which is very rare. Fortunately, we are lucky with attracting talented people from all over the world so far :)
In light of the AI Act "Trustworthy AI made in Europe" has become the term of the hour. Do you see your work at BIFOLD and DEEM contributing to this agenda?“
Absolutely! We are already collaborating with law professors in a conversation on the technical implications of the new regulations for high-risk applications. We are also in the process of preparing a large survey study to better understand the current challenges of practitioners in this area.
Our aspiration is to design open-source data systems that make it easy for companies and organizations to build data applications that adhere to the digital rights of citizens. This is especially important for start-ups and SMEs, who often do not have the same resources as large corporations to compete in regulated spaces.
The "right to be forgotten" is a concept that resonates with many on a deeply personal level. How do you envision bringing this right to life within the realm of current data management practices, especially given the technical hurdles? What motivates you to tackle these challenges?
The motivation for this research comes from risks in the real world. Imagine a person struggling with alcohol addiction, who decides to stop consuming alcoholic products. Unfortunately, this person will still be exposed to ads and recommendations for alcohol products online, since the underlying AI models will have learned their preference for alcohol. Empowering the person to immediately adjust their recommendations via low-latency unlearning might reduce their probability of relapse.
We tackle the resulting technical challenges by “reinventing” proven database technologies for the ML world, like tracking the provenance of data through computations and efficiently maintaining the results of queries under data updates.
Following up on that, are there specific tools or frameworks you are developing that could assist organizations in implementing this right, and how do you hope these tools will impact society?
We have developed several prototypes of open-source recommender systems for e-commerce, which can “forget” personal data of users in milliseconds. We hope that the instantaneous deletion and removal of personal data with a simple click becomes a standard feature of every data-driven application.
Now that we've discussed your research foci, what attracted you to BIFOLD and Berlin? How do you see the regional AI ecosystem?
BIFOLD is one of the very few places in the world with a dedicated research focus on problems at the intersection of data management and machine learning and I am very grateful to be able to work with all the talented professors and students there.
I have always been attracted to Berlin due to its openness, its international character and its historical role in the heart of Europe. Furthermore, Berlin has a vibrant data and AI ecosystem, which is also the reason why leading global companies such as Databricks, Amazon, Snowflake, SAP or Confluent have offices here.
Looking ahead, what are your aspirations for your work at BIFOLD and the DEEM Lab? What impact do you hope to have, both in the academic world and on society at large?
I am working hard to make the DEEM Lab a place for great research, which brings together creative people from all over the world in Berlin. I hope to impact the world through the education of talents, and I hope that I can help my students to build great things and maybe even become professors themselves.
Thank you very much for all your amazing insights!