In today’s world, our eyes are confronted by an almost incomprehensible amount of data every day. And each year, figures are growing at an ever-increasing rate. So, how will it be possible to process these massive quantities in the future? Mobius Labs specializes in developing advanced multimodal AI that enables machines to see, understand, and interpret visual, audio and language data just like humans. We talked to CEO Dr. Appu Shaji to delve deeper into the world of multimodality, exploring its potential to revolutionize industries and enhance user experiences, and what it takes to turn Isaac Asimov’s fiction into reality.
Dr. Shaji, the market for AI-supported technology applications in the field of image, text and audio has gained momentum, and not just since the triumph of large language models. Can you briefly describe the vision and main goals of Mobius Labs and how your products differ from others on the market?
Human understanding and perception are multimodal, deriving from the interplay of vision, audio, and language. For example, when I read Isaac Asimov's books, there were robots performing all the tasks that humans did. The idea that we could build software capable of such comprehension always fascinated me, and that reality seems to be now, compared to it being mere science fiction. However, early AI seemed to be restricted to companies with large capital expenditure budgets. We are on a path to democratize it by building AI that is super efficient to run, deploy, and also open source. Our goal is to become a major player in AI server/device-side infrastructure.
You often talk about 'Superhuman Vision' as a key concept of your technology. Can you explain what this means and how this technology meets current and future customer needs?
This is the old moniker we used. We started off as a computer vision company and realized that in many areas, computers outperform humans (for example, if you need to classify or detect billions of images consistently). However, we have expanded our ambitions to other modalities like audio and language.
One core aspect remains the same: we are producing, processing, and utilizing a massive deluge of data, making it humanly impossible to make it searchable, recommendable, or actionable. That is where the superhuman skills of machines come in.
Mobius Labs relies on open intelligence and open source initiatives. How do you integrate these philosophies into your product development and what benefits does this bring for your users?
There are two major reasons, one structural/emotional and the other strategic/commercial. What makes AI research and development unique is that it comes from some fundamental work done by academia, where open and reproducible work are central tenets. Mobius Labs, being formed with a few high-quality researchers in the founding team, has this innate philosophy embedded in it. We really like the democratic and meritocratic nature of the world, where ideas are exchanged and proven in front of everyone. We are still quite passionate and committed to it.
But the more practical and strategic nature is how the AI landscape is set up. Just a year ago, closed-source AI systems like ChatGPT were miles ahead of their open-source counterparts. However, the gap is closing rapidly and is almost non-existent.
In such a world, Open source software throws in a lot of potential and end-customer value, like transparency, no vendor-lock in, customizability and full ownership.
Talent is also a major driver in core innovation in AI. Embracing OSS allowed us to collaborate freely. We are now collaborating with people around the globe (for example, engineers/scientists inside Meta, Hugging Face, Answer.AI, and many other projects). As a small startup, we cannot dream of competing with large proprietary solutions, but as an OSS community, we definitely can!
Additionally, the nature of the work we do is enormously important to the future of AI computation. Especially our work on making AI computation faster, smaller, and cheaper (details at blog.mobiuslabs.com/). It is truly a direction and step towards democratizing AI for people who do not have much capital expenditure to spare.
We believe that the AI enterprise stack (B2B) will be ruled by OS. We see similar dynamics to Linux/Microsoft. When Linux broke out in the late 90s, it still had its rough edges, but work done by the community took it to over 80% of the server-side market and circa 50% of mobile phones (Android). We are quite confident that the winners in the AI enterprise stack will be open source software.
Data privacy is a major challenge in the AI industry and especially since the passing of the European Union's AI Act. How does Mobius Labs deal with data protection concerns, especially in connection with the processing of visual data?
Yes, privacy is a core value we abide by. The philosophy is “privacy by design.” We achieve this by simply shipping code to the user (and not the other way around where third parties take the data to their servers). We never see or have access to customer data unless our clients explicitly ask us to. AI companies, especially, have been trying to make users their product by training models on customer/user-collected data. We only pre-train our models on public datasets that are licensed permissively and build tools that can customize/fine-tune to a specific use case with a very limited size of datasets. This is also an area of active research within the company.
Your company offers solutions that enable users to implement AI without programming skills. How does this work in practice, and which industries benefit most from these solutions?
This is part of an old-product line which did a lot of work on few-shot learning, coupled with a no-code AI training tool where users can train their own models. However, the generative AI stack has changed this a bit. We currently give information on how to finetune models and as a business case fine tune models for our clients.
You recently posted about your work on 1-bit machine learning models. Could you explain the benefits of this technology and how it improves the efficiency and accessibility of AI applications?
One of the major challenges of current AI solutions, if not the major challenge, is computation. In fact, the energy demands are so massive that it might be quite hard to generate power. So, we need to figure out ways to make the models efficient. The work on quantization is a step towards that.
To explain, most of these transformer models involve a lot of matrix multiplications with floating point numbers. But we can strip out decimal points so that less computation is required (i.e., 3*100 is way easier to compute than 3.1415926535 * 100.4123414). The extreme version is if numbers are all binaries, which is 1-bit (0s and 1s). Typically, all LLMs need 16 bits to store a number, and 1-bit is 16 times smaller. But interestingly, multiplications with binary are only additions (which is 70 times faster).
We still have more research to do to achieve the same accuracy with 1-bit models as opposed to 16-bit models, but it is a significant step in the right direction. For example, our 2-bit to 4-bit models are coming very close to full precision ones.
What role does Berlin and its strong AI ecosystem play for you? How is the exchange between start-ups, companies and research here?
Berlin is an interesting place due to its unique mixture of ideas and thoughts. The interactions I find most fascinating are not with tech builders but with the creative crowd (artists, photographers, DJs). Since AI and human behavior are intricately linked, these interactions are paramount and make our thinking very multi-dimensional. It is also quite a young city, with people coming from all over the world. Additionally, our office is at the Merantix AI Campus, which hosts many interesting AI companies and a vibrant community.
How do you see the future of computer vision in the next five years? What new areas of application or technological advances do you expect?
Embodied AI will become quite interesting (like the Isaac Asimov robots I talked about). AI will sense the world, encounter strange situations, and find solutions to them. We will have strong models that are constantly learning, with capabilities capable of running on a multitude of hardware.
Thank you for the conversation.