‘People love to search – it’s our only way of interacting with an incredible quantity of data’. The numbers back up this observation by Dr Han Xiao, CEO and co-founder of the Berlin-based start-up Jina AI: Back in the year 2000, around 3.5 million Google searches were made per day. Today, that number has risen to five billion. That’s without counting all the billions of Tinder profiles, Amazon products and Spotify playlists that millions of people dig up on their phones and computers and with the help of virtual assistants on a daily basis. There’s no end in sight for this growth trend either.
Driven by artificial intelligence
‘When people talk about searching, they’re mostly referring to the good old-fashioned text search, like finding a sentence containing certain words, for example’, Xiao explains. In the age of Instagram, TikTok, YouTube, Twitch and Clubhouse, a huge amount of important information gets lost in unstructured data like images, videos, live streams and audio content. ‘These days there’s a growing need to search for multi- and cross-media content’, as this adoptive Berliner explains. Xiao was previously responsible for China’s popular app WeChat in his role as engineering lead at Tencent’s AI Lab before founding Jina AI in early 2020 with his partners Nan Wang and Bing He. ‘That’s why we need neural search, a completely new way of searching that’s driven by artificial intelligence’. Instead of teaching rules to a machine like with conventional search technology for it to ‘understand’ the data combed and deliver the best results, neural search nourishes a network of neurons with existing scenarios. This develops an ability to find relevant results whether they’re Tumblr GIFs, sentences on Wikipedia or pictures of Pokémon.‘The system learns the rules itself and gets better and better with time’, according to Xiao. This means that ‘with neural search, developers don’t need to rack their brains thinking up rules’, but the challenges involved are significantly greater than with text search. ‘You do need advanced knowledge of software engineering and AI’, he says, adding that ‘to be honest, this kind of technology isn’t something that just anyone can build from the ground up’. That’s not necessary, either. The open-source framework Jina has been available since May 2020 and even allows ‘developers who aren’t experts in machine learning to set up a search engine quickly’. A 3D video-game developer, for example, used the framework to construct an app that permits developers to add automatically determined elements to the game using their editor’s right-click menu. In another case, a European legal tech start-up’s chatbot was able to find answers to users’ questions by trawling countless PDF documents.
Dr. Han Xiao, Co-Founder Jina AI © Jina AI
Alongside its core product Jina, the start-up also provides developers with the marketplace Jina Hub and recently released two more open-source projects with DocArray and Finetuner to offer its users a more comprehensive experience to set up neural search. This is just the beginning: The company now has 45 employees, half of whom work at its Berlin location, and in November 2021 it was able to secure $30 million in finance as part of a Series A round. ‘There are a few factors to the winning formula: a strong team of founders, a world-class team of engineers, a promising and growing market, and lots of coffee’, Xiao says with a grin. It’s a formula that he hopes will help him succeed in the North American market too this year. With its clients currently spread across Europe and Asia, the company – which Forbes includes in its list of the 30 most noteworthy AI start-ups – wants to take root in the USA as well: ‘The developer community there is much bigger and stronger than anywhere else. If we can make a breakthrough there, we can really improve our software app’, he says, explaining why he plans on opening a US office. ‘Secondly, SaaS subscription models are very widely accepted by both companies and individuals there. If we want to grow quickly with our monetisation, the US is definitely the market to target’.
‘Most powerful app for natural language processing (NLP)’
‘Build on our position outside Europe too’ – that’s the wish of Milos Rusic for the next few years. Rusic is CEO and co-founder of Deepset, another Berlin-based start-up that, just like Jina AI, is tackling the ‘most powerful application of NLP – neural search’, as the company’s website puts it. The founder team, comprising Malte Pietsch and Timo Möller along with Rusic, started off building customised NLP solutions. It quickly became clear, though, that ‘for scalable, widespread adoption of NLP, you need a framework’. As the CEO underlines, ‘this framework has to be simple and flexible in order to help developers working on products build individual NLP solutions. The next logical step was to release our open-source framework Haystack, which makes precisely that possible’. Since then, over 500,000 users have downloaded the software and over 1000 organisations use the complementary tools for Haystack, including major international companies like Infineon and Alcatel Lucent as well as a great many start-ups. ‘There’s a wide range of potential applications, but the most common is for search and for answering questions. Most uses focus on providing end users with quicker access to relevant information in large quantities of data’, according to Rusic, who studied at the Technical University of Munich and the University of California, Berkeley. ‘That could be a bank that wants to get an overview of all the risks in a market quickly, a pharma company researching a new medicine that wants to collect all existing knowledge, or it could be a customer support centre that wants to give the right answer to the customer’s question as quickly as possible’. In mid-2021 Deepset also developed a question-answering and passage-retrieval data set in German, which is ‘designed to help researchers carry out more experiments and train new model architectures to improve NLP’s performance in German’, as the co-founder explains.
The team of Deepset with Founder Milos Rusic © Deepset
Open-source search engine for neural data
While Jina and Haystack provide frameworks for neural search, Andre Zayarni explains that Qdrant, the latest Berlin-based start-up in this field, is focused on the ‘technology that’s essential to make neural search possible’. ‘Our engine can be used standalone to create highly scalable applications or along with frameworks like Jina or Haystack’. The deep-tech company was founded back in October 2021, but Zayarni and his partner Andrei Vasnetsov only started working full-time on the start-up in January 2022, when their new enterprise received €2 million in pre-seed financing to improve practical AI solutions and make metric learning feasible. ‘Qdrant Engine is used with Neural Network Encoder, a special kind of neural network (NN) that produces vectors instead of the usual probabilistic classifiers’, in the words of CTO Vasnetsov, who came to Germany from Russia two years ago. ‘Thanks to modern developments there’s a wide range of ready-to-use neural networks for almost all kinds of unstructured data, from text to audio to video. Qdrant consumes the output of these NNs to allow users with a production-ready service to search for unstructured data with other unstructured data’. The Qdrant Engine was already available on open-source platforms last year. ‘We’ve had pretty good traction on GitHub so far, and what’s even more important is the positive feedback we’ve received from the developer community’, says CEO Zayarni, who started his career in Berlin in 2007 with StudiVZ, which at the time was the biggest social network in Germany. For the past five years he was working as CTPO as MoBerries, a job-matching platform that uses AI. ‘Our technology is already being used in production, and that’s what most motivates us’. The two founders are currently putting their team together and aim to bring the first business version of their neural search engine to market this year. ‘We’re also working on another open-source project that will push metric learning even further forward’, Andre Zayarni tells us. ‘We’re planning to release the first version this year. Besides that, we’re building on our developer community around our technology and we’re looking for partners to develop solutions with’. The Qdrant founders are already in touch with the team at Jina and say that they are collaborating on topics where the two companies overlap. Deepset and Jina AI have also already worked together.
Andrei Vasnetsov and Andre Zayarni, founder of Qdrant © Qdrant
Open source: The future of software development
The fact that these kinds of collaboration are possible comes down to Berlin’s open AI eco-system, according to Milos Rusic from Deepset. ‘Berlin’s bigger tech firms support open source, let their employees contribute to projects and also use the technologies themselves’, says the CEO, explaining that ‘we’ve got lots of open-source companies here that we’re in regular contact with’. It’s no coincidence that all three pioneering companies work in the domain of neural search: ‘We all use open-source technologies – you can hardly avoid them nowadays’, Andre Zayarni from Qdrant underlines. ‘Open-source software is eating away at company software. It’s the future. We don’t just use it, though, we create it too: Our core technology is open-sourced. The key advantage is that it builds trust because it’s open and freely accessible, and the community is involved in the world of development’.
They’re advantages that Han Xiao from Jina also recognises. For him, though, there’s one reason above all why ‘open-source infrastructure is the only way to build successful software and the only way to make Jina AI a successful software company: speed’. After all, infrastructure like Jina by its very nature has few end users than gaming software, for example. This makes feedback loops too long and too infrequent for the software to improve. That’s where the 3000 developers and 203 external contributors who now make up the community come in handy for Jina, as they constantly contribute to developing and improving the software. ‘We believe that open source and the community are what’s best about Haystack: We receive feedback, actively discuss improvements, and the community itself contributes directly to development and technical implementation’, Milos Rusic from Deepset confirms. ‘That way, we can guarantee transparent and continuous development that will be reflected by a high quality’. If you ask Qdrant CEO Zayarni, open-source projects up to now are just the beginning: ‘Admittedly, Germany and Europe aren’t world leaders in the field of open-source technologies, but that’s changing. Berlin’s open-source AI eco-system is growing rapidly. We’re definitely going to see lots of exciting new projects in this field’.