Search for...

Why 500 Global and Nvidia Just Bet €91.5m on Deepinfra’s ‘Token Factory’

Why 500 Global and Nvidia Just Bet €91.5m on Deepinfra’s ‘Token Factory’, TheRecursive.com
, ~

The artificial intelligence gold rush has been defined by the colossal cost of training models, a high-stakes game played by a handful of tech giants. But as companies globally move from flashy demos to real-world products, a new, more persistent cost is coming into focus: the price of actually running these models. This process, known as inference, is where the theoretical power of AI meets the reality of operational budgets, often derailing projects before they can scale.

It’s a challenge that CEE-founded, Silicon Valley-based DeepInfra is tackling head-on, for which they have just secured a $107 million Series B round co-led by 500 Global and early Google engineer Georges Harik, with participation from Nvidia, Samsung Next, Supermicro, A.Capital Ventures, Crescent Cove, Felicis, Peak6, Upper90, and Bulgarian BrightCap Ventures.

The inference bottleneck is here

The investment brings DeepInfra’s total funding to over $133 million. It arrives at what the company’s co-founder and CEO, Nikola Borisov, calls an “inflection point”. For years, the industry’s attention was fixed on training. Now, two trends are forcing a shift.

Firstly, open-source models are rapidly approaching the performance of their proprietary counterparts, sparking a wave of innovation at a lower cost. Secondly, the emergence of autonomous agentic systems (which can require over 100 model calls for a single task) is creating a continuous, high-volume demand for computing power.

“Inference is no longer a thin layer,” Borisov explains. “It’s the system constraint that will define the majority of workloads. Most cloud platforms weren’t built for this always-on, distributed model, so we built DeepInfra from the ground up.”

A token factory, not just a cloud provider

Founded in 2022 by Nikola Borisov, Yessenzhar Kanapin, and Georgios Papoutsis, the engineering team that scaled the imo messenger app to over 200 million users, DeepInfra’s strategy is rooted in controlling the entire technology stack.

Their experience building latency-sensitive, global systems influenced their current mission. Instead of renting third-party capacity, the company owns and operates its own hardware across 8 data centres in the US. This vertical integration allows it to engineer for one specific workload: high-throughput inference. The company has seen explosive growth, scaling its processing volume by 8,000 times since its seed round and now handling nearly five trillion tokens per week, with nearly 30% of that volume already driven by autonomous agents.

Read more:  These Are Greece’s Biggest Funding Rounds for April

This purpose-built approach has attracted strategic backing from industry giants like Nvidia, Samsung Next, and Supermicro. DeepInfra works closely with Nvidia, deploying its latest Blackwell and upcoming Vera Rubin GPUs to optimise performance. The goal is to create what SiliconAngle has dubbed a “token factory”, designed to handle the relentless demands of production-scale AI without the unpredictable latency and ballooning costs associated with general-purpose clouds.

‘The lowest blended price on the market’

DeepInfra’s core appeal to developers is its afforadble pricing. By serving over 190 open-source models on its optimised hardware, it can pass significant savings to its customers. The company’s own benchmarks provide a compelling case. When running GLM-5, a powerful open-source reasoning model, DeepInfra offers the lowest blended price on the market at $1.24 per million tokens, a full 20% cheaper than the industry average. This cost advantage is critical for the new wave of “thinking models” that generate a high volume of internal tokens before delivering an answer.

For cost-sensitive startups and scaleups, this difference can be the deciding factor between a viable product and an unsustainable burn rate. The platform’s architecture includes clever cost-saving mechanics, such as discounted pricing for cached, or repeated, input tokens. Many applications repeatedly send the same instructions or context; by identifying and billing this static text at a lower rate, allegedly DeepInfra can materially reduce costs for common applications like multi-turn chatbots and RAG pipelines.

Why investors are backing the thesis

The investor lineup also reflects a growing conviction that the AI stack’s infrastructure layer will be as crucial as the models themselves. Tony Wang, Managing Partner at 500 Global, sees a clear need for specialised platforms as new AI-driven workflows emerge.

“Enterprises and developers building with open source and agent-driven AI need infrastructure that was designed to be flexible, fast and reliable,” Wang notes. “We backed DeepInfra because… we believe purpose-built inference infrastructure will be fundamental to the next phase of AI as compute was to the last.”

With the new capital, DeepInfra plans to expand its global compute capacity, with new locations planned for Europe and Asia, while continuing to refine its developer tools. As the AI industry matures, the focus is shifting from what’s possible to what’s practical. DeepInfra is betting that by solving the unglamorous but essential problem of inference cost, it can become the foundation for the next generation of AI applications.

Read more:  Czech-founded Resistant AI Secures €21.6 Million Series B to Combat AI-Powered Financial Crime

Help us grow the emerging innovation hubs in Central and Eastern Europe

Every single contribution of yours helps us guarantee our independence and sustainable future. With your financial support, we can keep on providing constructive reporting on the developments in the region, give even more global visibility to our ecosystem, and educate the next generation of innovation journalists and content creators.

Find out more about how your donation could help us shape the story of the CEE entrepreneurial ecosystem!

One-time donation

You can also support The Recursive’s mission with a pick-any-amount, one-time donation. 👍