The AI compute shortage explained by Nvidia, Crusoe, & MosaicML

AI compute costs are eating up startups’ runway just as fundraising is getting tougher. How can startups navigate the shortage to get the compute resources they need? How should they shop for compute providers across different clouds? And how can the industry keep up with demand without exacerbating climate change?

SignalFire brought together leaders in the compute space for a real-talk panel at our SF headquarters to lay out how startups can build with AI without breaking the bank. The top takeaways include:

  • The AI compute shortage is being caused by the sudden spike in demand, complexity of building modern GPUs, and need for algorithmic solutions that enhance efficiency.
  • Startups and other compute buyers should use a multi-cloud approach, testing which use cases perform best with which providers rather than trying to dodge egress costs using a one-cloud-fits-none approach.
  • AI’s contributions to humanity make compute “energy well spent”, and new approaches to data center cooling and software-based efficiency improvements will reduce power consumption and climate impact.

An event flyer depicting the panelists at SignalFire's AI Compute Event

Here are all the top insights on AI compute from our discussion with Nvidia Chief Platform Architect for data center products Robert Ober, Crusoe Energy Co-founder and CEO Chase Lochmiller, and MosaicML Co-founder and CEO Naveen Rao, hosted by SignalFire’s AI Lab lead Veronica Mercado. And if you’re building something special in AI and want access to help with compute, recruiting, data science, and marketing, SignalFire would be excited to talk to you!

Is there a compute shortage? Yes, but not because there aren’t enough GPUs


It’s just that they’re all locked up in contracts. The hockey stick growth of ChatGPT and the AI sector in general has put massive stress on the whole semiconductor industry and supply chain. This has pushed companies that were ahead of the curve to reserve any available GPUs, which have doubled in price since 2020.  So while you might be able to get pricing for a spot instance, vendors can’t fulfill those allocations, and getting a cluster is next to impossible without inside connections.

Essentially, the software demand has massively outstripped our physical infrastructure for producing the hardware. Meanwhile, the complexity of chips, high-performance networking, and packaging has grown significantly, pushing prices and failure rates up, and yield down. 

“You can’t just press a button and build 10X more” –Nvidia’s Robert Ober


Ober from Nvidia says the big cloud business leaders are asking to suddenly increase production by 10X, but he emphasizes that “this is real hardware. You can’t just press a button and build 10X more . . . These are truly the most complex systems anyone’s ever built.” With demand and complexity growing quickly, scaling up compute manufacturing will take time. We’ll need ways to improve maximum performance by optimizing what Mosaic calls “model flop utilization” — securely intermixing users so a given piece of hardware is running all the time.

A photo of the panelists on stage at SignalFire's AI Compute Event

SignalFire’s AI compute event panelists (from left): Nvidia Chief Platform Architect Robert Ober, Crusoe Energy co-founder and CEO Chase Lochmiller, and MosaicML co-founder and CEO Naveen Rao, moderated by SignalFire’s AI Lab lead Veronica Mercado.

Algorithmic solutions may be our best hope of closing the gap between surging demand and lagging supply. Of course we’ll continue to need advanced packaging innovations and better chips so we get more performance per watt and have more compute to apply. But algorithmic innovations have outpaced hardware improvements of late, and are our best chance for doing more with less given the hardware shortage

How can customers optimize their AI compute spend? Experimenting across multi-cloud environments


Founders may be tempted to try to save money on training and deploying their models by configuring their own compute — building and running their own mini cluster garage network infrastructure. Instead, they’re likely best off turning to a vendor that lives and breathes efficiency. But the desire to home-brew compute shows a failure of the cloud ecosystem, where big clouds should get such efficiencies of scale that they pass on that no one would want to do it themselves. Unfortunately, some large cloud providers bundle in managed services that startups don’t actually need, and their cloud egress costs can be daunting.

Lochmiller of Crusoe says we’re suffering from the “Hotel California cloud model” — you can check in your data anytime you want but you can never leave. Obnoxious egress fees can bully startups into sticking with one cloud provider. But the improved fit and flexibility of using different, smaller, specialized providers for different use cases is likely to outweigh those egress fees. 

“Multi-cloud is of much greater value”
–MosaicML’s Naveen Rao


Due to differences in their internal network infrastructure, control planes, and instances, one cloud may be best for CNN inference, another for large language model inference, another for training a small model across a couple of nodes, and another for when you need 4000 GPUs. You might run training, workloads, and customer data in different places to get the best provider for each. And the compute itself is so expensive itself that the added fees are drops in the bucket. Startups can also use intermediaries that stream data across providers to seek out the highest efficiency. Rao from MosaicML said it found Amazon S3 was up to three times more expensive than the next competitor, so staying locked into a single name-brand cloud can be very costly.

A photo of the boba bar at SignalFire's AI Compute event

We served boba to keep everyone cool while discussing hot topics in AI

Rao breaks down the fallacy of egress and streaming costs, saying, “When you’re training a large language learner like MPT-7B, it costs $200,000 in compute. About $800 of that is streaming. It’s not that much, right? It’s less than half a percent. The flexibility to have multi-cloud is of much greater value to you than the loss of streaming.” So you and your startup should hunt around, check that you can actually get the instances you’re promised, experiment to see what gets the best efficiency where, and repeat the process as your needs change and scale.

How do we minimize the climate impact of AI compute? Energy usage isn’t bad, but it needs to be efficient


“If all this innovation accelerates the climate crisis, what’s the point?” Lochmiller declares. 

The compute- and energy-intensive nature of AI has raised concerns about how its environmental impact could hasten climate change, which also threatens to sour public perception and invite onerous regulation.

Lochmiller says data centers represent 1 to 1.5% of global power consumption today. That was forecast to grow to 8% by 2030, but with the AI boom, he says it will probably hit 10% sooner than that. Our panelists agreed that demand for AI has been consistently growing about 10 times per year, yet we’re only improving compute supply by three times.

One thing that’s not likely to save the planet right away is edge computing. Phones lack sufficient compute power to do learning at the edge, so tokenizing data to be sent to the cloud for processing will remain the norm. 

“Energy well spent” –Nvidia’s Robert Ober


Luckily, AI is latency-tolerant, since feeding tokens through a massive model already takes some time. This could lead to more data centers being built closer to where energy is cheap and abundant, since the added latency is less noticeable. New data center cooling systems that run cold water through copper pipes could also help by more efficiently transferring heat off of chips. Plus, this could reduce the use of energy-sucking HVAC units and high failure rate fans. Applying AI itself to designing more effective hardware could minimize electricity consumption.

SignalFire’s AI compute event speakers after the panel (from left): Crusoe Energy co-founder and CEO Chase Lochmiller, MosaicML co-founder and CEO Naveen Rao, Nvidia Chief Platform Architect Robert Ober, and SignalFire’s AI Lab lead Veronica Mercado.

But overall, it’s important to remember that “[powering AI is] energy well spent, because it allows us to do things that were impossible before,” from genomics to autonomous vehicles, Ober insists. 

Lochmiller concludes that “people often conflate this idea that using energy is bad, that we should use less. It’s actually the opposite. If you look at the correlation of the human development index to the amount of energy used, more advanced societies use more energy and it’s going to continue to be the case. It’s more a matter of how efficient we are.”

SignalFire is AI-native venture firm that’s been building and refining its own models for a decade. Our Beacon AI data platform helps our investors spot amazing founders and helps our portfolio companies recruit the best talent. Along with our seed-to-Series B investment practice, we recently launched the SignalFire AI Lab to pair technology leaders and sector experts with corporates as data providers, design partners, and initial customers. We’d love to hear about what you’re building! 


Announcing the SignalFire AI Lab

Today, we’re announcing the SignalFire AI Lab. We’re excited to propel the next wave of category-defining startups by providing the resources, capital, and credibility to help tomorrow’s AI leaders today. As an AI-native VC, we’re earmarking $50M to back this new program.

The massive potential of AI is inspiring top talent from across industries to explore building startups in the space. SignalFire’s own in-house Beacon AI shows that high-quality AI talent from tech giants are leaving to start AI companies of their own. To discover and test product opportunities, these innovators and sector experts need access to data sets, compute infrastructure, corporate design partners, recruiting technology, peer community, and AI mentors.

The SignalFire AI Lab brings founders together with their future partners and customers at the earliest stages of ideation to help these entrepreneurs build smarter and grow more quickly in a space where speed matters.

The program is based on our decade of experience developing AI in-house in order to source investments and assist portfolio companies with recruiting. We’ve proven the value of tightly pairing developers and users. Our AI PhDs, engineers, and data scientists work hand-in-hand with our investors and recruiters to test and refine our models with human feedback loops. Now we’re building a similar bridge between founders looking for product guidance and corporate partners with big problems to solve.

Alongside funding at competitive terms, companies in the SignalFire AI Lab receive:

  • Direct access to corporate design partners and their data sets across enterprise, healthcare, cybersecurity, law, finance, and other sectors
  • Developer credits and engineering assistance from top compute and large language model providers, plus our Beacon AI for recruiting
  • Mentorship from AI leaders, corporate executives, chief security officers, fellow founders, and functional experts

The SignalFire AI Lab is also designed to address corporate partners’ biggest questions around AI. Where should they be applying it? What tools do they need? How can they build safely? How will it impact their headcount planning?

The AI Lab will connect corporates to vetted, experienced entrepreneurs—AI developers and infrastructure providers who understand the unique demands of the enterprise. With so many hucksters pivoting out of crypto and adding “.ai” to their names, corporates need an experienced copilot to help them navigate a complicated, cloudy landscape that’s evolving daily. SignalFire’s AI Lab is developing a playbook that corporates can rely on to learn best practices to help define their AI strategy.

SignalFire will continue to lead funding rounds from seed to Series B while providing deep value-adds across recruiting, go-to-market, data science, and more. Now with the AI Lab, we’ll offer the fastest path for seasoned builders to find and win the biggest opportunities in the future of computing. If you’re a former founder, big tech manager, or sector expert planning to build an AI startup, please get in touch here. And if you’re at a corporate and interested in partnering with the AI Lab, please connect with our team here.



The above is not to be construed as an offer to sell or the solicitation of an offer to buy any security.

Get our latest tips & trend reports
Sign up for our Newsletter

Newsletter pattern