Kolena and why we need industrial-strength tools for AI testing

What’s the best AI model for your use case? Will it work with no hallucinations? These questions are plaguing every enterprise trying to adopt AI.

This stems from a huge shift in how we build software. The first wave of machine learning tooling was about in-house experimentation and training, and for good reason. The workflow was complex and third-party infrastructure tooling was insufficient. Traditional ML work was focused on model architecture and training hyperparameters, all driving toward consistent performance with a company’s specific product.

An illustration of the traditional ML workflow

Foundation models changed everything

Using zero-shot learning for ML product development, we now have machine-learning-as-a-service. Enterprises can license and integrate an AI model without having to build a full-fledged MLOps and tooling stack. But which model should they choose? Now they need a rigorous testing solution to validate and compare the best model in the market for their use cases.

An illustration of the traditional machine learning pipeline

Without a methodical way to verify performance, there are significant risks to adopting foundational models—LLM factualness, jailbreaking, privacy, hallucinations, and other characteristics. This is why the modern ML stack is test-driven. It’s all about gathering training data, fine-tuning models, and then checking that they consistently work as expected.

Kolena testing tools catalyze AI adoption

Kolena turns AI model comparison, testing, and validation into a science instead of a haphazard art. It allows developers to build AI systems characterized by safety, reliability, and fairness. By providing meticulous assessment and comprehensive analysis of every dimension of AI models and their data, Kolena adopts a highly granular approach: unit testing for machine learning. It ensures that AI models undergo rigorous testing at the scenario level before their deployment to users, significantly reducing risk to the business. That kind of peace of mind will catalyze adoption by bigger businesses, regulated sectors, and industries like healthcare that have no margin for error.

When we led Kolena’s seed round, we spoke with many practitioners from the industry. It was clear that there was a need but no competing tooling around building reliable, enterprise-ready AI products. One leader we spoke with from a well-known Silicon Valley company mentioned, “We do this using aggregate metrics currently, which leaves huge blind spots in our model validation process.” This is why customers were eager to use Kolena before the product was even built. They were dedicating significant headcount to ad hoc testing and were in such dire need of tooling that they considered building proprietary dashboards. This proved to be too complex, with big questions around data segmentation, testing diversity, and applying perturbations. Most enterprises preferred to buy rather than build, and now they have Kolena.

An illustration of the flow for testing model performance

As an AI-native venture fund that builds and tests its own models for investment sourcing and portfolio recruiting, SignalFire saw the need for Kolena early on and led its $6M seed round. We’ve since used our Beacon AI data platform to assist Kolena with commercial, technical, and leadership recruiting searches, and data pulls of potential customer lists. Now we’re excited to back its $15M Series A led by our friend David Hornik at Lobby Capital.

Kolena is fixing the broken testing workflow that even leading AI organizations like OpenAI are wasting time and risking mistakes by doing manually. It is tedious, time-consuming, and hard to manage at scale without the proper tooling. By giving time, energy, headcount, and assurance back to enterprises, Kolena is unlocking the next stage of AI adoption.

*Portfolio company founders listed above have not received any compensation for this feedback and may or may not have invested in a SignalFire fund. These founders may or may not serve as Affiliate Advisors, Retained Advisors, or consultants to provide their expertise on a formal or ad hoc basis. They are not employed by SignalFire and do not provide investment advisory services to clients on behalf of SignalFire. Please refer to our disclosures page for additional disclosures.

View all

Mandolin lands $40M to accelerate access to life-saving specialty drugs with AI agent platform

Portfolio

Investment

June 25, 2025

Mandolin lands $40M to accelerate access to life-saving specialty drugs with AI agent platform

Mandolin raises $40M to bring AI automation to the complex world of specialty drug access—accelerating treatment for diseases like cancer and Alzheimer’s by streamlining referral-to-reimbursement workflows for infusion centers and pharmacies.

SignalFire backs SuperDial to tackle healthcare’s $1T admin bottleneck

Investment

Portfolio

June 24, 2025

SignalFire backs SuperDial to tackle healthcare’s $1T admin bottleneck

SignalFire leads SuperDial’s $15M Series A to automate the $1T administrative burden in U.S. healthcare with AI-powered voice agents—freeing up providers, reducing burnout, and transforming how back-office workflows get done.

Conveyor raises $20M to scale the AI agent platform that’s redefining customer trust

Investment

Portfolio

June 12, 2025

Conveyor raises $20M to scale the AI agent platform that’s redefining customer trust

Conveyor raises $20M to launch AI agents that automate security reviews and RFPs, turning trust bottlenecks into business accelerators for enterprise sales.

No items found.

Foundation models changed everything

Kolena testing tools catalyze AI adoption

Subscribe to our newsletter

Share

Related posts

Mandolin lands $40M to accelerate access to life-saving specialty drugs with AI agent platform

SignalFire backs SuperDial to tackle healthcare’s $1T admin bottleneck

Conveyor raises $20M to scale the AI agent platform that’s redefining customer trust

Privacy Settings