In a move that could accelerate the enterprise adoption of generative AI, startup Galileo has unveiled a novel suite of AI models specifically designed for evaluating the outputs of large language models like GPT-3.
The new offering, dubbed Galileo Luna, represents a first-of-its-kind approach to GenAI evaluation using what the company calls Evaluation Foundation Models (EFMs). These specialized models are finely tuned for tasks like detecting hallucinations, toxic language, data leaks, and malicious prompts in the responses from AI systems.
“For gen AI to achieve mass adoption, it’s crucial that enterprises can evaluate hundreds of thousands of AI responses for hallucinations, toxicity, security risk, and more, in real time,” said Vikram Chatterji, Co-Founder and CEO of Galileo. “In speaking with customers, we found that existing approaches, such as human evaluation or LLM-based evaluation, were too expensive and slow, so we set out to solve that. With Galileo Luna®, we’re setting new benchmarks for speed, accuracy, and cost efficiency. Luna® can evaluate millions of responses per month 97% cheaper, 11x faster, and 18% more accurately than evaluating using OpenAI GPT3.5.”
The key innovation in Luna is the use of right-sized, purpose-built EFMs rather than massively over parameterized models. This approach yields major gains in evaluation speed, cost, and precision over conventional techniques, Galileo claims.
Notable capabilities of Luna include exceeding industry benchmarks for detecting issues like hallucinations by up to 20%, costing 30 times less than traditional methods, delivering evaluations in milliseconds, and not requiring expensive ground truth datasets. The models can also be rapidly customized for over 95% accuracy on specialized enterprise use cases.
“Evaluations are essential for safe, reliable AI products, but existing methods have been very costly and slow,” said Alex Klug, Head of Data Science & AI at HP Inc., one of Galileo’s customers. “Luna overcomes those hurdles in a way that’s a real game changer.”
Already integrated into Galileo’s AI governance platforms Protect and Evaluate, Luna is being used by major enterprises to handle millions of GenAI queries monthly while safeguarding against harmful outputs and reducing operating costs.
With generative AI poised for broader enterprise adoption, Galileo’s Luna models could help overcome one of the biggest remaining bottlenecks around robustly and affordably evaluating these powerful but still flawed AI systems.