Haize Labs’ Sphynx Puts AI Hallucination Detectors to the Test

By Abhijeet Adhikari
Published on August 9, 2024

Generative AI

Sphynx employs a fuzz-testing approach to challenge the robustness of hallucination detection models.

Haize Labs, an AI research company founded by Leonard Tang and Steve Li, has introduced Sphynx, an innovative tool designed to test and expose vulnerabilities in AI hallucination detection models. As artificial intelligence systems become increasingly integrated into critical sectors, the need for reliable hallucination detection has never been more pressing.

1/ introducing Sphynx – the leading hallucination haizing algorithm🕊️😼
– breaks SOTA hallucination detection models (HDM)
– open source, open data
– surfaces critical hallucinations in high-stakes domains
– enables adversarial training for more robust hallucination detection pic.twitter.com/RC1ulrYA7v
— Haize Labs (@haizelabs) August 1, 2024

Source: haizelabs

Sphynx employs a fuzz-testing approach to challenge the robustness of hallucination detection models. The tool generates subtle variations of input queries, designed to be semantically equivalent to original questions while potentially confusing AI systems.

How Sphynx Works

Sphynx operates by taking three inputs: a question, an answer, and context. It then generates variations of the question that preserve the original intent but may cause hallucination detectors to falter.

For example, the tool transformed the question “Paap is a film directed by the eldest child of which other Indian director?” into “What Indian director’s eldest child directed the film Paap?”

While both questions have the same answer (Mahesh Bhatt), some hallucination detectors incorrectly flagged the answer as a hallucination for the rephrased version.

The tool uses a beam search algorithm to efficiently find these “gotchas” – instances where slight rephrasing causes hallucination detectors to flip their judgment from correct to incorrect.

Testing Leading AI Models

Haize Labs put Sphynx to work testing several prominent AI models, including GPT-4 from OpenAI, Claude-3.5-Sonnet from Anthropic, Llama 3 from Meta, and Lynx from Patronus AI. The results were measured using two metrics:

Question Robustness: The ability to answer the original question correctly.

Variant Robustness: The ability to maintain accuracy across rephrased versions of the question.

The results, based on a subset of 100 questions and their adversarial variants, were as follows:

These scores reveal significant room for improvement in the consistency of hallucination detection across various phrasings of the same question, even for top-performing models.

The development of Sphynx highlights the limitations of static dataset approaches in building and testing AI models. While these models may perform well on static benchmarks (with reported 80%+ robustness), they struggle significantly when faced with dynamic testing methods like Sphynx.

Haize Labs emphasizes the importance of rigorous, scalable, and automatic testing to understand and address the corner cases and weaknesses of language models. The company suggests that such “haizing” – intensive fuzz-testing – is crucial for developing truly reliable AI systems.

The tool is open-source, allowing researchers and developers to test and improve their own hallucination detection models. Haize Labs hopes that by exposing these vulnerabilities, the AI community will be motivated to develop more robust and consistent models.

As AI continues to advance and integrate into various aspects of our lives, tools like Sphynx play a crucial role in ensuring the reliability and trustworthiness of these systems. By forcing failures to happen during development rather than in real-world applications, Sphynx represents an important step towards more dependable AI technologies.

In the words of Haize Labs, the introduction of Sphynx signals that “It’s a bad day to be a language model.” This cheeky tagline encapsulates the challenge that Sphynx presents to AI systems, serving as a call to action for the AI community to strive for models that don’t just appear intelligent, but can consistently provide accurate and reliable responses across a wide range of linguistic variations.

📣 Want to advertise in AIM Research? Book here >

Abhijeet Adhikari

Abhijeet Adhikari is a Research Associate at AIM-Research, focusing on AI and data science related research reports. Beyond his professional role, Abhijeet is an avid reader with a particular interest in historical and mythological facts, you can reach him at abhijeet.adhikari@aimresearch.co

Subscribe to our Latest Insights