Close this search box. Introduces World’s First Secure Unstructured Data Lakehouse for LLMs, a pioneer in data synthesis solutions for software and AI developers, has announced the debut of Tonic Textual, the world's first secure data lakehouse designed specifically for large language models (LLMs)., a pioneer in data synthesis solutions for software and AI developers, has announced the debut of Tonic Textual, the world’s first secure data lakehouse designed specifically for large language models (LLMs). This novel platform allows AI developers to effortlessly and securely use unstructured data for retrieval-augmented generation (RAG) systems and LLM fine-tuning, solving two major obstacles in corporate AI adoption.

The Untapped Value of Unstructured Data

Enterprises are investing heavily in generative AI efforts, motivated by its revolutionary potential. However, efficient implementation of this technology necessitates the usage of proprietary data, which is frequently kept in disorganised unstructured formats across several file types and contains sensitive information about customers, workers, and company secrets. According to IDC, unstructured data accounts for about 90% of enterprise data, with organisations estimated to create up to 73,000 exabytes in 2023 alone. To be used for AI projects, this data must be pulled from siloed places and standardised, which takes up a significant amount of development work.

“We’ve heard time and again from our enterprise customers that building scalable, secure unstructured data pipelines is a major blocker to releasing generative AI applications into production,” said Adam Kamor, Co-Founder and Head of Engineering at “Textual is specifically architected to meet the complexity, scale, and privacy demands of enterprise unstructured data and allows developers to spend more time on data science and less on data preparation, securely.”

The Importance of Privacy in AI

Data privacy is critical for businesses, especially when utilising third-party model services. According to the same IDC poll, 46% of firms consider data privacy compliance to be a major barrier when exploiting proprietary unstructured data in AI systems. Protecting sensitive data from model memorization and accidental exfiltration is critical for avoiding costly compliance violations.

“AI data privacy is a challenge the team is uniquely positioned to solve due to their deep experience building privacy-preserving synthetic data solutions,” said George Mathew, Managing Director at Insight Partners. “As enterprises make inroads implementing AI systems as the backbone of their operations, has built an innovative product in Textual to supply secured data that protects customer information and enables organizations to leverage AI responsibly.”

Introducing the Secure Data Lakehouse for LLMs

Tonic Textual is the first data lakehouse for generative AI, capable of extracting, managing, enhancing, and deploying unstructured data for AI research. Tonic Textual has several key qualities, including:

  • Automated Data Pipelines: Create, schedule, and automate unstructured data pipelines to extract and transform data into standardised forms for embedding, vector database ingestion, or LLM tuning. Textual supports the most popular unstructured free-text data formats, such as TXT, PDF, CSV, TIFF, JPG, PNG, JSON, DOCX, and XLSX.
  • Sensitive Data Protection: Automatically discover, categorise, and redact sensitive information in unstructured data, with the possibility of reseeding redactions with synthetic data to preserve semantic meaning. Textual employs proprietary named entity recognition (NER) algorithms developed on a variety of datasets to provide complete protection.
  • Enhanced Data Enrichment: Use document metadata and contextual entity tags to increase retrieval speed and context relevance in RAG systems. intends to facilitate constructing generative AI systems using private data without sacrificing privacy, including:

  • Native SDK Integrations: Use popular embedding models, vector databases, and AI developer platforms to build completely automated, end-to-end data pipelines.
  • Increased Data Management Capabilities: New capabilities for data cataloguing, classification, quality management, privacy compliance reporting, and identity and access management.
  • Data Connector Library: Native interfaces with cloud data lakes, object stores, cloud storage, file-sharing platforms, and business SaaS applications link AI systems to data throughout the organisation.

“Companies have amassed a staggering amount of unstructured data in the cloud over the last two decades; unfortunately, its complexity and the nascency of analytical methods have prevented its use,” said Oren Yunger, Managing Partner at Notable Capital. “Generative AI has finally unlocked the use case for that data, and has stepped in to solve the complexity problem in a way that reflects its core mission to transform how businesses handle and leverage sensitive data while still enabling developers to do their best work.”

Picture of Anshika Mathews
Anshika Mathews
Anshika is an Associate Research Analyst working for the AIM Leaders Council. She holds a keen interest in technology and related policy-making and its impact on society. She can be reached at
Subscribe to our Latest Insights
By clicking the “Continue” button, you are agreeing to the AIM Media Terms of Use and Privacy Policy.
Recognitions & Lists
Discover, Apply, and Contribute on Noteworthy Awards and Surveys from AIM
AIM Leaders Council
An invitation-only forum of senior executives in the Data Science and AI industry.
Stay Current with our In-Depth Insights
The Biggest Exclusive Gathering Of CDOs & Analytics Leaders In United States

MachineCon 2024
26 July 2024, New York

MachineCon 2024
Meet 100 Most Influential AI Leaders in USA
Our Latest Reports on AI Industry
Supercharge your top goals and objectives to reach new heights of success!

Cutting Edge Analysis and Trends for USA's AI Industry

Subscribe to our Newsletter