Close this search box.

Cohesity Gaia’s Evolution and Ethical Considerations in AI-Powered Analytics

Generative AI is like a proverbial debate about fire - a matchstick.

In the era of exponential data growth and technological innovation, the need for advanced solutions to manage, analyze, and derive insights from vast amounts of data has never been more crucial. Enter Cohesity Gaia, a groundbreaking platform poised to redefine the landscape of data management and analytics. As organizations increasingly recognize the transformative potential of AI-powered analytics, Cohesity Gaia emerges as a pioneering solution poised to revolutionize the way businesses derive value from their data. With a strategic focus on adoption, scalability, and long-term viability, Gaia represents not just a technological advancement, but a catalyst for driving sustainable growth and competitive advantage in an increasingly data-driven world.

To give us exclusive insights into Gaia, we had Sanjay Poonen, CEO & President of Cohesity.  An accomplished leader in the tech industry, Poonen holds two patents and boasts an impressive educational background. He earned his MBA from Harvard Business School,  where he graduated as a Baker Scholar , showcasing his exceptional academic achievements. Additionally, Poonen holds a master’s degree in management science and engineering from Stanford University, further demonstrating his expertise in the field. He complements his advanced degrees with a bachelor’s degree in computer science, math and engineering from Dartmouth College, underlining his multidisciplinary approach and strong foundation in technology.

AIM Research: What is it about Gaia that makes it special for discussion, and what are your thoughts on the entire space of our ages?

“Generative AI is a once in a lifetime opportunity.”

Sanjay Poonen: Generative AI is a once-in-a-lifetime kind of opportunity. I’ve been in predictive analytics, sinceI was an undergrad in computer science. I came to this country as an immigrant on a scholarship to go to college, and when I studied computer science, we wouldn’t call it AI, we’d call it predictive analytics, expert systems, it was all rule-based. But I think the entire advent of GPUs, inference training models, and embeddings, has been an enormous breakthrough, and if I was a young computer scientist in my 20s coming out of college now compared to when I came out of college, now  is the best time to get into the tech world. So for those of us a little older, I reflect on what  the forces that are coming together. There’s clearly a big move to the cloud, there’s a big move into generative AI and security, and we [Cohesity] play at the center of all of that. And I think Generative AI for us is the tip of the spear,  to optimize what we can do to allow people the power of generative AI to search data. We [companies] store a lot of secondary data in backup and archives, and that usually looks like a tape, and in the old days it was a tape.

Primary data is what’s stored in what is commonly called hot systems, where you can actively get to the data. That’s why it’s called primary. It might be filers, file logic systems, databases, live email. At some point in time, that data starts to age, and it becomes colder. That’s what the world calls secondary data. That’s backup, archives, vaults, etc. So typically one example of secondary data is backup data. So we’ve been the company that’s looking to secure and protect the world’s data and provide insights into it, and all of a sudden that estate has got significantly bigger with the promise of what we can bring together, with Veritas.

But prior to that, Generative AI allows us to crack open all that data that we manage on our platform, and provide ways by which people can search and analyze and get summaries of that data. That’s the most important breakthrough of Generative AI.

AIM Research: What are the general-purpose capabilities of Gaia, and how do they compare to offerings from other companies?

“Getting insight into that bottom of the iceberg is as tough as looking at the bottom of the iceberg.”

Sanjay Poonen: We are patent pending on this because no one has taken this approach, retrieval augmented generation, RAG, to secondary data. Everybody’s been focused on protecting and securing that data. There’s data security, data protection, inventions, scanning algorithms, everything that protects you from the bad guys. But the ability to get insights into a lot of that data that’s sitting there on your platform, and to be able to ask questions, to have a conversation. For example, let’s just say you’re a bank and you have a lot of loan or risk data, that’s old files, and you want to search and summarize that. So on your backup, you should be able to write a query in the guide that says ‘please summarize all my documents from the past’. Or maybe I sent you an email 10 years ago and I want to get a summary of what I did. So any unstructured data or structured data query on your old data, it’s typically sitting on our platform, and the world’s data is on our platform. If we [Cohesity] secure the world’s data, which is our mission, then writing a query on that data in the past was impossible to get an answer to, because you have to do what’s called rehydration. You took the data out of your system, and rehydrated it to then allow it to be able to do that. So if you think of data like an iceberg, the top of the iceberg is primary data. As it ages, it becomes secondary, it goes to the bottom of the iceberg. Getting insight into that bottom of the iceberg is as tough as looking at the bottom of the iceberg. It’s opaque, it’s dark. Now, and if you want to get insight before, you have to take the data out of the bottom of the iceberg, bring it up to the top of the iceberg, and then you could get insight. We changed that fundamentally. That’s what Gaia does. 

AIM Research: What were some of the initial difficulties faced in building Gaia, both from a technological standpoint and in terms of organizational structure?

“We are now three or four years ahead of every one of our competitors in using Generative AI to discover insights on business data.”

Sanjay Poonen: None of us knew that this approach was an option until ChatGPT and OpenAI came out last year. So about this time last year , I was playing around with ChatGPT and was amazed that it could summarize some of the speeches I gave 15 years ago in analytics. I was president of SAP and ran the analytics business there. Later on I was COO of VMware and very involved in end user computing and mobility business. I gave a lot of speeches. Much of that is in text and word form on the web. I wanted to see if OpenAI and ChatGPT could summarize my own speeches. They did a pretty good job. So I came through with the learning that fundamentally this technology is one of the best summarization tools of lots of data. Well, if the world’s data is on our platform, could we summarize it, became my question. So I went up  to Microsoft, and asked is there an ability of OpenAI – because Microsoft was at the frontier of driving AI and in many senses they still are at the frontier of doing that – can you use OpenAI to basically search and navigate all our data on our platform? And they were like yeah, there’s this technique called Retrieval Augmented Generation (RAG) and look at applying RAG to secondary data. That’s where the seed was born. 

We went back, but we kept it under stealth because we wanted to be the first to do it, and basically do it. So, we patented that immediately because we know that we may have a two, three, or four year head start on everybody else, but everyone else ultimately could do the same thing. And then we got to work implementing it. For us, it was like our founding team were  like kids in a candy store, they downloaded every computer science paper on RAG, they started reading about it, and coding furiously, and here we are. It was for me an idea generation and then we all got to work. Interestingly, there was a person inside our company, who was looking to leave the company to start a company that could do this type of Generative AI on top of Cohesity, so I said no he’s got to stay here, Greg Statton is his name, and he became a very key founding contributor to this effort.

So, really I call this the most fundamental innovation that Cohesity has worked on since we were founded 10 years ago. Mohit [Aron] founded this company with incredible tech, and we’re deeply grateful to him, but I have not seen an innovative idea like this since we founded the company. So that gave almost like a rebirth moment for the company at about our   10 year anniversary, to do something absolutely phenomenal. Then we started working closely with not just Microsoft, but Microsoft and Nvidia. And you’re going to hear us doing more with Nvidia. Then we started working with Google. So I think in the ecosystem, Microsoft, NVIDIA, Google, and Amazon are gonna be very key partners. We  are now three or four years ahead of every one of our competitors in using Generative AI to discover insights on business data. Everybody’s using AI for security purposes. We are, others are. That’s like table stakes, that you have to do. We’ve been doing that for years, and are  going to keep doing this. But for this use case, retrieval, augmentation, and integration, we are several years ahead of everybody else and we want to stay innovative in that way.

AIM Research: How does Gaia differentiate itself from AI co-pilots, despite sharing a similar objective of democratizing data and insights within organizations? What sets Gaia apart and enables it to take a step ahead in this domain?

The power of a co-pilot is not the ability to build it”

Sanjay Poonen: The power of a co-pilot is not the ability to build it. Anybody can  build a co-pilot. But, you  have to have the data. Why is Co-pilot so powerful in Microsoft’s hands? Because a lot of their data is email or they have a billion users of Office that they can now easily add that [AI] to. So they have either the tool or the data. Think of us like an Oracle, Snowflake, or Microsoft Office because all of that secondary data is on our platform. We set the vision to secure and provide insights into the world’s data. We have a couple of single-digit exabytes of data on our platform.

With Veritas, we will have hundreds of exabytes on our platform. So that data is sitting on our platform. jointly Cohesity & Veritas’s data protection business. Imagine now, Gaia is a co-pilot for our data. Being able to use that tool to ask questions on the data that’s in our format is huge and no one else can do it. But then we can also apply that to filers that are sitting in primary data like Isilon, from NetApp. There’s no reason why we could not apply Generative AI also there. And that’s on our roadmap to get done. That opens up a huge avenue of Gaia being a co-pilot type product, to have conversational AI on data sitting on Cohesity or non-Cohesity stores.

AIM Research: How does Gaia address the significant responsibility associated with handling vast amounts of data, particularly in light of ethical considerations and potential legal challenges? What strategic approach does Gaia adopt to ensure the responsible management of such extensive data resources?

“Generative AI is like a proverbial debate about fire – a matchstick.”

Sanjay Poonen: Generative AI is like a proverbial debate about fire, a matchstick. Is it fire what keeps you warm or is it like kryptonite that arsonists  can essentially use to light a fire and potentially burn down a house? It’s both. So,  we have to use Generative AI very responsibly. So we were one of the first, at the time we started announcing it last year, in April and May. Well before we shipped it, we started to talk about this concept of responsible AI.

So what that means is if I have role-based access controls that  allow me to ask a question on the data, then I should also be allowed to ask a question about the  historical versions of that data. For example, if I’m permitted to ask a question about the risk within a set of documents that I have access to, and I also have access to the historical versions of those documents, when I pose that query through Gaia, it should allow me to do so. However, if I’m not authorized to inquire about current or historical versions of that document, I shouldn’t be allowed to, and that’s how Gaia will operate. Gaia will operate  within the role-based access controls, within the guardrails of fine-grained privileges for accessing documents or any other data. For instance, email, even though I’m the CEO of the company, I’m not permitted to read other people’s email. During a compliance check, our legal department might be authorized to check particular  emails for individuals who may have violated regulations. Even in the use case of Gaia to read or summarize emails, it will all be done  within this responsible AI framework.

AIM Research: How does Gaia enhance human intelligence in enterprises by facilitating decision-making and providing access to data and insights, without aiming to replace human intelligence?

“Gaia is a summarization tool of all your historical data with links to the actual documents it got it from.”

Sanjay Poonen: Again, let’s use a metaphor, would you drive a car today without GPS? Does that insult your human intelligence? No, it’s an assistance that helps you get there faster. Gaia has to be viewed like a GPS. It’s a summarization tool of your data. Or you can pick another example, if you were to write a document today would you do it without spell check? Does that insult your human intelligence that you have to spell check? You hope your spelling is right, but you have something helping you.

So that’s how I tend to view it. Gaia is a summarization tool of all your historical data with links to the actual documents it got it from. Do I want a summarization tool that can read vast amounts of thousands of pages and give me a synopsis? I might still then use my human intelligence on top of that summarization to get more. So all of these, to me, are just helpers. If we want to make our value add that we’re going to read a thousand pages of documents and summarize it, you know humans can do better. In the same way, do  I want to have a human being that is going around and manually checking all the streets as a replacement to Google GPS? Not at all.

I want the best mapping tool, so I can listen to music in my car. I shouldn’t have to worry about pulling off to  the side, looking up at an atlas like the old days. To me, these debates are often about AI, if we start to look at fundamental innovations, like the printing press or the automobile, we aren’t going around in horse carriages anymore, we are going around in a car,  there’s jet flying. All of these things are debates that every time there is a new innovation, society should debate. But we should come back to the fact that these things are not replacing human beings, it’s actually making us a lot more productive.

AIM Research: How does Gaia contribute to enhancing productivity without replacing humans? What strategic support does Gaia offer clients to address adoption and scalability challenges, including issues like data security, privacy, governance, talent acquisition, and cultural integration?

“Gaia stands for Generative AI app”

Sanjay Poonen: I think the approach we’re taking is by workload. We’re starting off with M365, which is things like emails and OneDrive, which are documents in the cloud. And then we’re going to expand to other documents, and then we’ll tailor our approach based on industry and the specific use cases. For instance, when we approach a bank, we ask them: What is the question you want to ask on your historical data – that if you now had access to all of your historical data and documents – you would want to put into this tool? It might be summarize my loan data, or my risk data of  the bank.

Show me some geoscience data about oil and gas. Show me my medical records. What we want to do is learn what those questions are, take a sample of the customer’s data, prototype what Gaia could look like, and show them. It’s a little bit like an analytical BI tool. Business intelligence tools, typically would take people’s data and then produce pretty dashboards to show you what it looked like. In some senses, Gaia is exactly that. Gaia stands for generative AI app. So it takes your data and summarizes it, and if there are any restrictions on what data it could operate on. It’s operating on unstructured data. So we’re going to cover the entire universe of unstructured data, emails, files, and then we’re going to get to the universe of structured data. And that’s a little bit different because it’s databases and queries.

AIM Research: What long-term vision does Gaia have for its evolution and improvement in functionality? Looking ahead, what are the key areas of focus and development for Gaia over the next five to ten years?

“We are a Snowflake meets Palo Alto type opportunity one. That’s both a data play around analyzing data and a security play.”

Sanjay Poonen: For the next year, we want to advance all the workloads and by vertical industry. This quickly becomes a vertical industry pitch. What are we doing in oil & gas? What are we doing in retail? What are we doing in banks? What are we doing in hospitals? We’re in the public sector. I lived that type of thinking at SAP because I ran the industry groups there. Very quickly,you start to have a very relevant conversation for a particular industry and with that customer, and then you can go to every company in that same industry with the same proposition. Once you’ve solved what Gaia could do for one oil and gas company, you can do it for everyone in that sector. It’s not proprietary then because the questions they’re asking on their data are the same questions another oil and gas peer was doing. 

So, we will learn a lot of what those use cases are and apply it by vertical industry to everyone in a particular industry.Banking, financial services, healthcare, retail, oil and gas, manufacturing, and some are going to be horizontal scenarios across all industries. Compliance is a good example. Email compliance or discovery, e-discovery is fairly horizontal across all industries. But there are going to be some very vertical, specific data sets. So, this for me tugs  a little bit at my heart. I fundamentally believe the opportunity for this industry, data protection, is a security play and an insights AI play. Today most companies have been saying, we use AI for security. We [Cohesity] are a Snowflake meets Palo Alto type opportunity one. That’s both a data play around analyzing data and a security play – and that’s a tremendous opportunity for us to go and prosecute, and to do very well.

Picture of Anshika Mathews
Anshika Mathews
Anshika is an Associate Research Analyst working for the AIM Leaders Council. She holds a keen interest in technology and related policy-making and its impact on society. She can be reached at
Subscribe to our Latest Insights
By clicking the “Continue” button, you are agreeing to the AIM Media Terms of Use and Privacy Policy.
Recognitions & Lists
Discover, Apply, and Contribute on Noteworthy Awards and Surveys from AIM
AIM Leaders Council
An invitation-only forum of senior executives in the Data Science and AI industry.
Stay Current with our In-Depth Insights
The Biggest Exclusive Gathering Of CDOs & Analytics Leaders In United States

MachineCon 2024
26 July 2024, New York

MachineCon 2024
Meet 100 Most Influential AI Leaders in USA
Our Latest Reports on AI Industry
Supercharge your top goals and objectives to reach new heights of success!

Cutting Edge Analysis and Trends for USA's AI Industry

Subscribe to our Newsletter