The interplay of AI, data science, and modern data infrastructure is revolutionizing industries, reshaping how organizations leverage data for decision-making and innovation. This convergence, when seamlessly integrated, unlocks unprecedented opportunities to extract actionable insights from extensive datasets. Beyond enhancing internal processes, this synergy transforms customer experiences through personalized interactions. Staying attuned to this interplay is crucial for organizations navigating the data-driven era. This exploration will delve into motivations, strategies, breakthroughs, and future anticipations associated with this transformative integration.
We wanted to know from the best about how this interplay unfolds and for this week’s CDO Insights we have with us Chet Kapoor who serves as the Chairman and CEO of DataStax, showcasing over two decades of leadership in the technology sector with notable roles at pioneering software and cloud companies, including Google, IBM, BEA Systems, WebMethods, and NeXT. With a proven track record, Chet has been at the forefront of innovative initiatives, steering Apigee to become a prominent technology provider for digital business during his tenure as Chairman and CEO. Notably, he led Apigee through a successful initial public offering before its acquisition by Google in 2016. At Google (Apigee), he played a key role in advancing the cross-cloud API management platform, catering to the demands of a multi- and hybrid-cloud environment.
This interview promises to be a deep dive into the strategies, challenges, and future visions of a leader at the forefront of leveraging AI, data science, and modern data infrastructure for organizational success. It is an exploration of the dynamic landscape where technology meets strategic vision, providing valuable insights for professionals navigating similar journeys in the ever-evolving realm of digital transformation.
AIM: Could you briefly introduce DataStax and elaborate on the essential role that cloud and infrastructure play in shaping the dynamics of DataStax, particularly in the context of the evolving Gen AI era?
Chet Kapoor: DataStax was born in 2010 when it was very clear that you needed a database that scaled in a massive way, and other databases were not getting it done. It was a project that came out of Facebook. Jonathan Ellis, one of the company’s co-founders, created this project in Apache called Apache Cassandra. We were born as a company, and we’ve been extremely successful since then.
So let me give you a different view. Your audience probably uses Netflix or Spotify, buys coffee from Starbucks, has an iPhone, and tracks a package on FedEx. All of those things happen based on Apache Cassandra technologies. We participated in making this technology available to the world. Recently, over the last few years,we’ve taken this awesome, scalable database and made it available as a cloud service in the most modern ways, by making it serverless – separating compute and storage. That has been a phenomenal growth driver for us because people have been going from self-managed to cloud with their mission-critical apps and as part of the mission-critical stack. So, giving them the opportunity not to lose any of the greatness of Cassandra, but bringing it to the cloud has been something that we’ve been helping customers with. That’s been outrageously successful.
We are geeks; we are technologists. We love what we do and want to be at the forefront of innovation; our customers expect us to do that. We saw OpenAI start taking off and we started talking about it internally in November of last year. In March, we decided that we were going to pivot the company. We started as a database company, then became a database as a service and cloud company, and now we’re aGenerative AI company. That has been a welcome change for our customers and, more importantly, our developers. Our goal is to give them all the flexibility of a vector database built on Apache Cassandra, as well as make sure that it is available in their ecosystem because developers use tools and we need to make sure that our products work well with those tools.
AIM: Transitioning from a database-centric approach to the Gen AI era, how does your company navigate this shift? What factors position you to seamlessly integrate modern-day infrastructure with the demands of Gen AI, recognizing the critical interplay between the two?
Chet Kapoor: It was clear to us earlier than most –not all, but earlier than most – that people will start building enterprise Gen AI apps beyond ChatGPT. There’s a promise of LLM’s, which is beautiful for the world. But there will be an aspect of this where people will leverage LLMs to build enterprise apps. What does DataStax do? We have enterprise data. So we were convinced that for the enterprise customers that we work with and for developers that build apps – they needed to combine LLMs with enterprise data because you’re not going to fine-tune personal data about chat in OpenAI; you’re going to keep that separate, but you need to combine the two to give you accurate answers. And so our take was that we needed to support vector capabilities because that’s the embeddings of the language that large language models “speak.” And how do we bring other techniques like RAG together so people can build accurate and relevant applications?
AIM: Amidst the shift to generative AI and providing infrastructure as a service, what challenges has your company faced, especially considering the recent move from theoretical discussions to actual production applications?
Chet Kapoor: Let’s start with building the Gen AI apps. You need to worry about three things. One is it’s got to be easy to use. That means a developer’s experience needs to be easy. It will fail to work if it’s too hard. Ease of use becomes really, really important. The ease of use is not about the graphic user interface. It’s about APIs and catering to specific languages, like JavaScript and Python. We are making sure we’re working with the right frameworks, such as Lang chain, LlamaIndex, and others. The third thing is to ensure we have an easy-to-use vector database.
Let me go back even further. Predictive analytics did not take off. Not that it wasn’t used, but it didn’t become a category by itself that changed the world. And it didn’t, because it was not easy to use. You need to have very specialised skills to do predictive analytics. So, every wave that has happened before this has made it easy for developers to build applications. So that’s the first one.
Secondly, you need to make sure you have production features. You need to ensure scale, throughput, and availability, but the most important thing on the production piece is price performance. Budgets will not double, so they need to fit in an envelope. You will have additional spending, but they must fit in, so the production component becomes important.
The third one is interesting. I’ve been doing this for a while. Generally, when you build apps, you think about ease of use and scale, or ease of scale and production, or ease of use and developer experience. There’s a new metric that you need to worry about called relevance. So now, developers have to think about what F1 is, what precision is, and what recall is. They don’t need to know all the details of it, but it becomes a new metric. The learnings from our developers have been that it is a combination of making it easy, making sure they can get a project into production, and making sure the responses are relevant.
AIM: With no specific roles like Gen AI developers a year ago, how did you shape a team capable of handling the interplay between foundational models and modern infrastructure at DataStax? What strategies fueled the creation of roles across a diverse stack, given the sudden emergence of Gen AI and the demand for adaptive data and analytics professionals?
Chet Kapoor: I have been through a few waves before: I saw client-server; I saw the web; I’ve seen mobile; I’ve seen the cloud happen. Whenever you’re at the wave’s beginning, the iteration speed is important. And so you have to make sure you’re focused on it. One of the most important things was to start taking an outside-inview of everything. What does the developer need? And more importantly, what developers are the ones that are going to be the kingmakers or the change makers?
The second thing beyond the developer needs was that the velocity of change was going to be very high in the beginning. So it was important for us to pick a team of deep computer scientists who understood AI and Gen AI but also very much understood that things would change quickly. And we are catering to the developer.
Now, let me point out something on the developer piece—we sometimes get nuanced on personas as technologists and as folks in the industry. And we sometimes end up going after a small base of developers, data engineers, for example. How many data engineers are in the world? Less than a million? 800,000? Maybe 2 million. How many JavaScript developers are in the world? 20 million?
I think it’s really important to understand that if Gen AI is going to be successful, it is important to understand that it needs to be something that JavaScript developers will have to do. And that was our perspective on this. This is not to say that data scientists cannot use Python to do this; MLops cannot use IT to do this. But this wave will not be successful if you don’t make it available to a broad set of developers. Why was the web successful? Because of HTML. Why was mobile successful? Because an average developer could use Xcode to make it happen. Why was the cloud successful? Because it was so accessible. You could go to AWS and say I want to pay by the second. That is what Gen AI needs to become, not an esoteric technology that only a subset of the entire technical population understands.
AIM: For companies venturing into the realm of combining database services with Gen AI, what concise advice would you give, considering the compute-intensive nature and diverse data demands? Drawing from your experience of leveraging strengths in this arena, what insights can you share for those initiating a similar journey in this dynamic landscape?
Chet Kapoor: Let me start with one statement. Don’t fake it. You cannot read and learn and become an expert. You have to have experience and you have to have lived it. And it is very clear to me that the reason why we will be successful is because we’ve been building databases as a service for a long time. We understand the mechanics of data, but the other part is we also understand the mechanics of what it takes to build apps. And that’s my personal experience with a lot of the leadership team and that’s why we are able to bring this together and take advantage of it. So if you’re running a product company and you have experience in one or the other, make sure you get experienced people who have done the other part that you haven’t done because you won’t be able to fake it. This market is going really fast. And people need real products to solve real problems.
AIM: In the ever-changing realm of AI and data science, where do you anticipate the interplay with infrastructure heading in the next one to two years? Looking further, what are your predictions for the next five to ten years, acknowledging the inherent unpredictability in the AI landscape?
Chet Kapoor: I’ll give you two different answers. One is, from a consumer perspective, you’re going to see Gen AI show up everywhere. It’ll show up on your phone. It’ll show up in search. It’ll show up as ChatGPT . The other day, I heard that somebody is using ChatGPT to brainstorm, so they put GPT on listen mode. It records the entire conversation and then gives them a summary of what their thoughts were. So it’s seeping into our lives all the time. It has certainly dominated my dinner conversations.
On the enterprise side, people will start with incremental use cases. They will do things like chatbots and copilots, which are great. But the real fun starts when they do transformative use cases. When they start talking about people with agents rather than just people by themselves. When they started thinking about new revenue models, things started changing. I see a world five or ten years from now where it is not people versus AI but people with AI versus people without – and the people with AI will win, just like people with paper maps are less efficient than people with Google Maps. And we’ve already proven that again and again.
This will show up in everything we do. It is going to transform the P&L of a lot of different companies. It is going to change how they work. It is going to change how they partner and how they serve their customers. It is probably one of the most significant technology or industrial revolutions that have ever happened before. Over five or ten years, we will realise what we’ve realised with every industrial revolution: It helps us become more productive. What happens when a company becomes more productive? It makes more money. What happens when it makes more money? It hires more people. So, guess what people are concerned about? People are concerned about job losses. There will be some as we transition, but if productivity increases, so will jobs. I’ve been excited about that for some time. People will have to retrain and rescale things like that. But we as a society, as a human race have figured this out many times before in the last 500 years.
AIM: In the shift towards robust infrastructure and Gen AI, what are your thoughts on job dynamics? Historically, automation led to more jobs, but with the rise of intelligent automation, could increased productivity concentrate wealth without creating additional employment opportunities? How do you envision the future of work in this context?
Chet Kapoor: It will always require human judgment. It will. So, let’s take a quote from Vinod Khosla recently. He said between five and ten years, AI will be able to do 80% of 80% of jobs. I don’t know whether it’s five, ten, or twenty years; they will do a percentage of the jobs, just like I talked about maps as an example. I used to be able to read a map, and now I don’t need to. I talk about the location, and it goes and does it. That’s easier for me because I can do other things.
The same thing will likely happen with many other tasks we do regularly. 20% of jobs won’t be replaced, and 20% of that particular job, you will need human interactions. And when that happens, we as a society or human race will figure out a way to do other things that we can innovate on. To create and prosper like we had when we went through other industrial revolutions.
I am bullish about working through the ethics, governance, and societal issues related to these things because we have in the past. Why will this be any different? We’ll figure this out in partnership with regulating bodies, and are now starting to engage much earlier than in the past. Some will say it will stifle the competition, and yes, it might. But at least we’re engaged in a conversation, in a dialogue on figuring out how good artificial intelligence is? When is it coming? How does it affect people? How does it affect economies? All those conversations are happening today.
My closing thoughts are really simple. There is no AI without data. We love where DataStax is positioned because we bring data to AI so that you can create more relevant apps. The second thing is facing the issues we have with this new industrial revolution head on, because it’s people with AI versus people without, and you never want to be on the other side. You don’t want to be on the backside of this. So, I continue to be super excited.