Integrating Generative AI and Large Language Models (LLMs) into your data landscape presents a paradigm shift in leveraging artificial intelligence for data-driven solutions. These advanced technologies offer unprecedented capabilities to process, understand, and generate human-like text, transforming how businesses interact with data. By integrating Generative AI and LLMs into your data infrastructure, organizations can harness their potential to enhance analytics, automate tasks, and drive innovation across various sectors.
Sridhar Ramaswamy, previously a Co-Founder at Neeva, joined Snowflake post the company’s acquisition in 2023 and currently serves as a Senior Vice President of AI. Before his tenure at Snowflake, he spent over 15 years at Google, commencing as a software engineer and advancing to the role of SVP of Ads & Commerce. He holds a Ph.D. in computer science from Brown University.
The interview explores the current market landscape in Silicon Valley amidst challenges like layoffs. It shifts focus to Snowflake, known for its data platform, and its integration of cutting-edge tech like Generative AI and Large Language Models. Topics include secure data practices and trade-offs when using vast internet resources. It covers summit highlights, democratizing data platforms, managing diverse data formats, and Snowflake’s vision for Gen AI and LLMs, ensuring responsible tech use for business value and community progress.
AIM: In the ever-evolving landscape of Silicon Valley, market shifts can be quite the rollercoaster. How do you interpret the current market sentiment, particularly in light of recent challenges such as layoffs and industry downturns? Are we witnessing a resilient bounce back, or do lingering concerns persist within the industry?
Sridhar Ramaswamy: I think the economy is doing okay. Even though people have a hard time telling if inflation really peaked with interest rates, have they peaked? Will it begin to come down? So, there is cautious optimism regarding the economy. This is a pretty tough year for startups. There was an article this morning in the New York Times about how many startups are shutting down. The difference in how you can operate in a 0% interest rate environment versus a 5% interest rate environment is massive for startups. So that is continuing to go on. Some of the excitement around AI and what that technology can do is make people believe everything is fine with startups. Yes, those companies are raising a lot of money. But quickly, there will also be a time for them to show the business value of their actions. So, it’s a pretty tough environment for startups. And I see the current excitement and ease of being able to raise in AI very much as I would give it like utmost an 18-month window.
AIM: Snowflake, renowned as the ultimate data platform for professionals in analytics and machine learning, holds a significant place in the industry. With the rise of Generative AI and Large Language Models (LLMs), how is Snowflake incorporating these technologies into its platform? What initiatives are underway in this space?
Sridhar Ramaswamy: The core Snowflake thesis is that having a data strategy and a data platform is a must for every enterprise, and there are companies like Fidelity where we are the system of records. This is where all data flows in and how they exchange and interchange data with other companies. People used to do FTP files if they needed data from other companies. Now, companies say hey, we want this to be done via data transfer, and people are also beginning to build applications on top of Snowflake. We have worked on generative AI since early last year. We released one of the best natively search engines built with AI early this year. So we have long experience in the area.
First and foremost, language models are almost like an interaction mechanism between us and the software we use. Remember, if you want to use a website, you must conform to its rules. You grew up in India; you know that we write dates differently in India than you do in the US; it’s constant confusion.
But to me, those are things that language models address. So it is easier to interact with this software. It’s easier to get back information and start getting a bunch of links. You can aspire to get believable summaries, citations, etc. So what we’re doing at Snowflake is building on that. How do we make it easier for people to get the data in Snowflake to interrogate the data? So, we’ve created a platform layer called Cortex with a semantic search with language models built into the Snowflake platform. If you want to create a chatbot on a table, you should be able to do that in minutes if you want a more sophisticated application like a copilot to help you generate SQL. We’re working on that. We want to take these, embed them, and create an application for some of your users. We want to make that possible. So we think of this as the next logical step as an accelerant into democratizing access to data in Snowflake that is in Cloud Storage. But this is the beginning of a journey. This is when people are building the beginning to build applications but also asking questions like okay, what’s the ROI? Does this mean lots more queries? Like how are we creating value? These are all great conversations.
AIM: How does Snowflake’s emphasis on secure data impact leveraging vast internet resources for applications like ChatGPT, aiming for AGI? Are there trade-offs when aiming for advanced applications in such a secure environment?
Sridhar Ramaswamy: I don’t think there is a trade-off. There is a very important aspect of making sure that language models have linguistic capabilities. That’s where training on the internet and all of the books we can find that stuff matters. At the end of the day, that language model understands English, Hindi, and German; it’s like it is a base level of language understanding. Now, when you come to more of an enterprise setting, maybe you want to take one of these base models but fine-tune it on some amount of data. Maybe you’re in the health area, and you have a bunch of documents that are web pages that have lingo that is specific to your particular industry. That is an easy add-on on top of the base models. And then, you can harness the power of your data in order to enable AI applications. So, to give you a simple example, when I think of a chatbot for a particular situation, I think in terms of what is going to ground the chatbot. In other words, what kind of retrieval system do you need to run? So if a user asks a question, you go back and look at the authoritative sources for that question. And use the content of those pages of those articles to generate the answer. I think that there is very little trade-off. Things work pretty smoothly.
AIM: Could you guide us through the enhancements showcased during the summit this year? Particularly, how does the integration of elements into data discovery platforms democratize usage for a wider audience? Can you illustrate the user journey into the platform for a clearer understanding?
Sridhar Ramaswamy: Many people in the company end up using Snowflake data but let’s talk about how they use it. So, for example, a business user might want to know how revenue is doing within a particular sector. I look at a report of Snowflake revenue that comes every morning, updated for the previous day. I click on it; it goes to a Tableau dashboard. I look at some slices, but I do this ritualistically every morning at 5 am as soon as I get up. But if I want to change, okay, you know what, and it’s not there in the dashboard already. I have to go find the analyst responsible for this dashboard and say, hey, I want this other slice, or I want this other bit of information. They, in turn, figure out the sequel that is being used to populate the data in this visualization tool. They tinker with it and make sure that it goes through a QA process. And then it gets checked in, and maybe five days later, they’re like, “Hey, this report updated.” That is the speed at which data consumption and change work today. Where we want to be is instead of this, perhaps I get the daily email that says this is what revenue did yesterday. But if I click on that, I get taken to more of a conversational interface, but I’m able to say things like slice yesterday’s revenue by our RBPS, and it generates SQL and creates that table for me. To me, it’s that sort of interactivity. Remember, you don’t need to be an SQL expert all of a sudden to be able to do this, and you want to be able to short-circuit the process of multiple days to create something like this. Now, this is not a replacement for analysts; they still have to bring in the data, clean the data, and define the right views so that the AI models can generate SQL on the fly and produce answers. But, in essence, I think that’s the difference. Right now, data consumption is very stylized. It works in a certain way. Change works in a certain way.
You have to conform to the rules of the game with respect to the tools that you use. Powered by generative AI, the promise is that it’s a lot more fluid. It’s a lot more interactive. And in an enterprise setting, it also needs to be a lot more believable. ChatGPT, the part of the problem with it is that it hallucinates. You don’t know when it’s right or wrong, but I think in a business setting, people will not tolerate that. And so these tools have to become like a lot more failsafe; we think about writing and creating these tools so that if you don’t know the answer to something, you’re way better off saying, please talk to an analyst than producing something fake. And that’s a core part of how we think about AI.
AIM: How does Snowflake handle the challenges posed by diverse data formats, especially unstructured data that might lead to erroneous outcomes or hallucinations? Could you shed light on the strategies employed to manage this multi-dimensional data landscape and ensure robust capabilities within the company to handle varied data formats effectively?
Sridhar Ramaswamy: Our CEO Frank Slootman famously said, “There is no AI strategy without a data strategy.” And I don’t think that hard work will go away in terms of ensuring that data is cleansed, that it is believable that it is, brought in on time from which our systems are creating that. I honestly don’t think that hard work goes away.
But once that data foundation is there, we in the AI team make sure that there is a high amount of precision in everything we do. What I mean by that is we never recommend to any of our customers, for example, that they just use a fine-tuned model out of the box because you can’t trust the output from any language model because it’s not grounded in reality. It makes up the best answer that it can. And so, we are building semantic search into the core of Snowflake to serve as a frame-setting or retrieval system. But if you ask a question, we first retrieve the documents that are likely most relevant to your question, and then we use the contents of those documents to generate an answer.
Similarly, if you ask us to generate SQL, we will generate SQL only when we are very confident that it is the right SQL. So when we do that kind of fine-tuning when we create systems like this, we emphasise how you make them feel safe? It is really like this: machine learning people call it the precision-recall trade-off. I think AI is believable. It needs to be done in such a way that is very high precision and as much recall, meaning answer as many questions as you can, but when you do answer, you better be sure you’re right.
AIM: How does Snowflake balance making data-driven technologies accessible to users while ensuring they have the necessary proficiency when utilizing features through Gen AI and LLM? What proficiency level does Snowflake expect from users building models, and how does the platform maintain safety measures for data accuracy?
Sridhar Ramaswamy: Machine learning is a whole other area, almost independent from AI and large language models. And if you want to do anything predictive, let’s say you’re in a company and you want to predict next year’s revenue for Analytics India you are not going to ask if you have no business asking your language model the answer to that question. You need to have data scientists look at your data, build a model, and then actually build a predictive model for that revenue. It is really important for us to understand that language models bring language capabilities, vision models bring vision generating capabilities. There are lots and lots of things like anomaly detection, or different kinds of regression models or like logistic regression predicting the probability of an event. Those are all largely orthogonal to what language models can provide. And those disciplines are not going away in a hurry anytime soon. Your ability to generate code for some of these at the hands of a capable programmer gets enhanced with things like co-pilot. That stays as is, but when we do when the data is there, and we create tools in order to be able to answer questions, we very much envision that essentially every chatbot will have an introduction where it will say, Hey, I’m the revenue chatbot for Analytics India. I can answer these kinds of questions about revenue projections, revenue slices, but I can’t really answer questions about like other areas, and this context setting is going to be important and these models also the sort of chatbots have to be created in such a way that if they are asked a question that they have no idea about, they should refrain from generating answers. Chatbots like this should have no business answering questions about politics of India. So, that kind of frame setting will be an important part of how we create these applications.
AIM: How does Snowflake envision leveraging Gen AI and LLMs within data-driven technology to ensure both the sustained development of cutting-edge innovation and the responsible use of these technologies for creating business value and advancing the broader community?
Sridhar Ramaswamy: Snowflake, to a certain extent, is in an easier position because we provide business tools, and businesses are making money. And it’s a constrained problem, rather than the unconstrained world of people generating any kind of content they want. AI does introduce key elements in the consumer realm of potential things, such as you cannot trust any voice you hear anymore because it could be AI-generated. You cannot trust any video you see anymore because it would also be AI-generated, and it’s going to be hard to tell what is real and what is not. That’s on the consumer side.
It’s more of the business enterprise side, but we play. There’s a lot of work regarding how you have better reasoning capability with models. How do you get them to be better at generating SQL’s? Most of the benchmarks that are there are not very sophisticated benchmarks. I’ll give you one example. I’ve met customers who have Snowflake deployments; they have multiple deployments but individual deployments that are 100,000 tables. These are complex, sort of situations. So, there’s a lot of work to do to make language models better at SQL generation. There’s a lot of work to do to make them better at what is called API calling, where they can actually call some other business function to get information about what they should be doing. So, technology is very much in its infancy when it comes to business applications. But we think that by integrating it into the core of Snowflake, making it super easy to use, and giving guidelines for how to create safe applications, we can go a long way in harnessing the power of this technology to bring data to the fingertips of more people in more companies. We have an AI governance council within any models that we put out. We want to ensure they’re safe to use and don’t generate abusive content. And that work needs to happen on an ongoing basis across everybody that is working in language models. And we see ourselves as important and responsible stewards there. But it’s really like having our customers access data more easily and freely, but also more safely and in a reliable way, which I think is the value that Snowflake as a company can bring to the data world.