Video footage is everywhere but most of it sits unused. Spot AI is changing that by turning ordinary security cameras into powerful tools for insight and action. Their plug-and-play platform helps businesses improve safety, efficiency, and operations without replacing a single camera. Behind this vision is Rish Gupta, who made the decision to start his first company during the final 6 kilometers of a marathon on a 100-degree day in Bombay. With no background in business, he scaled it to 60+ employees and millions of users before selling it and moving to the U.S. in 2015. Now, captivated by the videofication of everything, Rish and his co-founders are building Spot AI to be the “easy button” for video unlocking the richest data source available.
Kashyap: Welcome to another episode of The AIM Media House Podcast: Simulated Reality.
Today, we’re excited to have Rish Gupta, the Co-founder and CEO of Spot AI.
Rish, I’ve been reading up on your journey over the past few days, and it’s truly fascinating. You’ve had a unique path both as a founder and a technologist and I’m especially curious about what inspired the idea for Spot AI.
But before we get into that, I’d love to hear about your overall experience in Silicon Valley, building startups, navigating the ecosystem, and everything in between. What have been some of the key lessons you’ve picked up along the way?
Rish: It’s been an epic ride. If you think about this seven-square-mile city, it’s probably driving most of the innovation in the world. Every time we look at the city, it’s just amazing to realize that most of the digital products we use whether it’s X or Google came from within eyesight distance or just a short drive from here. So, it’s pretty epic to be a part of that.
The second thing that makes it really exciting is this: I think if you live in other big cities around the world, whether it’s London, New York, or anywhere else if you have a completely ridiculous idea, most people will tell you all the reasons why it’s going to fail. But here, you’ll meet people who say, “Oh yeah? Can you think bigger?” And that kind of push is what makes this place really exciting.
Plus, it’s beautiful gorgeous weather, the ocean, just an amazing place to be.
Kashyap: I completely agree with that. Now, shifting to Spot AI, I’d love to dive into the idea behind it. What was the initial thought process?
Every founder usually draws inspiration from a problem they’ve encountered or something they were trying to solve in a previous role. Was there a particular moment or experience that sparked the idea for Spot AI?
And like with most startups, the original vision often evolves over time as technology shifts and real-world feedback comes in. How did your initial idea transform into what Spot AI is today?
Rish: I think there are three dimensions when you think about a startup. There is the technology wave that you want to ride, which I think is the most important dimension to think about. Then there’s a distribution wave that you can create, and then there’s a product form factor that you will create to make these two things happen to bring these two things together: the technology wave that’s happening, the distribution wave that you can create, and then you create a product launch to make that happen.
What I mean by that is, for Spot it was very clear. When we were starting out, the three of us, the clear indication was that two things in technology were going to be really, really true. And there were enough proof points that you could project it outwards.
One was the edge computing stack was going to get better and better. Even in 2019, 2018, 2020, you could see NVIDIA GPUs getting better and better, and you could deploy more and more capacity on the edge. And they were almost doubling every year. You project that ten years out, and you have like a 1,000x improvement on something.
And then second, AI models were getting phenomenally better. So things that five years ago would require a team of engineers and a supercomputer a much more powerful computer could be run on a MacBook Pro by a single engineer using probably just a couple of hours of code using open-source models. So you knew the state of AI was increasing at a really rapid pace. You project that again for ten years and you’re like, this is not going to slow down, it’s just going to get better and better. And we’ve seen that happen in the last two years in an accelerated fall.
Once we had those, it was like, what is the most complex data source that we can apply this to? The richest data source available is video. Video is interesting because you cannot put all the video in the cloud because of bandwidth constraints. Bandwidth is just not increasing fast enough. That gave us this idea that we can make video data really, really useful for companies.
Then it was about, who do we make it useful for? If you think about which industries have the highest lack of visibility into their operations, this comes to the distribution point. If you’re in a digital realm or a pure IT world, most of what you’re doing stays in some kind of computer code or metadata lying on the computer. But if you’re in the physical world, where things are moving, objects are moving, trucks are moving, all kinds of pallets and forklifts and stuff , you don’t have digital footprints of that. Cameras allow you to create a digital footprint of that.
That came to the conjunction of this video data, which we make immensely powerful for this multi-trillion-dollar physical economy. That was the idea, to make video data incredibly useful for the physical economy. Then there were a whole bunch of layers of how we productize that over the years into sustained form factors. As more technology becomes more mature, it has led to more and more interesting products.
Kashyap: Video surveillance, especially from the perspective of using it for very viable and meaningful ways, has been a problem statement that many have always tried to solve. You have put humans behind to observe security footage every now and then. There are big casinos, big manufacturing plants, where this surveillance has been used. What does spot AI bring into that equation? I really want to understand what is proactive Video Intelligence versus a traditional surveillance intelligence and a human sitting behind those sets of cameras.
Rish: We never thought of ourselves from day one as a surveillance company or that the market we were in was offering video surveillance. I think that was the first market when IP cameras, or cameras, became famous in the 90s and 2000s for businesses. That was the first use case that people deployed into.
We thought of it, as I said, because of the AI and the edge compute stack. This is one of the richest data sources. And if you can create a digital footprint of everything that’s happening in a factory, or in a logistics warehouse, or in a hospital, or in a retail outlet then what can you do with that data?
So video is just a source of data, and then you think about how you can empower whether it’s their operations team, their safety teams, even security teams. That’s how we have approached the market.
And the difference that has meant is it’s not so much about, can we prevent certain things from happening? Yes, we can. But the approach has always been: what data would make the person’s job better?
So, if you are in the operations team in a retail outlet, knowing the number of transactions you’re having or how many aisles a normal person walks through before they make a transaction, or the number of people entering your store at different hours of the day, or how many times someone is at the kiosk when there’s nobody behind the counter to help them these are all data points that make it incredibly valuable.
Same thing for a safety officer. It’s not just about preventative versus reactive aspects. It’s about unearthing insights. A typical safety manager can walk around the floor and get a sense of the safety compliance, but with cameras, you’re getting 24/7 data on where the biggest risks are in the factory and which parts of the floor tend to have the most problems.
And yes, that leads to proactive measures people can take to avoid safety incidents. But it also means you can use the data to lower your insurance cost. You can use the data to actually measure safety across different factories and create metric-driven decisions for your business.
That’s what we mean. The way Spot has been different from other camera companies is that it’s not about installing cameras which is what they were about. 70 to 80% of their revenue comes from hardware. The emphasis is on how many pieces of hardware can I sell? Can I sell you more cameras? If I’ve maxed out cameras, I want to sell you other surveillance tools like widgets whether it’s alarm systems, door access control systems, etc.
For us, it’s been: now that you already have these cameras, how do we make these cameras incredibly useful? We’re not in the business of trying to sell you more cameras. We’re in the business of making these cameras incredibly more useful.
Our whole technology stack has been about taking the compute away from the camera, centralizing it, so you can bring any camera onto the platform. That was very, very different from other camera manufacturers. Then we built a whole data pipeline focused on maximizing the value that this data can bring.
Kashyap: We’ve been trying to understand how your thought process has evolved from product development to scaling which is crucial. As a media company, we always explore a theme each year. This year, it’s agents.
Every company today is starting to think seriously about bringing in AI agents. Last year, it was generative AI. Before that, it was scalability through MLOps. And before that, it was data engineering. So there’s always a thematic evolution in how state-of-the-art startups like Spot AI adapt to market shifts. When the conversation around agents started picking up, Salesforce hosted an event focused entirely on AI agents. Snowflake and Databricks followed with similar messaging. I was at NVIDIA GTC, and Jensen Huang mentioned AI agents multiple times, it’s clearly front and center.
Have you identified any viable use cases where agentic AI could play a meaningful role in what Spot is building particularly in the context of video surveillance or operational intelligence? How are you thinking about integrating agents into your existing platform, and what does that unlock in terms of capability or customer value?
Rish: We actually started looking at how to apply agents in the video space almost a year ago, and we were probably the first company in this category to start thinking about it and launching products. If you do a search for who builds video AI agents, Spot AI will be the number one result. That’s because we were the first to do it and probably still among the very few who do it well, or do it at all.
For us, the way we think about agents is very simple. We’re a camera-agnostic platform, which means that if you’re a large company, you can bring in any brand of camera across your different locations onto a single AI platform. This platform is completely open, which means your video data platform can connect to your ERP system, your point-of-sale system, your access control systems. So it becomes part of your business intelligence stack, with video data being a component.
Right now, we’re delivering this data through application layers whether that’s desktop, mobile, browsers, etc. and users receive data coming from their factories, warehouses, retail shops, and more. But even now, a human still has to look at that data and make decisions. A safety officer, for instance, still has to look through incidents, generate reports, assign safety scores, and recommend improvements. There’s still a lot of manual work involved after the data is received.
The beauty of agent tech is that we can now go to safety teams and say: just give the system a goal, like reducing safety incidents by 10%, and the agent will generate reports, analysis, and recommendations automatically. You can even automate next steps such as sending a warning or coaching message to someone when they violate a policy.
So instead of a person consuming the data and then deciding what to do next, the agent handles that entire workflow. Whether it’s improving safety compliance, reducing operational delays, or securing a property like preventing loitering after 10 p.m. or tailgating—the agent figures out what needs to happen next and drives you toward that goal.
It completely augments and supplements the team, allowing them to do a lot more. It’s like having an additional team member working alongside you. Just like you see with agent tools in the digital world like Cursor for engineers we’re now seeing that kind of augmentation happen in the physical world for safety teams, security teams, and ops teams.
Kashyap: you brought the conversation to also the human in the loop argument around it. Do you as a founder, envision your AI or AI in general, to not have human in the loop over a period of time. Or do you always envision the use cases for spot AI application, to have women in the loop at the at least the end, you know, decision making process and also kind of making the making critical decisions in sensitive areas where Surveillance can be deployed?
Rish: Just to clarify, “human in the loop” within the industry can be used in two different ways. One is whether we have humans within Spot who are tagging data and making things happen. Is our AI functioning automatically, or are there actually people sitting in some remote office tagging a bunch of data?
Our AI works really well with more than a billion hours of video data, which we use to fine-tune all our models. So we don’t need a human in the loop to do the detections.
Then comes decision-making on the customer or enterprise end—to define and design the parameters of what the agent will do. That part is 100% human-in-the-loop. It’s not an open-ended system where you tell it to “go solve safety” and let it run wild. A safety officer still defines the constraints: the behaviors they want to detect and understand in a factory, the kinds of actions they allow, and the kinds of actions they want the agent to trigger. The agent delivers specific kinds of results.
It’s different from any other team member. You’ll look at the first month of results, then the next two months of results, and then start fine-tuning your instructions to the agent. You might adjust the number of behaviors you want to track, or the set of actions you want the system to take or not take and that evolves with the company.
So yes, at the customer end, this is not about replacing human beings. It’s about making them more efficient. If you were an accountant in the 1970s or 1980s, you were manually collecting receipts and making files to track expenses. Then computers came in, then the internet, and now tools like QuickBooks or Intuit. The accountant isn’t redundant, they’re just doing higher-order work now. They’re doing more FP&A, talking to business leaders, analyzing expenses at a deeper level.
What we’re doing is similar. Take the safety officer example. If they had to manually check safety practices, maybe they’d walk the factory floor once a day, more likely once a week, and jot things down on a clipboard or in a file. They’d love to do it more often, but there’s a physical limit to how much they can do. But with 10 cameras or even 100 cameras watching the entire factory, you can have a safety report running every hour. The fidelity of the data you can collect becomes immensely more powerful and empowering to the human being.
Kashyap: Surveillance has always been a polarizing topic. While there are undeniably powerful and positive use cases like preventing accidents in large industrial plants, there’s also a long history of concerns around misidentification, privacy violations, and misuse of data. As a company building systems that rely on cameras and physical data, I imagine this question comes up often. Have you established a clear framework or philosophy around where your surveillance technology should and shouldn’t be used? And more broadly, how do you ensure that what you’re building consistently drives productivity, safety, and positive outcomes rather than enabling misuse?
Rish: First of all, we don’t see ourselves as a surveillance system. We truly see ourselves as asking: what data can we provide to our customers to make their jobs easier?
Once we look at it from that lens, we can start to understand what data is typically more useful to customers and what data could potentially be used in harmful ways.
For example, if someone starts asking us for racial data of people entering their stores or factory floors, or anything that could be discriminatory or stereotype people into groups—we don’t provide that. Can AI potentially detect that? Yes. But we’re not going to explore or expose that kind of data to our customers.
The second point: if you’re a private customer walking into a store and haven’t consented to having your personal information used, we’re not going to link you up to an external database. We’re not going to look at your face and say, “This person is Person X who resides in this location and drives this car,” for marketing purposes. We’re not going to identify you or target you through ads or anything else. That kind of data crosses a line in terms of code of conduct and privacy.
On the other hand, there’s data that’s extremely helpful. For example, if you’re doing a specific job and are expected to be in certain locations whether from a safety or customer service perspective we can help determine whether you were at those locations or not. Were you able to take the actions necessary to succeed in your role? We can provide that data. That’s the kind of data we focus on.
Then comes the implementation layer: even if you have the data, how do you implement it effectively within an organization? We follow the “three T’s”: transparency, training, and trend.
Transparency means ensuring the data isn’t just available to a few people at the C-suite level. When we go into organizations, we make our systems available to anyone who can benefit from it, team members included. It’s not just your boss who has access to this data. As a safety officer or frontline worker, you can also see your own safety data. You can see when you’re making mistakes, and also when you’re doing amazing things on the safety or operations side. The data flows both ways, it’s transparent.
Training is the second. We have a great customer success team, and we support our clients in training their teams to use the AI effectively. We’ve seen countless examples where people were initially scared of it, then used it, and found it incredibly useful. It may have taken them two days to learn, but they were glad they did.
Trend is the third. We don’t isolate incidents. We’re not flagging someone for taking one five-minute-long break or for not wearing a safety hat for two seconds. Instead, we look at trends over time. Is this person trending toward being more productive, safer, more helpful to the organization? Or are they deviating from the company’s established standards? These trends help people on both sides understand how to improve.
So beyond just providing data, these are the three pillars on the implementation side that really help our customers.
Kashyap: You mentioned earlier that your strategy was shaped by a few core assumptions—like the continued growth of compute power. I recently heard another AI CEO say that the most successful companies tend to have an assumption-to-knowledge ratio greater than one—they make bold bets based on conviction rather than waiting for perfect clarity. With that in mind, what are some of the key assumptions you’re operating under today? What trends are you observing in the industry that are helping shape the direction of your company—whether in how you scale, improve your technology, or go deeper into customer workflows?
Rish: I think the first thing I’m going to say is probably not a huge surprise to people who are deep in the AI space or actively implementing AI applications. You’re moving from a system-of-record software world where everything you did was stored in some kind of database to a new paradigm. Your Salesforce, your Atlassian, or other tools connected to databases and triggered actions, which was a truly useful world.
Now, we’re entering an era where you have to think about what you’re building in the application in terms of: how does it help a team or a specific set of individuals do their job? How can it take parts of the job and do it for them or do it much more intensely than they humanly could? I can only watch a video for five minutes because I have other things to do. But now AI can watch 24 hours of video and give me a summary.
So when you start looking at what a person does day to day and how they spend their hours—you can ask: which of those hours can be minimized, or how can we make the output 10x better? That’s one lens through which applications are now being designed. It’s a product point of view, but it’s what’s going to happen.
The second thing, from a technology side, is this: a couple of years ago, many people believed that the model layer—the models of the world—would create all the applications of the world. But we believed the application layer would continue to matter, as long as the application layer had both an understanding of workflows and proprietary data.
Today, we’re sitting on multiple billions of hours of physical video data—data you can’t find on YouTube or in open sources. This ranges from understanding how humans interact in factory settings, how gravity behaves when objects fall, what happens when two forklifts cross paths—tons of examples that we’re learning from to make our systems better.
So our bet is that over the next few years, the people who understand human workflows better and who have a proprietary data advantage—and become really good at training and deploying models—will have a huge edge.
From that perspective, our goal is not just to get more data, but to go deeper into our customers’ workflows. It’s almost counterintuitive, but our focus is not to get the next thousand customers quickly. Instead, it’s to get the next few hundred customers who are willing to work more deeply with us on higher-value use cases.
If a customer comes to us and says they just want a basic AI veneer, as a business, we almost have to say no. We focus on customers who are going deeper with us, because that’s what allows us to build a compelling technology and company for the future—and to build agents around that.
“Agents” will become a very generic, overused term—like a cloud of workers. But what it comes down to is this: are you able to take something a human being needs to do and either do it much more intensely, or eliminate the need for them to do it entirely—so they can focus on other things?