AI Advancements and Camera Awareness Fuse Digital and Physical Realities

Tech Disruptors podcast series (hosted by Bloomberg) invites CEOs, thought leaders and management teams to share their views on disruptive trends and their impact on decision making and strategy formulations. Sravanth Aluru (CEO, Avataar) and Mandeep Singh (Senior Analyst, Bloomberg Intelligence) in conversation here. Some interesting excerpts from their conversation :-

Mandeep: We host a lot of CEOs and executives on this podcast and most of them have been from publicly traded companies. We were attracted by your background and what you're doing in the space of 3D content. So maybe we can start off with your background. How did you start this company and who are your main backers?

Sravanth: I started my journey as a computer vision fanatic in the early 2000s at Microsoft. Just out of IIT-Bombay, an engineering college, I was interested in immersive technologies and their evolution. If you’d recall, Xbox used to have something called Kinnect, which was a depth sensor sitting on top of it doing realtime camera AI. This powered technologies like skeleton tracking allowing a multiplayer gaming use case to be built on top of it. That's where my passion for the power of the convergence of AI and computer vision started, which I think is one of the reasons why I started Avataar. Post my B-School from Wharton, I spent about six years as a tech investment banker where I saw a lot of organic and inorganic investments happening from both the OEM giants and equally from the smartphone ecosystem, I'd say social players and others who started looking at the requirement for digital web to shift from the flat device that we call mobile today, into something that's more pari passu with the physical experience that we have, which has spatial depth. That's when I started realizing that 60% of our sensory perception is visual in nature and today the digital web is flat. So we are literally not attuning to 60% of our sensory perception as humans. And that's what I think, validates the need, which you will see happen in the next decade as Web3 evolves. But that was probably where I really started feeling that there is going to be an inflection point. Our technology has come to a point where photorealism wise it is very hard to differentiate manipulated reality from physical reality. That is a turing test moment of this shift from 2D to 3D. This thesis started my passion for the space and my decision to probably leave Deutsche and start Avataar.

I've been very fortunate to have my co-founder working with me. We've been together for the last 10 years and are supported by a very strong leadership team with a persistent focus. My co-founder runs the product, I run the engineering side. And both of us are part of the market oriented efforts so that we can think of PMF holistically. We have Sequoia and Tiger Global as our institutional backers. I am grateful to them in this journey, given that we spent about 3 to 4 years on R&D trying to create the building blocks of 3D. We built consumer experiences using this power of spatial depth and remove friction from a consumer’s journey. We've also invested heavily in R&D and raised about $55 Million in total. I'm fortunate to say that we have been an inside engineering engine, powering a lot of the large names that you see today shifting from 2D to 3D. Whether it is AR or VR or the metaverse, the fundamental block that is required is the shift of 2D to 3D, from a discovery perspective and equally from a content perspective.

Mandeep: Is that why you picked the name Avataar? Because everything will be 3D or can 3D be without an “Avataar” ?

Sravanth: Avataar is a Sanskrit word for incarnation. It is actually not a human “Avataar”, it means incarnating anything again. So our real core purpose is to remove this divide between digital reality and physical reality, such that tomorrow when it's blended you can't make out the difference. For us, it was literally creating an “Avataar” of everything that's physical or imaginative. It was effectively saying that we are here to create that building block of transitioning the gap between the 2D and 3D formats, which requires reincarnation.

Mandeep: So when you say you are creating the building blocks for 3D content, I'm curious to know why do you think there is an inflection point now for 3D content. Is it due to hardware availability or something else? I mean we've been talking about 3D for years. So why now?

Sravanth: I have been tracking this for 2 decades and my personal view is that there are two sides to this: Hardware and Consumer Awareness. From a hardware perspective, back when I was in Microsoft, Xbox was still a gaming console and would probably be in less than 3-4% of the total households, which was not a meaningful penetration. The shift that happened was through mobile and mobile AR. I would say in the selfie generation, camera became very important in driving the CPU and GPU capabilities on a mobile phone, primarily driven by the Augmented Reality inflection. That is, I would say one of the hardware inflections that was needed. A device that's now in a consumer’s hand, which means there is real penetration and is capable with the hardware that's required.

The second evolution is more consumer. Since camera as a device has evolved from selfies to using AI for image recognition, consumers have come up the adoption curve in the last 10 years and that has been a phenomenal shift, more specific to Gen Z and younger millennials. Today, extending cameras for utility value for a consumer journey is a viable option, which wasn't there 5 years back. You'd see that evidence with giants like Snapchat and how they've grown camera based engagement.

Mandeep: Okay. But I'm curious, like where do you integrate your technology with what Snapchat camera is doing?

Sravanth: Let's take an example. Any captive platform today, which has a captive supply-demand ecosystem that they own. Like in commerce, there are season collections churning out every season that need to be delivered to consumers’ homes for them to see one of those virtual couches in their living spaces, and then make a decision. What Snapchat has very well done is create a camera interface to do that. However, there's a need for this entire ecosystem to be built. Creating content is all manual and leads to a bigger challenge of scaling up.
What we've done is taken a neural deep learning attempt, perhaps the reason why we are a part of the Metaverse standards forum and Khronos Group, consortiums working on creating open standards. There’s an AI element where an algorithm today can do what humans were doing yesterday. And in that process, create the building block required to transition suppliers to be 3D ready to start with.Then the next challenge, which is how do you then drop it into a consumer home? How do you create that friction free consumer journey with this new dimension of depth that's been added to the camera? That is the other focus area where we've been focused on commerce.
Now, how are we different? We actually don't think of us as a captive platform. The biggest differentiation for us is that we are a white label platform, with a very clear focus on being the building block enabler, an inside engine that solves for prerequisites before anyone is able to do what they want to do at scale. What Snapchat is doing is the next step, which is actually creating that impact for consumers that they own and bring this in a particular use case. We think of us as a horizontal underlying enabler while each of these captive platforms are verticals on top.

‍Mandeep: Could I think of your service as an API call that a Snapchat or Roblox or Epic or Unity or anyone else in this space is making?

Sravanth: We are not yet in the gaming space, we've been primarily focused on utility value so far. So commerce is our core focus. But yes, similar platforms are using us as a SaaS platform, and a better way to think of us would be like an underlying white label platform as a service that they're calling within their captive ecosystems.

Mandeep: So what are the chances that they would want to do this on their own, as opposed to using a service or a platform as a service call like yours?

Sravanth: That's a brilliant question. The core focus for us is actually very different from an eCommerce player for example, who's doing a D2C engagement of selling goods.Their business model is about selling goods. What we are really solving for is the supplier challenge for their marketplaces. So what you would see is the vendors in marketplaces today use us to actually get onto the seller interfaces. That's a prestep to marketplaces. You should think of us as partners, rather than competition for them.

Mandeep: There is always that risk that if a company is making too many calls to this service, there is that question - Should we outsource it? Or should we build it in-house? Where do you fit in that stack?

Sravanth: We recently announced Shopify integration and BigCommerce integration. Shopify is a widely built e-store that's powering the entire stack. What we've done is partnered with Shopify to pre integrate our entire stack within the Shopify system. On the supply side, the ERP integration is already done and on the frontend side, our renderer goes and sits within the Shopify e-store, then we give the data back to the vendors, for any merchant that we onboard. Anything that you see within a camera for our partners would be all by Avataar and it's on Avataar’s cloud. That's why I say we are a service platform because we are the ones that actually own the consumer engagement within the camera, and then give analytics back on consumer behavior to the merchants.

We actually get integrated into the websites and the apps of our front end. Then we control from our cloud. If you were to ask me it is too early to bet on today's formats being the final formats. There's going to be a lot of consumer data driven evolution of the consumer experience. So we need to own the consumer data to evolve the consumer experience forward. From our vision perspective, we own the front end as well, as much as we own the backend.

Mandeep: So just to tie everything together, it has to start from a camera. Whether that camera is on the phone or it's on an AR or a VR device, but that's how your product will be rendered.

Sravanth: We saw the virtual room through a desktop. So what you'd see is anyone who doesn't have a camera device is actually being shown the same 3D experience where they can rotate and swivel in a virtual life size room. We want to ensure that we serve the consumers who haven't yet shifted into VR (which is around 30% of the total consumers).

Snapchat and Deloitte released an industrial research report last year. And they have observed that in US, there are about a hundred million shoppers who are using AR and about 96% of those shoppers say, “I want AR for my next shopping experience”.

Mandeep: So are you of the view that AR will hit mainstream before VR?

Sravanth: AR already has hit the mainstream in utility value, across logistics where camera and computer vision do a lot of sorting and things like those. From a consumer perspective, you can see the AR Ads. That's a huge industry and is growing very healthily over the last five years.

There is the AR commerce utility value. VR is still early because VR’s headsets haven't yet reached inflection. AR you can experience on a mobile device today.

Mandeep: So Apple can really develop their AR business without even launching a headset. Would that be a fair guess in terms of what they could do potentially with the current smartphones?

Sravanth: They already have. If you see AR kit, which is their reality kit. It actually is a software platform, which allows developers to create 3D experiences on a mobile phone today. And that's been their strategy for the last five years. So yeah, you're right. You actually summarized it well from at least the info I have.

Mandeep: So let's get to the business model. How are you monetizing or is it too early to think about monetization for what you're doing with that platform as a service offering?

Sravanth: I think we've already covered the journey of utility value and monetization for mobile AR. Today on Shopify, we get a 1% of transaction GMV that goes through that particular product that has been converted to 3D through our platform. We have been able to do that because we are seeing about 3X transactions uplift for consumers across geographies today.

We are to live with consumer electronics, large appliances, home improvement furniture, and all of those use cases, equally fashion, accessories, jewelry, sunglasses. With that kind of utility value, our model has been to first drive higher revenues for our clients, and then take a small cut of it by leaving 99% value on the table. And that's been our flywheel. Once we are a revenue growth solution and the cost savings proposition with the AI that we have for the supply, you're literally talking about the two things that really matter from an ROI perspective.

Mandeep: So probably what you are doing is what a direct response Ad was doing before in terms of driving conversion and that's because that 3D experience is more engaging. So guessing the conversion for a 3D rendered image is higher than a 2D rendered image.

Sravanth: We are seeing almost 200% uplift on an average across categories that we are driving, across geographies.

Mandeep: When we talk to Unity and Roblox, this term “Neural Radiance Fields” comes up a lot. Is that the underlying technology you are using or are you using something else for the 3D content that you are rendering?

Sravanth: I'm proud to say that we have developed the first commercial Neural Radiance Fields (NeRF) based tech AR in the world. Neural radiance fields are the future of 3D as they are solving that historical manual creation problem that the mesh world had. Equally from a consumer perspective there's a term called uncanny valley which is when people get an eerie feeling when they see a close to real, but not a real representation. Our subconscious minds are so trained with our eye every day that we catch this like an emotion, which is a subconscious emotion that gets triggered. NeRF does solve that problem because today 3D is a cheat code for trying to emulate the laws of physics around us into equations that can be run on mobile devices. NeRF makes 3D as simple as a photo. Anyone can now just take any camera, simply take a scan of a product at their homes, and then get an as is version of the product. Large scale merchants can set up a photo studio and not worry about scaling it up which is the biggest unlock that this technology brings for many problems. This technology will democratize 3D, and be looked at as the biggest building block inflection that happened in the shift from 2D web to 3D web.

Mandeep: Then the question is, since you mentioned Google deep mind, how much of what you have built is proprietary versus using an open source kind of version of what deep mind is offering?

Sravanth: The big beauty of Neural Radiance Fields is the deep learning nature of it. Unlike a deterministic algorithm, it learns with every conversion. Any conversion that happens today, on our technology or renderer improves the time, cost and quality of the next one. So what we've created is a win-win need for a central platform that gets the benefit of every partner and gives back the cumulative benefit to each of them. That is a very important change in the way this was happening earlier to how we are tackling this problem. I hope that gives you a little bit of comfort on what is the value that we create for anyone who's thinking of doing say 2D to 3D at scale? Our platform would've the benefit of data. We are actually strong believers of open standards and open source, because our vision at Avataar is democratizing 3D. We are not a captive platform. So our aspirations are different. We want the industry advancing!

Mandeep: In terms of the training data that you have accumulated over time, it is more centered around eCommerce. I mean, going back to your partnership with Shopify and again, the algorithm works best when you have been training the algorithm over time. So would that be a fair assumption?

Sravanth: We have a degree of freedom there as we learn surfaces and materials. not categories. For our training, we are surface agnostic and can actually do any objects. So you'd see today that we are category agnostic from an engineering standpoint and GTM standpoint. We wanted to prove utility value, so we picked commerce.

Mandeep: So maybe paint a picture for us, 5 or 10 years out. How big do you think this 3D image driven commerce could become and what would you need to do what you're doing at scale?

Sravanth: 3D commerce is the decision that everyone wants to make and it has become a standard. The decision is primarily driven by the ROI proposition. So 10 years down the line, I cannot imagine any catalog not being 3D enabled and the ability to at least drop that couch in your home would have happened. 3D ready with 5% of their catalog from a consumer perspective, I think that'll grow to a hundred percent in the next five years. I think there is still a lot to do on the metaverse side of the story, which is the true evolution. If you look at the research from say, Morgan Stanley, you will see extraordinary numbers being assigned to the industry size.

Mandeep: I guess maybe we can spend one more minute on just what we are seeing with autonomous driving and Tesla. They are also doing 3D rendering through all the cameras they have inside the car. Are they using something else for that 3D conversion? Is there a use case that you have in that?

Sravanth: There are two ways to do it. One is to take a LIDAR sensor and put in expensive hardware, which can do the 3D scanning, through the hardware approach of lasers. There's a software approach, which is the ability to just use any camera. No hardware change. AI can do what is needed to shift from 2D to 3D. So I'd say Google right now is trying to do that -> Keeping the cost down yet solve the problem. With Tesla, there are many LIDAR sensors around, so for the flagship product, this isn't even a challenge. This is honestly a challenge for the products which are more price sensitive.

Mandeep: How long do you think it will take for somebody to get started on something similar to what you are doing? What are the other things that you feel someone else could do that you haven't done.

Sravanth: There are a lot of opportunities to use this upgrade of spatial depth for things that we do today. A salesman, for example could see the background of the person that he/she is talking to or could see their LinkedIn profiles. While I speak, the AI might listen to our conversation and start dropping interesting pointers on how to take the conversation more deeper, and build a relationship. Those things would be super productive. Imagine making a superhuman in at least one use case. There'll be many such use cases that will emerge out of this fundamental shift. Imagine driving, if you could have everything co-located to physical reality. Instead of a turn that you see on a flat screen today, what if there's a big arrow coming on the road. So I do think that there is a clear upgrade of utility value in the digital Web 3.0. It will decentralize the experience such that consumers will, first of all, pull, so a lot of the privacy problems, not just payment, but even experience wise can be solved and equally you'd see a lot of the Web3 startups start thinking about 3D and the metaverse.

Mandeep: What is one technology or trend that you are most excited about over the next two years?

Sravanth: I think Web3. I do think there is a serious utility value and I think that is the future. In my view, Web3 would be decentralization of the experience, payments, every bit of it, not just one chain, one crypto, or use case. But I think all of those are real and I'm very excited about what that does to the digital world and how we live our lives.

Mandeep: Any misconceptions about the 3D rendering or 3D experiences that you want to clear on this podcast?

Sravanth: I feel most people think it's very expensive today. That problem was yesterday. Also it is believed that it is very cumbersome for the consumer experience or that consumers are still not ready. I think that is also a fact of yesterday and isn't relevant anymore. Enough evidence is available for someone who is serious to know about it.

Mandeep: What could go wrong with the assumptions that you're making around the adoption of 3D?

Sravanth: I actually think the hardware side is the real question mark, because you really need to solve the wearable tech to make this happen in a real way. Until then they'll always be a percentage. Like I said, AR is 30% today. I think that's the real challenge. I don't think there's a hardware yet that I know that clearly challenges a mobile phone and we need that moment to happen in the next five years.

Mandeep: When are you going public?

Sravanth: Hopefully within the decade. We were seeing a lot but too early for me to say. We have the aspirations of the long game, so that would likely be an outcome in our journey.

Table of Contents

Are we at an inflection point of 3D content creation & AR experiences?

Continue Reading

Democratizing AR: Making 3D Scalable and Accessible

Stepping into a Metaverse powered future with endless possibilities

WebAR and its role as a building block for the next stage of the internet

Frequently asked questions

Looking for awesome content?