Edge AI: how to make the magic happen with Kubernetes

Sci-fi experiences here today — in your industry

Picture this: You walk into a Macy's store, taking a break from the virtual world to indulge in the tactile experience of brick-and-mortar shopping.

As you enter, a tall and broad-shouldered Android named Friday greets you warmly.

"Welcome back, Lucy. How are you doing today? Did you like the pink sweater you bought last week?"

Taken aback by the personal touch, you reply, "I'm doing well, thank you. I'm actually looking for a blouse to go with that sweater."

Friday engages you in conversation, subtly inquiring about your preferences, until finally, the smart mirror built into its torso lights up. It scans your body and overlays selected clothing options on your avatar. Impressed?

Now, with a nod of your head, one of Friday's minions retrieves your chosen attire and some matching accessories, and leads you to the next available dressing room.

Sci-fi has been dreaming (not always positively) about this kind of thing for decades — can you believe that Minority Report came out more than 20 years ago?

But at last the future is here. This is the evolving reality in retail, healthcare, agriculture, and a wide range of other sectors, thanks to rapid maturation of artificial intelligence (AI). From self-driving cars and other autonomous vehicles, to fruit-picking robots and AI medical imaging diagnoses, AI is now truly everywhere.

AI: the magic behind the curtain

As enchanting as the customer experience in our Macy’s scenario seems, the technological orchestration behind it is no less impressive.

When we say ‘AI’ we are probably talking about the seamless integration of so many different technologies:

Text-to-speech (TTS), to convert Friday’s script and product names into spoken audio
Speech-to-text (STT), to recognize your responses and store them
Object detection, to recognize apparel and customers
Natural language processing, to extract meaning from your spoken responses
Image generation to create sample outfits from prompts

And of course behind it all, an up-to-date, vectorized database of available store merchandise and customer records.

When I said that AI has been maturing rapidly, I wasn’t kidding. These complex capabilities are now really in reach of every business, including yours, thanks to a blooming landscape of open source technologies and simple SaaS products.

I recently built a demo mimicking this interactive personal shopping experience. It’s not quite Friday, and it’s not in an android body — but it is in your browser for you to try out right here.

Our example smart shopping assistant shows how AI can transform the shopping experience.

By the way: as someone who doesn’t code as much as I used to, I was still able to implement this entire stack in less than a day, and most of the time was wrestling with CSS to make rounded corners! The actual TTS, STT, and LLM portions were relatively easy, using the amazing OpenAI APIs.

Of course in the real world I probably wouldn’t want to send my corporate data into the bowels of OpenAI, which is why I strongly urge you to check out our open source LocalAI project, which enables you to run the whole lot on-prem.

AI’s true home is on the edge

Today, training and deep learning of models happens at huge expense in clouds and data centers, because it’s so computationally intensive. Many of the core processing and services we talk about above run on cloud services, where they can easily scale and be managed.

But for actual business-critical inferencing work in the real world, retail assistants and other AI workloads probably shouldn’t live in the cloud. They need to live at the edge. In fact, we believe the natural home for most AI workloads will be running at the edge of the network.

Why? Distributed computing puts processing power and compute resources closer to the user and the business process. It sidesteps several key challenges:

Connectivity: AI workloads can involve a huge amount of data. Yet, particularly in rural or industrial use cases, internet connections at edge locations are often intermittent or slow, presenting a major bottleneck. And 5G rollouts won’t fix that any time soon. If you process the data on the device, you don’t need to connect to the internet.
Latency: Even if you have connectivity to the DC or cloud, latency becomes a factor. A few hundred milliseconds might not sound like much, but in a real-time interaction, it's an eternity. Running AI workloads at the edge enables real-time experiences with almost instantaneous response times.
Cloud costs: Hyperscalers charge to ingest your data, move it between availability zones, and extract it again. Across millions of AI interactions, these costs add up.
Data privacy: Many AI workloads will gather sensitive, regulated data. Do you really want your body measurements and shopping history floating around in the cloud? With edge computing, your personal sensitive data is processed locally right there on the edge server and that’s where it can stay, if compliance demands it.

But edge introduces its own challenges…

Anyone who has tried deploying edge computing infrastructure at scale, whether on Kubernetes or another platform, will tell you that it’s hard work.

You’ll be confronted with challenges around deploying and onboarding hardware, on an ongoing basis. How can you get your Friday android booted and active when there’s no IT expert in the store to ninja the CLI?

You have to address security, when devices may be vulnerable to physical tampering. At minimum you need to consider encryption and a means of verifying device trust, at boot and beyond.

And you need to manage hardware and software stacks at scale, from monitoring to patching. This itself is a major challenge when considering the hardware limitations of small form-factor edge devices, and the intermittent connectivity between the device and management infrastructure back at HQ. You need an architecture with zero-risk updates and the ability to run effectively in an air-gap environment.

…and AI amplifies the challenges of the edge

Adding AI into edge environments introduces further layers of complexity. It’s not just infrastructure you need to manage now, but also:

Models: Your data scientists have an overwhelming number of ready-made datasets and models to choose from on popular repositories like Hugging Face. How can you help them quickly try out and deploy these models — and then keep them up to date on a daily or weekly basis?

AI engines: Engines such as Seldon, BentoML and Kserve need constant maintenance, updates, and tuning for optimal performance. Updating these across many locations becomes tedious and error prone.

AI models process incoming data in real-time, turning raw inputs like voice commands or sensor readings into actionable insights or personalized interactions. AI engines such as Seldon, BendoML and Kserve run those AI models. Think of it like this: AI models are workloads, and the AI engine is the runtime in which these models are executed.

Solving the challenges of edge for your AI workloads

This is the problem space we've been attacking as we build Palette EdgeAI, announced today.

Palette EdgeAI helps you deploy and manage the complete stack, from the edge OS and Kubernetes infrastructure, to the models powering your innovative AI apps.

Without diving too deep into the feature list, EdgeAI enables you to:

Deploy and manage your edge AI stacks to edge locations at scale, from easy hardware onboarding options to repeatable ‘blueprints’ that include your chosen AI engine.
Update your edge models frequently without risk of downtime, with easy repo integration and safe, version-controlled updates and rollbacks.
Secure critical intellectual property and sensitive data, with a complete security architecture from silicon to app, including immutability, secure boot, SBOM scans and air-gap mode.

The integration of AI and edge computing is not just an intriguing possibility; it's a necessity for the next leap in technology and user experience. As we stand on this exciting frontier, one thing is clear: the future of shopping, healthcare, business and many other aspects of life will be smarter, faster, and more personalized than ever.

There are edge AI use cases coming for your industry, too. The growth forecasts are stratospheric: the global Edge AI software market is set to grow from $590 million in 2020 to $1.83 trillion by 2026, according to MarketsandMarkets Research.

Ready to be a part of this future? Of course you are. But the benefits of AI will only be in your grasp if you can tackle the challenges of the edge. That's where we can help. So why not take a look at Palette EdgeAI and let us know what you think?

Tags:

Edge Computing

Enterprise Scale

Operations

Cloud