AI Tech Stack

Understand AI Tech stack via Rufus

Dec 18, 2025

let’s understand the AI tech stack with help of an example that is Amazon’s Rufus.

When you ask Rufus, “What are the best running shoes for flat feet?”, it doesn’t just guess. There is a massive activation across the 5 layers of the AI Tech Stack.

Layer 1: Infrastructure

Running AI is expensive. Every time you ask a question, a massive server has to think.

Standard computer chips (CPUs) that run your laptop aren't good at the type of math required for modern AI. For that, you need Graphics Processing Units (GPUs). These chips were originally designed for video games, but it turns out they are perfect for training huge AI models.

NVIDIA: They design the GPUs that power almost the entire AI revolution. But some players like Google is designing their own Chips.

Amazon has also done something similar like - Vertical Integration.

Amazon doesn’t just buy chips; they design them. Rufus runs largely on AWS Trainium and AWS Inferentia chips.

Trainium: Custom silicon designed solely to teach the model (training).
Inferentia: Custom silicon designed solely to run the model (inference) when you ask a question.

By controlling the hardware, Amazon lowers the “cost per query” significantly compared to competitors who rely entirely on generic NVIDIA GPUs.

The bottom layer of the stack is an economics game. The winner is the one who can generate intelligence at the lowest cost per watt.

Layer 2: Models ( Intelligence )

This is the layer that gets the actual intelligence. Model is like something which has read the internet and learned how to predict the next word in a sentence.

There are different models like GPT 5 or open source model like Lama or Google’s Model Gemini 3

Rufus used a composite Model approach.

Amazon trained a custom Large Language Model (LLM) specifically on its own data—billions of product descriptions, reviews, and Q&A threads. This model is the “shopping expert.”
For complex reasoning or conversational flow, Amazon leverages generic powerhouse models like Anthropic’s Claude or their own Amazon Nova family.

Layer 3: Data

When you ask Rufus a question, it doesn’t answer immediately. First, it queries the Amazon Catalog and Customer Reviews Database. Why it is required - because Imagine - The model was trained months ago. It doesn’t know that the price of a Sony TV dropped 10 minutes ago or that a new review was posted today complaining about battery life. This will be a wrong experience. Also it protects form Hallucination

This is often done through something called RAG (Retrieval-Augmented Generation). Before the model answers your question, it quickly searches a private database of your documents, finds the relevant info, pastes it into the prompt, and then asks the model to answer based on that context.

How RAG (Retrieval-Augmented Generation) Helps

Retrieval : When you ask a question, the system doesn’t send it to the AI yet. First, it searches the Amazon Catalog to find the exact product specs and the Reviews Database to find customer opinions relevant to your specific question (e.g., “waterproof”).
Augmentation : The system takes that fresh data—today’s price, the “in stock” status, and the top 5 relevant reviews—and pastes them into a hidden instruction for the AI.
Generation : Now the AI answers. It isn’t reciting from memory; it is reading the “cheat sheet” we just gave it and summarizing the facts.

It uses Vector Search -

Vector Search: It converts your question into math (vectors) to find products that feel similar to your request, even if they don’t use the exact same keywords.
Vector Database Simplified
Shailesh Sharma
·
June 4, 2025
Read full story

Layer 4: Orchestration

Users don’t just ask questions; they want actions. ‘Add this to my cart’ or ‘Compare these three.’ A raw AI model cannot click buttons or navigate a website.

This layer uses ‘Agents’—software that can use tools. Amazon uses an orchestration framework (likely built on Amazon Bedrock Agents) to break your request into steps:

Intent Recognition: The user wants to compare items, not buy them.
Tool Use: The AI triggers a code function: Compare ( A, B ).
Formatting: The system takes the raw data and structures it into a UI card rather than a wall of text.

Layer 5: Application Interface

A chat box is often a bad interface for shopping. You don’t want to read about a shirt; you want to see it.

Rufus doesn’t just output text.

If you ask for recommendations, it slides up a Carousel of products.
If you ask for a comparison, it generates a Comparison Table.
It suggests Follow-up Questions (What about battery life?) to guide users who don’t know what to ask.

For Full Detailed Cases Studies and AI & Strategy — Download this Book ( 5/5 Rated )

Download the Book

About Author

Shailesh Sharma! | LinkedIn I help PMs and business leaders excel in Product, Strategy, and AI using First Principles Thinking. For more, check out my Live cohort course, PM Interview Mastery Course, Cracking Strategy, and other Resources

Apoorva Mittal | LinkedIn

A guest post by

Apoorva Mittal

Product Manager

Technomanagers

Vector Database Simplified

Discussion about this post

Ready for more?