Amazon SQL Interview: 10 Scenario-Based SQL Questions for Data Analysts and Data Engineers

Preparing for a SQL interview at Amazon or a similar data-driven company? Expect scenario-based questions that test how you think with data, not just how well you remember syntax. This guide walks through 10 real interview-style problems that cover core concepts like window functions, CTEs, recursive queries, ranking, and time-based analysis. You will see how to break each problem into steps, understand why the approach works, and learn common pitfalls to avoid. We will use simple SQL Server style syntax so you can focus on ideas first.

We will break down each question step by step, just like in the video.

 

Question 1: Calculate Running Total of Quantity Sold per Product in Each Month

Problem Overview

You have a sales table with product_id, quantity, and sale_date. The goal is to produce a running total of quantity sold per product for each month. This is a classic use of window functions.

Step-by-Step Solution

  1. Select product_id and extract the month from sale_date.
  2. Compute the running total using SUM over a partition of product and month, ordered by sale_date.
  3. Include sale_date and product_id in the output for clarity.

Example result idea: if product_id 2 has multiple orders in July and August, you will see a fresh running total for July and a fresh one for August.

SELECT
  product_id,
  CAST(sale_date AS date) AS sale_date,
  DATEFROMPARTS(YEAR(sale_date), MONTH(sale_date), 1) AS month_start,
  SUM(quantity) OVER (
    PARTITION BY product_id, YEAR(sale_date), MONTH(sale_date)
    ORDER BY sale_date, product_id
    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
  ) AS running_total
FROM sales
ORDER BY product_id, sale_date;

Key Takeaway

  • Ordering by sale_date ensures the running total grows in time.
  • Partitioning by product and month resets the total each month.

Question 2: Identify Maximum Consecutive Login Days per User

Problem Overview

You have a login table with user_id and activity_date. Find each user’s longest consecutive login streak. This is a classic gap and island problem.

Step-by-Step Solution

First, assign row numbers per user ordered by activity_date. Then subtract that row number from the activity_date to form a stable group key for consecutive days. Finally, count streak length per group and pick the maximum per user.

WITH rn AS (
  SELECT
    user_id,
    CAST(activity_date AS date) AS activity_date,
    ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY activity_date) AS rn
  FROM user_logins
),
grp AS (
  SELECT
    user_id,
    activity_date,
    DATEADD(day, -rn, activity_date) AS grp_key
  FROM rn
),
streaks AS (
  SELECT
    user_id,
    MIN(activity_date) AS start_date,
    MAX(activity_date) AS end_date,
    COUNT(*) AS streak_length
  FROM grp
  GROUP BY user_id, grp_key
)
SELECT
  s.user_id,
  s.streak_length,
  s.start_date,
  s.end_date
FROM streaks s
QUALIFY ROW_NUMBER() OVER (PARTITION BY s.user_id ORDER BY s.streak_length DESC) = 1;

If QUALIFY is not supported in your SQL engine, wrap the final select in another CTE and filter with ROW_NUMBER = 1.

Key Takeaway

  • Subtracting row numbers from dates creates stable groups for consecutive sequences.
  • You can also return the start and end dates of each streak using MIN and MAX.

Question 3: Find the Third Highest Salary per Department

Problem Overview

You have an employee table with department_id and salary. Find the third highest salary per department, handling ties correctly.

Step-by-Step Solution

Use DENSE_RANK over department, ordered by salary in descending order. DENSE_RANK handles ties without skipping ranks.

WITH ranked AS (
  SELECT
    department_id,
    employee_id,
    salary,
    DENSE_RANK() OVER (
      PARTITION BY department_id
      ORDER BY salary DESC
    ) AS rnk
  FROM employees
)
SELECT
  department_id,
  employee_id,
  salary
FROM ranked
WHERE rnk = 3;

Key Takeaway

  • Choose the ranking function based on tie handling. DENSE_RANK is usually the right choice for Nth highest.

Question 4: Find Users Who Purchased Every Product

Problem Overview

Given users, products, and sales (user_id, product_id), return users who purchased all products in the product catalog.

Step-by-Step Solution

Count distinct products per user and compare it to the total number of products.

SELECT
  s.user_id
FROM sales s
GROUP BY s.user_id
HAVING COUNT(DISTINCT s.product_id) = (SELECT COUNT(*) FROM products);

Example idea: if there are 3 products, any user with 3 distinct product purchases qualifies.

Key Takeaway

  • GROUP BY with HAVING and a count comparison solves this cleanly.

Question 5: Identify Users Placing More Than One Order Within 1 Minute

Problem Overview

In retail, duplicate or rapid repeat orders happen. Given order_id, user_id, and order_time, find users who place multiple orders within 60 seconds.

Step-by-Step Solution

Use LAG to access the prior order_time per user, then compute the time difference.

WITH t AS (
  SELECT
    user_id,
    order_id,
    order_time,
    LAG(order_time) OVER (
      PARTITION BY user_id
      ORDER BY order_time
    ) AS prev_time
  FROM orders
)
SELECT
  user_id,
  order_id,
  order_time,
  prev_time,
  DATEDIFF(second, prev_time, order_time) AS diff_seconds
FROM t
WHERE prev_time IS NOT NULL
  AND DATEDIFF(second, prev_time, order_time) <= 60;

This flags sequences where the gap between consecutive orders is 60 seconds or less.

Key Takeaway

  • Compare each row to its previous row using LAG to detect near-duplicate events.

Question 6: Rank Products by Revenue in Each Quarter

Problem Overview

Compute total revenue per product per quarter, then rank products within each quarter by revenue.

Step-by-Step Solution

Aggregate by year and quarter, then apply a ranking function on the totals.

WITH rev AS (
  SELECT
    product_id,
    YEAR(sale_date) AS yr,
    DATEPART(quarter, sale_date) AS qtr,
    SUM(revenue) AS total_revenue
  FROM sales
  GROUP BY product_id, YEAR(sale_date), DATEPART(quarter, sale_date)
)
SELECT
  product_id,
  yr,
  qtr,
  total_revenue,
  RANK() OVER (
    PARTITION BY yr, qtr
    ORDER BY total_revenue DESC
  ) AS revenue_rank
FROM rev
ORDER BY yr, qtr, revenue_rank, product_id;

If revenue needs to be computed, use quantity * unit_price in the SUM.

Key Takeaway

  • Aggregate first, rank second. Order by total_revenue in descending order for top earners.

Question 7: Generate an Organization Chart Using Employee to Manager Hierarchy

Problem Overview

You have employees with employee_id, name, and manager_id. Build the full reporting chain. This calls for a recursive CTE with a self-join.

Step-by-Step Solution

Start from top-level managers, then walk down the tree. Build a path column to show the chain.

WITH org_chart AS (
  -- Anchor: top-level managers
  SELECT
    e.employee_id,
    e.name,
    e.manager_id,
    CAST(e.name AS VARCHAR(MAX)) AS path
  FROM employees e
  WHERE e.manager_id IS NULL

  UNION ALL

  -- Recursive: direct reports
  SELECT
    e.employee_id,
    e.name,
    e.manager_id,
    CAST(oc.path + ' -> ' + e.name AS VARCHAR(MAX)) AS path
  FROM employees e
  JOIN org_chart oc
    ON e.manager_id = oc.employee_id
)
SELECT
  employee_id,
  name,
  manager_id,
  path
FROM org_chart
ORDER BY path;

This returns each employee with a readable chain, for example CEO -> VP -> Manager -> Analyst.

Key Takeaway

  • A clear anchor for top nodes and a recursive join for children builds the hierarchy. A path column helps visualize the structure.

Question 8: Identify Customers with Month over Month Revenue Increase

Problem Overview

Given orders with customer_id, order_date, and revenue, find customers whose revenue increased from the previous month.

Step-by-Step Solution

Aggregate revenue per customer per month. Use LAG to fetch the previous month’s revenue, then filter.

WITH monthly AS (
  SELECT
    customer_id,
    YEAR(order_date) AS yr,
    MONTH(order_date) AS mo,
    SUM(revenue) AS total_revenue
  FROM orders
  GROUP BY customer_id, YEAR(order_date), MONTH(order_date)
),
cmp AS (
  SELECT
    customer_id,
    yr,
    mo,
    total_revenue,
    LAG(total_revenue) OVER (
      PARTITION BY customer_id
      ORDER BY yr, mo
    ) AS prev_revenue
  FROM monthly
)
SELECT
  customer_id,
  yr,
  mo,
  total_revenue,
  prev_revenue
FROM cmp
WHERE prev_revenue IS NOT NULL
  AND total_revenue > prev_revenue
ORDER BY customer_id, yr, mo;

Example idea: if a customer had 500 last month and 700 this month, they qualify.

Key Takeaway

  • LAG is perfect for month over month comparisons. Always order by year and month.

Question 9: Calculate Median Revenue per Product

Problem Overview

You need the median revenue per product. Median is the middle value in a sorted list. For SQL Server, PERCENTILE_CONT returns the continuous 50th percentile.

Step-by-Step Solution

Compute the percentile per product with a window function. Use DISTINCT to return one row per product.

WITH med AS (
  SELECT
    product_id,
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY revenue)
      OVER (PARTITION BY product_id) AS median_revenue
  FROM sales
)
SELECT DISTINCT
  product_id,
  median_revenue
FROM med
ORDER BY product_id;

Key Takeaway

  • PERCENTILE_CONT gives a precise median for continuous distributions. It is ideal for financial stats.

Question 10: Detect Dates with No Sales in a Month

Problem Overview

Find all dates in a given month where no sales were recorded. This requires a complete list of dates and a comparison to actual sale dates.

Step-by-Step Solution

Generate a calendar for the month with a recursive CTE. Get distinct sale dates. Left join to spot missing days.

-- Generate all dates for August 2023
WITH all_dates AS (
  SELECT CAST('2023-08-01' AS date) AS sale_date
  UNION ALL
  SELECT DATEADD(day, 1, sale_date)
  FROM all_dates
  WHERE sale_date < '2023-08-31'
),
sale_days AS (
  SELECT DISTINCT CAST(sale_date AS date) AS sale_date
  FROM sales
)
SELECT
  ad.sale_date
FROM all_dates ad
LEFT JOIN sale_days s
  ON ad.sale_date = s.sale_date
WHERE s.sale_date IS NULL
ORDER BY ad.sale_date
OPTION (MAXRECURSION 0);

This returns every date in August with no sales. If your engine needs a different recursion limit, adjust accordingly.

Key Takeaway

  • Generate a continuous date set, then left join to identify gaps. This pattern works for many gap-detection tasks.

Sample Data Scripts -

-- 1. sales table
CREATE TABLE sales (
    order_id INT,
    user_id INT,
    product_id INT,
    sale_date DATE,
    quantity INT,
    revenue DECIMAL(10,2)
);
INSERT INTO sales (order_id, user_id, product_id, sale_date, quantity, revenue) VALUES
(101, 1, 1, '2023-08-01', 2, 2000.00),
(102, 2, 2, '2023-08-01', 3, 1500.00),
(103, 1, 1, '2023-08-02', 1, 1000.00),
(104, 3, 3, '2023-08-03', 5, 4000.00),
(105, 4, 2, '2023-08-05', 2, 1000.00),
(106, 2, 3, '2023-08-07', 1, 800.00);


-- 2. products table
CREATE TABLE products (
    product_id INT PRIMARY KEY,
    product_name VARCHAR(100),
    price DECIMAL(10,2)
);
INSERT INTO products (product_id, product_name, price) VALUES
(1, 'Laptop', 1000.00),
(2, 'Tablet', 500.00),
(3, 'Phone', 800.00);

-- 3. user_activity table
CREATE TABLE user_activity (
    user_id INT,
    activity_date DATE
);
INSERT INTO user_activity (user_id, activity_date) VALUES
(1, '2023-08-01'),
(1, '2023-08-02'),
(1, '2023-08-03'),
(2, '2023-08-01'),
(2, '2023-08-03'),
(3, '2023-08-02');


-- 4. employees table with department
CREATE TABLE employees (
    emp_id INT PRIMARY KEY,
    name VARCHAR(100),
    salary INT,
    department_id INT,
    manager_id INT
);
INSERT INTO employees (emp_id, name, salary, department_id, manager_id) VALUES
(1, 'Alice', 90000, 101, NULL),
(2, 'Bob', 85000, 101, 1),
(3, 'Charlie', 87000, 102, 1),
(4, 'Diana', 86000, 101, 2),
(5, 'Eve', 83000, 102, 3);

-- 5. users table
CREATE TABLE users (
    user_id INT PRIMARY KEY
);
INSERT INTO users (user_id) VALUES
(1), (2), (3), (4);

-- 6. orders table
CREATE TABLE orders (
    order_id INT,
    customer_id INT,
    order_date DATE,
    revenue DECIMAL(10,2)
);
INSERT INTO orders (order_id, customer_id, order_date, revenue) VALUES
(201, 1, '2023-07-01', 500.00),
(202, 1, '2023-08-01', 700.00),
(203, 2, '2023-07-15', 600.00),
(204, 2, '2023-08-01', 800.00),
(205, 3, '2023-08-03', 300.00),
(206, 1, '2023-09-01', 900.00);

-- 7. sales for median query
INSERT INTO sales (order_id, user_id, product_id, sale_date, quantity, revenue) VALUES
(107, 4, 2, '2023-08-08', 2, 1000.00),
(108, 4, 3, '2023-08-09', 1, 800.00);

-- 8. Additional order times table
CREATE TABLE order_times (
    order_id INT,
    user_id INT,
    order_time DATETIME
);

INSERT INTO order_times (order_id, user_id, order_time) VALUES
(301, 1, '2023-08-01 10:00:00'),
(302, 1, '2023-08-01 10:00:30'),
(303, 2, '2023-08-01 11:00:00'),
(304, 2, '2023-08-01 11:02:00');

Wrapping Up These Amazon SQL Challenges

These 10 questions map to the core skills most teams test: window functions, CTEs (including recursive ones), ranking, date math, and joins. Practice them with sample data and try small variations, for example Nth highest per group, or different time windows with LAG and LEAD. That builds intuition you can apply under pressure.

If you want to add structure to your learning, consider the Google Data Analytics Professional Certificate on Coursera. It is a solid path for analysts who want hands-on practice. You can check out the Google Data Analytics Professional Certificate using this link: Google Data Analytics Professional Certificate. For a deeper path with more advanced topics, see the Google Advanced Data Analytics Professional Certificate.

Looking to upgrade your work setup? Here are tools that pair well with long SQL sessions and analysis work:

Want a breakdown of budget, mid-range, and premium picks? Explore the full guide: 10 Must-Have Tech Gear for Programmers.

Stay connected with Learn at Knowstar:

Thanks for reading. Which question will you practice first? Try writing one query from memory today, then compare it to the solutions here. Consistent reps turn these patterns into muscle memory.


Post a Comment

Previous Post Next Post

Contact Form