Google SQL Interview Question - How to Find the Median Google Search Frequency Using SQL

Finding the median number of searches made by users last year was a real SQL problem asked in a Google interview. The Google marketing team needed this info to feature in a Super Bowl ad. Since Google handles trillions of searches, analyzing raw data isn't practical. Instead, they use a summary table that lists the number of users and how many searches each made. This post walks you through solving this problem step-by-step using SQL, from understanding the problem to writing the final query.

   

Understanding the Problem and Summary Table

Google’s question focuses on calculating the median number of searches performed by users last year. The challenge is the huge volume of data — trillions of searches — so Google uses an aggregated table rather than raw data.

The table in question is called search_frequency. It doesn’t list individual user searches but summarizes the data like this:

Number of Searches Number of Users
1 2
2 2
3 3

This means 2 users made 1 search each, 2 users made 2 searches each, and 3 users made 3 searches each, and so on.

The question is: How do we find the median search frequency based on this summary? The median should be rounded to one decimal place as per the task.

What Is Median and How Do You Find It?

Before jumping into SQL, it’s important to understand what a median really is. The median is the middle value in a sorted list of numbers.

Say we have this sorted data set:

1, 2, 5, 7, 9

Here, the median is 5 because it’s right in the middle of these five values.

If the list has an even number of values, the median is the average of the two middle numbers. For example:

5, 7, 9, 12

The middle numbers are 7 and 9, so the median would be (7 + 9) ÷ 2 = 8.

The key step is sorting the data first. Without sorting, you can’t accurately find the median.

Here is a quick overview to find the median in a data set:

  1. Sort the data in ascending order
  2. If the number of data points is odd, pick the middle element
  3. If even, calculate the average of the two middle elements

Step 1: Expanding the Summarized Data Table

The search_frequency table only has counts of users for each search frequency. To find the median accurately, you need the full list of every single user’s search count — not just the summarized counts.

That means expanding the summary into a full dataset where each row represents a single user's search count.

Let’s look at what expanded data looks like for the sample above:

Search Count
1
1
2
2
3
3
3

Notice that the count of rows now matches the total number of users (2 + 2 + 3 = 7), each with their individual search count.

To do this expansion in SQL, you can use a recursive common table expression (CTE). If you’re not familiar with recursive CTEs, they allow a query to repeatedly run until a limit is reached. Here’s how:

  • Anchor query: Starts by selecting search counts and number of users from the summary table, with a starting row count of 1.
  • Recursive part: Adds rows by incrementing a counter until it equals the number of users for that search count.

This process "explodes" each summarized row into multiple rows representing individual users.

A simplified version of the recursive CTE looks like this:

WITH RECURSIVE search_expanded AS (
  SELECT 
    searches,
    users,
    1 AS count
  FROM search_frequency

  UNION ALL

  SELECT
    searches,
    users,
    count + 1
  FROM search_expanded
  WHERE count < users
)
SELECT searches FROM search_expanded ORDER BY searches;

This query generates a full list of all users and their search counts, making it ready for median calculation.

Step 2: Calculating the Median Using SQL Window Functions

Once you have the expanded dataset, the next step is to find the median. SQL makes this easier with the percentile_cont function, which calculates continuous percentiles.

The median corresponds to the 50th percentile (0.5).

Here’s how you’d write the SQL snippet:

SELECT DISTINCT 
  ROUND(
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY searches) OVER (),
    1
  ) AS median
FROM search_expanded;

Breaking it down:

  • PERCENTILE_CONT(0.5) computes the median (the 50% percentile).
  • WITHIN GROUP (ORDER BY searches) tells the function how to sort the data.
  • OVER () is necessary for the window function to work without partitions (we want the median across all rows).
  • ROUND(..., 1) rounds the median value to one decimal place.
  • DISTINCT ensures the final output shows just one row with the median value.

This query will return the median search count for all users based on the expanded data.

Additional Resources and Learning Materials

If you want to strengthen your SQL skills further, especially recursive CTEs, here are some useful resources from Learn at Knowstar:

You might also consider the following courses and books, which help build strong data analytics and SQL knowledge:

  • Google Data Analytics Professional Certificate
  • Google Advanced Data Analytics Professional Certificate
  • Recommended books:
    • T-SQL Fundamentals by Itzik Ben-Gan
    • Ace the Data Science Interview

For hands-on practice, interactive platforms like LearnSQL offer practical exercises to try out SQL queries directly in your browser.

Connect with Learn at Knowstar

Stay updated and get more learning materials from Learn at Knowstar through:

Don’t forget to subscribe, like, and share their content to support more quality tutorials.

Recommended Laptop for Data Analytics

A reliable laptop makes a big difference when working with SQL and large datasets. The Lenovo IdeaPad Slim3 Gen8 is a solid choice. It handles data analytics tools smoothly and offers good value.

Check it out here: Lenovo IdeaPad Slim3 Gen8

Conclusion

Calculating the median search frequency from a summarized table can seem complicated, but breaking it down helps. First, expand the summary into individual rows using a recursive CTE. Then, use SQL's percentile_cont function to find and round the median.

This approach not only solves the Google interview question but also gives you insight into working with aggregated data and applying window functions in SQL.

For anyone preparing for data roles or interviews, mastering these techniques boosts your ability to handle real-world data problems confidently.

Keep exploring SQL functions and recursive queries—they’re powerful tools in your data toolkit.

Post a Comment

Previous Post Next Post

Contact Form