SQL Query to forward fill values for NULL records with the Last Not Null value | The Last Not NULL puzzle

There are situations where we have missing data and we need to forward fill the missing values (Nulls) with the last populated value. This is also popularly known as the Last Not Null puzzle and is asked in many Data analyst interviews for FAANG companies.

The video tutorial is available here - 

https://youtu.be/4MH8iRnhPO0

Today we are going to solve this by writing a SQL Query.

We are going to work the Currency Rate data. Please use the below scripts to create the table and load data in it.



CREATE TABLE [dbo].[CurrencyRate](

[CurrencyKey] [int] NOT NULL,

[DateKey] [int] NOT NULL,

[EndOfDayRate] [float] NULL,

[Date] [datetime] NULL );


INSERT [dbo].[CurrencyRate] ([CurrencyKey], [DateKey], [EndOfDayRate], [Date]) VALUES (3, 20201229,0.999800039992002, '2020-12-29');

INSERT [dbo].[CurrencyRate] ([CurrencyKey], [DateKey], [EndOfDayRate], [Date]) VALUES (3, 20201230,1.00090081072966, '2020-12-30');

INSERT [dbo].[CurrencyRate] ([CurrencyKey], [DateKey], [EndOfDayRate], [Date]) VALUES (3, 20201231,0.999600159936026, '2020-12-31');

INSERT [dbo].[CurrencyRate] ([CurrencyKey], [DateKey], [EndOfDayRate], [Date]) VALUES (3, 20210101,Null, '2021-01-01');

INSERT [dbo].[CurrencyRate] ([CurrencyKey], [DateKey], [EndOfDayRate], [Date]) VALUES (3, 20210102,Null, '2021-01-02');

INSERT [dbo].[CurrencyRate] ([CurrencyKey], [DateKey], [EndOfDayRate], [Date]) VALUES (3, 20210103,Null, '2021-01-03')

INSERT [dbo].[CurrencyRate] ([CurrencyKey], [DateKey], [EndOfDayRate], [Date]) VALUES (3, 20210104,0.999500249875062, '2021-01-04');

INSERT [dbo].[CurrencyRate] ([CurrencyKey], [DateKey], [EndOfDayRate], [Date]) VALUES (3, 20210105,1.000200040008, '2021-01-05');

INSERT [dbo].[CurrencyRate] ([CurrencyKey], [DateKey], [EndOfDayRate], [Date]) VALUES (3, 20210106,0.999200639488409, '2021-01-06');

INSERT [dbo].[CurrencyRate] ([CurrencyKey], [DateKey], [EndOfDayRate], [Date]) VALUES (3, 20210107,1.000200040008, '2021-01-07');

INSERT [dbo].[CurrencyRate] ([CurrencyKey], [DateKey], [EndOfDayRate], [Date]) VALUES (3, 20210108,0.999600159936026, '2021-01-08');

INSERT [dbo].[CurrencyRate] ([CurrencyKey], [DateKey], [EndOfDayRate], [Date]) VALUES (3, 20210109,Null, '2021-01-09');

INSERT [dbo].[CurrencyRate] ([CurrencyKey], [DateKey], [EndOfDayRate], [Date]) VALUES (3, 20210110,Null, '2021-01-10');

INSERT [dbo].[CurrencyRate] ([CurrencyKey], [DateKey], [EndOfDayRate], [Date]) VALUES (3, 20210111,1.00090081072966, '2021-01-11');

INSERT [dbo].[CurrencyRate] ([CurrencyKey], [DateKey], [EndOfDayRate], [Date]) VALUES (3, 20210112,0.99930048965724, '2021-01-12');


For this data, we need to fill the NULLs with the last populated Not Null value from the previous records.

The first step is to group the Null records with the previous Not Null value. The Not Null values which do not have any succeeding NULL values will be grouped independently as well.

The best way to achieve this grouping is by using the Count(column name) function with Windows partitioning.

Select * ,

Count(Rate) OVER (Partition by CurrencyKey Order by DateKey) as Grp 

From CurrencyRate

This will generate a count for each row based on the previous Not Null values. For ex - if  there is one previous record with a Not NULL value and the current value is Not NULL as well, the count will be 2.

Lets assume the next(third) record now has a NULL value. The count will still be 2 as the count(colname) function does not count NULL values. In this way, the NULL values will get grouped in the same group as the last Not NULL value.



The below video tutorial explains the differences between Count(*), Count(1) and Count(colname) in detail.



Now that we have the groups, we can use the FIRST_VALUE function to select the value from the first record in that group. the first record will always have the Not NULL value.

With CurrencyGrp as 

(Select * ,

Count(Rate) OVER (Partition by CurrencyKey Order by DateKey) as Grp 

From CurrencyRate) 


Select *,

FIRST_VALUE(Rate) OVER(Partition by CurrencyKey , Grp Order by DateKey) as CurrencyRate 

FROM CurrencyGrp




If using an older version of SQL Server and the FIRST_VALUE function is not available, then the MAX() function will work as well.

Below is the query using MAX().

With CurrencyGrp as 

(Select * ,

Count(Rate) OVER (Partition by CurrencyKey Order by DateKey) as Grp 

From CurrencyRate) 


Select *,

MAX(Rate) OVER(Partition by CurrencyKey , Grp Order by DateKey) as CurrencyRate 

FROM CurrencyGrp

Post a Comment

Previous Post Next Post

Contact Form