There might be scenarios where we need to convert data from columns into rows.
Imagine the below Sales dataset -
Region Jan-2022 Feb-2022 March-2022
A 2000 4500 3298
B 1876 1003 2500
Now instead of having a separate column for each month's sales, we wish to have an output as shown below -
Region Month Sales
A Jan-2022 2000
A Feb-2022 4500
A March-2022 3298
B Jan-2022 1876
B Feb-2022 1003
B March-2022 2500
The easiest way to achieve this is using the Pandas melt function.
Syntax -
pandas.melt(frame, id_vars=None, value_vars=None, var_name=None, value_name='value', col_level=None, ignore_index=True)
The function Unpivots a DataFrame from wide to long format, optionally leaving identifiers set.
It is useful to massage a DataFrame into a format where one or more columns are identifier variables (id_vars), while all other columns, considered measured variables (value_vars), are “unpivoted” to the row axis, leaving just two non-identifier columns, ‘variable’ and ‘value’.
Parameters
Column(s) to use as identifier variables.
value_vars tuple, list, or ndarray, optional
Column(s) to unpivot. If not specified, uses all columns that are not set as id_vars.
var_name scalar
Name to use for the ‘variable’ column. If None it uses frame.columns.name or ‘variable’.
value_name scalar, default ‘value’
Name to use for the ‘value’ column.
col_level int or str, optional
If columns are a MultiIndex then use this level to melt.
ignore_index bool, default True
If True, original index is ignored. If False, the original index is retained. Index labels will be repeated as necessary.
Solution for our Problem -
df.melt(id_vars=["Region"],
var_name="Month",
value_name="Sales")