Pivoting a Multi-Index Pandas DataFrame
Pandas is a popular Python library for data analysis and manipulation. It provides a powerful data structure called the DataFrame, which is similar to a table in a relational database. DataFrames can have multiple levels of index, which can make them challenging to work with. In this blog post, we’ll show you how to pivot a multi-index Pandas DataFrame to create a new DataFrame with columns and rows.
Suppose we have a Pandas DataFrame that looks like this:
Cases
Category Country
Confirmed Afghanistan 209451
Albania 334457
Algeria 271496
Andorra 47890
Angola 105288
... ...
Death West Bank and Gaza 5708
Yemen 2159
Zambia 4057
Zimbabwe 5671
[603 rows x 1 columns]
This DataFrame has a multi-level index, with the first level corresponding to the “Category” (either “Confirmed” or “Death”) and the second level corresponding to the country name. The DataFrame has only one column with the total number of confirmed or deaths cases for each country.
Suppose we want to create a new DataFrame with the “Death” and “Confirmed” categories as columns and the country names as rows. To do this, we need to pivot the DataFrame so that the first level of the index becomes the columns and the second level becomes the rows.
We can use the .unstack() method to pivot the DataFrame. The .unstack() method pivots the DataFrame based on the specified level(s) of the index. In this case, we want to pivot the DataFrame based on the first level of the index (i.e., the “Category” level), so we’ll pass level=0 to the .unstack() method.
Here’s the code to pivot the DataFrame:
new_df = df.unstack(level=0)['Value']
The resulting DataFrame looks like this:
Death Confirmed
Afghanistan 5793 209451
Albania 7591 334457
Algeria 7534 271496
Andorra 652 47890
Angola 2757 105288
... ... ...
West Bank and Gaza 5708 307127
Yemen 2159 8829
Zambia 4057 114209
Zimbabwe 5671 128940
[201 rows x 2 columns]
In this new DataFrame, the index consists of the country names, and there are only 201 rows (one for each country). The “Death” and “Confirmed” categories are now columns in the DataFrame.