Pivoting a Multi-Index Pandas DataFrame

Posted on Feb 24, 2023

Pandas is a popular Python library for data analysis and manipulation. It provides a powerful data structure called the DataFrame, which is similar to a table in a relational database. DataFrames can have multiple levels of index, which can make them challenging to work with. In this blog post, we’ll show you how to pivot a multi-index Pandas DataFrame to create a new DataFrame with columns and rows.

Suppose we have a Pandas DataFrame that looks like this:

                                   Cases
Category Country                       
Confirmed Afghanistan           209451
          Albania               334457
          Algeria               271496
          Andorra                47890
          Angola                105288
          ...                      ...
Death     West Bank and Gaza      5708
          Yemen                   2159
          Zambia                  4057
          Zimbabwe                5671

[603 rows x 1 columns]

This DataFrame has a multi-level index, with the first level corresponding to the “Category” (either “Confirmed” or “Death”) and the second level corresponding to the country name. The DataFrame has only one column with the total number of confirmed or deaths cases for each country.

Suppose we want to create a new DataFrame with the “Death” and “Confirmed” categories as columns and the country names as rows. To do this, we need to pivot the DataFrame so that the first level of the index becomes the columns and the second level becomes the rows.

We can use the .unstack() method to pivot the DataFrame. The .unstack() method pivots the DataFrame based on the specified level(s) of the index. In this case, we want to pivot the DataFrame based on the first level of the index (i.e., the “Category” level), so we’ll pass level=0 to the .unstack() method.

Here’s the code to pivot the DataFrame:

new_df = df.unstack(level=0)['Value']

The resulting DataFrame looks like this:

                       Death  Confirmed
Afghanistan             5793     209451
Albania                 7591     334457
Algeria                 7534     271496
Andorra                  652      47890
Angola                  2757     105288
...                      ...        ...
West Bank and Gaza      5708     307127
Yemen                   2159       8829
Zambia                  4057     114209
Zimbabwe                5671     128940

[201 rows x 2 columns]

In this new DataFrame, the index consists of the country names, and there are only 201 rows (one for each country). The “Death” and “Confirmed” categories are now columns in the DataFrame.