There are many good answers on melt and pivot on SO. In your sample df, sum column is of string type. Convert it to int and use pivot_table. Key difference between pivot and pivot_table is that when your index contains duplicated entries, you need to use pivot_table with some aggregate function. If you don't pass any function, the default is mean.
df['Sum'] = df['Sum'].astype(int)
df.pivot_table(index = 'Tech', columns = ['Year', 'Scenario'], values = 'Sum')
Year 2010 2015
Scenario Scen1 Scen2 Scen1 Scen2
Tech
x 1 7 4 10
y 2 8 5 11
z 3 9 6 12
Note: The same can be done using groupby. Since you need columns at two levels, you need to unstack twice.
df.groupby(['Tech', 'Scenario', 'Year'])['Sum'].mean().unstack().unstack()
But suppose we wish to do time series operations with the variables. A better representation would be where the columns are the unique variables and an index of dates identifies individual observations. To reshape the data into this form, we use the DataFrame.pivot() method (also implemented as a top level function pivot()):,A DataFrame, in the case of a MultiIndex in the columns.,If the values argument is omitted, and the input DataFrame has more than one column of values which are not used as column or index inputs to pivot(), then the resulting “pivoted” DataFrame will have hierarchical columns whose topmost level indicates the respective value column:,Note that this returns a view on the underlying data in the case where the data are homogeneously-typed.
In[1]: import pandas._testing as tm
In[2]: def unpivot(frame):
...: N, K = frame.shape
...: data = {
...: "value": frame.to_numpy().ravel("F"),
...: "variable": np.asarray(frame.columns).repeat(N),
...: "date": np.tile(np.asarray(frame.index), K),
...:
}
...: return pd.DataFrame(data, columns = ["date", "variable", "value"])
...:
In[3]: df = unpivot(tm.makeTimeDataFrame(3))
In[4]: df
Out[4]:
date variable value
0 2000 - 01 - 03 A 0.469112
1 2000 - 01 - 04 A - 0.282863
2 2000 - 01 - 05 A - 1.509059
3 2000 - 01 - 03 B - 1.135632
4 2000 - 01 - 04 B 1.212112
5 2000 - 01 - 05 B - 0.173215
6 2000 - 01 - 03 C 0.119209
7 2000 - 01 - 04 C - 1.044236
8 2000 - 01 - 05 C - 0.861849
9 2000 - 01 - 03 D - 2.104569
10 2000 - 01 - 04 D - 0.494929
11 2000 - 01 - 05 D 1.071804
In[5]: filtered = df[df["variable"] == "A"]
In[6]: filtered
Out[6]:
date variable value
0 2000 - 01 - 03 A 0.469112
1 2000 - 01 - 04 A - 0.282863
2 2000 - 01 - 05 A - 1.509059
In[7]: pivoted = df.pivot(index = "date", columns = "variable", values = "value")
In[8]: pivoted
Out[8]:
variable A B C D
date
2000 - 01 - 03 0.469112 - 1.135632 0.119209 - 2.104569
2000 - 01 - 04 - 0.282863 1.212112 - 1.044236 - 0.494929
2000 - 01 - 05 - 1.509059 - 0.173215 - 0.861849 1.071804
In[9]: df["value2"] = df["value"] * 2
In[10]: pivoted = df.pivot(index = "date", columns = "variable")
In[11]: pivoted
Out[11]:
value value2
variable A B C D A B C D
date
2000 - 01 - 03 0.469112 - 1.135632 0.119209 - 2.104569 0.938225 - 2.271265 0.238417 - 4.209138
2000 - 01 - 04 - 0.282863 1.212112 - 1.044236 - 0.494929 - 0.565727 2.424224 - 2.088472 - 0.989859
2000 - 01 - 05 - 1.509059 - 0.173215 - 0.861849 1.071804 - 3.018117 - 0.346429 - 1.723698 2.143608
In[12]: pivoted["value2"]
Out[12]:
variable A B C D
date
2000 - 01 - 03 0.938225 - 2.271265 0.238417 - 4.209138
2000 - 01 - 04 - 0.565727 2.424224 - 2.088472 - 0.989859
2000 - 01 - 05 - 3.018117 - 0.346429 - 1.723698 2.143608
In[13]: tuples = list(
....: zip(
....: * [
....: ["bar", "bar", "baz", "baz", "foo", "foo", "qux", "qux"],
....: ["one", "two", "one", "two", "one", "two", "one", "two"],
....:
]
....: )
....: )
....:
In[14]: index = pd.MultiIndex.from_tuples(tuples, names = ["first", "second"])
In[15]: df = pd.DataFrame(np.random.randn(8, 2), index = index, columns = ["A", "B"])
In[16]: df2 = df[: 4]
In[17]: df2
Out[17]:
A B
first second
bar one 0.721555 - 0.706771
two - 1.039575 0.271860
baz one - 0.424972 0.567020
two 0.276232 - 1.087401
As shown above, Pandas will create a hierarchical column index (MultiIndex) for the new table. You can think of a hierarchical index as a set of trees of indices. Each indexed column/row is identified by a unique sequence of values defining the “path” from the topmost index to the bottom index. The first level of the column index defines all columns that we have not specified in the pivot invocation - in this case USD and EU. The second level of the index defines the unique value of the corresponding column.,We can use this hierarchical column index to filter the values of a single column from the original table. For example p.USD returns a pivoted DataFrame with the USD values only and it is equivalent to the pivoted DataFrame from the previous section.,In this example we take a DataFrame similar to the one from the beginning. Instead of pivoting, this time we stack it, and we get a Series with a MultiIndex composed of the initial index as first level, and the table columns as a second. Unstacking can help us get back to our original data structure.,Stacking and unstacking can also be applied to data with flat (i.e. non-hierchical) indices. In this case, one of the indices is de facto removed (the columns index if stacking, and the rows if unstacking) and its values are nested in the other index, which is now a MultiIndex. Therefore, the result is always a Series with a hierarchical index. The following example demonstrates this:
In[1]:
from collections
import OrderedDict
from pandas
import DataFrame
import pandas as pd
import numpy as np
table = OrderedDict((
("Item", ['Item0', 'Item0', 'Item1', 'Item1']),
('CType', ['Gold', 'Bronze', 'Gold', 'Silver']),
('USD', ['1$', '2$', '3$', '4$']),
('EU', ['1€', '2€', '3€', '4€'])
))
d = DataFrame(table)
d
Out[1]:
In[2]:
p = d.pivot(index = 'Item', columns = 'CType', values = 'USD')
p
Out[2]:
If you want to change the columns to standard columns (not MultiIndex), just rename the columns.,Get monthly updates about new articles, cheatsheets, and tricks., How to change MultiIndex columns to standard columns , How to change MultiIndex columns to standard columns
Given a DataFrame with MultiIndex columns
# build an example DataFrame midx = pd.MultiIndex(levels = [ ['zero', 'one'], ['x', 'y'] ], labels = [ [1, 1, 0, ], [1, 0, 1, ] ]) df = pd.DataFrame(np.random.randn(2, 3), columns = midx) In[2]: df Out[2]: one zero y x y 0 0.785806 - 0.679039 0.513451 1 - 0.337862 - 0.350690 - 1.423253
If you want to change the columns to standard columns (not MultiIndex), just rename the columns.
df.columns = ['A', 'B', 'C']
In[3]: df
Out[3]:
A B C
0 0.785806 - 0.679039 0.513451
1 - 0.337862 - 0.350690 - 1.423253