Following your example:
df1 = df1.join(df2, on = ['Body', 'Season'])
or just join
without using on
and by default it will use the common index levels between the two DataFrames:
df1 = df1.join(df2)
Resulting df1
:
A B Mood
Body Season Item
sun summer one - 0.483779 0.981052 Good
winter one - 0.309939 0.803862 Bad
two - 0.413732 0.025331 Bad
moon summer one - 0.926068 - 1.316808 Ugly
two 0.221627 - 0.226154 Ugly
three 1.064856 0.402827 Ugly
winter one 0.526461 - 0.932231 Confused
two - 0.296415 - 0.812374 Confused
You could patch in a nice function like:
def merge_multi(self, df, on):
return self.reset_index().join(df, on = on).set_index(self.index.names)
DataFrame.merge_multi = merge_multi
df1.merge_multi(df2, on = ['Body', 'Season'])
You can join a singly-indexed DataFrame with a level of a multi-indexed DataFrame. The level will match on the name of the index of the singly-indexed frame against a level name of the multi-indexed frame.,When DataFrames are merged on a string that matches an index level in both frames, the index level is preserved as an index level in the resulting DataFrame.,DataFrame.join() is a convenient method for combining the columns of two potentially differently-indexed DataFrames into a single result DataFrame. Here is a very basic example:,left_on: Columns or index levels from the left DataFrame to use as keys. Can either be column names, index level names, or arrays with length equal to the length of the DataFrame.
In[1]: df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2', 'A3'],
...: 'B': ['B0', 'B1', 'B2', 'B3'],
...: 'C': ['C0', 'C1', 'C2', 'C3'],
...: 'D': ['D0', 'D1', 'D2', 'D3']
},
...: index = [0, 1, 2, 3])
...:
In[2]: df2 = pd.DataFrame({
'A': ['A4', 'A5', 'A6', 'A7'],
...: 'B': ['B4', 'B5', 'B6', 'B7'],
...: 'C': ['C4', 'C5', 'C6', 'C7'],
...: 'D': ['D4', 'D5', 'D6', 'D7']
},
...: index = [4, 5, 6, 7])
...:
In[3]: df3 = pd.DataFrame({
'A': ['A8', 'A9', 'A10', 'A11'],
...: 'B': ['B8', 'B9', 'B10', 'B11'],
...: 'C': ['C8', 'C9', 'C10', 'C11'],
...: 'D': ['D8', 'D9', 'D10', 'D11']
},
...: index = [8, 9, 10, 11])
...:
In[4]: frames = [df1, df2, df3]
In[5]: result = pd.concat(frames)
pd.concat(objs, axis = 0, join = 'outer', join_axes = None, ignore_index = False,
keys = None, levels = None, names = None, verify_integrity = False,
copy = True)
In[6]: result = pd.concat(frames, keys = ['x', 'y', 'z'])
In[7]: result.loc['y']
Out[7]:
A B C D
4 A4 B4 C4 D4
5 A5 B5 C5 D5
6 A6 B6 C6 D6
7 A7 B7 C7 D7
frames = [process_your_file(f) for f in files] result = pd.concat(frames)
join Think of join as wanting to combine to dataframes based on their respective indexes. If there are overlapping columns, join will want you to add a suffix to the overlapping column name from left dataframe. Our two dataframes do have an overlapping column name A.,We can tell join to use a specific column in the left dataframe to use as the join key, but it will still use the index from the right.,I used an outer join to better illustrate the point. If the indexes do not align, the result will be the union of the indexes.,This simple example finds the overlapping column to be 'A' and combines based on it.
Consider the dataframes left
and right
left = pd.DataFrame([
['a', 1],
['b', 2]
], list('XY'), list('AB'))
left
A B
X a 1
Y b 2
right = pd.DataFrame([
['a', 3],
['b', 4]
], list('XY'), list('AC'))
right
A C
X a 3
Y b 4
join
Think of join
as wanting to combine to dataframes based on their respective indexes. If there are overlapping columns, join
will want you to add a suffix to the overlapping column name from left dataframe. Our two dataframes do have an overlapping column name A
.
left.join(right, lsuffix = '_')
A_ B A C
X a 1 a 3
Y b 2 b 4
We can tell join
to use a specific column in the left dataframe to use as the join key, but it will still use the index from the right.
left.reset_index().join(right, on = 'index', lsuffix = '_')
index A_ B A C
0 X a 1 a 3
1 Y b 2 b 4
This simple example finds the overlapping column to be 'A'
and combines based on it.
left.merge(right)
A B C
0 a 1 3
1 b 2 4
A multi-level index DataFrame is a type of DataFrame that contains multiple level or hierarchical indexing. You can create a MultiIndex (multi-level index) in the following ways.,In this article, I will explain working on MultiIndex pandas DataFrame with several examples like creating Multi index DataFrame, converting Multi index to columns, dropping level from multi-index e.t.c,1. Create MultiIndex pandas DataFrame (Multi level Index),Use pandas DataFrame.reset_index() function to convert/transfer MultiIndex (multi-level index) indexes to columns. The default setting for the parameter is drop=False which will keep the index values as columns and set the new index to DataFrame starting from zero.
import pandas as pd
multi_index = pd.MultiIndex.from_tuples([("r0", "rA"),
("r1", "rB")
],
names = ['Courses', 'Fee'])
Step 2: Create Create MultiIndex for Column
cols = pd.MultiIndex.from_tuples([("Gasoline", "Toyoto"),
("Gasoline", "Ford"),
("Electric", "Tesla"),
("Electric", "Nio")
])
Step 3: Create DataFrame
data = [ [100, 300, 900, 400], [200, 500, 300, 600] ] df = pd.DataFrame(data, columns = cols, index = multi_index) print(df)
Yields below output.
indx1 indx2 Gasoline Electric Toyoto Ford Tesla Nio 0 r0 rA 100 300 900 400 1 r1 rB 200 500 300 600
If you have column names the same as Index, you will get an error. You can get over this by changing the multi-index names first.
df.index = df.index.set_names(['new_index1', 'new_index2'])