join series on multiindex in pandas

  • Last Update :
  • Techknowledgy :

Following your example:

df1 = df1.join(df2, on = ['Body', 'Season'])

or just join without using on and by default it will use the common index levels between the two DataFrames:

df1 = df1.join(df2)

Resulting df1:

                          A B Mood
                          Body Season Item
                          sun summer one - 0.483779 0.981052 Good
                          winter one - 0.309939 0.803862 Bad
                          two - 0.413732 0.025331 Bad
                          moon summer one - 0.926068 - 1.316808 Ugly
                          two 0.221627 - 0.226154 Ugly
                          three 1.064856 0.402827 Ugly
                          winter one 0.526461 - 0.932231 Confused
                          two - 0.296415 - 0.812374 Confused

You could patch in a nice function like:

def merge_multi(self, df, on):
   return self.reset_index().join(df, on = on).set_index(self.index.names)
DataFrame.merge_multi = merge_multi

df1.merge_multi(df2, on = ['Body', 'Season'])

Suggestion : 2

You can join a singly-indexed DataFrame with a level of a multi-indexed DataFrame. The level will match on the name of the index of the singly-indexed frame against a level name of the multi-indexed frame.,When DataFrames are merged on a string that matches an index level in both frames, the index level is preserved as an index level in the resulting DataFrame.,DataFrame.join() is a convenient method for combining the columns of two potentially differently-indexed DataFrames into a single result DataFrame. Here is a very basic example:,left_on: Columns or index levels from the left DataFrame to use as keys. Can either be column names, index level names, or arrays with length equal to the length of the DataFrame.

In[1]: df1 = pd.DataFrame({
         'A': ['A0', 'A1', 'A2', 'A3'],
         ...: 'B': ['B0', 'B1', 'B2', 'B3'],
         ...: 'C': ['C0', 'C1', 'C2', 'C3'],
         ...: 'D': ['D0', 'D1', 'D2', 'D3']
      },
      ...: index = [0, 1, 2, 3])
   ...:

   In[2]: df2 = pd.DataFrame({
         'A': ['A4', 'A5', 'A6', 'A7'],
         ...: 'B': ['B4', 'B5', 'B6', 'B7'],
         ...: 'C': ['C4', 'C5', 'C6', 'C7'],
         ...: 'D': ['D4', 'D5', 'D6', 'D7']
      },
      ...: index = [4, 5, 6, 7])
   ...:

   In[3]: df3 = pd.DataFrame({
         'A': ['A8', 'A9', 'A10', 'A11'],
         ...: 'B': ['B8', 'B9', 'B10', 'B11'],
         ...: 'C': ['C8', 'C9', 'C10', 'C11'],
         ...: 'D': ['D8', 'D9', 'D10', 'D11']
      },
      ...: index = [8, 9, 10, 11])
   ...:

   In[4]: frames = [df1, df2, df3]

In[5]: result = pd.concat(frames)
pd.concat(objs, axis = 0, join = 'outer', join_axes = None, ignore_index = False,
   keys = None, levels = None, names = None, verify_integrity = False,
   copy = True)
In[6]: result = pd.concat(frames, keys = ['x', 'y', 'z'])
In[7]: result.loc['y']
Out[7]:
   A B C D
4 A4 B4 C4 D4
5 A5 B5 C5 D5
6 A6 B6 C6 D6
7 A7 B7 C7 D7
frames = [process_your_file(f) for f in files]
result = pd.concat(frames)

Suggestion : 3

join Think of join as wanting to combine to dataframes based on their respective indexes. If there are overlapping columns, join will want you to add a suffix to the overlapping column name from left dataframe. Our two dataframes do have an overlapping column name A.,We can tell join to use a specific column in the left dataframe to use as the join key, but it will still use the index from the right.,I used an outer join to better illustrate the point. If the indexes do not align, the result will be the union of the indexes.,This simple example finds the overlapping column to be 'A' and combines based on it.

Consider the dataframes left and right

left = pd.DataFrame([
   ['a', 1],
   ['b', 2]
], list('XY'), list('AB'))
left

A B
X a 1
Y b 2

right = pd.DataFrame([
   ['a', 3],
   ['b', 4]
], list('XY'), list('AC'))
right

A C
X a 3
Y b 4

join
Think of join as wanting to combine to dataframes based on their respective indexes. If there are overlapping columns, join will want you to add a suffix to the overlapping column name from left dataframe. Our two dataframes do have an overlapping column name A.

left.join(right, lsuffix = '_')

A_ B A C
X a 1 a 3
Y b 2 b 4

We can tell join to use a specific column in the left dataframe to use as the join key, but it will still use the index from the right.

left.reset_index().join(right, on = 'index', lsuffix = '_')

index A_ B A C
0 X a 1 a 3
1 Y b 2 b 4

This simple example finds the overlapping column to be 'A' and combines based on it.

left.merge(right)

A B C
0 a 1 3
1 b 2 4

Suggestion : 4

A multi-level index DataFrame is a type of DataFrame that contains multiple level or hierarchical indexing. You can create a MultiIndex (multi-level index) in the following ways.,In this article, I will explain working on MultiIndex pandas DataFrame with several examples like creating Multi index DataFrame, converting Multi index to columns, dropping level from multi-index e.t.c,1. Create MultiIndex pandas DataFrame (Multi level Index),Use pandas DataFrame.reset_index() function to convert/transfer MultiIndex (multi-level index) indexes to columns. The default setting for the parameter is drop=False which will keep the index values as columns and set the new index to DataFrame starting from zero.

1._
import pandas as pd
multi_index = pd.MultiIndex.from_tuples([("r0", "rA"),
      ("r1", "rB")
   ],
   names = ['Courses', 'Fee'])

Step 2: Create Create MultiIndex for Column

cols = pd.MultiIndex.from_tuples([("Gasoline", "Toyoto"),
   ("Gasoline", "Ford"),
   ("Electric", "Tesla"),
   ("Electric", "Nio")
])

Step 3: Create DataFrame

data = [
   [100, 300, 900, 400],
   [200, 500, 300, 600]
]

df = pd.DataFrame(data, columns = cols, index = multi_index)
print(df)

Yields below output.

indx1 indx2 Gasoline Electric
Toyoto Ford Tesla Nio
0 r0 rA 100 300 900 400
1 r1 rB 200 500 300 600

If you have column names the same as Index, you will get an error. You can get over this by changing the multi-index names first.

df.index = df.index.set_names(['new_index1', 'new_index2'])