interpolating multi index a pandas dataframe

  • Last Update :
  • Techknowledgy :

You can use scipy.interpolate.LinearNDInterpolator to do what you want. If the dataframe is a MultiIndex with the column 'a','b' and 'c', then:

from scipy.interpolate
import LinearNDInterpolator as lNDI
print(lNDI(points = df.index.to_frame().values, values = df.result.values)([1.3, 1.7, 1.55]))

now if you have dataframe with all the tuples (a, b, c) as index you want to calculate, you can do for example:

def pd_interpolate_MI(df_input, df_toInterpolate):
   from scipy.interpolate
import LinearNDInterpolator as lNDI
#create the
function of interpolation
func_interp = lNDI(points = df_input.index.to_frame().values, values = df_input.result.values)
#calculate the value
for the unknown index
df_toInterpolate['result'] = func_interp(df_toInterpolate.index.to_frame().values)
#return the dataframe with the new values
return pd.concat([df_input, df_toInterpolate]).sort_index()

Then for example with your df and df_toI = pd.DataFrame(index=pd.MultiIndex.from_tuples([(1.3, 1.7, 1.55),(1.7, 1.4, 1.9)],names=df.index.names)) then you get

print(pd_interpolate_MI(df, df_toI))
result
a b c
1.0 1.0 1.00 6.00
2.00 9.00
2.0 1.00 8.00
2.00 11.00
1.3 1.7 1.55 9.35
1.7 1.4 1.90 10.20
2.0 1.0 1.00 7.00
2.00 10.00
2.0 1.00 9.00
2.00 12.00

Suggestion : 2

Please note that only method='linear' is supported for DataFrames/Series with a MultiIndex.,Interpolate values according to different methods.,‘linear’: ignore the index and treat the values as equally spaced. This is the only method supported on MultiIndexes. default,New in version 0.18.1: Added support for the ‘akima’ method Added interpolate method ‘from_derivatives’ which replaces ‘piecewise_polynomial’ in scipy 0.18; backwards-compatible with scipy < 0.18

>>> s = pd.Series([0, 1, np.nan, 3]) >>>
   s.interpolate()
0 0
1 1
2 2
3 3
dtype: float64

Suggestion : 3

I need to interpolate multi index dataframe:,pandas multiindex dataframe, ND anycodings_pandas interpolation for missing values,Fill multi-index Pandas DataFrame with anycodings_pandas interpolation,React native async storage render error occurred even though it was called and implemented as it is in the documents

this is the main dataframe:

a b c result
1 1 1 6
1 1 2 9
1 2 1 8
1 2 2 11
2 1 1 7
2 1 2 10
2 2 1 9
2 2 2 12

I need to find the result for:

1.3 1.7 1.55

stage 1:

a b c result
1 1 1 6
1 1 2 9
1 2 1 8
1 2 2 11
1.3 1 1 6.3
1.3 1 2 9.3
1.3 2 1 8.3
1.3 2 2 11.3
2 1 1 7
2 1 2 10
2 2 1 9
2 2 2 12

stage 3:

a b c result
1 1 1 6
1 1 2 9
1 2 1 8
1 2 2 11
1.3 1 1 6.3
1.3 1 2 9.3
1.3 1.7 1 7.7
1.3 1.7 1.55 9.35
1.3 1.7 2 10.7
1.3 2 1 8.3
1.3 2 2 11.3
2 1 1 7
2 1 2 10
2 2 1 9
2 2 2 12

You can use anycodings_python-3.x scipy.interpolate.LinearNDInterpolator anycodings_python-3.x to do what you want. If the dataframe is anycodings_python-3.x a MultiIndex with the column 'a','b' and anycodings_python-3.x 'c', then:

from scipy.interpolate
import LinearNDInterpolator as lNDI
print(lNDI(points = df.index.to_frame().values, values = df.result.values)([1.3, 1.7, 1.55]))

now if you have dataframe with all the anycodings_python-3.x tuples (a, b, c) as index you want to anycodings_python-3.x calculate, you can do for example:

def pd_interpolate_MI(df_input, df_toInterpolate):
   from scipy.interpolate
import LinearNDInterpolator as lNDI
#create the
function of interpolation
func_interp = lNDI(points = df_input.index.to_frame().values, values = df_input.result.values)
#calculate the value
for the unknown index
df_toInterpolate['result'] = func_interp(df_toInterpolate.index.to_frame().values)
#return the dataframe with the new values
return pd.concat([df_input, df_toInterpolate]).sort_index()

Then for example with your df and df_toI anycodings_python-3.x = anycodings_python-3.x pd.DataFrame(index=pd.MultiIndex.from_tuples([(1.3, anycodings_python-3.x 1.7, 1.55),(1.7, 1.4, anycodings_python-3.x 1.9)],names=df.index.names)) then you anycodings_python-3.x get

print(pd_interpolate_MI(df, df_toI))
result
a b c
1.0 1.0 1.00 6.00
2.00 9.00
2.0 1.00 8.00
2.00 11.00
1.3 1.7 1.55 9.35
1.7 1.4 1.90 10.20
2.0 1.0 1.00 7.00
2.00 10.00
2.0 1.00 9.00
2.00 12.00

Suggestion : 4

I got back to this problem today and found a bug in my originally proposed solution. When the multi-index is not ordered as it is in your example, the above code sorts your DataFrame by index values. To get around this, I joined the result back into a DataFrame with the original index so that index order is preserved. I've also put it inside a function.,Pandas DataFrames: How to locate rows using index values in existing dataframe based on values from another dataframe column?,How can I iterate over column values in Pandas and create a new observation based on the values of multiple columns in the same row?,How to set index values in a MultiIndex pandas DataFrame?

I found this hacky work-around that gets rid of the MultiIndex and uses a combination of groupby and transform:

def multiindex_interp(x, interp_col, step_col):

   valid = ~pd.isnull(x[interp_col])
invalid = ~valid

x['last_valid_value'] = x[interp_col].ffill()
x['next_valid_value'] = x[interp_col].bfill()

# Generate a new Series filled with NaN 's
x['last_valid_step'] = np.NaN
# Copy the step value where we have a valid value
x['last_valid_step'][valid] = x[step_col][valid]
x['last_valid_step'] = x['last_valid_step'].ffill()

x['next_valid_step'] = np.NaN
x['next_valid_step'][valid] = x[step_col][valid]
x['next_valid_step'] = x['next_valid_step'].bfill()

# Simple linear interpolation = distance from last step / (range between closest valid steps) *
   # difference between closest values + last value
x[interp_col][invalid] = (x[step_col] - x['last_valid_step']) / (x['next_valid_step'] - x['last_valid_step'])\ *
   (x['next_valid_value'] - x['last_valid_value'])\ +
   x['last_valid_value']
return x

test_df = test_df.reset_index(drop = False)
grouped = test_df.groupby(['iso3', 'sex', 'year'])
interpolated = grouped.transform(multiindex_interp, 'value', 'age_start')
test_df['value'] = interpolated['value']
test_df
iso3 sex year age_start value
0 CAN 1 1990 0.00 16.00
1 CAN 1 1990 0.01 16.03
2 CAN 1 1990 0.10 16.30
3 CAN 1 1990 1.00 19.00
4 CAN 1 1991 0.00 20.00
5 CAN 1 1991 0.01 20.03
6 CAN 1 1991 0.10 20.30
7 CAN 1 1991 1.00 23.00
8 CAN 2 1990 0.00 24.00
9 CAN 2 1990 0.01 24.03
10 CAN 2 1990 0.10 24.30
11 CAN 2 1990 1.00 27.00
   ...

You can try something like this:

test_df.groupby(level = [0, 1, 2])\
   .apply(lambda g: g.reset_index(level = [0, 1, 2], drop = True)
      .interpolate(method = 'index'))

Output:

                         value
                         iso3 sex year age_start
                         CAN 1 1990 0.00 16.00
                         0.01 16.03
                         0.10 16.30
                         1.00 19.00
                         1991 0.00 20.00
                         0.01 20.03
                         0.10 20.30
                         1.00 23.00
                         2 1990 0.00 24.00
                         0.01 24.03
                         0.10 24.30
                         1.00 27.00
                         1991 0.00 28.00
                         0.01 28.03
                         0.10 28.30
                         1.00 31.00
                         USA 1 1990 0.00 0.00
                         0.01 0.03
                         0.10 0.30
                         1.00 3.00
                         1991 0.00 4.00
                         0.01 4.03
                         0.10 4.30
                         1.00 7.00
                         2 1990 0.00 8.00
                         0.01 8.03
                         0.10 8.30
                         1.00 11.00
                         1991 0.00 12.00
                         0.01 12.03
                         0.10 12.30
                         1.00 15.00

This might come a little late, but I ran into the same problem today. What I came up with is also just a workaround, but it uses pandas built-ins at least. My approach was to reset the index, then group by the first subset of index columns (i.e. all but age_start). These sub-DataFrames can then be interpolated with the method='index' parameter and put back together into a whole frame with pd.concat. The resulting DataFrame then gets its original index reassigned.

idx_names = test_df.index.names
test_df = test_df.reset_index()
concat_list = [grp.set_index('age_start').interpolate(method = 'index') for _, grp in test_df.groupby(['iso3', 'sex', 'year'])]
test_df = pd.concat(concat_list)
test_df = test_df.reset_index().set_index(idx_names)
test_df
value
iso3 sex year age_start
CAN 1 1990 0.00 16.00
0.01 16.03
0.10 16.30
1.00 19.00
1991 0.00 20.00
0.01 20.03
0.10 20.30
1.00 23.00
2 1990 0.00 24.00

I got back to this problem today and found a bug in my originally proposed solution. When the multi-index is not ordered as it is in your example, the above code sorts your DataFrame by index values. To get around this, I joined the result back into a DataFrame with the original index so that index order is preserved. I've also put it inside a function.

def interp_multiindex(df, interp_idx_name):
   ""
"
Provides index - based interpolation
for pd.Multiindex which usually only support linear
interpolation.Interpolates full DataFrame.

Parameters
-- -- -- -- --
df: pd.DataFrame
The DataFrame with NaN values
interp_idx_name: str
The name of the multiindex level on which index - based interpolation should take place

Returns
-- -- -- -
df: pd.DataFrame
The DataFrame with index - based interpolated values ""
"
# Get all index level names in order
existing_multiidx = df.index
# Remove the name on which interpolation will take place
noninterp_idx_names = [idx_name
   for idx_name in existing_multiidx.names
   if idx_name != interp_idx_name
]
df = df.reset_index()
concat_list = [grp.set_index(interp_idx_name).interpolate(method = 'index')
   for _, grp in df.groupby(noninterp_idx_names)
]
df = pd.concat(concat_list)
df = df.reset_index().set_index(existing_multiidx.names)
df = pd.DataFrame(index = existing_multiidx).join(df)
return df