You can use scipy.interpolate.LinearNDInterpolator
to do what you want. If the dataframe is a MultiIndex with the column 'a','b' and 'c', then:
from scipy.interpolate
import LinearNDInterpolator as lNDI
print(lNDI(points = df.index.to_frame().values, values = df.result.values)([1.3, 1.7, 1.55]))
now if you have dataframe with all the tuples (a, b, c) as index you want to calculate, you can do for example:
def pd_interpolate_MI(df_input, df_toInterpolate):
from scipy.interpolate
import LinearNDInterpolator as lNDI
#create the
function of interpolation
func_interp = lNDI(points = df_input.index.to_frame().values, values = df_input.result.values)
#calculate the value
for the unknown index
df_toInterpolate['result'] = func_interp(df_toInterpolate.index.to_frame().values)
#return the dataframe with the new values
return pd.concat([df_input, df_toInterpolate]).sort_index()
Then for example with your df
and df_toI = pd.DataFrame(index=pd.MultiIndex.from_tuples([(1.3, 1.7, 1.55),(1.7, 1.4, 1.9)],names=df.index.names))
then you get
print(pd_interpolate_MI(df, df_toI))
result
a b c
1.0 1.0 1.00 6.00
2.00 9.00
2.0 1.00 8.00
2.00 11.00
1.3 1.7 1.55 9.35
1.7 1.4 1.90 10.20
2.0 1.0 1.00 7.00
2.00 10.00
2.0 1.00 9.00
2.00 12.00
Please note that only method='linear' is supported for DataFrames/Series with a MultiIndex.,Interpolate values according to different methods.,‘linear’: ignore the index and treat the values as equally spaced. This is the only method supported on MultiIndexes. default,New in version 0.18.1: Added support for the ‘akima’ method Added interpolate method ‘from_derivatives’ which replaces ‘piecewise_polynomial’ in scipy 0.18; backwards-compatible with scipy < 0.18
>>> s = pd.Series([0, 1, np.nan, 3]) >>> s.interpolate() 0 0 1 1 2 2 3 3 dtype: float64
I need to interpolate multi index dataframe:,pandas multiindex dataframe, ND anycodings_pandas interpolation for missing values,Fill multi-index Pandas DataFrame with anycodings_pandas interpolation,React native async storage render error occurred even though it was called and implemented as it is in the documents
this is the main dataframe:
a b c result
1 1 1 6
1 1 2 9
1 2 1 8
1 2 2 11
2 1 1 7
2 1 2 10
2 2 1 9
2 2 2 12
I need to find the result for:
1.3 1.7 1.55
stage 1:
a b c result
1 1 1 6
1 1 2 9
1 2 1 8
1 2 2 11
1.3 1 1 6.3
1.3 1 2 9.3
1.3 2 1 8.3
1.3 2 2 11.3
2 1 1 7
2 1 2 10
2 2 1 9
2 2 2 12
stage 3:
a b c result
1 1 1 6
1 1 2 9
1 2 1 8
1 2 2 11
1.3 1 1 6.3
1.3 1 2 9.3
1.3 1.7 1 7.7
1.3 1.7 1.55 9.35
1.3 1.7 2 10.7
1.3 2 1 8.3
1.3 2 2 11.3
2 1 1 7
2 1 2 10
2 2 1 9
2 2 2 12
You can use anycodings_python-3.x scipy.interpolate.LinearNDInterpolator anycodings_python-3.x to do what you want. If the dataframe is anycodings_python-3.x a MultiIndex with the column 'a','b' and anycodings_python-3.x 'c', then:
from scipy.interpolate
import LinearNDInterpolator as lNDI
print(lNDI(points = df.index.to_frame().values, values = df.result.values)([1.3, 1.7, 1.55]))
now if you have dataframe with all the anycodings_python-3.x tuples (a, b, c) as index you want to anycodings_python-3.x calculate, you can do for example:
def pd_interpolate_MI(df_input, df_toInterpolate):
from scipy.interpolate
import LinearNDInterpolator as lNDI
#create the
function of interpolation
func_interp = lNDI(points = df_input.index.to_frame().values, values = df_input.result.values)
#calculate the value
for the unknown index
df_toInterpolate['result'] = func_interp(df_toInterpolate.index.to_frame().values)
#return the dataframe with the new values
return pd.concat([df_input, df_toInterpolate]).sort_index()
Then for example with your df and df_toI anycodings_python-3.x = anycodings_python-3.x pd.DataFrame(index=pd.MultiIndex.from_tuples([(1.3, anycodings_python-3.x 1.7, 1.55),(1.7, 1.4, anycodings_python-3.x 1.9)],names=df.index.names)) then you anycodings_python-3.x get
print(pd_interpolate_MI(df, df_toI))
result
a b c
1.0 1.0 1.00 6.00
2.00 9.00
2.0 1.00 8.00
2.00 11.00
1.3 1.7 1.55 9.35
1.7 1.4 1.90 10.20
2.0 1.0 1.00 7.00
2.00 10.00
2.0 1.00 9.00
2.00 12.00
I got back to this problem today and found a bug in my originally proposed solution. When the multi-index is not ordered as it is in your example, the above code sorts your DataFrame by index values. To get around this, I joined the result back into a DataFrame with the original index so that index order is preserved. I've also put it inside a function.,Pandas DataFrames: How to locate rows using index values in existing dataframe based on values from another dataframe column?,How can I iterate over column values in Pandas and create a new observation based on the values of multiple columns in the same row?,How to set index values in a MultiIndex pandas DataFrame?
I found this hacky work-around that gets rid of the MultiIndex and uses a combination of groupby and transform:
def multiindex_interp(x, interp_col, step_col): valid = ~pd.isnull(x[interp_col]) invalid = ~valid x['last_valid_value'] = x[interp_col].ffill() x['next_valid_value'] = x[interp_col].bfill() # Generate a new Series filled with NaN 's x['last_valid_step'] = np.NaN # Copy the step value where we have a valid value x['last_valid_step'][valid] = x[step_col][valid] x['last_valid_step'] = x['last_valid_step'].ffill() x['next_valid_step'] = np.NaN x['next_valid_step'][valid] = x[step_col][valid] x['next_valid_step'] = x['next_valid_step'].bfill() # Simple linear interpolation = distance from last step / (range between closest valid steps) * # difference between closest values + last value x[interp_col][invalid] = (x[step_col] - x['last_valid_step']) / (x['next_valid_step'] - x['last_valid_step'])\ * (x['next_valid_value'] - x['last_valid_value'])\ + x['last_valid_value'] return x test_df = test_df.reset_index(drop = False) grouped = test_df.groupby(['iso3', 'sex', 'year']) interpolated = grouped.transform(multiindex_interp, 'value', 'age_start') test_df['value'] = interpolated['value'] test_df iso3 sex year age_start value 0 CAN 1 1990 0.00 16.00 1 CAN 1 1990 0.01 16.03 2 CAN 1 1990 0.10 16.30 3 CAN 1 1990 1.00 19.00 4 CAN 1 1991 0.00 20.00 5 CAN 1 1991 0.01 20.03 6 CAN 1 1991 0.10 20.30 7 CAN 1 1991 1.00 23.00 8 CAN 2 1990 0.00 24.00 9 CAN 2 1990 0.01 24.03 10 CAN 2 1990 0.10 24.30 11 CAN 2 1990 1.00 27.00 ...
You can try something like this:
test_df.groupby(level = [0, 1, 2])\
.apply(lambda g: g.reset_index(level = [0, 1, 2], drop = True)
.interpolate(method = 'index'))
Output:
value iso3 sex year age_start CAN 1 1990 0.00 16.00 0.01 16.03 0.10 16.30 1.00 19.00 1991 0.00 20.00 0.01 20.03 0.10 20.30 1.00 23.00 2 1990 0.00 24.00 0.01 24.03 0.10 24.30 1.00 27.00 1991 0.00 28.00 0.01 28.03 0.10 28.30 1.00 31.00 USA 1 1990 0.00 0.00 0.01 0.03 0.10 0.30 1.00 3.00 1991 0.00 4.00 0.01 4.03 0.10 4.30 1.00 7.00 2 1990 0.00 8.00 0.01 8.03 0.10 8.30 1.00 11.00 1991 0.00 12.00 0.01 12.03 0.10 12.30 1.00 15.00
This might come a little late, but I ran into the same problem today. What I came up with is also just a workaround, but it uses pandas built-ins at least. My approach was to reset the index, then group by the first subset of index columns (i.e. all but age_start
). These sub-DataFrames can then be interpolated with the method='index'
parameter and put back together into a whole frame with pd.concat
. The resulting DataFrame then gets its original index reassigned.
idx_names = test_df.index.names
test_df = test_df.reset_index()
concat_list = [grp.set_index('age_start').interpolate(method = 'index') for _, grp in test_df.groupby(['iso3', 'sex', 'year'])]
test_df = pd.concat(concat_list)
test_df = test_df.reset_index().set_index(idx_names)
test_df
value
iso3 sex year age_start
CAN 1 1990 0.00 16.00
0.01 16.03
0.10 16.30
1.00 19.00
1991 0.00 20.00
0.01 20.03
0.10 20.30
1.00 23.00
2 1990 0.00 24.00
I got back to this problem today and found a bug in my originally proposed solution. When the multi-index is not ordered as it is in your example, the above code sorts your DataFrame by index values. To get around this, I joined the result back into a DataFrame with the original index so that index order is preserved. I've also put it inside a function.
def interp_multiindex(df, interp_idx_name): "" " Provides index - based interpolation for pd.Multiindex which usually only support linear interpolation.Interpolates full DataFrame. Parameters -- -- -- -- -- df: pd.DataFrame The DataFrame with NaN values interp_idx_name: str The name of the multiindex level on which index - based interpolation should take place Returns -- -- -- - df: pd.DataFrame The DataFrame with index - based interpolated values "" " # Get all index level names in order existing_multiidx = df.index # Remove the name on which interpolation will take place noninterp_idx_names = [idx_name for idx_name in existing_multiidx.names if idx_name != interp_idx_name ] df = df.reset_index() concat_list = [grp.set_index(interp_idx_name).interpolate(method = 'index') for _, grp in df.groupby(noninterp_idx_names) ] df = pd.concat(concat_list) df = df.reset_index().set_index(existing_multiidx.names) df = pd.DataFrame(index = existing_multiidx).join(df) return df