scipy curve fit fails power law

  • Last Update :
  • Techknowledgy :

I fit them - as suggested by @roadrunner66 - by transforming the power law in a linear function:

y = N * x ** a
ln(y) = ln(N * x ** a)
ln(y) = a * ln(x) + ln(N)

So I first use np.log on the original data and then do the fit. When I now use lmfit, I get the following output:

[
   [Variables]
]
lN: 2.35450302 + /- 0.019531 (0.83%) (init= 1.704748)
a: -0.08035342 + /- 0.005158 (6.42%) (init=-0.5)

Here is the entire code with a couple of inline comments:

import numpy as np
import matplotlib.pyplot as plt
from lmfit
import minimize, Parameters, Parameter, report_fit

# generate some data with noise
xData = np.linspace(0.01, 100., 50.)
aOrg = 0.08
Norg = 10.5
yData = Norg * xData ** (-aOrg) + np.random.normal(0, 0.5, len(xData))
plt.plot(xData, yData, 'bo')
plt.show()

# transform data so that we can use a linear fit
lx = np.log(xData)
ly = np.log(yData)
plt.plot(lx, ly, 'bo')
plt.show()

def decay(params, x, data):

   lN = params['lN'].value
a = params['a'].value

# our linear model
model = a * x + lN
return model - data # that 's what you want to minimize

# create a set of Parameters
params = Parameters()
params.add('lN', value = np.log(5.5), min = 0.01, max = 100) # value is the initial value
params.add('a', value = -0.5, min = -1, max = -0.001) # min, max define parameter bounds

# do fit, here with leastsq model
result = minimize(decay, params, args = (lx, ly))

# write error report
report_fit(params)

# plot data
xnew = np.linspace(0., 100., 5000.)
# plot the data
plt.plot(xData, yData, 'bo')
plt.plot(xnew, np.exp(result.values['lN']) * xnew ** (result.values['a']), 'r')
plt.show()

Suggestion : 2

where the else condition is just to force a to be positive. Using scipy.optimize.curve_fit yields an awful fit (green line), returning values of 1.2e+04 and 1.9e0-7 for N and a, respectively, with absolutely no intersection with the data. From fits I've put in manually, the values should land around 1e-07 and 1.2 for N and a, respectively, though putting those into curve_fit as initial parameters doesn't change the result. Removing the condition for a to be positive results in a worse fit, as it chooses a negative, which leads to a fit with the wrong sign slope.,I can't figure out how to get a believable, let alone reliable, fit out of this routine, but I can't find any other good Python curve fitting routines. Do I need to write my own least-squares algorithm or is there something I'm doing wrong here?,In the original post, I showed a solution that uses lmfit which allows to assign bounds to your parameters. Starting with version 0.17, scipy also allows to assign bounds to your parameters directly (see documentation). Please find this solution below after the EDIT which can hopefully serve as a minimal example on how to use scipy's curve_fit with parameter bounds.,"I can't find any other good Python curve fitting routines..." Take a look at lmfit (lmfit.github.io/lmfit-py); in particular, see lmfit.github.io/lmfit-py/model.html - Warren Weckesser

So, I'm trying to fit a set of data with a power law of the following kind:

def f(x, N, a): # Power law fit
if a > 0: return N * x ** (-a)
else: return 10. ** 300
par, cov = scipy.optimize.curve_fit(f, data, time, array([10 ** (-7), 1.2]))

I fit them - as suggested by @roadrunner66 - by transforming the power law in a linear function:

y = N * x ** a
ln(y) = ln(N * x ** a)
ln(y) = a * ln(x) + ln(N)

So I first use np.log on the original data and then do the fit. When I now use lmfit, I get the following output:

[
   [Variables]
] lN: 2.35450302 + /- 0.019531 (0.83%) (init= 1.704748) a: -0.08035342 +/ - 0.005158(6.42 % )(init = -0.5)

Here is the entire code:

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize
import curve_fit
# generate some data with noise
xData = np.linspace(0.01, 100., 50)
aOrg = 0.08
Norg = 10.5
yData = Norg * xData ** (-aOrg) + np.random.normal(0, 0.5, len(xData))
# get logarithmic data
lx = np.log(xData)
ly = np.log(yData)
def f(x, N, a): return N * x ** (-a)
def f_log(x, lN, a): return a * x + lN
# optimize using the appropriate bounds
popt, pcov = curve_fit(f, xData, yData, bounds = (0, [30., 20.]))
popt_log, pcov_log = curve_fit(f_log, lx, ly, bounds = ([0, -10], [30., 20.]))
xnew = np.linspace(0.01, 100., 5000)
# plot the data
plt.plot(xData, yData, 'bo')
plt.plot(xnew, f(xnew, * popt), 'r')
plt.plot(xnew, f(xnew, np.exp(popt_log[0]), -popt_log[1]), 'k')
plt.show()

Suggestion : 3

The model function, f(x, …). It must take the independent variable as the first argument and the parameters to fit as separate remaining arguments.,Initial guess for the parameters (length N). If None, then the initial values will all be 1 (if the number of parameters for the function can be determined using introspection, otherwise a ValueError is raised).,Determines the uncertainty in ydata. If we define residuals as r = ydata - f(xdata, *popt), then the interpretation of sigma depends on its number of dimensions:,Lower and upper bounds on parameters. Defaults to no bounds. Each element of the tuple must be either an array with the length equal to the number of parameters, or a scalar (in which case the bound is taken to be the same for all parameters). Use np.inf with an appropriate sign to disable bounds on all or some parameters.

>>>
import matplotlib.pyplot as plt >>>
   from scipy.optimize
import curve_fit
>>> def func(x, a, b, c):
   ...
   return a * np.exp(-b * x) + c
>>> xdata = np.linspace(0, 4, 50) >>>
   y = func(xdata, 2.5, 1.3, 0.5) >>>
   rng = np.random.default_rng() >>>
   y_noise = 0.2 * rng.normal(size = xdata.size) >>>
   ydata = y + y_noise >>>
   plt.plot(xdata, ydata, 'b-', label = 'data')
>>> popt, pcov = curve_fit(func, xdata, ydata) >>>
   popt
array([2.56274217, 1.37268521, 0.47427475]) >>>
   plt.plot(xdata, func(xdata, * popt), 'r-',
      ...label = 'fit: a=%5.3f, b=%5.3f, c=%5.3f' % tuple(popt))
>>> popt, pcov = curve_fit(func, xdata, ydata, bounds = (0, [3., 1., 0.5])) >>>
   popt
array([2.43736712, 1., 0.34463856]) >>>
   plt.plot(xdata, func(xdata, * popt), 'g--',
      ...label = 'fit: a=%5.3f, b=%5.3f, c=%5.3f' % tuple(popt))
>>> plt.xlabel('x') >>>
   plt.ylabel('y') >>>
   plt.legend() >>>
   plt.show()

Suggestion : 4

Fit examples with sinusoidal functions Generating the data Fitting the data A clever use of the cost function, Fitting a power-law to data with errors Generating the data Fitting the data,If your data is well-behaved, you can fit a power-law function by first converting to a linear equation by using the logarithm. Then use the optimize function to fit a straight line. Notice that we are weighting by positional uncertainties during the fit. Also, the best-fit parameters uncertainties are estimated from the variance-covariance matrix. You should read up on when it may not be appropriate to use this form of error estimation. If you are trying to fit a power-law distribution, this solution is more appropriate. ,Fitting gaussian-shaped data does not require an optimization routine. Just calculating the moments of the distribution is enough, and this is much faster.

   1 from pylab
   import *
   2 from scipy
   import *
   3
   4 #
   if you experience problem "optimize not found",
      try to uncomment the following line.The problem is present at least at Ubuntu Lucid python scipy package
   5 # from scipy
   import optimize
   6
   7 # Generate data points with noise
   8 num_points = 150
   9 Tx = linspace(5., 8., num_points)
   10 Ty = Tx
   11
   12 tX = 11.86 * cos(2 * pi / 0.81 * Tx - 1.32) + 0.64 * Tx + 4 * ((0.5 - rand(num_points)) * exp(2 * rand(num_points) ** 2))
   13 tY = -32.14 * cos(2 * pi / 0.8 * Ty - 1.94) + 0.15 * Ty + 7 * ((0.5 - rand(num_points)) * exp(2 * rand(num_points) ** 2))
   1 # Fit the first set
   2 fitfunc = lambda p, x: p[0] * cos(2 * pi / p[1] * x + p[2]) + p[3] * x # Target
   function
   3 errfunc = lambda p, x, y: fitfunc(p, x) - y # Distance to the target
   function
   4 p0 = [-15., 0.8, 0., -1.] # Initial guess
   for the parameters
   5 p1, success = optimize.leastsq(errfunc, p0[: ], args = (Tx, tX))
   6
   7 time = linspace(Tx.min(), Tx.max(), 100)
   8 plot(Tx, tX, "ro", time, fitfunc(p1, time), "r-") # Plot of the data and the fit
   9
   10 # Fit the second set
   11 p0 = [-15., 0.8, 0., -1.]
   12 p2, success = optimize.leastsq(errfunc, p0[: ], args = (Ty, tY))
   13
   14 time = linspace(Ty.min(), Ty.max(), 100)
   15 plot(Ty, tY, "b^", time, fitfunc(p2, time), "b-")
   16
   17 # Legend the plot
   18 title("Oscillations in the compressed trap")
   19 xlabel("time [ms]")
   20 ylabel("displacement [um]")
   21 legend(('x position', 'x fit', 'y position', 'y fit'))
   22
   23 ax = axes()
   24
   25 text(0.8, 0.07,
      26 'x freq :  %.3f kHz \n y freq :  %.3f kHz' % (1 / p1[1], 1 / p2[1]),
      27 fontsize = 16,
      28 horizontalalignment = 'center',
      29 verticalalignment = 'center',
      30 transform = ax.transAxes)
   31
   32 show()
   1 # Target
   function
   2 fitfunc = lambda T, p, x: p[0] * cos(2 * pi / T * x + p[1]) + p[2] * x
   3 # Initial guess
   for the first set 's parameters
   4 p1 = r_[-15., 0., -1.]
   5 # Initial guess
   for the second set 's parameters
   6 p2 = r_[-15., 0., -1.]
   7 # Initial guess
   for the common period
   8 T = 0.8
   9 # Vector of the parameters to fit, it contains all the parameters of the problem, and the period of the oscillation is not there twice!
      10 p = r_[T, p1, p2]
   11 # Cost
   function of the fit, compare it to the previous example.
   12 errfunc = lambda p, x1, y1, x2, y2: r_[
      13 fitfunc(p[0], p[1: 4], x1) - y1,
      14 fitfunc(p[0], p[4: 7], x2) - y2 15]
   16 # This time we need to pass the two sets of data, there are thus four "args".
   17 p, success = optimize.leastsq(errfunc, p, args = (Tx, tX, Ty, tY))
   18 time = linspace(Tx.min(), Tx.max(), 100) # Plot of the first data and the fit
   19 plot(Tx, tX, "ro", time, fitfunc(p[0], p[1: 4], time), "r-")
   20
   21 # Plot of the second data and the fit
   22 time = linspace(Ty.min(), Ty.max(), 100)
   23 plot(Ty, tY, "b^", time, fitfunc(p[0], p[4: 7], time), "b-")
   24
   25 # Legend the plot
   26 title("Oscillations in the compressed trap")
   27 xlabel("time [ms]")
   28 ylabel("displacement [um]")
   29 legend(('x position', 'x fit', 'y position', 'y fit'))
   30
   31 ax = axes()
   32
   33 text(0.8, 0.07,
      34 'x freq :  %.3f kHz' % (1 / p[0]),
      35 fontsize = 16,
      36 horizontalalignment = 'center',
      37 verticalalignment = 'center',
      38 transform = ax.transAxes)
   39
   40 show()
   1 from scipy
   import optimize
   2 from numpy
   import *
   3
   4 class Parameter:
      5 def __init__(self, value):
      6 self.value = value
   7
   8 def set(self, value):
      9 self.value = value
   10
   11 def __call__(self):
      12
   return self.value
   13
   14 def fit(function, parameters, y, x = None):
      15 def f(params):
      16 i = 0
   17
   for p in parameters:
      18 p.set(params[i])
   19 i += 1
   20
   return y - function(x)
   21
   22
   if x is None: x = arange(y.shape[0])
   23 p = [param() for param in parameters]
   24 optimize.leastsq(f, p)
   1 # giving initial parameters
   2 mu = Parameter(7)
   3 sigma = Parameter(3)
   4 height = Parameter(5)
   5
   6 # define your
   function:
   7 def f(x): return height() * exp(-((x - mu()) / sigma()) ** 2)
   8
   9 # fit!(given that data is an array with the data to fit)
   10 fit(f, [mu, sigma, height], data)
   1 from pylab
   import *
   2
   3 gaussian = lambda x: 3 * exp(-(30 - x) ** 2 / 20.)
   4
   5 data = gaussian(arange(100))
   6
   7 plot(data)
   8
   9 X = arange(data.size)
   10 x = sum(X * data) / sum(data)
   11 width = sqrt(abs(sum((X - x) ** 2 * data) / sum(data)))
   12
   13 max = data.max()
   14
   15 fit = lambda t: max * exp(-(t - x) ** 2 / (2 * width ** 2))
   16
   17 plot(fit(X))
   18
   19 show()

Suggestion : 5

I frequently use power law to study the variation of stiffness with stress and create constitutive laws for materials. Let’s see how to do a power fitting with scipy’s curve_fit and lmfit.,This is where lmfit (my favorite fitting package) comes into play. As the complexity of fitting function and parameter bounds increases curve_fit becomes less accurate and more crumbersome.,Trust me, as soon as your fitting functions become complex you will run towards lmfit with its concise and powerful way of defining and bounding parameters.,Scipy has a powerful module for curve fitting which uses non linear least squares method to fit a function to our data. The documentation is available it https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html.

where, m is usually the slope of the line and c is the intercept when x = 0 and x (Time), y (Stress) is our data. The y axis data is usually the measured value during the experiment/simulation and we are trying to find how the y axis quantity is dependent on the x axis quantity.

# Importing numpy
for creating data and matplotlib
for plotting
import numpy as np
import matplotlib.pyplot as plt

# Creating x axis data
x = np.linspace(0, 10, 100)

# Creating random data
for y axis
# Here the slope(m) is predefined to be 2.39645
# The intercept is 0
# random.normal method just adds some noise to data
y = 2.39645 * x + np.random.normal(0, 2, 100)

# Plotting the data
plt.scatter(x, y, c = 'black')
plt.xlabel('Time (sec)')
plt.ylabel('Stress (kPa)')
plt.show()

To use curve_fit for the above data, we need to define a linear function which will be used to find fitting. The output will be the slope (m) and intercept ( c ) for our data, along with any variance in these values and any fitting statistics ($R^2$ or $\chi^2$).

# Calling the scipy 's curve_fit function from optimize module
from scipy.optimize
import curve_fit

# Defining a fitting fucntion
def linear_fit(x, m, c):
   return m * x + c

''
'
1. Using the curve_fit
function to fit the random linear data
2. Params returns an array with the best
for values of the different fitting parameters.In our
case first entry in params will be the slope m and second entry would be the intercept.
3. Covariance returns a matrix of covariance
for our fitted parameters.
4. The first argument f is the defined fitting
function.
5. xdata and ydata are the x and y data we generated above.
''
'
params, covariance = curve_fit(f = linear_fit, xdata = x, ydata = y)

print('Slope (m) is ', params[0])
print('Intercept (c) is ', params[1])
print(covariance)
Slope(m) is 2.430364409972301
Intercept(c) is - 0.11571245549643683[[0.00434152 - 0.02170757]
   [-0.02170757 0.14544807]]

Although covariance matrix provides information about the variance of the fitted parameters, mostly everyone uses the standard deviation and coefficient of determination ($R^2$) value to get an estimate of ‘goodness’ of fit. I use it all the time for simpler fittings (linear, exponential, power etc).

# Getting the standard deviation
standarddevparams2 = np.sqrt(np.diag(covariance2))

# Getting the R ^ 2 value
for the fitting
# Read more at https: //en.wikipedia.org/wiki/Coefficient_of_determination

   # Step 1: Get the residuals
for fitting
residuals = y - linear_fit(x, params2[0], params2[1])

# Step 2: Get the sum of squares of residual
squaresumofresiduals = np.sum(residuals ** 2)

# Step 3: Get the total sum of squares using the mean of y values
squaresum = np.sum((y - np.mean(y)) ** 2)

# Step 4: Get the R ^ 2
R2 = 1 - (squaresumofresiduals / squaresum)

print('The slope (m) is ', params2[0], '+-', standarddevparams2[0])
print('The intercept (c) is ', params2[1], '+-', standarddevparams2[1])
print('The R^2 value is ', R2)

Let’s try our linear fitting using lmfit!

''
'
1. Importing the module.If you dont have lmfit, you can download it on pip or conda.Instructions at https: //lmfit.github.io/lmfit-py/installation.html
   2. Parameters is the main parameters class.
3. minimize is the main fitting
function.Takes our parameters and spits the best fit parameters.
4. fit_report provides the parameter fit values and different statistical values.
''
'

from lmfit
import Parameters, minimize, fit_report

# Define the fitting
function
def linear_fitting_lmfit(params, x, y):
   m = params['m']
c = params['c']
y_fit = m * x + c
return y_fit - y

# Defining the various parameters
params = Parameters()
# Slope is bounded between min value of 1.0 and max value of 3.0
params.add('m', min = 1.0, max = 3.0)
# Intercept is made fixed at 0.0 value
params.add('c', value = 0.0, vary = False)

# Calling the minimize
function.Args contains the x and y data.
fitted_params = minimize(linear_fitting_lmfit, params, args = (x, y, ), method = 'least_squares')

# Getting the fitted values
m = fitted_params.params['m'].value
c = fitted_params.params['c'].value

# Printing the fitted values
print('The slope (m) is ', m)
print('The intercept (c) is ', c)

# Pretty printing all the statistical data
print(fit_report(fitted_params))