I fit them - as suggested by @roadrunner66 - by transforming the power law in a linear function:
y = N * x ** a ln(y) = ln(N * x ** a) ln(y) = a * ln(x) + ln(N)
So I first use np.log
on the original data and then do the fit. When I now use lmfit, I get the following output:
[ [Variables] ] lN: 2.35450302 + /- 0.019531 (0.83%) (init= 1.704748) a: -0.08035342 + /- 0.005158 (6.42%) (init=-0.5)
Here is the entire code with a couple of inline comments:
import numpy as np import matplotlib.pyplot as plt from lmfit import minimize, Parameters, Parameter, report_fit # generate some data with noise xData = np.linspace(0.01, 100., 50.) aOrg = 0.08 Norg = 10.5 yData = Norg * xData ** (-aOrg) + np.random.normal(0, 0.5, len(xData)) plt.plot(xData, yData, 'bo') plt.show() # transform data so that we can use a linear fit lx = np.log(xData) ly = np.log(yData) plt.plot(lx, ly, 'bo') plt.show() def decay(params, x, data): lN = params['lN'].value a = params['a'].value # our linear model model = a * x + lN return model - data # that 's what you want to minimize # create a set of Parameters params = Parameters() params.add('lN', value = np.log(5.5), min = 0.01, max = 100) # value is the initial value params.add('a', value = -0.5, min = -1, max = -0.001) # min, max define parameter bounds # do fit, here with leastsq model result = minimize(decay, params, args = (lx, ly)) # write error report report_fit(params) # plot data xnew = np.linspace(0., 100., 5000.) # plot the data plt.plot(xData, yData, 'bo') plt.plot(xnew, np.exp(result.values['lN']) * xnew ** (result.values['a']), 'r') plt.show()
where the else condition is just to force a to be positive. Using scipy.optimize.curve_fit yields an awful fit (green line), returning values of 1.2e+04 and 1.9e0-7 for N and a, respectively, with absolutely no intersection with the data. From fits I've put in manually, the values should land around 1e-07 and 1.2 for N and a, respectively, though putting those into curve_fit as initial parameters doesn't change the result. Removing the condition for a to be positive results in a worse fit, as it chooses a negative, which leads to a fit with the wrong sign slope.,I can't figure out how to get a believable, let alone reliable, fit out of this routine, but I can't find any other good Python curve fitting routines. Do I need to write my own least-squares algorithm or is there something I'm doing wrong here?,In the original post, I showed a solution that uses lmfit which allows to assign bounds to your parameters. Starting with version 0.17, scipy also allows to assign bounds to your parameters directly (see documentation). Please find this solution below after the EDIT which can hopefully serve as a minimal example on how to use scipy's curve_fit with parameter bounds.,"I can't find any other good Python curve fitting routines..." Take a look at lmfit (lmfit.github.io/lmfit-py); in particular, see lmfit.github.io/lmfit-py/model.html - Warren Weckesser
So, I'm trying to fit a set of data with a power law of the following kind:
def f(x, N, a): # Power law fit
if a > 0: return N * x ** (-a)
else: return 10. ** 300
par, cov = scipy.optimize.curve_fit(f, data, time, array([10 ** (-7), 1.2]))
I fit them - as suggested by @roadrunner66 - by transforming the power law in a linear function:
y = N * x ** a ln(y) = ln(N * x ** a) ln(y) = a * ln(x) + ln(N)
So I first use np.log
on the original data and then do the fit. When I now use lmfit, I get the following output:
[
[Variables]
] lN: 2.35450302 + /- 0.019531 (0.83%) (init= 1.704748) a: -0.08035342 +/ - 0.005158(6.42 % )(init = -0.5)
Here is the entire code:
import numpy as np import matplotlib.pyplot as plt from scipy.optimize import curve_fit # generate some data with noise xData = np.linspace(0.01, 100., 50) aOrg = 0.08 Norg = 10.5 yData = Norg * xData ** (-aOrg) + np.random.normal(0, 0.5, len(xData)) # get logarithmic data lx = np.log(xData) ly = np.log(yData) def f(x, N, a): return N * x ** (-a) def f_log(x, lN, a): return a * x + lN # optimize using the appropriate bounds popt, pcov = curve_fit(f, xData, yData, bounds = (0, [30., 20.])) popt_log, pcov_log = curve_fit(f_log, lx, ly, bounds = ([0, -10], [30., 20.])) xnew = np.linspace(0.01, 100., 5000) # plot the data plt.plot(xData, yData, 'bo') plt.plot(xnew, f(xnew, * popt), 'r') plt.plot(xnew, f(xnew, np.exp(popt_log[0]), -popt_log[1]), 'k') plt.show()
The model function, f(x, …). It must take the independent variable as the first argument and the parameters to fit as separate remaining arguments.,Initial guess for the parameters (length N). If None, then the initial values will all be 1 (if the number of parameters for the function can be determined using introspection, otherwise a ValueError is raised).,Determines the uncertainty in ydata. If we define residuals as r = ydata - f(xdata, *popt), then the interpretation of sigma depends on its number of dimensions:,Lower and upper bounds on parameters. Defaults to no bounds. Each element of the tuple must be either an array with the length equal to the number of parameters, or a scalar (in which case the bound is taken to be the same for all parameters). Use np.inf with an appropriate sign to disable bounds on all or some parameters.
>>>
import matplotlib.pyplot as plt >>>
from scipy.optimize
import curve_fit
>>> def func(x, a, b, c):
...
return a * np.exp(-b * x) + c
>>> xdata = np.linspace(0, 4, 50) >>>
y = func(xdata, 2.5, 1.3, 0.5) >>>
rng = np.random.default_rng() >>>
y_noise = 0.2 * rng.normal(size = xdata.size) >>>
ydata = y + y_noise >>>
plt.plot(xdata, ydata, 'b-', label = 'data')
>>> popt, pcov = curve_fit(func, xdata, ydata) >>>
popt
array([2.56274217, 1.37268521, 0.47427475]) >>>
plt.plot(xdata, func(xdata, * popt), 'r-',
...label = 'fit: a=%5.3f, b=%5.3f, c=%5.3f' % tuple(popt))
>>> popt, pcov = curve_fit(func, xdata, ydata, bounds = (0, [3., 1., 0.5])) >>>
popt
array([2.43736712, 1., 0.34463856]) >>>
plt.plot(xdata, func(xdata, * popt), 'g--',
...label = 'fit: a=%5.3f, b=%5.3f, c=%5.3f' % tuple(popt))
>>> plt.xlabel('x') >>>
plt.ylabel('y') >>>
plt.legend() >>>
plt.show()
Fit examples with sinusoidal functions Generating the data Fitting the data A clever use of the cost function, Fitting a power-law to data with errors Generating the data Fitting the data,If your data is well-behaved, you can fit a power-law function by first converting to a linear equation by using the logarithm. Then use the optimize function to fit a straight line. Notice that we are weighting by positional uncertainties during the fit. Also, the best-fit parameters uncertainties are estimated from the variance-covariance matrix. You should read up on when it may not be appropriate to use this form of error estimation. If you are trying to fit a power-law distribution, this solution is more appropriate. ,Fitting gaussian-shaped data does not require an optimization routine. Just calculating the moments of the distribution is enough, and this is much faster.
1 from pylab import * 2 from scipy import * 3 4 # if you experience problem "optimize not found", try to uncomment the following line.The problem is present at least at Ubuntu Lucid python scipy package 5 # from scipy import optimize 6 7 # Generate data points with noise 8 num_points = 150 9 Tx = linspace(5., 8., num_points) 10 Ty = Tx 11 12 tX = 11.86 * cos(2 * pi / 0.81 * Tx - 1.32) + 0.64 * Tx + 4 * ((0.5 - rand(num_points)) * exp(2 * rand(num_points) ** 2)) 13 tY = -32.14 * cos(2 * pi / 0.8 * Ty - 1.94) + 0.15 * Ty + 7 * ((0.5 - rand(num_points)) * exp(2 * rand(num_points) ** 2))
1 # Fit the first set 2 fitfunc = lambda p, x: p[0] * cos(2 * pi / p[1] * x + p[2]) + p[3] * x # Target function 3 errfunc = lambda p, x, y: fitfunc(p, x) - y # Distance to the target function 4 p0 = [-15., 0.8, 0., -1.] # Initial guess for the parameters 5 p1, success = optimize.leastsq(errfunc, p0[: ], args = (Tx, tX)) 6 7 time = linspace(Tx.min(), Tx.max(), 100) 8 plot(Tx, tX, "ro", time, fitfunc(p1, time), "r-") # Plot of the data and the fit 9 10 # Fit the second set 11 p0 = [-15., 0.8, 0., -1.] 12 p2, success = optimize.leastsq(errfunc, p0[: ], args = (Ty, tY)) 13 14 time = linspace(Ty.min(), Ty.max(), 100) 15 plot(Ty, tY, "b^", time, fitfunc(p2, time), "b-") 16 17 # Legend the plot 18 title("Oscillations in the compressed trap") 19 xlabel("time [ms]") 20 ylabel("displacement [um]") 21 legend(('x position', 'x fit', 'y position', 'y fit')) 22 23 ax = axes() 24 25 text(0.8, 0.07, 26 'x freq : %.3f kHz \n y freq : %.3f kHz' % (1 / p1[1], 1 / p2[1]), 27 fontsize = 16, 28 horizontalalignment = 'center', 29 verticalalignment = 'center', 30 transform = ax.transAxes) 31 32 show()
1 # Target function 2 fitfunc = lambda T, p, x: p[0] * cos(2 * pi / T * x + p[1]) + p[2] * x 3 # Initial guess for the first set 's parameters 4 p1 = r_[-15., 0., -1.] 5 # Initial guess for the second set 's parameters 6 p2 = r_[-15., 0., -1.] 7 # Initial guess for the common period 8 T = 0.8 9 # Vector of the parameters to fit, it contains all the parameters of the problem, and the period of the oscillation is not there twice! 10 p = r_[T, p1, p2] 11 # Cost function of the fit, compare it to the previous example. 12 errfunc = lambda p, x1, y1, x2, y2: r_[ 13 fitfunc(p[0], p[1: 4], x1) - y1, 14 fitfunc(p[0], p[4: 7], x2) - y2 15] 16 # This time we need to pass the two sets of data, there are thus four "args". 17 p, success = optimize.leastsq(errfunc, p, args = (Tx, tX, Ty, tY)) 18 time = linspace(Tx.min(), Tx.max(), 100) # Plot of the first data and the fit 19 plot(Tx, tX, "ro", time, fitfunc(p[0], p[1: 4], time), "r-") 20 21 # Plot of the second data and the fit 22 time = linspace(Ty.min(), Ty.max(), 100) 23 plot(Ty, tY, "b^", time, fitfunc(p[0], p[4: 7], time), "b-") 24 25 # Legend the plot 26 title("Oscillations in the compressed trap") 27 xlabel("time [ms]") 28 ylabel("displacement [um]") 29 legend(('x position', 'x fit', 'y position', 'y fit')) 30 31 ax = axes() 32 33 text(0.8, 0.07, 34 'x freq : %.3f kHz' % (1 / p[0]), 35 fontsize = 16, 36 horizontalalignment = 'center', 37 verticalalignment = 'center', 38 transform = ax.transAxes) 39 40 show()
1 from scipy
import optimize
2 from numpy
import *
3
4 class Parameter:
5 def __init__(self, value):
6 self.value = value
7
8 def set(self, value):
9 self.value = value
10
11 def __call__(self):
12
return self.value
13
14 def fit(function, parameters, y, x = None):
15 def f(params):
16 i = 0
17
for p in parameters:
18 p.set(params[i])
19 i += 1
20
return y - function(x)
21
22
if x is None: x = arange(y.shape[0])
23 p = [param() for param in parameters]
24 optimize.leastsq(f, p)
1 # giving initial parameters 2 mu = Parameter(7) 3 sigma = Parameter(3) 4 height = Parameter(5) 5 6 # define your function: 7 def f(x): return height() * exp(-((x - mu()) / sigma()) ** 2) 8 9 # fit!(given that data is an array with the data to fit) 10 fit(f, [mu, sigma, height], data)
1 from pylab
import *
2
3 gaussian = lambda x: 3 * exp(-(30 - x) ** 2 / 20.)
4
5 data = gaussian(arange(100))
6
7 plot(data)
8
9 X = arange(data.size)
10 x = sum(X * data) / sum(data)
11 width = sqrt(abs(sum((X - x) ** 2 * data) / sum(data)))
12
13 max = data.max()
14
15 fit = lambda t: max * exp(-(t - x) ** 2 / (2 * width ** 2))
16
17 plot(fit(X))
18
19 show()
I frequently use power law to study the variation of stiffness with stress and create constitutive laws for materials. Let’s see how to do a power fitting with scipy’s curve_fit and lmfit.,This is where lmfit (my favorite fitting package) comes into play. As the complexity of fitting function and parameter bounds increases curve_fit becomes less accurate and more crumbersome.,Trust me, as soon as your fitting functions become complex you will run towards lmfit with its concise and powerful way of defining and bounding parameters.,Scipy has a powerful module for curve fitting which uses non linear least squares method to fit a function to our data. The documentation is available it https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html.
where, m is usually the slope of the line and c is the intercept when x = 0 and x (Time), y (Stress) is our data. The y axis data is usually the measured value during the experiment/simulation and we are trying to find how the y axis quantity is dependent on the x axis quantity.
# Importing numpy for creating data and matplotlib for plotting import numpy as np import matplotlib.pyplot as plt # Creating x axis data x = np.linspace(0, 10, 100) # Creating random data for y axis # Here the slope(m) is predefined to be 2.39645 # The intercept is 0 # random.normal method just adds some noise to data y = 2.39645 * x + np.random.normal(0, 2, 100) # Plotting the data plt.scatter(x, y, c = 'black') plt.xlabel('Time (sec)') plt.ylabel('Stress (kPa)') plt.show()
To use curve_fit for the above data, we need to define a linear function which will be used to find fitting. The output will be the slope (m) and intercept ( c ) for our data, along with any variance in these values and any fitting statistics ($R^2$ or $\chi^2$).
# Calling the scipy 's curve_fit function from optimize module from scipy.optimize import curve_fit # Defining a fitting fucntion def linear_fit(x, m, c): return m * x + c '' ' 1. Using the curve_fit function to fit the random linear data 2. Params returns an array with the best for values of the different fitting parameters.In our case first entry in params will be the slope m and second entry would be the intercept. 3. Covariance returns a matrix of covariance for our fitted parameters. 4. The first argument f is the defined fitting function. 5. xdata and ydata are the x and y data we generated above. '' ' params, covariance = curve_fit(f = linear_fit, xdata = x, ydata = y) print('Slope (m) is ', params[0]) print('Intercept (c) is ', params[1]) print(covariance)
Slope(m) is 2.430364409972301
Intercept(c) is - 0.11571245549643683[[0.00434152 - 0.02170757]
[-0.02170757 0.14544807]]
Although covariance matrix provides information about the variance of the fitted parameters, mostly everyone uses the standard deviation and coefficient of determination ($R^2$) value to get an estimate of ‘goodness’ of fit. I use it all the time for simpler fittings (linear, exponential, power etc).
# Getting the standard deviation standarddevparams2 = np.sqrt(np.diag(covariance2)) # Getting the R ^ 2 value for the fitting # Read more at https: //en.wikipedia.org/wiki/Coefficient_of_determination # Step 1: Get the residuals for fitting residuals = y - linear_fit(x, params2[0], params2[1]) # Step 2: Get the sum of squares of residual squaresumofresiduals = np.sum(residuals ** 2) # Step 3: Get the total sum of squares using the mean of y values squaresum = np.sum((y - np.mean(y)) ** 2) # Step 4: Get the R ^ 2 R2 = 1 - (squaresumofresiduals / squaresum) print('The slope (m) is ', params2[0], '+-', standarddevparams2[0]) print('The intercept (c) is ', params2[1], '+-', standarddevparams2[1]) print('The R^2 value is ', R2)
Let’s try our linear fitting using lmfit!
'' ' 1. Importing the module.If you dont have lmfit, you can download it on pip or conda.Instructions at https: //lmfit.github.io/lmfit-py/installation.html 2. Parameters is the main parameters class. 3. minimize is the main fitting function.Takes our parameters and spits the best fit parameters. 4. fit_report provides the parameter fit values and different statistical values. '' ' from lmfit import Parameters, minimize, fit_report # Define the fitting function def linear_fitting_lmfit(params, x, y): m = params['m'] c = params['c'] y_fit = m * x + c return y_fit - y # Defining the various parameters params = Parameters() # Slope is bounded between min value of 1.0 and max value of 3.0 params.add('m', min = 1.0, max = 3.0) # Intercept is made fixed at 0.0 value params.add('c', value = 0.0, vary = False) # Calling the minimize function.Args contains the x and y data. fitted_params = minimize(linear_fitting_lmfit, params, args = (x, y, ), method = 'least_squares') # Getting the fitted values m = fitted_params.params['m'].value c = fitted_params.params['c'].value # Printing the fitted values print('The slope (m) is ', m) print('The intercept (c) is ', c) # Pretty printing all the statistical data print(fit_report(fitted_params))