You are only plotting the same x points as the original data in your call:
ax.plot(M, V, 'go', label = 'data')
ax.plot(M, func(M, * popt), '-', label = 'fit')
To fix this, you can use a wider range - here we use all the values from 700 to 1200:
toplot = np.arange(700, 1200)
ax.plot(toplot, func(toplot, * popt), '-', label = 'fit')
Fitting a function which describes the expected occurence of data points to real data is often required in scientific applications. A possible optimizer for this task is curve_fit from scipy.optimize. In the following, an example of application of curve_fit is given., Fitting functions with scipy.optimize curve_fit , Fitting functions with scipy.optimize curve_fit ,Suppose there is a peak of normally (gaussian) distributed data (mean: 3.0, standard deviation: 0.3) in an exponentially decaying background. This distribution can be fitted with curve_fit within a few steps:
In this example, the observed y values are the heights of the histogram bins, while the observed x values are the centers of the histogram bins (binscenters
). It is necessary to pass the name of the fit function, the x values and the y values to curve_fit
. Furthermore, an optional argument containing rough estimates for the fit parameters can be given with p0
. curve_fit
returns popt
and pcov
, where popt
contains the fit results for the parameters, while pcov
is the covariance matrix, the diagonal elements of which represent the variance of the fitted parameters.
# 1.) Necessary imports.
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize
import curve_fit
# 2.) Define fit
function.
def fit_function(x, A, beta, B, mu, sigma):
return (A * np.exp(-x / beta) + B * np.exp(-1.0 * (x - mu) ** 2 / (2 * sigma ** 2)))
# 3.) Generate exponential and gaussian data and histograms.
data = np.random.exponential(scale = 2.0, size = 100000)
data2 = np.random.normal(loc = 3.0, scale = 0.3, size = 15000)
bins = np.linspace(0, 6, 61)
data_entries_1, bins_1 = np.histogram(data, bins = bins)
data_entries_2, bins_2 = np.histogram(data2, bins = bins)
# 4.) Add histograms of exponential and gaussian data.
data_entries = data_entries_1 + data_entries_2
binscenters = np.array([0.5 * (bins[i] + bins[i + 1]) for i in range(len(bins) - 1)])
# 5.) Fit the
function to the histogram data.
popt, pcov = curve_fit(fit_function, xdata = binscenters, ydata = data_entries, p0 = [20000, 2.0, 2000, 3.0, 0.3])
print(popt)
# 6.)
# Generate enough x values to make the curves look smooth.
xspace = np.linspace(0, 6, 100000)
# Plot the histogram and the fitted
function.
plt.bar(binscenters, data_entries, width = bins[1] - bins[0], color = 'navy', label = r 'Histogram entries')
plt.plot(xspace, fit_function(xspace, * popt), color = 'darkorange', linewidth = 2.5, label = r 'Fitted function')
# Make the plot nicer.
plt.xlim(0, 6)
plt.xlabel(r 'x axis')
plt.ylabel(r 'Number of entries')
plt.title(r 'Exponential decay with gaussian peak')
plt.legend(loc = 'best')
plt.show()
plt.clf()
Scipy has a powerful module for curve fitting which uses non linear least squares method to fit a function to our data. The documentation is available it https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html.,The easiest way to fit a function to a data would be to import that data in Excel and use its predefined Trendline function. The Trendline option is quite robust for common set of function (linear, power, exponential etc) but it lacks in complexity and rigorosity often required in engineering applications. This is where our best friend Python comes into picture.,Power law fitting is probably one of the most common type of fitting in engineering. Most engineering parameters saturate after some time which is captured by a decreasing power function. The basic equation is described as:,2. Using Scipy’s curve_fit module What if we want the line to pass through origin (x=0,y=0)? In other words, what if we want the intercept to be zero? What if we want some statistics parameters ($R^2$)?
where, m is usually the slope of the line and c is the intercept when x = 0 and x (Time), y (Stress) is our data. The y axis data is usually the measured value during the experiment/simulation and we are trying to find how the y axis quantity is dependent on the x axis quantity.
# Importing numpy for creating data and matplotlib for plotting import numpy as np import matplotlib.pyplot as plt # Creating x axis data x = np.linspace(0, 10, 100) # Creating random data for y axis # Here the slope(m) is predefined to be 2.39645 # The intercept is 0 # random.normal method just adds some noise to data y = 2.39645 * x + np.random.normal(0, 2, 100) # Plotting the data plt.scatter(x, y, c = 'black') plt.xlabel('Time (sec)') plt.ylabel('Stress (kPa)') plt.show()
To use curve_fit for the above data, we need to define a linear function which will be used to find fitting. The output will be the slope (m) and intercept ( c ) for our data, along with any variance in these values and any fitting statistics ($R^2$ or $\chi^2$).
# Calling the scipy 's curve_fit function from optimize module from scipy.optimize import curve_fit # Defining a fitting fucntion def linear_fit(x, m, c): return m * x + c '' ' 1. Using the curve_fit function to fit the random linear data 2. Params returns an array with the best for values of the different fitting parameters.In our case first entry in params will be the slope m and second entry would be the intercept. 3. Covariance returns a matrix of covariance for our fitted parameters. 4. The first argument f is the defined fitting function. 5. xdata and ydata are the x and y data we generated above. '' ' params, covariance = curve_fit(f = linear_fit, xdata = x, ydata = y) print('Slope (m) is ', params[0]) print('Intercept (c) is ', params[1]) print(covariance)
Slope(m) is 2.430364409972301
Intercept(c) is - 0.11571245549643683[[0.00434152 - 0.02170757]
[-0.02170757 0.14544807]]
Although covariance matrix provides information about the variance of the fitted parameters, mostly everyone uses the standard deviation and coefficient of determination ($R^2$) value to get an estimate of ‘goodness’ of fit. I use it all the time for simpler fittings (linear, exponential, power etc).
# Getting the standard deviation standarddevparams2 = np.sqrt(np.diag(covariance2)) # Getting the R ^ 2 value for the fitting # Read more at https: //en.wikipedia.org/wiki/Coefficient_of_determination # Step 1: Get the residuals for fitting residuals = y - linear_fit(x, params2[0], params2[1]) # Step 2: Get the sum of squares of residual squaresumofresiduals = np.sum(residuals ** 2) # Step 3: Get the total sum of squares using the mean of y values squaresum = np.sum((y - np.mean(y)) ** 2) # Step 4: Get the R ^ 2 R2 = 1 - (squaresumofresiduals / squaresum) print('The slope (m) is ', params2[0], '+-', standarddevparams2[0]) print('The intercept (c) is ', params2[1], '+-', standarddevparams2[1]) print('The R^2 value is ', R2)
Let’s try our linear fitting using lmfit!
'' ' 1. Importing the module.If you dont have lmfit, you can download it on pip or conda.Instructions at https: //lmfit.github.io/lmfit-py/installation.html 2. Parameters is the main parameters class. 3. minimize is the main fitting function.Takes our parameters and spits the best fit parameters. 4. fit_report provides the parameter fit values and different statistical values. '' ' from lmfit import Parameters, minimize, fit_report # Define the fitting function def linear_fitting_lmfit(params, x, y): m = params['m'] c = params['c'] y_fit = m * x + c return y_fit - y # Defining the various parameters params = Parameters() # Slope is bounded between min value of 1.0 and max value of 3.0 params.add('m', min = 1.0, max = 3.0) # Intercept is made fixed at 0.0 value params.add('c', value = 0.0, vary = False) # Calling the minimize function.Args contains the x and y data. fitted_params = minimize(linear_fitting_lmfit, params, args = (x, y, ), method = 'least_squares') # Getting the fitted values m = fitted_params.params['m'].value c = fitted_params.params['c'].value # Printing the fitted values print('The slope (m) is ', m) print('The intercept (c) is ', c) # Pretty printing all the statistical data print(fit_report(fitted_params))
Least square problems occur often when fitting a non-linear to data. While it is possible to construct our optimization problem ourselves, scipy provides a helper function for this purpose: scipy.optimize.curve_fit():,Mathematical optimization deals with the problem of finding numerically minimums (or maximums or zeros) of a function. In this context, the function is called cost function, or objective function, or energy.,scipy provides scipy.optimize.minimize() to find the minimum of scalar functions of one or more variables. The simple conjugate gradient method can be used by setting the parameter method to CG,The scale of an optimization problem is pretty much set by the dimensionality of the problem, i.e. the number of scalar variables on which the search is performed.
>>> from scipy
import optimize
>>>
def f(x):
...
return -np.exp(-(x - 0.7) ** 2) >>>
result = optimize.minimize_scalar(f) >>>
result.success # check
if solver was successful
True
>>>
x_min = result.x >>>
x_min
0.699999999...
>>>
x_min - 0.7 -
2.16...e - 10
>>> def f(x): # The rosenbrock
function
...
return .5 * (1 - x[0]) ** 2 + (x[1] - x[0] ** 2) ** 2 >>>
optimize.minimize(f, [2, -1], method = "CG")
fun: 1.6...e - 11
jac: array([-6.15...e - 06, 2.53...e - 07])
message: ...'Optimization terminated successfully.'
nfev: 108
nit: 13
njev: 27
status: 0
success: True
x: array([0.99999..., 0.99998...])
>>> def jacobian(x):
...
return np.array((-2 * .5 * (1 - x[0]) - 4 * x[0] * (x[1] - x[0] ** 2), 2 * (x[1] - x[0] ** 2))) >>>
optimize.minimize(f, [2, 1], method = "CG", jac = jacobian)
fun: 2.957...e - 14
jac: array([7.1825...e - 07, -2.9903...e - 07])
message: 'Optimization terminated successfully.'
nfev: 16
nit: 8
njev: 16
status: 0
success: True
x: array([1.0000..., 1.0000...])
>>> def f(x): # The rosenbrock
function
...
return .5 * (1 - x[0]) ** 2 + (x[1] - x[0] ** 2) ** 2 >>>
def jacobian(x):
...
return np.array((-2 * .5 * (1 - x[0]) - 4 * x[0] * (x[1] - x[0] ** 2), 2 * (x[1] - x[0] ** 2))) >>>
optimize.minimize(f, [2, -1], method = "Newton-CG", jac = jacobian)
fun: 1.5...e - 15
jac: array([1.0575...e - 07, -7.4832...e - 08])
message: ...'Optimization terminated successfully.'
nfev: 11
nhev: 0
nit: 10
njev: 52
status: 0
success: True
x: array([0.99999..., 0.99999...])
>>> def hessian(x): # Computed with sympy
...
return np.array(((1 - 4 * x[1] + 12 * x[0] ** 2, -4 * x[0]), (-4 * x[0], 2))) >>>
optimize.minimize(f, [2, -1], method = "Newton-CG", jac = jacobian, hess = hessian)
fun: 1.6277...e - 15
jac: array([1.1104...e - 07, -7.7809...e - 08])
message: ...'Optimization terminated successfully.'
nfev: 11
nhev: 10
nit: 10
njev: 20
status: 0
success: True
x: array([0.99999..., 0.99999...])
>>> def f(x): # The rosenbrock
function
...
return .5 * (1 - x[0]) ** 2 + (x[1] - x[0] ** 2) ** 2 >>>
def jacobian(x):
...
return np.array((-2 * .5 * (1 - x[0]) - 4 * x[0] * (x[1] - x[0] ** 2), 2 * (x[1] - x[0] ** 2))) >>>
optimize.minimize(f, [2, -1], method = "BFGS", jac = jacobian)
fun: 2.6306...e - 16
hess_inv: array([
[0.99986..., 2.0000...],
[2.0000..., 4.498...]
])
jac: array([6.7089...e - 08, -3.2222...e - 08])
message: ...'Optimization terminated successfully.'
nfev: 10
nit: 8
njev: 10
status: 0
success: True
x: array([1., 0.99999...])