There is a package in scipy which does what I want for unnatural splines.

```
import numpy as np
import matplotlib.pyplot as plt
from scipy
import interpolate, randn
x = np.arange(0, 5, 1.0 / 6)
xs = np.arange(0, 5, 1.0 / 500)
y = np.sin(x + 1) + .2 * np.random.rand(len(x)) - .1
knots = np.array([1, 2, 3, 4])
tck = interpolate.splrep(x, y, s = 0, k = 3, t = knots, task = -1)
ys = interpolate.splev(xs, tck, der = 0)
plt.figure()
plt.plot(xs, ys, x, y, 'x')
```

Strictly positive rank-1 array of weights the same length as x and y. The weights are used in computing the weighted least-squares spline fit. If the errors in the y values have standard-deviation given by the vector d, then w should be 1/d. Default is ones(len(x)).,If task=-1 find the weighted least square spline for a given set of knots, t. These should be interior knots as knots on the ends will be added automatically.,This routine zero-pads the coefficients array c to have the same length as the array of knots t (the trailing k + 1 coefficients are ignored by the evaluation routines, splev and BSpline.) This is in contrast with splprep, which does not zero-pad the coefficients.,A tuple (t,c,k) containing the vector of knots, the B-spline coefficients, and the degree of the spline.

```
>>>
import matplotlib.pyplot as plt >>>
from scipy.interpolate
import splev, splrep
>>>
x = np.linspace(0, 10, 10) >>>
y = np.sin(x) >>>
spl = splrep(x, y) >>>
x2 = np.linspace(0, 10, 200) >>>
y2 = splev(x2, spl) >>>
plt.plot(x, y, 'o', x2, y2) >>>
plt.show()
```

First we create the appropriate system of equations and find the coefficients of the cubic splines by solving the system in matrix form.},These equations are linear in the unknown coefficients \(a_i, b_i, c_i\), and \(d_i\). We can put them in matrix form and solve for the coefficients of each spline by left division. Remember that whenever we solve the matrix equation \(Ax = b\) for \(x\), we must make be sure that \(A\) is square and invertible. In the case of finding cubic spline equations, the \(A\) matrix is always square and invertible as long as the \(x_i\) values in the data set are unique.,which gives us \(2(n-1)\) equations. Next, we want each cubic function to join as smoothly with its neighbors as possible, so we constrain the splines to have continuous first and second derivatives at the data points \(i = 2,\ldots,n-1\).,First we know that the cubic functions must intersect the data the points on the left and the right:

```
from scipy.interpolate
import CubicSpline
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('seaborn-poster')
```

x = [0, 1, 2] y = [1, 3, 2] # use bc_type = 'natural' adds the constraints as we described above f = CubicSpline(x, y, bc_type = 'natural') x_new = np.linspace(0, 2, 100) y_new = f(x_new)

```
plt.figure(figsize = (10, 8))
plt.plot(x_new, y_new, 'b')
plt.plot(x, y, 'ro')
plt.title('Cubic Spline Interpolation')
plt.xlabel('x')
plt.ylabel('y')
plt.show()
```

b = np.array([1, 3, 3, 2, 0, 0, 0, 0]) b = b[: , np.newaxis] A = np.array([ [0, 0, 0, 1, 0, 0, 0, 0], [0, 0, 0, 0, 1, 1, 1, 1], [1, 1, 1, 1, 0, 0, 0, 0], \ [0, 0, 0, 0, 8, 4, 2, 1], [3, 2, 1, 0, -3, -2, -1, 0], [6, 2, 0, 0, -6, -2, 0, 0], \ [0, 2, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 12, 2, 0, 0] ])

`np.dot(np.linalg.inv(A), b)`

array([ [-0.75], [0.], [2.75], [1.], [0.75], [-4.5], [7.25], [-0.5] ])

Smoothing splines, as described in class, minimize the sum of squared errors subject to a penalty that depends on the wiggliness of the function. The resulting solution is a cubic spline with knots at every data value that is regularized according to a smoothing parameter.,The function below is one such better transformation of the raw data. It (depending on the parameters) applies a RELU or truncated cubic to the input data. Let's see what that looks like,We use the csaps library to implement smoothing splines. The smoothing parameter takes on values between 0 and 1, and is the weight attached to the error sum of squares of the weighted average between the error sum of squares and the wiggliness penalty. A smoothing parameter of 0 correspondends to a least-squares fit, and a parameter of 1 corresponds to connecting-the-dots.,Notice that we can apply the function h(x) directly in the formula, because we defined it earlier in the notebook.

```
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.formula.api as sm
%
matplotlib inline
```

diab = pd.read_csv("data/diabetes.csv") print("" " # Variables are # subject: subject ID number # age: age diagnosed with diabetes # acidity: a measure of acidity called base deficit # y: natural log of serum C - peptide concentration # Original source is Sockett et al.(1987) # mentioned in Hastie and Tibshirani 's book # "Generalized Additive Models". "" " ) diab.head()

# Variables are # subject: subject ID number # age: age diagnosed with diabetes # acidity: a measure of acidity called base deficit # y: natural log of serum C - peptide concentration # Original source is Sockett et al.(1987) # mentioned in Hastie and Tibshirani 's book # "Generalized Additive Models".

```
diab.plot.scatter(x = 'age', y = 'y', c = 'Red', title = "Diabetes data")
plt.xlabel("Age at Diagnosis")
plt.ylabel("Log C-Peptide Concentration")
plt.show()
```

`fit1_lm = sm.ols('y~age', data = diab).fit()`

```
xpred = pd.DataFrame({
"age": np.arange(0, 16.1, 0.1)
})
```