Home » Logistic Regression Example in Python: Step-by-Step Guide,In this guide, we’ll show a logistic regression example in Python, step-by-step.,At this point, we have the logistic regression model for our example in Python!,Finally, we can fit the logistic regression in Python on our example dataset.
Index(['age', 'sex', 'cp', 'trestbps', 'chol', 'fbs', 'restecg', 'thalach',
'exang', 'oldpeak', 'slope', 'ca', 'thal', 'num '
],
dtype = 'object')
0 188 1 106 Name: target, dtype: int64
['age', 'trestbps', 'chol', 'thalach', 'oldpeak']
['cp_2', 'cp_3', 'cp_4', 'exang', 'fbs', 'restecg_1.0', 'restecg_2.0', 'sex']
LogisticRegression(C = 1.0, class_weight = None, dual = False, fit_intercept = True, intercept_scaling = 1, l1_ratio = None, max_iter = 100, multi_class = 'auto', n_jobs = None, penalty = 'none', random_state = None, solver = 'lbfgs', tol = 0.0001, verbose = 0, warm_start = False)
Log loss = 0.35613
AUC = 0.92424
Average Precision = 0.89045
Using 0.5 as threshold:
Accuracy = 0.83019
Precision = 0.76190
Recall = 0.80000
F1 score = 0.78049
Classification Report
precision recall f1 - score support
0 0.88 0.85 0.86 33
1 0.76 0.80 0.78 20
accuracy 0.83 53
macro avg 0.82 0.82 0.82 53
weighted avg 0.83 0.83 0.83 53
Here in this code, we will import the load_digits data set with the help of the sklearn library. The data is inbuilt in sklearn we do not need to upload the data.
from sklearn.datasets
import load_digits
digits = load_digits()
We can already import the data with the help of sklearn from this uploaded data from the below command we can see that there are 1797 images and 1797 labels in the dataset.
print('Image Data Shape', digits.data.shape)
print("Label Data Shape", digits.target.shape
- plot.figure(figsize=(30,4)) is used for plotting the figure on the screen.
- for index, (image, label) in enumerate(zip(digits.data[5:10], digits.target[5:10])): is used to give the perfect size or label to the image.
- plot.subplot(1, 5, index + 1) is used to plotting the index.
- plot.imshow(np.reshape(image, (8,8)), cmap=plt.cm.gray) is used for reshaping the image.
- plot.title(‘Set: %i\n’ % label, fontsize = 30) is used to give the title to the image.
import numpy as np
import matplotlib.pyplot as plot
plot.figure(figsize = (30, 4))
for index, (image, label) in enumerate(zip(digits.data[5: 10], digits.target[5: 10])):
plot.subplot(1, 5, index + 1)
plot.imshow(np.reshape(image, (8, 8)), cmap = plt.cm.gray)
plot.title('Set: %i\n' % label, fontsize = 30)
Here we import logistic regression from sklearn .sklearn is used to just focus on modeling the dataset.
from sklearn.linear_model
import LogisticRegression
In the below code we make an instance of the model. In here all parameters not specified are set to their defaults.
logisticRegression = LogisticRegression()
Scikit-learn provides a ColumnTransformer class which will send specific columns to a specific transformer, making it easy to fit a single predictive model on a dataset that combines both kinds of variables together (heterogeneously typed tabular data).,It splits the columns of the original dataset based on the column names or indices provided. We will obtain as many subsets as the number of transformers passed into the ColumnTransformer.,The important thing is that ColumnTransformer is like any other scikit-learn transformer. In particular it can be combined with a classifier in a Pipeline:,used a ColumnTransformer to apply different preprocessing for categorical and numerical variables;
import pandas as pd adult_census = pd.read_csv("../datasets/adult-census.csv") # drop the duplicated column `"education-num"` as stated in the first notebook adult_census = adult_census.drop(columns = "education-num") target_name = "class" target = adult_census[target_name] data = adult_census.drop(columns = [target_name])
from sklearn.compose
import make_column_selector as selector
numerical_columns_selector = selector(dtype_exclude = object)
categorical_columns_selector = selector(dtype_include = object)
numerical_columns = numerical_columns_selector(data)
categorical_columns = categorical_columns_selector(data)
from sklearn.preprocessing
import OneHotEncoder, StandardScaler
categorical_preprocessor = OneHotEncoder(handle_unknown = "ignore")
numerical_preprocessor = StandardScaler()
from sklearn.compose
import ColumnTransformer
preprocessor = ColumnTransformer([
('one-hot-encoder', categorical_preprocessor, categorical_columns),
('standard_scaler', numerical_preprocessor, numerical_columns)
])
from sklearn.linear_model
import LogisticRegression
from sklearn.pipeline
import make_pipeline
model = make_pipeline(preprocessor, LogisticRegression(max_iter = 500))
from sklearn
import set_config
set_config(display = 'diagram')
model