why doesn't statsmodels glm have r^2 in results?

  • Last Update :
  • Techknowledgy :

Not sure about why its implemented this way, but the SO answer above and some wikipedia-ing allowed me to calculate the R^2 by hand easily:

sst_val = sum(map(lambda x: np.power(x, 2), y - np.mean(y)))
sse_val = sum(map(lambda x: np.power(x, 2), m1.resid_response))
r2 = 1.0 - sse_val / sst_val

Suggestion : 2

The formula for Adjusted-R² is:,Using our formula for Deviance:,We will now state the formula for R² in terms of RSS and TSS as follows:,Adjusted-R² has some problems, notably:

import pandas as pd
from matplotlib
import pyplot as plt
from statsmodels.regression.linear_model
import OLS as OLS
import statsmodels.api as sm
df = pd.read_csv('taiwan_real_estate_valuation_curated.csv', header = 0)
y = df['HOUSE_PRICE_PER_UNIT_AREA']
X = df['HOUSE_AGE_YEARS']
X = sm.add_constant(X)
olsr_model = OLS(endog = y, exog = X)
olsr_results = olsr_model.fit()

print(olsr_results.summary())
y = df['HOUSE_PRICE_PER_UNIT_AREA']
X = df[['HOUSE_AGE_YEARS', 'NUM_CONVENIENCE_STORES_IN_AREA']]
X = sm.add_constant(X)

olsr_model = OLS(endog = y, exog = X)
olsr_results = olsr_model.fit()

Suggestion : 3

Note: If your response column is binomial, then you must convert that column to a categorical (.asfactor() in Python and as.factor() in R) and set family = binomial. The following configurations can lead to unexpected results.,and the response is Enum with cardinality = 2, then the family is automatically determined as binomial.,The following table describes the allowed Family/Link combinations.,and the response is Enum with cardinality > 2, then the family is automatically determined as multinomial.

library(h2o)
h2o.init()

df < -h2o.importFile("https://h2o-public-test-data.s3.amazonaws.com/smalldata/prostate/prostate.csv")
df$CAPSULE < -as.factor(df$CAPSULE)
df$RACE < -as.factor(df$RACE)
df$DCAPS < -as.factor(df$DCAPS)
df$DPROS < -as.factor(df$DPROS)

predictors < -c("AGE", "RACE", "VOL", "GLEASON")
response < -"CAPSULE"

prostate_glm < -h2o.glm(family = "binomial",
   x = predictors,
   y = response,
   training_frame = df,
   lambda = 0,
   compute_p_values = TRUE)

# Coefficients that can be applied to the non - standardized data
h2o.coef(prostate_glm)
Intercept RACE .1 RACE .2 AGE VOL GLEASON
   -
   6.67515539 - 0.44278752 - 0.58992326 - 0.01788870 - 0.01278335 1.25035939

# Coefficients fitted on the standardized data(requires standardize = TRUE, which is on by
   default)
h2o.coef_norm(prostate_glm)
Intercept RACE .1 RACE .2 AGE VOL GLEASON
   -
   0.07610006 - 0.44278752 - 0.58992326 - 0.11676080 - 0.23454402 1.36533415

# Print the coefficients table
prostate_glm @model$coefficients_table
Coefficients: glm coefficients
names coefficients std_error z_value p_value standardized_coefficients
1 Intercept - 6.675155 1.931760 - 3.455478 0.000549 - 0.076100
2 RACE .1 - 0.442788 1.324231 - 0.334373 0.738098 - 0.442788
3 RACE .2 - 0.589923 1.373466 - 0.429514 0.667549 - 0.589923
4 AGE - 0.017889 0.018702 - 0.956516 0.338812 - 0.116761
5 VOL - 0.012783 0.007514 - 1.701191 0.088907 - 0.234544
6 GLEASON 1.250359 0.156156 8.007103 0.000000 1.365334

# Print the standard error
prostate_glm @model$coefficients_table$std_error
   [1] 1.931760363 1.324230832 1.373465793 0.018701933 0.007514354 0.156156271

# Print the p values
prostate_glm @model$coefficients_table$p_value
   [1] 5.493181e-04 7.380978e-01 6.675490e-01 3.388116e-01 8.890718e-02[6] 1.221245e-15

# Print the z values
prostate_glm @model$coefficients_table$z_value
   [1] - 3.4554780 - 0.3343734 - 0.4295143 - 0.9565159 - 1.7011907 8.0071033

# Retrieve a graphical plot of the standardized coefficient magnitudes
h2o.std_coef_plot(prostate_glm)
# Retrieve all model attributes:
   prostate_glm @model$model_summary
GLM Model: summary
family link regularization number_of_predictors_total
1 binomial logit None 5
number_of_active_predictors number_of_iterations training_frame
1 5 4 RTMP_sid_8b2d_6

# Retrieve a specific model attribute(
      for example, the number of active predictors):
   prostate_glm @model$model_summary['number_of_active_predictors']
number_of_active_predictors
1 5
# Retrieve all model attributes:
   prostate_glm._model_json["output"]['model_summary']
GLM Model: summary
family link regularization number_of_predictors_total number_of_active_predictors number_of_iterations training_frame
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
binomial logit None 5 5 4 py_4_sid_9981

# Retrieve a specific model attribute(
      for example, the number of active predictors):
   prostate_glm._model_json["output"]['model_summary']['number_of_active_predictors']
   ['5']
solver = "IRLSM"
lambda = 0
remove_collinear_columns = TRUE
compute_p_values = TRUE