Not sure about why its implemented this way, but the SO answer above and some wikipedia-ing allowed me to calculate the R^2 by hand easily:
sst_val = sum(map(lambda x: np.power(x, 2), y - np.mean(y))) sse_val = sum(map(lambda x: np.power(x, 2), m1.resid_response)) r2 = 1.0 - sse_val / sst_val
The formula for Adjusted-R² is:,Using our formula for Deviance:,We will now state the formula for R² in terms of RSS and TSS as follows:,Adjusted-R² has some problems, notably:
import pandas as pd
from matplotlib
import pyplot as plt
from statsmodels.regression.linear_model
import OLS as OLS
import statsmodels.api as sm
df = pd.read_csv('taiwan_real_estate_valuation_curated.csv', header = 0)
y = df['HOUSE_PRICE_PER_UNIT_AREA']
X = df['HOUSE_AGE_YEARS']
X = sm.add_constant(X)
olsr_model = OLS(endog = y, exog = X) olsr_results = olsr_model.fit() print(olsr_results.summary())
y = df['HOUSE_PRICE_PER_UNIT_AREA']
X = df[['HOUSE_AGE_YEARS', 'NUM_CONVENIENCE_STORES_IN_AREA']]
X = sm.add_constant(X)
olsr_model = OLS(endog = y, exog = X)
olsr_results = olsr_model.fit()
Note: If your response column is binomial, then you must convert that column to a categorical (.asfactor() in Python and as.factor() in R) and set family = binomial. The following configurations can lead to unexpected results.,and the response is Enum with cardinality = 2, then the family is automatically determined as binomial.,The following table describes the allowed Family/Link combinations.,and the response is Enum with cardinality > 2, then the family is automatically determined as multinomial.
library(h2o) h2o.init() df < -h2o.importFile("https://h2o-public-test-data.s3.amazonaws.com/smalldata/prostate/prostate.csv") df$CAPSULE < -as.factor(df$CAPSULE) df$RACE < -as.factor(df$RACE) df$DCAPS < -as.factor(df$DCAPS) df$DPROS < -as.factor(df$DPROS) predictors < -c("AGE", "RACE", "VOL", "GLEASON") response < -"CAPSULE" prostate_glm < -h2o.glm(family = "binomial", x = predictors, y = response, training_frame = df, lambda = 0, compute_p_values = TRUE) # Coefficients that can be applied to the non - standardized data h2o.coef(prostate_glm) Intercept RACE .1 RACE .2 AGE VOL GLEASON - 6.67515539 - 0.44278752 - 0.58992326 - 0.01788870 - 0.01278335 1.25035939 # Coefficients fitted on the standardized data(requires standardize = TRUE, which is on by default) h2o.coef_norm(prostate_glm) Intercept RACE .1 RACE .2 AGE VOL GLEASON - 0.07610006 - 0.44278752 - 0.58992326 - 0.11676080 - 0.23454402 1.36533415 # Print the coefficients table prostate_glm @model$coefficients_table Coefficients: glm coefficients names coefficients std_error z_value p_value standardized_coefficients 1 Intercept - 6.675155 1.931760 - 3.455478 0.000549 - 0.076100 2 RACE .1 - 0.442788 1.324231 - 0.334373 0.738098 - 0.442788 3 RACE .2 - 0.589923 1.373466 - 0.429514 0.667549 - 0.589923 4 AGE - 0.017889 0.018702 - 0.956516 0.338812 - 0.116761 5 VOL - 0.012783 0.007514 - 1.701191 0.088907 - 0.234544 6 GLEASON 1.250359 0.156156 8.007103 0.000000 1.365334 # Print the standard error prostate_glm @model$coefficients_table$std_error [1] 1.931760363 1.324230832 1.373465793 0.018701933 0.007514354 0.156156271 # Print the p values prostate_glm @model$coefficients_table$p_value [1] 5.493181e-04 7.380978e-01 6.675490e-01 3.388116e-01 8.890718e-02[6] 1.221245e-15 # Print the z values prostate_glm @model$coefficients_table$z_value [1] - 3.4554780 - 0.3343734 - 0.4295143 - 0.9565159 - 1.7011907 8.0071033 # Retrieve a graphical plot of the standardized coefficient magnitudes h2o.std_coef_plot(prostate_glm)
# Retrieve all model attributes:
prostate_glm @model$model_summary
GLM Model: summary
family link regularization number_of_predictors_total
1 binomial logit None 5
number_of_active_predictors number_of_iterations training_frame
1 5 4 RTMP_sid_8b2d_6
# Retrieve a specific model attribute(
for example, the number of active predictors):
prostate_glm @model$model_summary['number_of_active_predictors']
number_of_active_predictors
1 5
# Retrieve all model attributes:
prostate_glm._model_json["output"]['model_summary']
GLM Model: summary
family link regularization number_of_predictors_total number_of_active_predictors number_of_iterations training_frame
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
binomial logit None 5 5 4 py_4_sid_9981
# Retrieve a specific model attribute(
for example, the number of active predictors):
prostate_glm._model_json["output"]['model_summary']['number_of_active_predictors']
['5']
solver = "IRLSM"
lambda = 0
remove_collinear_columns = TRUE
compute_p_values = TRUE