difference between .score() and .predict in the sklearn library?

  • Last Update :
  • Techknowledgy :

Because your y_train is (301, 1) and not (301,) numpy does broadcasting, so

(y_train == model.predict(X_train)).shape == (301, 301)

which is not what you intended. The correct version of your code would be

np.mean(y_train.ravel() == model.predict(X_train))

which will give the same result as

model.score(X_train, y_train)

Suggestion : 2

I have instantiated a SVC object using the sklearn library with the following code:,machine-learningpythonscikit-learnsvm,Python – What’s the difference between lists and tuples,Python – Difference between staticmethod and classmethod

Because your y_train is (301, 1) and not (301,) numpy does broadcasting, so

(y_train == model.predict(X_train)).shape == (301, 301)

which is not what you intended. The correct version of your code would be

np.mean(y_train.ravel() == model.predict(X_train))

which will give the same result as

model.score(X_train, y_train)

Suggestion : 3

model.predict() : given a trained model, predict the label of a new set of data. This method accepts one argument, the new data X_new (e.g. model.predict(X_new)), and returns the learned label for each object in the array.,model.transform() : given an unsupervised model, transform new data into the new basis. This also accepts one argument X_new, and returns the new representation of the data based on the unsupervised model.,model.fit() : fit training data. For supervised learning applications, this accepts two arguments: the data X and the labels y (e.g. model.fit(X, y)). For unsupervised learning applications, this accepts only a single argument, the data X (e.g. model.fit(X)).,Learning the parameters of a prediction function and testing it on the same data is a methodological mistake: a model that would just repeat the labels of the samples that it has just seen would have a perfect score but would fail to predict anything useful on yet-unseen data.

>>> from sklearn.datasets
import load_iris
   >>>
   iris = load_iris()
>>> print(iris.data.shape)
   (150, 4) >>>
   n_samples, n_features = iris.data.shape >>>
   print(n_samples)
150
   >>>
   print(n_features)
4
   >>>
   print(iris.data[0])[5.1 3.5 1.4 0.2]
>>> print(iris.target.shape)
   (150, ) >>>
   print(iris.target)[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2]
>>> print(iris.target_names)['setosa'
   'versicolor'
   'virginica']
>>> from sklearn.linear_model
import LinearRegression
>>> model = LinearRegression(normalize = True) >>>
   print(model.normalize)
True
   >>>
   print(model)
LinearRegression(copy_X = True, fit_intercept = True, n_jobs = 1, normalize = True)