python - Difference between LinearRegression() and Ridge(alpha=0) -
the tikhonov (ridge) cost becomes equivalent least squares cost when alpha parameter approaches zero. on scikit-learn docs subject indicates same. therefore expected
sklearn.linear_model.ridge(alpha=1e-100).fit(data, target)
to equivalent to
sklearn.linear_model.linearregression().fit(data, target)
but that's not case. why?
updated code:
import pandas pd sklearn.linear_model import ridge, linearregression sklearn.preprocessing import polynomialfeatures import matplotlib.pyplot plt %matplotlib inline dataset = pd.read_csv('house_price_data.csv') x = dataset['sqft_living'].reshape(-1, 1) y = dataset['price'].reshape(-1, 1) polyx = polynomialfeatures(degree=15).fit_transform(x) model1 = linearregression().fit(polyx, y) model2 = ridge(alpha=1e-100).fit(polyx, y) plt.plot(x, y,'.', x, model1.predict(polyx),'g-', x, model2.predict(polyx),'r-')
note: plot looks same alpha=1e-8
or alpha=1e-100
according documentation, alpha
must positive float. example has alpha=0
integer. using small positive alpha
, results of ridge
, linearregression
appear converge.
from sklearn.linear_model import ridge, linearregression data = [[0, 0], [1, 1], [2, 2]] target = [0, 1, 2] ridge_model = ridge(alpha=1e-8).fit(data, target) print("ridge coefs: " + str(ridge_model.coef_)) ols = linearregression().fit(data,target) print("ols coefs: " + str(ols.coef_)) # ridge coefs: [ 0.49999999 0.50000001] # ols coefs: [ 0.5 0.5] # # vs. alpha=0: # ridge coefs: [ 1.57009246e-16 1.00000000e+00] # ols coefs: [ 0.5 0.5]
update issue alpha=0
int
above seems issue few toy problems example above.
for housing data, issue 1 of scaling. 15-degree polynomial invoke causing numerical overflow. produce identical results linearregression
, ridge
, try scaling data first:
import pandas pd sklearn.linear_model import ridge, linearregression sklearn.preprocessing import polynomialfeatures, scale dataset = pd.read_csv('house_price_data.csv') # scale x data prevent numerical errors. x = scale(dataset['sqft_living'].reshape(-1, 1)) y = dataset['price'].reshape(-1, 1) polyx = polynomialfeatures(degree=15).fit_transform(x) model1 = linearregression().fit(polyx, y) model2 = ridge(alpha=0).fit(polyx, y) print("ols coefs: " + str(model1.coef_[0])) print("ridge coefs: " + str(model2.coef_[0])) #ols coefs: [ 0.00000000e+00 2.69625315e+04 3.20058010e+04 -8.23455994e+04 # -7.67529485e+04 1.27831360e+05 9.61619464e+04 -8.47728622e+04 # -5.67810971e+04 2.94638384e+04 1.60272961e+04 -5.71555266e+03 # -2.10880344e+03 5.92090729e+02 1.03986456e+02 -2.55313741e+01] #ridge coefs: [ 0.00000000e+00 2.69625315e+04 3.20058010e+04 -8.23455994e+04 # -7.67529485e+04 1.27831360e+05 9.61619464e+04 -8.47728622e+04 # -5.67810971e+04 2.94638384e+04 1.60272961e+04 -5.71555266e+03 # -2.10880344e+03 5.92090729e+02 1.03986456e+02 -2.55313741e+01]
Comments
Post a Comment