python - Different results when computing linear regressions with scipy.stats and statsmodels -
i'm getting different values of r^2 (coefficient of determination) when try ols fits these 2 libraries , can't quite figure out why. (some spacing removed convenience)
in [1]: import pandas pd in [2]: import numpy np in [3]: import statsmodels.api sm in [4]: import scipy.stats in [5]: np.random.seed(100) in [6]: x = np.linspace(0, 10, 100) + 5*np.random.randn(100) in [7]: y = np.arange(100) in [8]: slope, intercept, r, p, std_err = scipy.stats.linregress(x, y) in [9]: r**2 out[9]: 0.22045988449873671 in [10]: model = sm.ols(y, x) in [11]: est = model.fit() in [12]: est.rsquared out[12]: 0.5327910685035413
what going on here? can't figure out! there error somewhere?
the 0.2205 coming model also has intercept term--the 0.5328 value result if remove intercept.
basically, 1 package modeling y = bx whereas other (helpfully) assumes also intercept term (i.e. y = + bx). [note: advantage of assumption otherwise have take x , bind column of ones every time wanted run regression (or else you'd end biased model)]
check out this post longer discussion.
good luck!
Comments
Post a Comment