python - Different results when computing linear regressions with scipy.stats and statsmodels -


i'm getting different values of r^2 (coefficient of determination) when try ols fits these 2 libraries , can't quite figure out why. (some spacing removed convenience)

in [1]: import pandas pd        in [2]: import numpy np in [3]: import statsmodels.api sm in [4]: import scipy.stats in [5]: np.random.seed(100) in [6]: x = np.linspace(0, 10, 100) + 5*np.random.randn(100) in [7]: y = np.arange(100)  in [8]: slope, intercept, r, p, std_err = scipy.stats.linregress(x, y)  in [9]: r**2 out[9]: 0.22045988449873671  in [10]: model = sm.ols(y, x) in [11]: est = model.fit()  in [12]: est.rsquared out[12]: 0.5327910685035413 

what going on here? can't figure out! there error somewhere?

the 0.2205 coming model also has intercept term--the 0.5328 value result if remove intercept.

basically, 1 package modeling y = bx whereas other (helpfully) assumes also intercept term (i.e. y = + bx). [note: advantage of assumption otherwise have take x , bind column of ones every time wanted run regression (or else you'd end biased model)]

check out this post longer discussion.

good luck!


Comments

Popular posts from this blog

commonjs - How to write a typescript definition file for a node module that exports a function? -

ios - Change Storyboard View using Seague -

openid - Okta: Failed to get authorization code through API call -