Excel linear regression r2

#Excel linear regression r2 code

In the model with intercept, the comparison sum of squares is around the mean. $R^2$ becomes higher without intercept, not because the model is better, but because the definition of $R^2$ used is another one! $R^2$ is an expression of a comparison of the estimated model with some standard model, expressed as reduction in sum of squares compared to sum of squares with the standard model. If you set $\alpha=0$, then you say that you KNOW that the expected value of $y$ given $x=0$ is zero. Short answer to question in title: (almost) NEVER. (The ellipse was added by hand afterward, though it's easy enough to do in R as well)

#Excel linear regression r2 code

Since people usually ask for it, here's the code for my plot: plot(dist~speed,data=cars,xlim=c(0,30),ylim=c(-5,120))Ībline(glm(dist~speed,data=cars,family=Gamma(link=identity)),col=2,lty=2)Ībline(lm(dist~speed,data=cars),col=4,lty=2) If you don't need the identity link, you might consider other link functions, like the log-link and the inverse link, which relate to the transformations already discussed, but without the need for actual transformation. It's almost as easy as fitting a regression in R. So that's one possible alternative approach that may be worth a try. This model has variance proportional to mean, so if you find your data are more spread as the expected time grows, it may be especially suitable. The red line is the gamma GLM with identity link - while having a negative intercept, it only has positive fitted values. The blue line is the OLS fit the fitted value for the smallest x-values in the data set are negative. The problem is, if you fit an ordinary linear regression, the fitted intercept is quite a way negative, which causes the fitted values to be negative. One might say "oh, but the distance for speed 0 is guaranteed to be 0, so we should omit the intercept" but the problem with that reasoning is that the model is misspecified in several ways, and that argument only works well enough when the model is not misspecified - a linear model with 0 intercept doesn't fit at all well in this case, while one with an intercept is actually a half-decent approximation even though it's not actually "correct". Here's an example: the cars data set in R, which records speed and stopping distances (the response). You should not end up with a negative fitted value for any of your x's (but you might perhaps have convergence issues in some cases if you force the identity link where it really won't fit). You can also look at GLMs which can be used to fit models which have non-negative fitted values and can (if required) even have $E(Y)=X\beta$.įor example, one can fit a gamma GLM with identity link. If you know your response is linear in the predictors, you can attempt to fit a constrained regression, but with multiple regression the exact form you need will depend on your particular x's (there's no one linear constraint that will work for all $x's$), so it's a bit ad-hoc. As an alternative, you might work with speed rather than time - but then with linear fits you may get a problem with small speeds (long times) instead. If your linear model was largely one of convenience (rather than coming from a known functional relationship that might stem from a physical model, say), then you might instead work with log-time the fitted model is then guaranteed to be positive in $t$. The best way to get an always positive fit is to fit something that will always be positive in part that depends on what functions you need to fit. It's unusual to not fit an intercept and generally inadvisable - one should only do so if you know it's 0, but I think that (and the fact that you can't compare the $R^2$ for fits with and without intercept) is well and truly covered already (if possibly a little overstated in the case of the 0 intercept) I want to focus on your main issue which is that you need the fitted function to be positive, though I do return to the 0-intercept issue in part of my answer.