What is a model? How is a linear regression fit and assessed? How we construct a model and make predictions with it?
In our case, a model is simply a (simplified) representation of reality, using mathematical language. This section will deal with regression linear models, starting with a single variable (predictor, independent) used to predict another one (response, dependent). The relation between both variables must be modeled, using a line and some properties of the normal distribution to fit the parameters that define that line. The model is assessed to measure the predictive power as well as if we incur in any violation of the premises concerning the way the line was fit. We will review the assumptions of Normality of the residues, linearity, constant variance (homoscedasticity) and independence. From that basis, we will expand to add more variables, check the effects of multi-collinearity and how to deal from there.
We propose you to try the following tasks to practice the concepts explain in those lectures:
Create a large sequence of numbers following a normal distribution with defined mean=0 and std deviation (σ) using excel/R. That will be the noise in the model.
Create a sequence of numbers, either random, systematic (e.g. 1 to 100) or following a normal distribution. That will be the x in the model.
Create a model. For instance y=2+3x. In this case, β0=2 and β1=3. This is the true model of your data. If you try to make a figure, it will look like a perfect line, with that exact formula and R2=1
Add the noise. That is, to add to y=2+3x the values of step 1.
Now check how the model behaves in the figure. Increase the noise (increase the std deviation (σ) of step 1). How is the R2 changing? Are you being fooled by randomness? Do you see a "better picture" with a larger sample?
How to do it?
Generate random numbers in excel: =RAND() Generate numbers following a normal distribution with mean=100 and st dev=10: =NORMINV(RAND(),100,10)