=20
**=20
**

=20
**=20
**

=20
**=20
**

=20
**Models**

## Summary

## Materials

## Tasks

**
**

Blas MOLA-YUDEGO

*What is a model? How is a linear regression fit and assessed? How we=
construct a model and make predictions with it?*

In our case, a model is simply a (simplified) representation of reality,=
using mathematical language. This section will deal with regression linear=
models, starting with a single variable (predictor, independent) used to p=
redict another one (response, dependent). The relation between both variabl=
es must be modeled, using a line and some properties of the normal distribu=
tion to fit the parameters that define that line. The model is assessed to =
measure the predictive power as well as if we incur in any violation of the=
premises concerning the way the line was fit. We will review the assumptio=
ns of Normality of the residues, linearity, constant variance (*homosced=
asticity*) and independence. From that basis, we will expand to add mor=
e variables, check the effects of *multi-collinearity* and how to de=
al from there.

**T3.1 Linear regression model** [lecture]

**T3.2 Model assessment** [lecture]

**T3.3 Multiple regression** [lecture]

Slides from old lectures 2019 [PDF]

We propose you to try the following tasks to practice the concepts expla= in in those lectures:

- Create a large sequence of numbers following a normal distribution with=
defined
**mean**=3D0 and**std devia= tion (=CF=83)**using excel/R. That will be the no= ise in the model. - Create a sequence of numbers, either random, systematic (e.g. 1 to 100)=
or following a normal distribution. That will be the
**x**in= the model. - Create a model. For instance y=3D2+3x. In this case,
*=CE=B2*0=3D2 and1=3D3. This is the true model of your data= . If you try to make a figure, it will look like a perfect line, with that = exact formula and R2=3D1*=CE=B2* - Add the noise. That is, to add to y=3D2+3x the values of step 1.
- Now check how the model behaves in the figure. Increase the noise (incr=
ease the
**std deviation (=CF=83)**of step 1). H= ow is the R2 changing? Are you being fooled by randomness? Do you see a "be= tter picture" with a larger sample?

**How to do it?**

In Excel:

Generate random numbers in excel: =3DRAND()

Generate nu=
mbers following a normal distribution with *mean=3D***100 and st dev=3D10: **

In R:

Check here.

For more instructions, **google **(as I do)!

&nb= sp;

=20
## Reflections

=20

*What are the consequences of violating the main assumptions of linea=
r regression models?*

*What is the role of the interception (**=CE=B2 _{0}*

*How do you decide the variables to be included in a model?*

[These questions may help the student= s to focus and reflect on the topic contents. They do not require to be sub= mitted as an assignment and are not to be evaluated.]

*"All models are wrong. Many are useful. Some are lethal" *(Taleb, echoing Georg=
e Box)

=20

=20
=20

=20
=20
## Objectives

## Schedule

The objectives of this topic are:

- To understand the main assumptions of linear regression models
- To fit linear regression models
- Model assessment and evaluation

**Lecture days**: Third week: 2-6 November

During the lecture we will discuss about the topics in the presentations= , the exercises and possible doubts.

=20

=20
## Data and materials

### Datasets

### Simple regression exercises<=
/h3>

### Multiple regre=
ssion exercises

### Videos and tutorials

Excel for task [xls]

Excel for practice [xls]

Exercise pre-exam [PDF]

Exercise height-volume [PDF]

Exercise heigh-diameter-volume [PDF] [solutions]

Exercise wheat production [PDF]

Exercise height-diameter-value [PDF]=

Exercise barley yields [PDF]

Simple linear regression [youtube]

How to make our own *sandbox* mode=
l [video]

=20