Simple Linear Regression

Posted on May 21, 2024 in Mathematics

BIOS 7020: Introductory Biostatistics II Fall 2018

Hanwen Huang, Ph.D.

Department of Epidemiology & Biostatistics

gif;base64,R0lGODlhBgAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhAwAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhBAAFAHcAMSH+GlNvZnR3Y O2wKJKo8mEhBIlZAOSZAIU4YyjWFJJax0AB8W0qk gif;base64,R0lGODlhFgAHAHcAMSH+GlNvZnR3Y OoA2DY6SxCeqr8ipyO0RtJAEfVBEZ6PCxMpgMQoc gif;base64,R0lGODlhBwAHAHcAMSH+GlNvZnR3Y CoVfBNNcIyFYscSO1yJLMOhpeQlH7zGyIoIZjPBo College of Public Health University of Georgia huanghw@uga.edu

gif;base64,R0lGODlhBwAGAHcAMSH+GlNvZnR3Y Data Summaries (Mean, Median, . . .) Introduction to Probability

gif;base64,R0lGODlhBwAGAHcAMSH+GlNvZnR3Y Inference for one sample problem: estimates, CIs and tests for

gif;base64,R0lGODlhDQAHAHcAMSH+GlNvZnR3Y continuous response: mean binary response: proportion

gif;base64,R0lGODlhBwAGAHcAMSH+GlNvZnR3Y Inference for two sample problem: estimates, CIs and tests for

BB9yUs06yDPBeAgA7 continuous response: mean binary response: proportion

gif;base64,R0lGODlhAwAEAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhFgAIAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhCAAIAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhGgAIAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhBwAGAHcAMSH+GlNvZnR3Y Power Analysis and Sample Size Determination

gif;base64,R0lGODlhBgAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhBAAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhBwAHAHcAMSH+GlNvZnR3Y Introduction to linear regression

More than two groups (variables) are of interest

Linear regression

gif;base64,R0lGODlhDQAIAHcAMSH+GlNvZnR3Y Simple linear regression

gif;base64,R0lGODlhDQAHAHcAMSH+GlNvZnR3Y Multiple linear regression

gif;base64,R0lGODlhBwAGAHcAMSH+GlNvZnR3Y ANOVA

gif;base64,R0lGODlhDQAHAHcAMSH+GlNvZnR3Y One-way ANOVA Two-way ANOVA

gif;base64,R0lGODlhCAAHAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhGgAIAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhBwAGAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhDQAHAHcAMSH+GlNvZnR3Y Categorical data analysis (binary data) Comparisons of proportions or odds Logistic regression

gif;base64,R0lGODlhBwAHAHcAMSH+GlNvZnR3Y Survival Analysis

BB9yUs06yCvBeAgA7 Basic concepts

gif;base64,R0lGODlhBgAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhBAAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhAwAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhFgAHAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhDQAHAHcAMSH+GlNvZnR3Y Proportional hazards regression

gif;base64,R0lGODlhBgAGAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhBAAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhAwAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhCAAIAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhBwAGAHcAMSH+GlNvZnR3Y Motivation examples Regression model Parameter estimation Hypothesis test Prediction

Bckground:

gif;base64,R0lGODlhBwAGAHcAMSH+GlNvZnR3Y Time from exposure to AIDS Immunological status ranging from 0 to 10

Sample: 10 haemophilia patients

Objective: Predict number of exposure months as a function of immunologic status

gif;base64,R0lGODlhBgAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhBAAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhAwAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhCAAHAHcAMSH+GlNvZnR3Y Reference: Schock, M.A., and Remington, R.D. (2000). Statistics with Applications to the Biological and Health Sciences. Upper Saddle River, NJ: Prentice Hall.

Data

Immunologic Status Exposure Time (months)

gif;base64,R0lGODlhMwECAHcAMSH+GlNvZnR3Y 8 4

9 2

3 15

7 6

5 7

8 3

4 12

6 9

3 17

4 11

gif;base64,R0lGODlhMwECAHcAMSH+GlNvZnR3Y Questions:

gif;base64,R0lGODlhCAAIAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhCwAKAHcAMSH+GlNvZnR3Y 1Does there appear to be any relationship between the two variables?

gif;base64,R0lGODlhBgAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhBAAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhAwAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhCwALAHcAMSH+GlNvZnR3Y 2If so, what is the direction of that relationship?

Scatter Plot

gif;base64,R0lGODlhBgAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhBAAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhAwAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhCAAHAHcAMSH+GlNvZnR3Y

Simple Linear Regression is used when:

gif;base64,R0lGODlhBwAHAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhBwAGAHcAMSH+GlNvZnR3Y We are interested in the relationship between two variables; Both variables are continuous;

gif;base64,R0lGODlhBgAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhBAAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhAwAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhCAAHAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhBwAGAHcAMSH+GlNvZnR3Y We wish to predict the value of one variable from the value of the other.

Two types of variables:

0JADs= Dependent Variable. Sometimes called the ”variable of interest” or Y -variable

gif;base64,R0lGODlhBwAHAHcAMSH+GlNvZnR3Y Independent Variable. Sometimes called the explanatory variable, predictor variable or X -variable

Example:

gif;base64,R0lGODlhBwAGAHcAMSH+GlNvZnR3Y Dependent Variable: Exposure Time

gif;base64,R0lGODlhBgAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhBAAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhAwAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhCAAHAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhBwAHAHcAMSH+GlNvZnR3Y Independent Variable: Immunologic Status

gif;base64,R0lGODlhBwAGAHcAMSH+GlNvZnR3Y n = Sample Size

gif;base64,R0lGODlhBwAGAHcAMSH+GlNvZnR3Y xi= Independent Variable for Subject i (i = 1, · · · , n)

gif;base64,R0lGODlhBgAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhBAAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhAwAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhCAAHAHcAMSH+GlNvZnR3Y 0JADs= yi= Dependent Variable for Subject i (i = 1, · · · , n)

yi= α + βxi+ εi

gif;base64,R0lGODlhBwAGAHcAMSH+GlNvZnR3Y α = y-intercept

gif;base64,R0lGODlhBwAHAHcAMSH+GlNvZnR3Y β = Slope

gif;base64,R0lGODlhBwAGAHcAMSH+GlNvZnR3Y εi= (Random) Error

gif;base64,R0lGODlhBgAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhBAAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhAwAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhCAAHAHcAMSH+GlNvZnR3Y 2Q==

Assumptions

gif;base64,R0lGODlhCwAKAHcAMSH+GlNvZnR3Y 1The data are realized from a linear regression model

yi= α + βxi+ εi

gif;base64,R0lGODlhCwALAHcAMSH+GlNvZnR3Y 2The errors have population mean of zero

E (εi) = 0

gif;base64,R0lGODlhCwAKAHcAMSH+GlNvZnR3Y 3Homoskedasticity: The variance of the errors does not depent on X

var (εi) = σ2

gif;base64,R0lGODlhCwALAHcAMSH+GlNvZnR3Y 4Independence: The subjects (sample units) are independently sampled

gif;base64,R0lGODlhCAAHAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhCwAKAHcAMSH+GlNvZnR3Y 5Normality: The errors are sampled from a normal distribution

gif;base64,R0lGODlhBgAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhBAAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhAwAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhCwALAHcAMSH+GlNvZnR3Y 6The independent variable is measured without error

Parameter Estimation

The parameters α and β are estimated using the Method of Least

Squares.

Define the estimated regression line

where

yi= a + bxi

gif;base64,R0lGODlhBwAHAHcAMSH+GlNvZnR3Y a = Estimated value of the y-intercept α

gif;base64,R0lGODlhBAAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhBwAGAHcAMSH+GlNvZnR3Y b = Estimated value of the slope β

gif;base64,R0lGODlhBgAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhAwAFAHcAMSH+GlNvZnR3Y AVC3blgAAOw== gif;base64,R0lGODlhBwAGAHcAMSH+GlNvZnR3Y

yi= Fitted value of the dependent variable yiwhen the independent variable is equal to xi

Residual

9k=

Define: di= yi− yi= yi− a − bxi

Minimize the sum of the squared residuals:

n n

gif;base64,R0lGODlhBgAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhBAAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhAwAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhCAAHAHcAMSH+GlNvZnR3Y

S = X d2= X (yi− a − bxi)2

i=1

Important quantities

gif;base64,R0lGODlhBwAGAHcAMSH+GlNvZnR3Y

Sum of Squares for x (x¯ = Pn

xi/n)

n n n!2

gif;base64,R0lGODlhCgACAHcAMSH+GlNvZnR3Y

Lxx= X (xi− x)2= X x2−

Xxi/n

i=1

gif;base64,R0lGODlhBwAHAHcAMSH+GlNvZnR3Y Sum of Squares for y (y¯ = Pn

i=1

yi/n)

i=1

n n

n!2

gif;base64,R0lGODlhCgACAHcAMSH+GlNvZnR3Y

Lyy= X (yi− y)2= X y2−

Xyi/n

i=1

gif;base64,R0lGODlhBwAGAHcAMSH+GlNvZnR3Y Sum of Cross Products

i=1

gif;base64,R0lGODlhCAAHAHcAMSH+GlNvZnR3Y !n!

gif;base64,R0lGODlhCgACAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhCwACAHcAMSH+GlNvZnR3Y Lxy= X (xi− x) (yi− y) = X xiyi−

Xxi

Xyi/n

i=1

Estimated regression coefficients:

i=1

gif;base64,R0lGODlhGgAHAHcAMSH+GlNvZnR3Y

gif;base64,R0lGODlhBgAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhBAAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhAwAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhFgAHAHcAMSH+GlNvZnR3Y TWfaK4rqlNBAA7 KJIL8UBLBUJWUMgC8qqy4AEYBEFgIAOw== gif;base64,R0lGODlhCgACAHcAMSH+GlNvZnR3Y b = Lxyand a = y − bx

Lxx

Status (xi ) Time (yi ) x2 y 2 xi yi

i i

Sample means:

8 4 64 16 32

9 2 81 4 18

3 15 9 225 45

7 6 49 36 42

5 7 25 49 35

8 3 64 9 24

4 12 16 144 48

6 9 36 81 54

3 17 9 289 51

4 11 16 121 44

gif;base64,R0lGODlhCAAHAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhEAECAHcAMSH+GlNvZnR3Y 57 86 369 974 393

gif;base64,R0lGODlhBgAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhBAAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhAwAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhCgACAHcAMSH+GlNvZnR3Y x = 5.7 and y = 8.6

Summary: Lxx= 44.1, Lxy= −97.2, Lyy= 234.4. The estimated regression coefficients are:

Lxy

and

b =

Lxx

= −2.204

gif;base64,R0lGODlhCgACAHcAMSH+GlNvZnR3Y a = y − bx

gif;base64,R0lGODlhBgAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhBAAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhAwAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhCAAIAHcAMSH+GlNvZnR3Y = 8.6 − (−2.204) × 5.7 = 8.6 + 12.6

= 21.2

In summary, we obtain the estimated (fitted) regression line

y = a + bx

= 21.2 − 2.204x

gif;base64,R0lGODlhCAAIAHcAMSH+GlNvZnR3Y Suppose that a subject with unknown exposure time has an immunologic status of x = 5. Then the predicted exposure time for that subject is

y = 21.2 − 2.204 5

gif;base64,R0lGODlhBgAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhBAAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhAwAFAHcAMSH+GlNvZnR3Y = 10.2 months

Reference: Bruce, Kusumi, and Hosner. (1973). Maximal oxygen intake and nomographic assessment of functional aerobic impairment in cardiovascular disease. American Heart Journal 65,

gif;base64,R0lGODlhBwAGAHcAMSH+GlNvZnR3Y 546-562. The data contain variables: Case

gif;base64,R0lGODlhBwAGAHcAMSH+GlNvZnR3Y Duration (seconds)

gif;base64,R0lGODlhBwAGAHcAMSH+GlNvZnR3Y VO2Max

gif;base64,R0lGODlhBAAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhBwAHAHcAMSH+GlNvZnR3Y Heart Rate (beats per minute) Age (years)

gif;base64,R0lGODlhBwAHAHcAMSH+GlNvZnR3Y Height (cm) Weight (kg)

gif;base64,R0lGODlhBgAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhAwAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhCAAHAHcAMSH+GlNvZnR3Y Objective: Predict VO2Max as a function of duration of exercise.

gif;base64,R0lGODlhBgAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhBAAFAHcAMSH+GlNvZnR3Y 2Q==

gif;base64,R0lGODlhAwAFAHcAMSH+GlNvZnR3Y AVC3blgAAOw== Question: What is your interpretation of this scatter plot?

Recall: Questions

gif;base64,R0lGODlhCwALAHcAMSH+GlNvZnR3Y 1Is exposure time a decreasing function of immunologic status?

gif;base64,R0lGODlhCwALAHcAMSH+GlNvZnR3Y 2If so, at what rate does exposure time decrease with increasing immunologic status?

Note: A complete answer requires testing the null hypothesis

AVC3blgAAOw== H0: β = 0 against the one-sided alternative hypothesis

gif;base64,R0lGODlhBgAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhBAAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhAwAFAHcAMSH+GlNvZnR3Y Ha: β 0

Step 1. Construct an Analysis of Variance table

Step 2. Estimate the variance σ2of the errors εi

yi= α + βxi+ εi

Step 3. Compute the standard error of the estimated slope b

The total variation in the dependent variable is measured by the

Total Sum of Squares:

gif;base64,R0lGODlhCwACAHcAMSH+GlNvZnR3Y Total SS = X (yi− y)2= Lyy

i=1

The total sum of squares may be partitioned into two terms:

n n n

gif;base64,R0lGODlhCAAHAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhCgACAHcAMSH+GlNvZnR3Y X(yi− y)2=X(yi− y)2+X(yi− yi)2

i=1

gif;base64,R0lGODlhHgADAHcAMSH+GlNvZnR3Y i=1

gif;base64,R0lGODlhBgAGAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhBAAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhAwAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhHAADAHcAMSH+GlNvZnR3Y |Mod{ezl SS }

|Resid{uzal SS }

The regression sum of squares

Model SS = X(b − y) =

i 1

gif;base64,R0lGODlhFgACAHcAMSH+GlNvZnR3Y Lxx

measures the variation of the estimated regression line about the sample mean.

The residual sum of squares

Residual SS = X(yi− b )

= Lyy−

i=1

yi2

gif;base64,R0lGODlhFgACAHcAMSH+GlNvZnR3Y Lxx

gif;base64,R0lGODlhBwAHAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhBwAGAHcAMSH+GlNvZnR3Y

gif;base64,R0lGODlhBgAGAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhBAAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhAwAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhCAAHAHcAMSH+GlNvZnR3Y measures the variation of the data about the estimated regression line.

ANOVA

Source d.f. SS MS F

gif;base64,R0lGODlhQgECAHcAMSH+GlNvZnR3Y L2

Model 1

Lxx

Model SS

MS Model

MS Res

Residual n − 2 Lyy− xy

Res SS

gif;base64,R0lGODlhBwAHAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhBwAGAHcAMSH+GlNvZnR3Y Lxx n−2

Total n − 1 Lyy

Here, we have two sources of variation, that due to regression and that due to error.

We have a total of n − 1 degrees of freedom. The regression has only one d.f. This leaves n − 2 d.f. for the residuals.

MS is the Mean Square. Mean Squares are obtained by dividing the sum of squares term by its d.f.

gif;base64,R0lGODlhBgAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhBAAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhAwAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhCAAHAHcAMSH+GlNvZnR3Y The F-statistic is obtained by dividing the MS Model by MS Residual. This may be used to perform a two-sided test of H0: β = 0.

Step 2: Estimate σ2

The variance of the errors σ2may then be estimated by the Mean

Square Residual

σ2

Example: AIDS Data

Res SS

n − 2

L2 yy Lxxn − 2

σ2=

20.2

gif;base64,R0lGODlhBgAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhBAAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhAwAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhCAAHAHcAMSH+GlNvZnR3Y = 2.52

Step 3: Compute the Standard Error of the Estimated

Slope

gif;base64,R0lGODlhCAAHAHcAMSH+GlNvZnR3Y Note: Since the estimated slope b is a function of our random data, b is also random variable. So, b has a mean and variance. Under our model assumptions:

gif;base64,R0lGODlhBwAGAHcAMSH+GlNvZnR3Y E (b) = β

gif;base64,R0lGODlhBwAGAHcAMSH+GlNvZnR3Y

var(b)= σ2

i=1(xi−x)

σ2

Lxx

σ2 σ2

gif;base64,R0lGODlhBwAGAHcAMSH+GlNvZnR3Y var (b) = Pn

b 2=Lb

ci=1(xi−x)xx

gif;base64,R0lGODlhFAACAHcAMSH+GlNvZnR3Y varqb2

gif;base64,R0lGODlhBwAGAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhKgACAHcAMSH+GlNvZnR3Y SE (b) = pc (b) =

gif;base64,R0lGODlhBgAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhBAAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhAwAFAHcAMSH+GlNvZnR3Y Lxx

Step 4: Compute the Test Statistic and Carry out the Test

Consider test H0: β = 0 against Ha: β 0 Compute the test statistic:

gif;base64,R0lGODlhKQACAHcAMSH+GlNvZnR3Y b t =

SE(b)

Under H0, t is t-distributed with n − 2 d.f. Since we carrying out a one-tailed test, reject H0at level α if

|t| > tn−2,1−αand b 0

Notes:

gif;base64,R0lGODlhBwAHAHcAMSH+GlNvZnR3Y To test H0: β = 0 against Ha: β > 0, reject H0at level α if

|t| > tn−2,1−αand b > 0

gif;base64,R0lGODlhCAAHAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhBwAHAHcAMSH+GlNvZnR3Y To test H0: β = 0 against Ha: β = 0, reject H0at level α if

gif;base64,R0lGODlhBgAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhBAAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhAwAFAHcAMSH+GlNvZnR3Y |t|> tn−2,1−α/2

Note: A complete conclusion should have

gif;base64,R0lGODlhBwAGAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhBwAHAHcAMSH+GlNvZnR3Y Indication of the direction of the effect, when significant. Evidence including the test statistic, d.f., and p-value.

Should also indicate

gif;base64,R0lGODlhBgAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhBAAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhAwAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhBwAGAHcAMSH+GlNvZnR3Y Magnitude of the effect using the estimated slope.

gif;base64,R0lGODlhCAAHAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhBwAGAHcAMSH+GlNvZnR3Y An assessment of the uncertainty of our estimate by including either the standard error or a 95% confidence interval.

The ANOVA table

ANOVA

Source d.f. SS MS F

gif;base64,R0lGODlhLAECAHcAMSH+GlNvZnR3Y L2

Model 1

Lxx

Reg SS

MS Reg

MS Res

Residual n − 2 Lyy− xy

Res SS

Lxx n−1

Total n − 1 Lyy

may be used to test

H0: β = 0

against the two-sided alternative

gif;base64,R0lGODlhBgAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhBAAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhAwAFAHcAMSH+GlNvZnR3Y AVC3blgAAOw== Ha: β = 0

Under H0

gif;base64,R0lGODlhQQACAHcAMSH+GlNvZnR3Y MS Model

F =

MS Res

is F -distributed with 1 and n − 2 degrees of freedom. Here, we have two sets of degrees of freedom:

gif;base64,R0lGODlhBwAHAHcAMSH+GlNvZnR3Y The numerator degrees of freedom is equal to 1

gif;base64,R0lGODlhBwAHAHcAMSH+GlNvZnR3Y The denominator degrees of freedom is equal to n − 2

gif;base64,R0lGODlhCAAHAHcAMSH+GlNvZnR3Y Both of these can be read directly from the ANOVA table. Reject H0at level α if

gif;base64,R0lGODlhBgAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhBAAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhAwAFAHcAMSH+GlNvZnR3Y F > F1,n−2,1−α

Question: How well does the linear regression model fit the data? Consider the Analysis of Variance table:

ANOVA

xQAOw== Source d.f. SS MS F Regression 1 SS Regression MS Regression F Residual n − 2 SS Residual MSE

gif;base64,R0lGODlhawECAHcAMSH+GlNvZnR3Y Total n − 1 Total SS

Definition: The coefficient of determination is

gif;base64,R0lGODlhCAAIAHcAMSH+GlNvZnR3Y r2=SS Regression

gif;base64,R0lGODlhBgAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhBAAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhAwAFAHcAMSH+GlNvZnR3Y Total SS

(1 − α) × 100% Confidence Interval for β

Confidence intervals are often more easily interpreted than standard errors. They give a range of plausible values for the parameter of interest.

gif;base64,R0lGODlhCAAIAHcAMSH+GlNvZnR3Y A (1 − α) × 100% confidence interval for the slope β is given by

gif;base64,R0lGODlhBgAGAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhBAAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhAwAFAHcAMSH+GlNvZnR3Y b ± tn−2,1−α/2× SE (b)

(1 − α) × 100% Confidence Interval for α

We can also compute a confidence interval for the intercept α. First, we need an expression for the standard error of the estimated intercept:

gif;base64,R0lGODlhXQACAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhCgACAHcAMSH+GlNvZnR3Y s1

σ2

SE (a) = b

n Lxx

gif;base64,R0lGODlhCAAIAHcAMSH+GlNvZnR3Y Then a (1 − α) × 100% confidence interval for α is given by

gif;base64,R0lGODlhBgAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhBAAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhAwAFAHcAMSH+GlNvZnR3Y a ± tn−2,1−α/2× SE (a)

Prediction

Here, y may be used to:

y = a + bx

0JADs= Estimate the mean of the dependent variable y for a population of subjects sharing a common level of the independent variable x (public health setting).

gif;base64,R0lGODlhCgACAHcAMSH+GlNvZnR3Y v

SE (b) = ub

1 (x − x)2!

gif;base64,R0lGODlhNgACAHcAMSH+GlNvZnR3Y +

y σ2

gif;base64,R0lGODlhCgACAHcAMSH+GlNvZnR3Y n

Lxx

AVC3blgAAOw== gif;base64,R0lGODlhBwAGAHcAMSH+GlNvZnR3Y Estimate the value of the dependent variable y for a subject whose value of the independent variable is x (medical setting).

gif;base64,R0lGODlhCgACAHcAMSH+GlNvZnR3Y v

y) = ub

gif;base64,R0lGODlhBgAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhBAAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhAwAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhCgACAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhNQACAHcAMSH+GlNvZnR3Y 1 + +

(x − x)2!

SE (b

tσ2

n Lxx

Estimate the mean of the dependent variable y for a population of subjects sharing a common level of the independent variable x.

y) = ub

1 (x − x)2!

gif;base64,R0lGODlhNgACAHcAMSH+GlNvZnR3Y +

gif;base64,R0lGODlhCgACAHcAMSH+GlNvZnR3Y SE (b

σ2

Lxx

gif;base64,R0lGODlhCgACAHcAMSH+GlNvZnR3Y v

gif;base64,R0lGODlhCgACAHcAMSH+GlNvZnR3Y u1

(x − x)2!

y ± tn−2,1−α/2

uσ2

b × tb

n Lxx

Estimate the value of the dependent variable y for a subject whose value of the independent variable is x.

gif;base64,R0lGODlhCgACAHcAMSH+GlNvZnR3Y v

SE (b) = ub

gif;base64,R0lGODlhCgACAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhNQACAHcAMSH+GlNvZnR3Y 1 + +

(x − x)2!

y tσ2

n Lxx

JGqCDAEAOw== gif;base64,R0lGODlhCgACAHcAMSH+GlNvZnR3Y u1 (x − x)2!

y ± tn−2,1−α/2

uσ2

gif;base64,R0lGODlhBwAHAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhBwAGAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhGgAIAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhCAAHAHcAMSH+GlNvZnR3Y b × tb

1 + +

gif;base64,R0lGODlhBgAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhBAAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhAwAFAHcAMSH+GlNvZnR3Y n

gif;base64,R0lGODlhFgAHAHcAMSH+GlNvZnR3Y Lxx

gif;base64,R0lGODlhGgAIAHcAMSH+GlNvZnR3Y Interpretation and plot

JGqCDAEAOw== AVC3blgAAOw== 2Q==

gif;base64,R0lGODlhBwAGAHcAMSH+GlNvZnR3Y 95% confidence interval: for mean

gif;base64,R0lGODlhBgAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhBAAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhAwAFAHcAMSH+GlNvZnR3Y gif;base64,R0lGODlhFgAHAHcAMSH+GlNvZnR3Y 7ko50oASydei5VI5lMHCWuEAA7 gif;base64,R0lGODlhBwAGAHcAMSH+GlNvZnR3Y 95% prediction interval: for individual

Simple Linear Regression

Recent Notes

Subjects

Publicidad