Linear Regression EBook

On non Internet Explorer browsers, the equations may not show up.

As an alternative or if you prefer,

you can see the pdf version of the ebook instead

Title

An interactive e-book for illustrating linear regression

Creator

Autar K Kaw

Subject and Keywords

Linear Regression, Regression, Mathcad, Maple, Mathematica, Matlab, Simulations.

Description

This is an interactive E-book for illustrating linear regression.  It includes links to examples, simulations in Mathcad, Maple, Mathematica, and Matlab for the algorithm, and a PowerPoint presentation.

Publisher

Holistic Numerical Methods Institute,

College of Engineering,

University of South Florida, Tampa, FL 33620-5350.

Contributors

Autar Kaw, Egwu Kalu

Format

Text/HTML

Last Revised

October 3, 2007

Identifier

http://numericalmethods.eng.usf.edu/ebooks/straightline_06reg_ebook.pdf

Language

English

Rights

http://numericalmethods.eng.usf.edu/rights.htm

 

Table of Contents

Background

What is Linear Regression?

Why minimize the sum of the square of the residuals?

Method

Least Squares

Example

Example 1: Finding the torsional stiffness of a mousetrap spring

Example 2: Finding the longitudinal Young’s of a unidirectional composite

Presentation

Power Point Presentation

Simulation

             Simulation of Linear Regression [MAPLE]  [MATHCAD]  [MATHEMATICA]  [MATLAB]

 

What is Linear Regression?

Linear regression is the most popular regression model.  In this model we wish to predict response to n data points (x1,y1), (x2,y2), ....., (xn, yn) data by a regression model given by

                                                                                                                         (1)

where a0 and a1 are the constants of the regression model.

            A measure of goodness of fit, that is, how  predicts the response variable y is the magnitude of the residual, at each of the n data points.

                                                                                                           (2)

Ideally, if all the residuals  are zero, one may have found an equation in which all the points lie on the model.  Thus, minimization of the residual is an objective of obtaining regression coefficients. 

            The most popular method to minimize the residual is the least squares methods, where the estimates of the constants of the models are chosen such that the sum of the squared residuals is minimized, that is minimize . 

Back to TOC

Why minimize the sum of the square of the residuals?

Why not, for instance, minimize the sum of the residual errors or the sum of the absolute values of the residuals?  Alternatively, constants of the model can be chosen such that the average residual is zero without making individual residuals small.  Will any of these criteria yield unbiased parameters with the smallest variance?  All of these questions will be answered below.  Look at the data in Table 1.

 

Table 1   Data points.

x

y

2.0

4.0

3.0

6.0

2.0

6.0

3.0

8.0

 

To explain this data by a straight line regression model,

                                                                                                            (3)

and using minimizing as a criteria to find ao and a1, we find that for (Figure 1)

                                                                                                                (4)

y =4x - 4

Figure 1 Regression curve y = 4x – 4 for y vs. x data.

 

The sum of the residuals, as shown in the Table 2.

Table 2 The residuals at each data point for regression model

x

y

ypredicted

ε = y - ypredicted

2.0

4.0

4.0

0.0

3.0

6.0

8.0

-2.0

2.0

6.0

4.0

2.0

3.0

8.0

8.0

0.0

 

 

So does this give us the smallest error? It does as . But it does not give unique values for the parameters of the model. A straight-line of the model

                                                                                                                    (5)

y = 6

Figure 2 Regression curve y = 6 for y vs. x data.

 

also makes as shown in the Table 3.

 

Table 3.  The residuals at each data point for regression model

x

y

ypredicted

ε = y - ypredicted

2.0

4.0

6.0

-2.0

3.0

6.0

6.0

0.0

2.0

6.0

6.0

0.0

3.0

8.0

6.0

2.0

 

 

Since this criterion does not give unique regression model, it cannot be used for finding the regression coefficients. Let us see why we cannot use this criterion for any general data.  We want to minimize

                                                                                 (6) Differentiating Equation (6) with respect to a0 and a1, we get

                                                                                      (7)

 

                                                                                  (8)

Putting these equations to zero, give n= 0 but that is not possible.  Therefore, unique values of a0 and a1 do not exist.

You may think that the reason the minimization criterion does not work is that negative residuals cancel with positive residuals.  So is minimizing criterion may be better?  Let us look at the data given in the Table 2 for equation .  It makes  as shown in the following table.

Table 4   The absolute residuals at each data point when employing

x

y

ypredicted

|ε| = |y - ypredicted|

2.0

4.0

4.0

0.0

3.0

6.0

8.0

2.0

2.0

6.0

4.0

2.0

3.0

8.0

8.0

0.0

 

The value of  also exists for the straight line model y = 6. No other straight line for this data has .  Again, we find the regression coefficients are not unique, and hence this criterion also cannot be used for finding the regression model.

Back to TOC

Least Squares

Let us use the least squares criterion where we minimize

                                                                      (9)

Sr is called the sum of the square of the residuals.

 

x

y

Figure 3.  Linear regression of y vs. x data showing residuals at a typical point, xi.

 

To find a0 and a1, we minimize Sr with respect to a0 and a1.

                                                             (10)

                                                           (11)

giving

                                                                          (12)

                                                                (13)

Noting that

                                                                                     (14)

                                                                         (15)

Solving the above Equations (14) and (15) gives

                                                                           (16)

                                                                   (17)

Redefining

                                                                                    (18)

                                                                                         (19)

                                                                                                       (20)

                                                                                                      (21)

we can rewrite

                                                                                                        (22)

                                                                                                  (23)

 

Simulation of Linear Regression [MAPLE]  [MATHCAD]  [MATHEMATICA]  [MATLAB]

Power Point Presentation

Back to TOC

Example 1

The torque, T needed to turn the torsional spring of a mousetrap through an angle,  is given below

 

Table 5 Torque versus angle for a torsion spring.

Angle,

Torque, T

Radians

N-m

0.698132