Statistical Methods for Economists Lecture 4 Multiple Linear Regression •David Bartl •Statistical Methods for Economists •INM/BASTE Outline of the lecture •Introduction: Simple Linear Regression & Least Squares Method •Multiple Linear Regression: Introduction •Multiple Linear Regression: Summary & Background •The Classical Assumptions •The Coefficient of Determination (R2) •Further Theorems, Tests of Hypotheses and Confidence Intervals •Two-sample t-test for the difference of the population means // σX=σY •Simple linear regression without the intercept term Introduction •Simple Linear Regression •Motivation •Example •Least Squares Method •Generalization •Multiple Linear Regression: Introduction •Multiple Linear Regression: Notation • Simple Linear Regression: Motivation Simple Linear Regression: Motivation Simple Linear Regression: Motivation Simple Linear Regression: Example Simple Linear Regression: Example 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 Simple Linear Regression: Least Squares Method Simple Linear Regression: Least Squares Method Simple Linear Regression: Least Squares Method the normal equation Simple Linear Regression: Least Squares Method Simple Linear Regression: Generalization Simple Linear Regression: Generalization • •We shall now study Multiple Linear Regression Multiple Linear Regression: Introduction Multiple Linear Regression: Introduction Multiple Linear Regression: Notation Multiple Linear Regression: Notation Multiple Linear Regression: Notation Multiple Linear Regression: Notation Multiple Linear Regression: Notation Multiple Linear Regression: Notation Random vectors •Random variable •Random vector •Mean value •Variance-covariance matrix •Uncorrelated random variables •Independent random variables • Random variable Random vector Random vector Random vector: Expected value Random variables: Variance and Covariance Random vector: Variance-covariance matrix Random vector: Uncorrelated random variables Random vector: Independent events Random vector: Independent random variables Random vector: Independent random variables Multivariate normal distribution • • Normal distribution Normal distribution Normal distribution Variance-covariance matrix Variance-covariance matrix: A decomposition Standard multivariate normal distribution (of dim. k ≥ 1) Standard multivariate normal distribution (of dim. k ≥ 1) Multivariate normal distribution Multivariate normal distribution: Density Multivariate normal distribution: Another definition Multivariate normal distribution: Linear transformation Multivariate normal distribution: Theorem Multiple Linear Regression: Summary & Background •Summary •Terminology •Assumptions •Random vectors •The classical assumptions •Notation • Multiple Linear Regression: Summary Multiple Linear Regression: Summary Multiple Linear Regression: Summary Multiple Linear Regression: Summary Multiple Linear Regression: Terminology Regressand Predicand Explained variable Dependent variable Endogenous variable Controlled variable Response Outcome Predicted variable Measured variable Regressors Predictors Explanatory variables Independent variables Exogenous variables Control variables Stimuli Covariates Parameters Regression coefficients Deviation Error term Disturbance Noise Multiple Linear Regression: Terminology Regressand Predicand Explained variable Dependent variable Endogenous variable Controlled variable Response Outcome Predicted variable Measured variable Regressors Predictors Explanatory variables Independent variables Exogenous variables Control variables Stimuli Covariates Parameters Regression coefficients Deviation Error term Disturbance Noise The intercept term Multiple Linear Regression: Assumptions Multiple Linear Regression: Assumptions Multiple Linear Regression: Random vectors Multiple Linear Regression: Random vectors Multiple Linear Regression: Random vectors Multiple Linear Regression: The Classical Assumptions Multiple Linear Regression: The Classical Assumptions Multiple Linear Regression: The Classical Assumptions Multiple Linear Regression: The Classical Assumptions homoscedasticity, i.e. the variance is the same Multiple Linear Regression: The Classical Assumptions • Multiple Linear Regression: The Classical Assumptions linearity Multiple Linear Regression: Notation Multiple Linear Regression: Notation Basic Results (Theorems) • Multiple Linear Regression: The Normal Equation Multiple Linear Regression: The Normal Equation Multiple Linear Regression: The Normal Equation Multiple Linear Regression: The Normal Equation Multiple Linear Regression: The Normal Equation NO (perfect) multicollinearity Multiple Linear Regression: The Normal Equation Multiple Linear Regression: the predicted values Multiple Linear Regression: the predicted values Multiple Linear Regression: the predicted values Multiple Linear Regression: some properties of H moreover Multiple Linear Regression: some properties of H Multiple Linear Regression: some properties of H 0 (the orthogonal complement = = the space of the residuals) Multiple Linear Regression Multiple Linear Regression: Theorem 1 Multiple Linear Regression: Theorem 1: Corollary Multiple Linear Regression: Theorem 2 Multiple Linear Regression: Theorem 3 Multiple Linear Regression: Theorem 4 Residual Sum of Squares, χ2-test for the variance σ2, and confidence intervals • Multiple Linear Regression: Residual Sum of Squares Multiple Linear Regression: Mean Square Error Multiple Linear Regression: Theorem 5 Multiple Linear Regression: Theorem 5 Test of hypothesis about the variance σ2 χ2-test for the variance σ2 χ2-test for the variance σ2 χ2-test for the variance σ2 χ2-test for the variance σ2 χ2-test for the variance σ2 χ2-test for the variance σ2 Confidence interval for the variance σ2 Confidence interval for the variance σ2 Confidence interval for the variance σ2 Confidence interval for the variance σ2 t-test for a single linear combination of the parameters β0,β1,…,βk — e.g. an individual parameter βj — and confidence interval • Multiple Linear Regression: Theorem 6 Multiple Linear Regression: Prediction (Extrapolation) Multiple Linear Regression: Prediction (Extrapolation) Multiple Linear Regression: Prediction (Extrapolation) Multiple Linear Regression: Prediction (Extrapolation) Multiple Linear Regression: Prediction (Extrapolation) Multiple Linear Regression: Theorem 6: Corollary Tests of hypotheses about the individual parameters βj t-test for the parameter βj // rank(X)=k+1 t-test for the parameter βj // rank(X)=k+1 t-test for the parameter βj // rank(X)=k+1 t-test for the parameter βj // rank(X)=k+1 t-test for the parameter βj // rank(X)=k+1 t-test for the parameter βj // rank(X)=k+1 t-test for the parameter βj // rank(X)=k+1 t-test for the parameter βj // rank(X)=k+1 Confidence interval for the parameter βj // rank(X)=k+1 Confidence interval for the parameter βj // rank(X)=k+1 Confidence interval for the parameter βj // rank(X)=k+1 Confidence interval for the parameter βj // rank(X)=k+1 Confidence interval for the parameter βj // rank(X)=k+1 •¡¡¡ WARNING !!! •Never use the above t-test for the parameters β0,β1,…,βk consecutively! •Never use the above construction of the confidence intervals consecutively! •Use the following result (Theorem 7) instead ! the true confidence region the confidence interval for β1 the confidence interval for β0 F-test for the significance of the model and confidence region & F-test for a system of linear combinations of the parameters β0,β1,…,βk •Theorem 7: • •F-test for the significance of the model •Confidence region •Theorem 8: • • Multiple Linear Regression: Theorem 7 Multiple Linear Regression: Theorem 7* Multiple Linear Regression: Theorem 7*: Corollary Multiple Linear Regression: Theorem 7*: Corollary Multiple Linear Regression: Theorem 7*: Corollary 0 (the orthogonal complement = = the space of the residuals) F-test for the significance of the model // rank(X)=k+1 F-test for the significance of the model // rank(X)=k+1 d.f. F-test for the significance of the model // rank(X)=k+1 F-test for the significance of the model // rank(X)=k+1 Confidence region for the parameters // rank(X)=k+1 Confidence region for the parameters // rank(X)=k+1 the unknown Multiple Linear Regression: Theorem 8 Multiple Linear Regression: Theorem 8: Illustration 0 (the orthogonal complement = = the space of the residuals) subspace of dimension this is an affine subspace of dimension It holds: Multiple Linear Regression: Theorem 8: Remark The Coefficient of Determination (R2) • The Coefficient of Determination (R2): Assumption The Coefficient of Determination (R2): Th. 8: Corollary The Coefficient of Determination (R2): Th. 8: Corollary The Coefficient of Determination (R2): Th. 8: Corollary 0 (the orthogonal complement = = the space of the residuals) subspace of dimension the line is a subspace of dimension By Theorem 8: Let: The Coefficient of Determination (R2): Th. 8: Corollary 0 (the orthogonal complement = = the space of the residuals) subspace of dimension the line is a subspace of dimension The Coefficient of Determination: Let: The Coefficient of Determination (R2): TSS=RSS+RegSS The Coefficient of Determination (R2): TSS=RSS+RegSS 0 (the orthogonal complement = = the space of the residuals) subspace of dimension the line is a subspace of dimension Let: By the Pythagoras Theorem: The Coefficient of Determination (R2): Some facts The Coefficient of Determination (R2): Some facts The Coefficient of Determination (R2): Some facts The Coefficient of Determination (R2): TSS=RSS+RegSS The Coefficient of Determination (R2): Some facts The Coefficient of Determination (R2) The Coefficient of Determination (R2) The Coefficient of Determination (R2): Th. 8: Corollary F-test for the null hypothesis H0: β1=…=βk=0 The Coefficient of Determination (R2) The Coefficient of Determination (R2) The Coefficient of Determination (R2) population (cf. ANOVA)