The Multiple Linear Regression Model is a generalization of the Simple Linear Regression Model that was described in the previous sections.
The mathematical treatment of the Simple Linear Regression Model was based upon “standard” algebra. Due to the complexity of the Multiple Regression Model, it is necessary to make use of Elementary Matrix Algebra (cfr. Appendix D) such that the formal derivations remain compact - hence more “readable”.
The multiple regression model is explained in the context of two statistical approaches:
the Ordinary Least Squares approach to multiple regression
the Maximum Likelihood Estimation approach to multiple regression
Terminology note: “exogenous/endogenous” in this chapter is equivalent to “predictor/response” used in the later applied chapters.
135.1 Ordinary Least Squares for Multiple Linear Regression
135.1.1 Model
The MLRM can be described in matrix notation as
\[
y = X \beta + e
\]
where \(y\) is a stochastic \(n \times 1\) vector, \(X\) is a deterministic (exogenous) \(n \times k\) matrix, \(\beta\) is a \(k \times 1\) vector of invariant parameters to be estimated by OLS, \(e\) is a \(n \times 1\) disturbance vector, \(n\) is the number of observations in the sample, and \(k\) is the number of exogenous variables used in the right hand side of the equation. Note: the constant term is coded by a column in matrix \(X\) which only contains the value 1.
which is equivalent to the assumptions made in the SLRM.
135.1.2 Estimator
The OLS estimator minimizes \(e'e\) which represents the sum of squared residuals (SSR) and attempts to find a solution for \(\hat{\beta} = b\) which is the estimate of \(\beta\).
Solving the so-called “normal” equations \(X'Xb = X'y\) with respect to \(b\) results in
\[
b = (X'X)^{-1}X'y
\]
where \(X'X\) must be a non singular symmetric \(k \times k\) matrix.
The same result can be found without the need of a derivative by simply applying elementary matrix algebra to the formulation of the MLRM without error term
\[
\begin{align*}y &= X b \\X'y &= X'X b \\(X'X)^{-1}X'y &= (X'X)^{-1}X'X b \\(X'X)^{-1}X'y &= b\end{align*}
\]
135.1.3 Unbiasedness of \(b\)
The OLS estimator for the MLRM is unbiased
\[
\text{E}(b) = \beta
\]
since E\((X'e) = 0\) by assumption (\(X\) is exogenous). Note: if \(X\) is not assumed to be exogenous (i.e. \(X\) is of a stochastic/probabilistic nature) then the property of unbiasedness, for small samples, only holds if the exogenous variables are not correlated with the error term.
The covariance matrix of the parameters is obtain by the following expression
\[
\begin{align*}\text{E} \left[ (b - \text{E}(b)) (b - \text{E}(b))' \right] &= … \\&= \text{E} \left[ (X'X)^{-1} X' e e' X (X'X)^{-1} \right] \\&\Rightarrow \\\Sigma_b &= \sigma^2 (X'X)^{-1}\end{align*}
\]
has a parameter covariance matrix which is at least as large as the covariance matrix of the OLS parameters, i.e.
\[
\Sigma_{\tilde{\beta}} - \Sigma_b \text{ is positive semi definite}
\]
Therefore, this important theorem proves that the OLS estimator is a Best Linear Unbiased Estimator (BLUE). In other words, no linear unbiased estimator has a lower covariance matrix than OLS (provided all assumptions are satisfied).
If \(D^*\) is a \(k \times n\) matrix which is independent from \(y\) and if
\[
\tilde{b} = D^* y
\]
then the parameter vector is by definition a linear estimator, and if
From this result it follows that the parameter vector can only be unbiased if \(DX = 0\). The condition E\((D^*e)=0\) is implied by E\((e)=0\) for fixed \(D^*\) and is therefore not specific to OLS.
Moreover, the OLS estimator also has a minimum covariance matrix because
The parameters of the MLRM and the SLRM are mathematically related to each other. In this context it is important to note that if all exogenous variables are independent (orthogonal or uncorrelated), there is no difference between the multiple and simple regression coefficients.
If the OLS assumptions are satisfied it follows that
\[
\forall j: \text{E}(x_j'e) = 0
\]
Hence we can combine the above results to obtain
\[
\forall j: b_j = (x_j'x_j)^{-1}x_j'y
\]
In practice, however, the exogenous variables are rarely orthogonal. This implies that, in practice, the multiple regression parameters differ from the simple regression parameters because they represent the partial effect of each exogenous variable on the endogenous variable (see also Partial Pearson Correlation in Chapter 73).
135.2 Maximum Likelihood Estimation for Multiple Linear Regression
Maximum Likelihood Estimation (MLE) applies when distributional assumptions are imposed, such as
In the Gaussian linear model, the ML estimator of \(\beta\) coincides with OLS and therefore has the same finite-sample BLUE property under Gauss-Markov assumptions. For nonlinear models, MLE is generally justified by consistency and asymptotic efficiency (not by a universal finite-sample “best unbiased” claim).
The ML estimator for the Variance parameter, however, is based
This expression can be shown to be true under the so-called regularity condition which implies that the information matrix times \(1/n\) converges to a positive definite matrix in the limit
\[
\lim\limits_{n \rightarrow \infty} n^{-1} I(\theta) = I A(\theta)
\]
and from the derivation (i.e. proof) of the Cramér-Rao lower bound (Cramér 1946; Rao 1945), it follows that each of the \(n\) observations has a zero expected value and finite variance. Hence,
\[
\text{D} \ln L(\theta) \sim (0,I(\theta))
\]
Due to the regularity conditions and the central limit theorem, for the single-parameter case (\(k=1\)) it can be shown that
Estimate lm(mpg ~ wt + hp + am, data = mtcars) and interpret each coefficient in substantive terms.
Add an interaction term (wt:am) and test whether it improves the model with a nested F-test.
Compare two models using AIC and adjusted \(R^2\). Discuss whether both criteria select the same model.
For practical diagnostics (residual analysis, heteroskedasticity testing, multicollinearity/VIF, and interaction effects), see Chapter 143.
Cramér, Harald. 1946. Mathematical Methods of Statistics. Princeton Mathematical Series 9. Princeton: Princeton University Press.
Rao, Calyampudi Radhakrishna. 1945. “Information and the Accuracy Attainable in the Estimation of Statistical Parameters.”Bulletin of the Calcutta Mathematical Society 37: 81–91.