Author Topic: Need help understanding the proof of Ordinary Least Squares (Read 978 times)

engineheat · « **on:** January 04, 2020, 10:06:57 pm »

Hi, In this page:
https://en.wikipedia.org/wiki/Proofs_involving_ordinary_least_squares

section " Derivation directly in terms of matrices",

they differentiated S(beta) with respect to the beta vector to get the normal equation.

I'm really having a hard time following that step since matrix calculus is not my strong suit.

Can someone explain in more detail? thanks

GlennSprigg · « **Reply #1 on:** January 05, 2020, 02:00:24 pm »

Sorry mate... I had a look at your link like a lot of the 65 other readers did...
And had a brain-fart ! sorry... Given time, we may understand!

Nominal Animal · « **Reply #2 on:** January 05, 2020, 05:57:37 pm »

Okay, I'll have a go. I'll use bold for matrices (all except the function to be minimized!), and \$\mathbf{b}\$ instead of \$\mathbf{\beta}\$ for clarity (as \$\mathbf{\beta}\$ does not look boldfaced).

The objective is to minimize
$$S(\mathbf{b}) = \mathbf{y}^T \mathbf{y} - 2 \mathbf{b}^T \mathbf{X}^T \mathbf{y} + \mathbf{b}^T \mathbf{X}^T \mathbf{X} \mathbf{b}$$
where \$\mathbf{b}\$ and \$\mathbf{y}\$ are one-column \$N\$-row matrices (i.e., column vectors) and \$\mathbf{X}\$ is an \$N \times N\$ square matrix. The \${}^T\$ indicates matrix transpose.

Expanding \$S(\mathbf{b})\$ we get
$$S(\mathbf{b}) = \left( \sum_i y_i^2 \right) - 2 \left( \sum_j \sum_i b_i x_{j i} y_j \right) + \left( \sum_j \sum_i \sum_k b_i x_{k i} x_{j k} b_j \right )$$
You can verify this easily by expanding the expressions for \$N=3\$ and \$N=4\$.

Although the argument \$\mathbf{b}\$ is a vector, the value of the function \$S(\mathbf{b})\$ is a scalar. As usual, it reaches an extremum (minimum or maximum) whenever its derivative is zero.

The derivative of scalar \$S\$ by vector \$\mathbf{b}\$ is a vector. Each component of the derivative is the partial derivative of the original scalar with respect to the corresponding vector component. Partial derivative is calculated by treating that particular component as a variable, and all other components constants.

The partial differentials are
$$\frac{\partial S}{\partial b_i} = \left( 0 \right) - 2 \left( \sum_j x_{j i} y_j \right) + \left( 2 \sum_k x_{i k} x_{k i} b_i \right)$$
(Again, \$b_i\$ is the variable, and all other \$b_j, j \ne i\$ are considered constants.)

At the extremum, all \$N\$ partial differentials are zero:
$$- 2 \left( \sum_j x_{i j} y_j \right) + \left( 2 \sum_k x_{i k} x_{k i} b_i \right) = 0$$
We can obviously halve each side, getting
$$- \left( \sum_j x_{i j} y_j \right) + \left( \sum_k x_{i k} x_{k i} b_i \right) = 0$$
Do realize that this is \$N\$ equations.

We can trivially write this in matrix form,
$$- \mathbf{X}^T \mathbf{y} + (\mathbf{X}^T \mathbf{X}) \mathbf{b} = \mathbf{0}$$
where one must realize that the right side is a zero matrix, not just a scalar zero.

German_EE · « **Reply #3 on:** January 05, 2020, 07:47:07 pm »

And in other news, we have more pictures of cats:

https://www.eevblog.com/forum/chat/post-a-picture-of-a-cat!/575/

engineheat · « **Reply #4 on:** January 05, 2020, 09:43:47 pm »

Quote from: Nominal Animal on January 05, 2020, 05:57:37 pm

Okay, I'll have a go. I'll use bold for matrices (all except the function to be minimized!), and \$\mathbf{b}\$ instead of \$\mathbf{\beta}\$ for clarity (as \$\mathbf{\beta}\$ does not look boldfaced).

The objective is to minimize
$$S(\mathbf{b}) = \mathbf{y}^T \mathbf{y} - 2 \mathbf{b}^T \mathbf{X}^T \mathbf{y} + \mathbf{b}^T \mathbf{X}^T \mathbf{X} \mathbf{b}$$
where \$\mathbf{b}\$ and \$\mathbf{y}\$ are one-column \$N\$-row matrices (i.e., column vectors) and \$\mathbf{X}\$ is an \$N \times N\$ square matrix. The \${}^T\$ indicates matrix transpose.

Expanding \$S(\mathbf{b})\$ we get
$$S(\mathbf{b}) = \left( \sum_i y_i^2 \right) - 2 \left( \sum_j \sum_i b_i x_{j i} y_j \right) + \left( \sum_j \sum_i \sum_k b_i x_{k i} x_{j k} b_j \right )$$
You can verify this easily by expanding the expressions for \$N=3\$ and \$N=4\$.

Although the argument \$\mathbf{b}\$ is a vector, the value of the function \$S(\mathbf{b})\$ is a scalar. As usual, it reaches an extremum (minimum or maximum) whenever its derivative is zero.

The derivative of scalar \$S\$ by vector \$\mathbf{b}\$ is a vector. Each component of the derivative is the partial derivative of the original scalar with respect to the corresponding vector component. Partial derivative is calculated by treating that particular component as a variable, and all other components constants.

The partial differentials are
$$\frac{\partial S}{\partial b_i} = \left( 0 \right) - 2 \left( \sum_j x_{j i} y_j \right) + \left( 2 \sum_k x_{i k} x_{k i} b_i \right)$$
(Again, \$b_i\$ is the variable, and all other \$b_j, j \ne i\$ are considered constants.)

At the extremum, all \$N\$ partial differentials are zero:
$$- 2 \left( \sum_j x_{i j} y_j \right) + \left( 2 \sum_k x_{i k} x_{k i} b_i \right) = 0$$
We can obviously halve each side, getting
$$- \left( \sum_j x_{i j} y_j \right) + \left( \sum_k x_{i k} x_{k i} b_i \right) = 0$$
Do realize that this is \$N\$ equations.

We can trivially write this in matrix form,
$$- \mathbf{X}^t \mathbf{y} + (\mathbf{X}^T \mathbf{X}) \mathbf{b} = \mathbf{0}$$
where one must realize that the right side is a zero matrix, not just a scalar zero.

Wow, thanks.
I printed this out and will digest it slowly...

Nominal Animal · « **Reply #5 on:** January 06, 2020, 04:01:35 am »

Quote from: engineheat on January 05, 2020, 09:43:47 pm

I printed this out and will digest it slowly...

Let me know if you have any questions. (My edit after your response only fixed an insignificant typo, changing the \${}^t\$ to \${}^T\$.)

I used \$X_{j i}\$ for the component of \$X\$ on row \$j\$, column \$i\$, as usual; for the vectors, the only subscript indicates the component; both numbered from \$1\$ to \$N\$, inclusive.

I'm not a mathematician, so the way I usually do matrix calculus is by expanding it to scalar expressions with subscript indexes to denote the components; essentially, I treat the vector/matrix expressions as a set of equations on scalars, because those are so much easier to work with (for me!). After the operations, the set of equations can almost always be expressed in concise vector/matrix form. Those who do this sort of thing often enough know (or memorize, or have references listing the most common ones), so they can skip the entire split to scalar equations, operate, then combine back to vector/matrix form -steps.

It is extremely common for mathematicians to skip these steps, probably because they see it as "mechanical work", but to me, for the "math is a tool for solving problems" aspect, the way to do such steps is much more important than any single particular result (which the mathematicians seem so very excited about, usually).


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

EEVblog Electronics Community Forum

Author Topic: Need help understanding the proof of Ordinary Least Squares (Read 978 times)

engineheat

Need help understanding the proof of Ordinary Least Squares

GlennSprigg

Re: Need help understanding the proof of Ordinary Least Squares

Nominal Animal

Re: Need help understanding the proof of Ordinary Least Squares

German_EE

Re: Need help understanding the proof of Ordinary Least Squares

engineheat

Re: Need help understanding the proof of Ordinary Least Squares

Nominal Animal

Re: Need help understanding the proof of Ordinary Least Squares

Share me