Okay, I'll have a go. I'll use bold for matrices (all except the function to be minimized!), and \$\mathbf{b}\$ instead of \$\mathbf{\beta}\$ for clarity (as \$\mathbf{\beta}\$ does not look boldfaced).
The objective is to minimize
$$S(\mathbf{b}) = \mathbf{y}^T \mathbf{y} - 2 \mathbf{b}^T \mathbf{X}^T \mathbf{y} + \mathbf{b}^T \mathbf{X}^T \mathbf{X} \mathbf{b}$$
where \$\mathbf{b}\$ and \$\mathbf{y}\$ are one-column \$N\$-row matrices (i.e., column vectors) and \$\mathbf{X}\$ is an \$N \times N\$ square matrix. The \${}^T\$ indicates matrix transpose.
Expanding \$S(\mathbf{b})\$ we get
$$S(\mathbf{b}) = \left( \sum_i y_i^2 \right) - 2 \left( \sum_j \sum_i b_i x_{j i} y_j \right) + \left( \sum_j \sum_i \sum_k b_i x_{k i} x_{j k} b_j \right )$$
You can verify this easily by expanding the expressions for \$N=3\$ and \$N=4\$.
Although the argument \$\mathbf{b}\$ is a vector, the value of the function \$S(\mathbf{b})\$ is a scalar. As usual, it reaches an extremum (minimum or maximum) whenever its derivative is zero.
The derivative of scalar \$S\$ by vector \$\mathbf{b}\$ is a vector. Each component of the derivative is the partial derivative of the original scalar with respect to the corresponding vector component. Partial derivative is calculated by treating that particular component as a variable, and all other components constants.
The partial differentials are
$$\frac{\partial S}{\partial b_i} = \left( 0 \right) - 2 \left( \sum_j x_{j i} y_j \right) + \left( 2 \sum_k x_{i k} x_{k i} b_i \right)$$
(Again, \$b_i\$ is the variable, and all other \$b_j, j \ne i\$ are considered constants.)
At the extremum, all \$N\$ partial differentials are zero:
$$- 2 \left( \sum_j x_{i j} y_j \right) + \left( 2 \sum_k x_{i k} x_{k i} b_i \right) = 0$$
We can obviously halve each side, getting
$$- \left( \sum_j x_{i j} y_j \right) + \left( \sum_k x_{i k} x_{k i} b_i \right) = 0$$
Do realize that this is \$N\$ equations.
We can trivially write this in matrix form,
$$- \mathbf{X}^T \mathbf{y} + (\mathbf{X}^T \mathbf{X}) \mathbf{b} = \mathbf{0}$$
where one must realize that the right side is a zero matrix, not just a scalar zero.