Let’s take a moment and understand what is happening here. Let’s consider the first element of the derivative vector, . It has a sign and it has a magnitude!
Let’s look at the magnitude first:
The magnitude of shows the proportion of change in over , if we increase the current value of a tiny bit! So, the higher this value, the more of a change in will we observe, given a little increase in the current value of .
What about the sign of this gradient?
If , then it means that if we keep all the other weights constant, we will have to increase to have moved along the direction of the steepest increase in , as far as is concerned! Similarly, if , then it means that if we keep all the other weights constant, we will have to decrease to have moved along the direction of the steepest increase in , as far as is concerned.
So, if we know the direction and magnitude of change for every weight (i.e., increasing or decreasing them), using the gradient, we have moved towards the direction of the steepest ascent on the error surface, as far as (pay attention!!!) ALL OF OUR WEIGHTS, are concerned!
So what is the direction of steepest descent then? Of course, the negated direction of the gradient. So, the polar opposite of that!
Pingback: Deriving the Gradient Descent Rule (PART-2) – ML-DAWN