A tangent kernel for a function , where and . A tanget kernel is a matrix is:
Essential, it is gradient of the function at multiplied with its transpose.
One important conclusion is if we can get the minimum eigen-value of the tangent kernel is greater than , then we satisfy the -strong Polyak-Lojasiewicz inequality
Proof:
Let us assume on
Let us assume that our loss function expressed as the squared norm of the difference between a function and some target .
Let us consider the gradient of the loss function( derivative of the loss function with respect to the weights)
We know that:
So, we can rewrite the above equation as:
Therefore the gradiant of the loss function is the tangent kernel multiplied by the loss function.
If we use the initial conditions, we can rewrite the above equation as:
If we see the kernel function, the smallest eigenvalue of the kernel function is . If , then we have the -strong Polyak-Lojasiewicz inequality. This translates to high convergence of the optimization algorithm.
This is a very important result as it shows that the minimum eigenvalue of the tangent kernel is a key factor in the convergence of the optimization algorithm. If the minimum eigenvalue of the tangent kernel is greater than a certain threshold, then the optimization algorithm will converge exponentially fast. This is a very useful result in practice, as it allows us to analyze the convergence properties of optimization algorithms and design algorithms that converge faster.
Now a natural question is how we measure the convergence. Read more about in condition number