A tangent kernel for a function , where and . A tanget kernel is a matrix is:

Essential, it is gradient of the function at multiplied with its transpose.

One important conclusion is if we can get the minimum eigen-value of the tangent kernel is greater than , then we satisfy the -strong Polyak-Lojasiewicz inequality

Proof:

Let us assume on

Let us assume that our loss function expressed as the squared norm of the difference between a function and some target .

Let us consider the gradient of the loss function( derivative of the loss function with respect to the weights)

We know that:

So, we can rewrite the above equation as:

Therefore the gradiant of the loss function is the tangent kernel multiplied by the loss function.

If we use the initial conditions, we can rewrite the above equation as:

If we see the kernel function, the smallest eigenvalue of the kernel function is . If , then we have the -strong Polyak-Lojasiewicz inequality. This translates to high convergence of the optimization algorithm.

This is a very important result as it shows that the minimum eigenvalue of the tangent kernel is a key factor in the convergence of the optimization algorithm. If the minimum eigenvalue of the tangent kernel is greater than a certain threshold, then the optimization algorithm will converge exponentially fast. This is a very useful result in practice, as it allows us to analyze the convergence properties of optimization algorithms and design algorithms that converge faster.

Now a natural question is how we measure the convergence. Read more about in condition number