# [Unfinished] Gradient-based optimization algorithms for training in machine learning and EDA global placement process

I am reading the paper “Sutskever, Ilya, James Martens, George Dahl, and Geoffrey Hinton. “On the importance of initialization and momentum in deep learning.” In Proceedings of the 30th international conference on machine learning (ICML-13), pp. 1139-1147. 2013.

And re-direct to Prof. Yann LeCun’s paper. There are several pictures to illustrate the intuition of the optimization process.

$W(t+1) = W(t) - \eta \frac{dE(W)}{dW}$

[1]

[1]

[2] [3] [4] are also good references to understand NAG process.

References:

[1] Y. LeCun, L. Bottou, G. Orr and K. Muller: Efficient BackProp, in Orr, G. and Muller K. (Eds), Neural Networks: Tricks of the trade, Springer, 1998

[2] Sebastien Bubeck,  Nesterov’s Accelerated Gradient Descent for Smooth and Strongly Convex Optimization, https://blogs.princeton.edu/imabandit/2014/03/06/nesterovs-accelerated-gradient-descent-for-smooth-and-strongly-convex-optimization/

[3] Zeyuan Allen-Zhu and Lorenzo Orecchia. “A Novel, Simple Interpretation of Nesterov’s Accelerated Method as a Combination of Gradient and Mirror Descent.” arXiv preprint arXiv:1407.1537 (2014). http://arxiv.org/abs/1407.1537