[Unfinished] Gradient-based optimization algorithms for training in machine learning and EDA global placement process

I am reading the paper “Sutskever, Ilya, James Martens, George Dahl, and Geoffrey Hinton. “On the importance of initialization and momentum in deep learning.” In Proceedings of the 30th international conference on machine learning (ICML-13), pp. 1139-1147. 2013.

And re-direct to Prof. Yann LeCun’s paper. There are several pictures to illustrate the intuition of the optimization process.

Gradient based method

W(t+1) = W(t) - \eta \frac{dE(W)}{dW}

yann_cg_illustration1[1]

yann_cg_illustration[1]

[2] [3] [4] are also good references to understand NAG process.

 

References:

[1] Y. LeCun, L. Bottou, G. Orr and K. Muller: Efficient BackProp, in Orr, G. and Muller K. (Eds), Neural Networks: Tricks of the trade, Springer, 1998

[2] Sebastien Bubeck,  Nesterov’s Accelerated Gradient Descent for Smooth and Strongly Convex Optimization, https://blogs.princeton.edu/imabandit/2014/03/06/nesterovs-accelerated-gradient-descent-for-smooth-and-strongly-convex-optimization/

[3] Zeyuan Allen-Zhu and Lorenzo Orecchia. “A Novel, Simple Interpretation of Nesterov’s Accelerated Method as a Combination of Gradient and Mirror Descent.” arXiv preprint arXiv:1407.1537 (2014). http://arxiv.org/abs/1407.1537
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s