What machine learning methods make use of differential equations?

What machine learning methods make use of differential equations? by Lei Zhang

Answer by Lei Zhang:

For a specific example, to back propagate errors in a feed forward perceptron, you would generally differentiate one of the three activation functions:  Step, Tanh or Sigmoid.  (To clarify for those who don't know, a perception is a neural network, generally with a feed-forward, back propagating iteration; which means that the input and information to derive the final results only go forward, while the error is figured out after you arrive at your result, then goes backwards to each weights to determine which should change and by how much to reduce future errors).

Sigmoid function being

 and its derivative being

assuming beta is not one.  You can find the functions and their derivatives in wikipedia.

Artificial Neural Networks/Activation Functions

So basically if you want to use your own activation functions, you will need to find out your own derivatives.  These 3 work well for single hidden layer (or no hidden layer) perceptrons, but for what people are calling "deep learning", or more than one hidden layers, the error propagation can get complex.

You need to know why you are using these activation functions, and sometimes, you would choose one function over another purely for its derivative properties.  Error correction is half of the AI (some people will argue it's all the AI), if it does not correctly find which nodes/weights are responsible for the error, your AI will never learn (which is really not AI), so choosing a good function/functional derivative is very important, and this is where math is more important than programming.  Seriously, you can build an AI in less than 15,000 lines of code (C++).

Have fun

What machine learning methods make use of differential equations?


Not Attend DAC 2016

After two times of research paper presentation at DAC (DAC 2014 and DAC 2015), I will not attend DAC 2016 this year. The main reason is I did not submit any paper there. I did not submit to ICCAD 2016 either. But I do volunteer to review the papers for those two conferences.

The reason I stop submitting papers to conferences for a while is I have two journals (the journal version of MATEX and SPICE_Diego for Prof. Ernest Kuh) published in this half year, with another one and my thesis in preparation. Moreover, I have a software engineer job at ANSYS Apache. The technologies and real impact attract me more recently. (Maybe I will focus academic research sometime later.) Besides, I feel the privilege to work with the teams consist of legends in the area of design automation algorithms, e.g., the forerunner and researchers of AWE, the creators of MIT FastCap, CMU PRIMA, UT RICE, Synopsys PrimeTime.

I am kind of laid-back compared to previous years in VLSI CAD and EDA area, because I have been going for another path of adventure, which excites me quite a bit, more than traditional EDA. You might find some clues among the lines. I will keep low-key as usual (Am I 🙂 ). Wish the results turn out to be good in the future, so that I can feel proud to update just like the products I help to build and follow, such as the industrial golden chip sign-on software – Redhawk, the first machine learning and big data platform in EDA – Seascape and the in-design power simulation tool – Seahawk.

My friends, have a good time at DAC 2016 and Austin.




My manager at ANSYS Apache Dr. Steven P. McCormick was a MIT grad working with Professor Jonathan Allen. How awesome was that the lab built a speech machine, which was used by physicist Stephen Hawking! Then, they also developed the technique and inspired the well-known AWE in EDA area. Check the references and acknowledgement from the papers or not. I do not want to emphasize this, they were quite laid-back. So went from computational linguist, speech processing smoothly to VLSI and EDA.






What are some good follow-up courses to CS 224 (Advanced Algorithms) at Harvard?


Answer by Jelani Nelson:

I would check out the courses at http://toc.seas.harvard.edu/cour… and http://toc.csail.mit.edu/Who%20i…. Both only show up to Fall 2014 though. For Spring 2015, here are some courses I'm aware of at Harvard which are related to the Theory of Computation (TOC) that a CS 224 alum might enjoy:

Salil Vadhan: CS 225 (Pseudorandomness)
Michael Mitzenmacher: CS 223 (Probabilistic analysis and algorithms)
Les Valiant: CS 229r (Special topics course: biology and complexity)
Yaron Singer: CS 284r (Topics on computation in network and crowds)
Vahid Tarokh: AM 106/206 (Applied algebra)
David Parkes: CS 186 (Econ & computation)

What are some good follow-up courses to CS 224 (Advanced Algorithms) at Harvard?

For how many years did you study advanced topics in algorithms before you really became productive as a researcher?


Answer by Jelani Nelson:

Here's an answer that is perhaps too long-winded…
I'm sure it differs from person to person, but my experience was as follows. First off, I entered MIT freshman year not knowing much about computer science, and also not knowing what a mathematical proof was. I knew some C/C++ from teaching myself in my senior year of high school, and some HTML from teaching myself in 8th grade, but that was it. I also had no idea what research was.
My first exposure to proofs was Freshman year, taking the theory track for single/multi-variable calculus and differential equations (18.014, 18.024, and 18.034 — see Jelani Nelson – CLASSES ). My first exposure to algorithms was in my sophomore Spring algorithms course (6.046). I liked the class, but at that point is was just one class of many and I didn't think much more about algorithms for at least another year. My next course in theoretical computer science was then a graduate cryptography course (6.875) in my junior Spring, which I really enjoyed. At that point, I thought I wanted to do cryptography.

At the same time, I signed up for a TopCoder account near the end of junior year (in May). I also signed up for an account on the USACO training gateway, and on the websites of several other online judges. A couple friends of mine at MIT, namely Hubert Hwang (TopCoder: antimatter) and David Pritchard (TopCoder: daveagp), were really into TopCoder at that time and I got into it due to their influence. It is quite easy for me to get addicted to computer/video games, and I viewed solving problems on these websites as a game. Summer 2004 I probably spent over 8 hours a day solving algorithmic exercises on various websites. I would say this phase of my life was quite pivotal. After about 2 years on TopCoder, algorithmic thinking became second nature.

I also took a few graduate courses in algorithms my senior year: Advanced Algorithms (6.854), Randomized Algorithms (6.856), and Advanced Data Structures (6.897 at the time, now 6.851). They all had final projects, but I was quite busy taking 6 classes per semester so I didn't have much time to launch any serious research efforts. Luckily Erik Demaine as part of 6.897 had some "open problem solving sessions" which I attended. These were quite fun, and as part of one, a team of 7 of us solved some data structure problem on dynamic ham sandwich cuts which we published that summer in CCCG. It was not a major problem, but I believe one of the goals of these sessions was to introduce youngsters to research in theoretical computer science. It worked on me; I think it was both a good confidence booster, and simply a good way to get my feet wet. (It was also my first co-authored paper with Daniel Kane, who I ended up working with later on quite a few projects during graduate school.)
My senior Fall I also applied to (two) graduate schools (saying I wanted to do cryptography). I had a 5.0 GPA at that point but zero theory research experience, and I got rejected from both. (I also can't really even remember why I applied, considering that at that point I didn't really understand what a PhD was …). After senior year I did a one-year Masters supervised by Bradley Kuszmaul and Charles Leiserson, focused on external memory and cache oblivious data structures. Since I was only taking two classes per semester, I had much more time to focus on research. At some point Bradley and I came up with an update/query tradeoff for external memory predecessor, but then Jeremy Fineman pointed out it had been done in a SODA several years prior. We worked on other stuff though, which led to a SPAA paper a year later on two cache-oblivious solutions that aimed to get much better update performance while still having good query. My main contribution was in deamortizing the "cache-oblivious lookahead array" (COLA) in that work. So, I think by this point, I had the ability to do some theory work.

I didn't really find my current area though (streaming/sketching/dimensionality reduction/etc.)  until my second year of PhD. My first year I tried to do some more cache-oblivious stuff but didn't really get anywhere. My second year, in the Fall, I took a course by Piotr Indyk on streaming and sketching. I did a final project with Krzysztof Onak and Nick Harvey on entropy estimation in data streams. We had some non-trivial progress by the end of the course, and we kept working on it intensely in the Spring. We managed to get a pretty good algorithm by the FOCS deadline, so we submitted and it got accepted. That was my first paper in the area that became the focus of my thesis.

For how many years did you study advanced topics in algorithms before you really became productive as a researcher?

sort list via selection sort

struct ListNode {
    int val;
    ListNode *next;
    ListNode(int x) : val(x), next(NULL) {}

class Solution {
    ListNode *insertionSortList(ListNode *head){
      if (head == NULL || head->next == NULL) return head;

      ListNode *cur = head->next;
      head->next = NULL; // set up the termination condition
      while (cur){
          ListNode *nxt = cur->next;
          ListNode * brk = head;

          if (cur->val < brk->val) {
              cur->next = brk;
              head = cur;
          else {
               while (brk && brk->next && brk->next->val <= cur->val){
                         brk = brk->next;
               cur->next = brk->next; // propagate the NULL, a smart move!
               brk->next = cur;
          cur = nxt;
      return head;



What innovations in CS theory from the last 10 years have had an impact outside of academia?

Answer by Jelani Nelson:

I''ll add two more:

  • Algorithmic Game Theory studies optimization problems in which there are strategic agents whose rationality you have to take into account. For example, you're Google and want to design an ad auction system that optimizes your profit, taking into account that advertisers will try their best when using your platform to game the system to maximize their utility. A founding paper in this area of "algorithmic mechanism design" is [1].
  • Faster Numerical Linear Algebra via Randomization has been a recent development that's spread fast. Imagine you have a huge matrix, e.g. rows are users and columns are products, and an entry is that user's rating for that product. Most entries are empty, since most users don't rate most products; e.g. the Netflix matrix is only ~1% filled in [2]. Predicting unfilled entries is known as the "matrix completion" problem and is at the heart of some recommendation systems. Other problems people care about for large matrices include regression and principal component analysis (PCA). Earlier research showed how to use non-uniform sampling to get some speedup, but Sarlos showed [3] how to use fast random projections (a concept mentioned in another answer by Ghalib) to obtain speedup, which I think really changed the game. Since then, Clarkson and Woodruff showed [4] how to solve these problems even faster using ideas related to Sarlos (in time dominated by the number of nonzeroes in the matrix, which as mentioned before is often much smaller than the size of the matrix), and IBM has been putting together a linear algebra library based on these ideas [5].

[1] Noam Nisan, Amir Ronen: Algorithmic Mechanism Design. Games and Economic Behavior 35(1-2): 166-196 (2001)
[2] Yunhong Zhou, Dennis M. Wilkinson, Robert Schreiber, and Rong Pan. Large-scale parallel collaborative filtering for the Netflix prize. In Proceedings of the 4th International Conference on Algorithmic Aspects in Information and Management (AAIM), pages 337–348, 2008.
[3] Tamás Sarlós: Improved Approximation Algorithms for Large Matrices via Random Projections. FOCS 2006: 143-152
[4] Kenneth L. Clarkson, David P. Woodruff: Low rank approximation and regression in input sparsity time. STOC 2013: 81-90
[5] Sketching Linear Algebra Kernel

What innovations in CS theory from the last 10 years have had an impact outside of academia?