Machine Learning uses Kernels. At least it tries when it can. A Kernel is an operator on a Hilbert Space. Or we might say that it defines the Hilbert Space. Depends on our point of view. Typical…
Question on @Quora: Are Differential Equations relevant to Machine Learning?
What machine learning methods make use of differential equations? by Lei Zhang
Answer by Lei Zhang:
For a specific example, to back propagate errors in a feed forward perceptron, you would generally differentiate one of the three activation functions: Step, Tanh or Sigmoid. (To clarify for those who don't know, a perception is a neural network, generally with a feed-forward, back propagating iteration; which means that the input and information to derive the final results only go forward, while the error is figured out after you arrive at your result, then goes backwards to each weights to determine which should change and by how much to reduce future errors).
Sigmoid function being
and its derivative being
assuming beta is not one. You can find the functions and their derivatives in wikipedia.
So basically if you want to use your own activation functions, you will need to find out your own derivatives. These 3 work well for single hidden layer (or no hidden layer) perceptrons, but for what people are calling "deep learning", or more than one hidden layers, the error propagation can get complex.
You need to know why you are using these activation functions, and sometimes, you would choose one function over another purely for its derivative properties. Error correction is half of the AI (some people will argue it's all the AI), if it does not correctly find which nodes/weights are responsible for the error, your AI will never learn (which is really not AI), so choosing a good function/functional derivative is very important, and this is where math is more important than programming. Seriously, you can build an AI in less than 15,000 lines of code (C++).
After two times of research paper presentation at DAC (DAC 2014 and DAC 2015), I will not attend DAC 2016 this year. The main reason is I did not submit any paper there. I did not submit to ICCAD 2016 either. But I do volunteer to review the papers for those two conferences.
The reason I stop submitting papers to conferences for a while is I have two journals (the journal version of MATEX and SPICE_Diego for Prof. Ernest Kuh) published in this half year, with another one and my thesis in preparation. Moreover, I have a software engineer job at ANSYS Apache. The technologies and real impact attract me more recently. (Maybe I will focus academic research sometime later.) Besides, I feel the privilege to work with the teams consist of legends in the area of design automation algorithms, e.g., the forerunner and researchers of AWE, the creators of MIT FastCap, CMU PRIMA, UT RICE, Synopsys PrimeTime.
I am kind of laid-back compared to previous years in VLSI CAD and EDA area, because I have been going for another path of adventure, which excites me quite a bit, more than traditional EDA. You might find some clues among the lines. I will keep low-key as usual (Am I🙂 ). Wish the results turn out to be good in the future, so that I can feel proud to update just like the products I help to build and follow, such as the industrial golden chip sign-on software – Redhawk, the first machine learning and big data platform in EDA – Seascape and the in-design power simulation tool – Seahawk.
My friends, have a good time at DAC 2016 and Austin.
My manager at ANSYS Apache Dr. Steven P. McCormick was a MIT grad working with Professor Jonathan Allen. How awesome was that the lab built a speech machine, which was used by physicist Stephen Hawking! Then, they also developed the technique and inspired the well-known AWE in EDA area. Check the references and acknowledgement from the papers or not. I do not want to emphasize this, they were quite laid-back. So went from computational linguist, speech processing smoothly to VLSI and EDA.
Answer by Jelani Nelson:
I would check out the courses atand . Both only show up to Fall 2014 though. For Spring 2015, here are some courses I'm aware of at Harvard which are related to the Theory of Computation (TOC) that a CS 224 alum might enjoy:
Salil Vadhan: CS 225 (Pseudorandomness)
Michael Mitzenmacher: CS 223 (Probabilistic analysis and algorithms)
Les Valiant: CS 229r (Special topics course: biology and complexity)
Yaron Singer: CS 284r (Topics on computation in network and crowds)
Vahid Tarokh: AM 106/206 (Applied algebra)
David Parkes: CS 186 (Econ & computation)
Answer by Jelani Nelson:
Here's an answer that is perhaps too long-winded…I'm sure it differs from person to person, but my experience was as follows. First off, I entered MIT freshman year not knowing much about computer science, and also not knowing what a mathematical proof was. I knew some C/C++ from teaching myself in my senior year of high school, and some HTML from teaching myself in 8th grade, but that was it. I also had no idea what research was.My first exposure to proofs was Freshman year, taking the theory track for single/multi-variable calculus and differential equations (18.014, 18.024, and 18.034 — see). My first exposure to algorithms was in my sophomore Spring algorithms course (6.046). I liked the class, but at that point is was just one class of many and I didn't think much more about algorithms for at least another year. My next course in theoretical computer science was then a graduate cryptography course (6.875) in my junior Spring, which I really enjoyed. At that point, I thought I wanted to do cryptography.
At the same time, I signed up for a TopCoder account near the end of junior year (in May). I also signed up for an account on the USACO training gateway, and on the websites of several other online judges. A couple friends of mine at MIT, namely Hubert Hwang (TopCoder: antimatter) and David Pritchard (TopCoder: daveagp), were really into TopCoder at that time and I got into it due to their influence. It is quite easy for me to get addicted to computer/video games, and I viewed solving problems on these websites as a game. Summer 2004 I probably spent over 8 hours a day solving algorithmic exercises on various websites. I would say this phase of my life was quite pivotal. After about 2 years on TopCoder, algorithmic thinking became second nature.I also took a few graduate courses in algorithms my senior year: Advanced Algorithms (6.854), Randomized Algorithms (6.856), and Advanced Data Structures (6.897 at the time, now 6.851). They all had final projects, but I was quite busy taking 6 classes per semester so I didn't have much time to launch any serious research efforts. Luckily Erik Demaine as part of 6.897 had some "open problem solving sessions" which I attended. These were quite fun, and as part of one, a team of 7 of us solved some data structure problem on dynamic ham sandwich cuts which we published that summer in CCCG. It was not a major problem, but I believe one of the goals of these sessions was to introduce youngsters to research in theoretical computer science. It worked on me; I think it was both a good confidence booster, and simply a good way to get my feet wet. (It was also my first co-authored paper with Daniel Kane, who I ended up working with later on quite a few projects during graduate school.)My senior Fall I also applied to (two) graduate schools (saying I wanted to do cryptography). I had a 5.0 GPA at that point but zero theory research experience, and I got rejected from both. (I also can't really even remember why I applied, considering that at that point I didn't really understand what a PhD was …). After senior year I did a one-year Masters supervised by Bradley Kuszmaul and Charles Leiserson, focused on external memory and cache oblivious data structures. Since I was only taking two classes per semester, I had much more time to focus on research. At some point Bradley and I came up with an update/query tradeoff for external memory predecessor, but then Jeremy Fineman pointed out it had been done in a SODA several years prior. We worked on other stuff though, which led to a SPAA paper a year later on two cache-oblivious solutions that aimed to get much better update performance while still having good query. My main contribution was in deamortizing the "cache-oblivious lookahead array" (COLA) in that work. So, I think by this point, I had the ability to do some theory work.
I didn't really find my current area though (streaming/sketching/dimensionality reduction/etc.) until my second year of PhD. My first year I tried to do some more cache-oblivious stuff but didn't really get anywhere. My second year, in the Fall, I took a course by Piotr Indyk on streaming and sketching. I did a final project with Krzysztof Onak and Nick Harvey on entropy estimation in data streams. We had some non-trivial progress by the end of the course, and we kept working on it intensely in the Spring. We managed to get a pretty good algorithm by the FOCS deadline, so we submitted and it got accepted. That was my first paper in the area that became the focus of my thesis.