What machine learning methods make use of differential equations? by Lei Zhang
Answer by Lei Zhang:
For a specific example, to back propagate errors in a feed forward perceptron, you would generally differentiate one of the three activation functions: Step, Tanh or Sigmoid. (To clarify for those who don't know, a perception is a neural network, generally with a feed-forward, back propagating iteration; which means that the input and information to derive the final results only go forward, while the error is figured out after you arrive at your result, then goes backwards to each weights to determine which should change and by how much to reduce future errors).
Sigmoid function being
and its derivative being
assuming beta is not one. You can find the functions and their derivatives in wikipedia.
So basically if you want to use your own activation functions, you will need to find out your own derivatives. These 3 work well for single hidden layer (or no hidden layer) perceptrons, but for what people are calling "deep learning", or more than one hidden layers, the error propagation can get complex.
You need to know why you are using these activation functions, and sometimes, you would choose one function over another purely for its derivative properties. Error correction is half of the AI (some people will argue it's all the AI), if it does not correctly find which nodes/weights are responsible for the error, your AI will never learn (which is really not AI), so choosing a good function/functional derivative is very important, and this is where math is more important than programming. Seriously, you can build an AI in less than 15,000 lines of code (C++).