Site icon Shahad's Blogs

Why ReLU is not differentiable at x=0 (zero)?

ReLU is one of the widely used activation functions. For any \(x > 0\), the output of ReLU is \(x\) and \(0\) otherwise. So,
\[
ReLU(x) = \left\{\begin{matrix}
x & \textrm{if}\ x > 0, \\ 
0 & \textrm{otherwise}
\end{matrix}\right.
\]

We can also write it as \(ReLU(x) = max(0, x)\). For rest of the post let’s say \(f(x) = ReLU(x)\). So, for \(x = 0\) the value of the ReLU will be \(f(x) = 0\). From the following graph we can clearly see that the function is continuous at \(x = 0\).

So, why it is not differentiable \(x = 0\)? To be differentiable a function must meet the following criteria:

We have already seen that \(f(x)\) is continuous at \(x = 0\). Let’s investigate the further criteria. But before that let’s have a quick recap on how we can obtain the derivative using the fundamental concepts.

Derivative of a function

Say, we have a function \(f(x)\). We can get the derivative of this function, \(f'(x)\) using the basic concepts of the derivative with the following formula. \[ f'(x) = \lim_{x\rightarrow 0}\frac{f(x+\Delta x) – f(x)}{\Delta x} \]

Here \(\Delta x\) is an “infinitely small” change of input \(x\). The derivative is sometimes referred to as ‘rise over run’, i.e. how much the outputs changes over a tiny change of the input.

Derivative of ReLU

For \(x \neq 0\) we can calculate the derivative of ReLU using the fundamental formula. So, the derivative will be \[f'(x) = \lim_{x\rightarrow 0}\frac{max(0, x+\Delta x) – max(0,x)}{\Delta x}\] For \(x>0\) we obtain the derivative as: \[f'(x) = \frac{x+\Delta x – x}{\Delta x} = \frac{\Delta x}{\Delta x} = 1\] For \(x<0\) it would be: \[f'(x) = \frac{0\ – 0}{\Delta x} = 0\] So, what’s about \(x = 0\)? Let’s figure it out.

Why the derivative does not exist

We have already known that a function must follow a particular set of conditions to get the derivative. We also figured out that ReLU satisfies the first condition. Let’s investigate the other conditions.

The left-hand limit

To get the left-hand limit, let’s approach \(0\) from the left-hand side. So, \[f'(x) = \lim_{x\rightarrow 0^-}\frac{max(0, x+\Delta x) – max(0,x)}{\Delta x} = \lim_{x\rightarrow 0^-}\frac{0 – 0}{\Delta x} = 0\]

The right-hand limit

And the right-hand limit is: \[ f'(x) = \lim_{x\rightarrow 0^+}\frac{max(0, x+\Delta x) – max(0,x)}{\Delta x}\\ = \lim_{x\rightarrow 0^+}\frac{x – \Delta x + x}{\Delta x} \\ = \frac{0 – \Delta x + 0}{\Delta x}\\ = 1\]

It clearly satisfies conditions 2 and 3 as well. But for ReLU, the left-hand limit and the right-hand limits are not equal. So, it fails to satisfy the last condition.

And thus the derivation of ReLU Does Not Exist (DNE) at \(x=0\). Generally, for machine learning in practice it is rare to get \(x=0\). If we encounter \(x=0\) then generally we set it to either \(0\), \(1\) or \(0.5\).

Exit mobile version