Deep Learning from Scratch to GPU, Part 2: Bias and Activation Function

语言: CN / TW / HK

In the current state, the network combines all layers into a single linear transformation . We can introduce basic decision-making capability by adding a cutoff to the output of each neuron. When the weighted sums of its inputs are below that threshold, the output is zero and when they are above, the output is one.

\begin{equation} output = \left\{ \begin{array}{ll} 0 & W\mathbf{x} \leq threshold \\ 1 & W\mathbf{x} > threshold \\ \end{array} \right. \end{equation}

Since we keep the current outputs in a (potentially) huge vector, it would be inconvenient to write a scalar-based logic for that. I prefer to use a vectorized function, or create one if there is not exactly what we need.

Neanderthal does not have the exact cutoff function, but we can create one by subtracting threshold from the maximum of each threshold and the signal value and then mapping the signum function to the result. There are simpler ways to compute this, but I wanted to use the existing functions, and do the computation in-place. It is of purely educational value, anyway. We will see soon that there are better things to use for transforming the output than the vanilla step function.

(defn step! [threshold x]
  (fmap! signum (axpy! -1.0 threshold (fmax! threshold x x))))
(let [threshold (dv [1 2 3])
      x (dv [0 2 7])]
  (step! threshold x))
nil#RealBlockVector[double, n:3, offset: 0, stride:1]
[   0.00    0.00    1.00 ]

I'm going to show you a few steps in the evolution of the code, so I will reuse weights and x. To simplify the example, we will use global def and not care about properly releasing the memory. It will not matter in a REPL session, but not forget to do it in the real code. Continuing the example fromPart 1:

(def x (dv 0.3 0.9))
(def w1 (dge 4 2 [0.3 0.6
                  0.1 2.0
                  0.9 3.7
                  0.0 1.0]
             {:layout :row}))
(def threshold (dv 0.7 0.2 1.1 2))

Since we do not care about extra instances at the moment, we'll use the pure mv function instead of mv! for convenience. mv creates the resulting vector y , instead of mutating the one that has to be provided as an argument.

(step! threshold (mv w1 x))
nil#RealBlockVector[double, n:4, offset: 0, stride:1]
[   0.00    1.00    1.00    0.00 ]

The bias is simply the threshold moved to the left side of the equation:

\begin{equation} output = \left\{ \begin{array}{ll} 0 & W\mathbf{x} - bias \leq 0 \\ 1 & W\mathbf{x} - bias > 0 \\ \end{array} \right. \end{equation}

(def bias (dv 0.7 0.2 1.1 2))
(def zero (dv 4))
(step! zero (axpy! -1.0 bias (mv w1 x)))
nil#RealBlockVector[double, n:4, offset: 0, stride:1]
[   0.00    1.00    1.00    0.00 ]

Remember that bias is the same as threshold. There is no need for the extra zero vector.

(step! bias (mv w1 x))
nil#RealBlockVector[double, n:4, offset: 0, stride:1]
[   0.00    1.00    1.00    0.00 ]