Technical Neural Nets 2

From Perzeptron to MLP

Why Nonlinear Transfer Function

smooth, continuous, differentiable → can use backpropagation algorithm
monotonic increase
bounded output

MLP Structure

one input layer, one or more hidden layers, one output layer
no weights in input layer
each neuron has a set of weights
forward unidirectional connections
fully connected

MLP Capability

Universal function approximators
It has been shown that an MLP with one hidden layer of a finite number of neurons with nonlinear transfer function is capable of approximating any continuous mapping from an N-dimensional input space to an M-dimensional output space with arbitrary accuracy.

Learning

Assess of Model

Loss function (statistics, pattern recognition, …)
Error function (function approximation, pattern recognition, …)

Supervised Learning Scheme

Choose a model, get some teacher data.
Initialize the model by setting the parameters.
Pick some data X and pass it to the model.
Apply the model to the data, produce the output Y(X).
Compare the output Y to the teacher Ŷ.
Apply the learning algorithm to change the parameters.
Decide whether to stop or continue.
Post processing if necessary.

Error Function
[
E_p = \frac{1}{2}\sum_{m=1}^{M}(\hat{y}_m - y_m)^2
]

Back Propagation

Table of Symbols and Meanings

Symbol	Meaning	Explanation
p	Training pattern index	Identifies the current sample in the training set
Eₚ	Error for pattern p	(E_p = \frac{1}{2}\sum_m (\hat{y}_m - y_m)^2); half-squared error for one sample
η (eta)	Learning rate	Controls the step size for each weight update
wₙₕ, wₕₘ	Weights	(w_{nh}): from input neuron n → hidden neuron h; (w_{hm}): from hidden h → output m
Δw	Weight change	Amount each weight is updated: (-\eta \frac{\partial E_p}{\partial w})
xₙ	Input neuron output	The _n_th input value (also written as (out_i) for previous layer output)
~outₕ	Hidden neuron output	The output value from hidden neuron h, used as input to the next layer
yₘ	Actual output	The network’s prediction for output neuron m
ŷₘ	Target output	The desired (teacher) output for neuron m
netₕ, netₘ	Net input	Weighted sum before activation: (net_j = \sum_i w_{ij} out_i)
f(net)	Activation function	Nonlinear function applied to net (e.g., sigmoid or tanh)
f′(net)	Derivative of activation	Needed for gradient computation in BP
δₘ	Delta for output neuron m	((y_m - \hat{y}_m) f′(net_m)); represents output layer error signal
δₕ	Delta for hidden neuron h	(f′(net_h)\sum_m w_{hm}δ_m); backpropagated error signal
∇₍W₎E	Gradient of error w.r.t weights	Vector of all partial derivatives (\frac{\partial E}{\partial w})
ΔW	Vector of all weight updates	(\Delta W = -\eta ∇₍W₎E)
K, M, H, N	Counts of neurons	(N)=input, (H)=hidden, (M)=output, (K)=next layer neurons
Bias	Constant input (usually = 1)	Allows shifting the activation threshold

Mathematical Derivation

0. Setup and Goal

For a single training pattern (p):
[
E_p = \frac{1}{2}\sum_{m=1}^{M}(\hat{y}_m - y_m)^2
]
(MSE = mean squared error)

Weight update rule (gradient descent):
[
\Delta w = -\eta \frac{\partial E_p}{\partial w}
]

Each neuron’s net input and output:
[
net_m = \sum_{g=0}^{H} \tilde{out}g , w{gm}, \quad y_m = f(net_m), \quad \tilde{out}_0 = 1 \text{ (bias input)}.
]

Typical activation derivatives:

Sigmoid: (f’(z) = f(z)(1 - f(z)))
Tanh: (f’(z) = 1 - \tanh^2(z))

Goal: compute (\frac{\partial E_p}{\partial w}) for every weight in the network.

1. Output Layer Weights (w_{hm})

(Connection from hidden neuron (h) → output neuron (m))

Apply the chain rule:
[
\frac{\partial E_p}{\partial w_{hm}} = \frac{\partial E_p}{\partial y_m} \cdot \frac{\partial y_m}{\partial net_m} \cdot \frac{\partial net_m}{\partial w_{hm}}
]

Step by step:
[
\frac{\partial E_p}{\partial y_m} = -( \hat{y}_m - y_m ) = (y_m - \hat{y}_m)
]

Combine:
[
\frac{\partial E_p}{\partial w_{hm}} = (y_m - \hat{y}_m) f’(net_m) \tilde{out}_h
]

Define the error signal (delta) for the output neuron:
[
\boxed{\delta_m = (y_m - \hat{y}_m) f’(net_m)}
]

Then:
[
\frac{\partial E_p}{\partial w_{hm}} = \delta_m \tilde{out}h, \quad \boxed{\Delta w{hm} = -\eta , \delta_m , \tilde{out}_h}
]

This is the delta rule for output weights.

2. Hidden Layer Weights (w_{nh})

(Connection from input neuron (n) → hidden neuron (h))

The hidden layer error depends on all output neurons, so again apply the chain rule:
[
\frac{\partial E_p}{\partial w_{nh}} = \sum_{m=1}^{M} \frac{\partial E_p}{\partial y_m} \frac{\partial y_m}{\partial net_m} \frac{\partial net_m}{\partial \tilde{out}_h} \frac{\partial \tilde{out}h}{\partial net_h} \frac{\partial net_h}{\partial w{nh}}
]

Substitute each term:
[
\frac{\partial net_m}{\partial \tilde{out}h} = w{hm}, \quad \frac{\partial \tilde{out}h}{\partial net_h} = f’(net_h), \quad \frac{\partial net_h}{\partial w{nh}} = x_n
]

and note that:
[
\frac{\partial E_p}{\partial y_m} \frac{\partial y_m}{\partial net_m} = \delta_m
]

So:
[
\frac{\partial E_p}{\partial w_{nh}} = \Big(\sum_{m=1}^{M} \delta_m , w_{hm}\Big) f’(net_h) x_n
]

Define the error signal (delta) for the hidden neuron:
[
\boxed{\delta_h = f’(net_h) \sum_{m=1}^{M} w_{hm} \delta_m}
]

Thus:
[
\frac{\partial E_p}{\partial w_{nh}} = \delta_h x_n, \quad \boxed{\Delta w_{nh} = -\eta , \delta_h , x_n}
]

(Bias weights use the same rule, with (x_0 = 1)).

BP Conclusion

General weight update rule:
[
\Delta w_{ij} = \eta , \delta_j , out_i
]
Each weight change = learning rate × neuron j’s delta × neuron i’s output.
Output neuron:
[
\delta_m = (\hat{y}_m - y_m) , f’(net_m)
]
Error comes directly from target vs. output.
Hidden neuron:
[
\delta_h = \Big(\sum_{k=1}^{K} \delta_k , w_{hk}\Big) f’(net_h)
]
Error is backpropagated from the next layer.

In short:

BP adjusts each weight by how much that neuron contributed to the total error, propagating δ backward layer by layer.

Technical Neural Nets 2

http://example.com/2025/10/23/MLP2/

Author

Newtown

Posted on

October 23, 2025

Licensed under

德语A1-Einheit5整理 Next