Backpropagation

Change Password

Submit

Change Email

Submit

Change Nickname

Current Nickname:

Submit

Profile

Account ID:

Full Name:
Phone:
Company:
Company Email:

Change Password

Apply

You have no license application record.

Apply

Certificate	Issued at	Valid until	Serial No.	File

Serial No.	Valid until	File

Not having one? Apply now! >>>

Product	Created On	ID	Amount (USD)	Invoice

Product	Created On	ID	Amount (USD)	Invoice

No Invoice

Create Ultipa Account

I agree to the Privacy Policy and the

Data Processing Agreement .

Please agree to continue.

Already have an Ultipa account? Sign in now!

Forgot Password

Reset Password

Back to sign in

Backpropagation

Backpropagation (or BP), stands for Error Backward Propagation, constitutes a core technique used in training models for graph embeddings.

The BP algorithm encompasses two main stages:

Forward Propagation: Input data is fed into the input layer of a neural network or model. It then passes through one or multiple hidden layers before generating output from the output layer.
Backpropagation: The generated output is compared with the actual or expected value. Subsequently, the error is conveyed from the output layer through the hidden layers and back to the input layer. During this process, the weights of the model are adjusted using the gradient descent technique.

The iterative weight adjustments constitute the training process of the neural network. We will further explain with a concrete example.

Preparations

Neural Network

Neural networks are typically composed of several essential components: an input layer, one or multiple hidden layers, and an output layer. Here, we present a simple example of a neural network architecture:

In this illustration, $x$ is the input vector containing 3 features, $y$ is the output. We have two neurons $h_{1}$ and $h_{2}$ in the hidden layer. The sigmoid activation function is applied in the output layer.

Furthermore, the connections between layers are characterized by the weights: $v_{11}$ ~ $v_{32}$ are weights between the input layer and hidden layer, $w_{1}$ and $w_{2}$ are weights between the hidden layer and output layer. These weights are pivotal in the computations performed within the neural network.

Activation Function

Activation functions empowers the neural network to conduct non-linear modeling. Without activation functions, the model can only express linear mappings, limiting their capability. A diverse range of activation functions exists, each serving a unique purpose. The sigmoid function used in this context is depicted by the following formula and graph:

Initial Weights

The weights are initialized with random values. Let's assume the initial weights are as follows:

Training Samples

Let's consider three sets of training samples as outlined below, where the superscript indicates the order of the sample:

Inputs: $x^{(1)} = (2, 3, 1)$ , $x^{(2)} = (1, 0, 2)$ , $x^{(3)} = (3, 1, 1)$
Outputs: $t^{(1)} = 0.64$ , $t^{(2)} = 0.52$ , $t^{(3)} = 0.36$

The primary objective of the training process is to adjust the model's parameters (weights) so that the predicted/computed output ( $y$ ) closely aligns with the actual output ( $t$ ) when the input ( $x$ ) is provided.

Forward Propagation

Input Layer → Hidden Layer

Neurons $h_{1}$ and $h_{2}$ are calculated by:

Hidden Layer → Output Layer

The output $y$ is calculated by:

Below is the calculation of the 3 samples:

$x$	$h_{1}$	$h_{2}$	$s$	$y$	$t$
$x^{(1)} = (2, 3, 1)$	2.4	1.8	2.28	0.907	0.64
$x^{(2)} = (1, 0, 2)$	0.75	1.2	0.84	0.698	0.52
$x^{(3)} = (3, 1, 1)$	1.35	1.4	1.36	0.796	0.36

Apparently, the three computed outputs ( $y$ ) are very different from the expected ( $t$ ).

Loss Function

A loss function is used to quantify the error or disparity between the model's outputs and the expected outputs. It is also referred to as the objective function or cost function. Let's use the mean square error (MSE) as the loss function $E$ here:

where $m$ is the number of samples. Calculate the error of this round of forward propagation as:

\frac{{(0.64 - 0.907)}^{2} + {(0.52 - 0.698)}^{2} + {(0.36 - 0.796)}^{2}}{2 \times 3} = 0.234

A smaller value of the loss function corresponds to higher model accuracy. The fundamental goal of model training is to minimize the value of the loss function to the greatest extent possible.

Consider the input and output as constants, while regarding the weights as variables within the loss function. Then the objective is to adjust the weights that result in the lowest value of the loss function - this is where the gradient descent technique comes to play.

In this example, the batch gradient descent (BGD) is used, i.e., all samples are involved in the calculation of the gradient. Set the learning rate $η = 0.5$ .

Output Layer → Hidden Layer

Adjust the weights $w_{1}$ and $w_{2}$ respectively.

Calculate the partial derivative of $E$ with respect to $w_{1}$ with the chain rule:

where,

Calculate with values:

\frac{\partial E}{\partial y} = \frac{(0.907 - 0.64) + (0.698 - 0.52) + (0.796 - 0.36)}{3} = 0.294

\frac{\partial y}{\partial s} = \frac{0.907 \times (1 - 0.907) + 0.698 \times (1 - 0.698) + 0.796 \times (1 - 0.796)}{3} = 0.152

\frac{\partial s}{\partial w_{1}} = \frac{2.4 + 0.75 + 1.35}{3} = 1.5

Then, $\frac{\partial E}{\partial w_{1}} = 0.294 \times 0.152 \times 1.5 = 0.067$

Since all samples are involved in computing the partial derivative, when calculating $\frac{\partial y}{\partial s}$ and $\frac{\partial s}{\partial w_{1}}$ , we take the sum of these derivatives across all samples and then obtain the average.

Therefore, $w_{1}$ is updated to $w_{1} = w_{1} - η \frac{\partial E}{\partial w_{1}} = 0.8 - 0.5 \times 0.067 = 0.766$ .

The weight $w_{2}$ can be adjusted in a similar way by calculating the partial derivative of $E$ with respect to $w_{2}$ . In this round, $w_{2}$ is updated from $0.2$ to $0.167$ .

Hidden Layer → Input Layer

Adjust the weights $v_{11}$ ~ $v_{32}$ respectively.

Calculate the partial derivative of $E$ with respect to $v_{11}$ with the chain rule:

We already computed $\frac{\partial E}{\partial y}$ and $\frac{\partial y}{\partial s}$ , below are the latter two:

Calculate with values:

\frac{\partial E}{\partial y} = 0.294

\frac{\partial y}{\partial s} = 0.152

\frac{\partial s}{\partial h_{1}} = 0.8

\frac{\partial h_{1}}{\partial v_{11}} = \frac{2 + 1 + 3}{3} = 2

Then, $\frac{\partial E}{\partial v_{11}} = 0.294 \times 0.152 \times 0.8 \times 2 = 0.072$ .

Therefore, $v_{11}$ is updated to $v_{11} = v_{11} - η \frac{\partial E}{\partial v_{11}} = 0.15 - 0.5 \times 0.072 = 0.114$ .

The remaining weights can be adjusted in a similar way by calculating the partial derivative of $E$ with respect to each of them. In this round, they are updated as follows:

$v_{12}$ is updated from $0.2$ to $0.191$
$v_{21}$ is updated from $0.6$ to $0.576$
$v_{22}$ is updated from $0.3$ to $0.294$
$v_{31}$ is updated from $0.3$ to $0.282$
$v_{32}$ is updated from $0.5$ to $0.496$

Training Iterations

Apply the adjusted weights into the model and proceed with forward propagation using the same three samples. In this iteration, the resulting error $E$ is reduced to $0.192$ .

The Backpropagation algorithm iteratively performs the forward and back-propagation steps to train the model. This process continues until either the designated training count or time limit is reached, or when the error decreases to a predefined threshold.

ID
Product
Status
Cores
Maximum Shard Services
Maximum Total Cores for Shard Service
Maximum HDC Services
Maximum Total Cores for HDC Service
Applied Validity Period(days)
Effective Date
Expired Date
Mac Address
Reason for Application
Review Comment