
model_hyper = kr.Sequential(
[
layers.Input(shape=Xtr.shape[1:]),
HyperConv2D(2, kernel_size=(3, 3), activation="relu", algebra = quat),
layers.MaxPooling2D(pool_size=(2, 2)),
HyperConv2D(4, kernel_size=(3, 3), activation="relu", algebra = quat),
layers.MaxPooling2D(pool_size=(2, 2)),
HyperConv2D(4, kernel_size=(3, 3), activation="relu", algebra = quat),
layers.MaxPooling2D(pool_size=(2, 2)),
HyperConv2D(8, kernel_size=(3, 3), activation="relu", algebra = quat),
layers.MaxPooling2D(pool_size=(2, 2)),
layers.Flatten(),
layers.Dropout(0.5),
layers.Dense(1, activation=None),
]
)
quat = np.array([[-1,+1,-1],[-1,-1,1],[1,-1,-1]])
Figure 4: Applied architecture of Qwaternion network. quat
= Quaterions.
5 EXPERIMENTAL PART AND
RESULTS
In the deep neural network classification experiments,
we divided the image sets into a training subset and
the validation test set with an 60/40 split. To es-
timate the quality of the classification, we used the
Monte Carlo Cross Validation (Xu(2001); Goodfel-
low(2016)) technique (MCCV5, i.e., five times train
and test), presenting average results. In the exper-
iments, the test (validation) system is applied in a
given iteration to the model to check the final effi-
ciency and observe the overlearning level. By eval-
uating in each iteration of learning an independent
validation set (not affecting the network’s learning
process), we can determine the degree of generaliza-
tion of the model. In evaluating experiments, accu-
racy in a balanced version is often recommended, i.e.,
the average accuracy of all classes classified (Broder-
sen(2010)). In our experiments, we use the Coss En-
tropy Loss version, which can exceed a value of 1, to
clearly indicate where the model is malfunctioning.
We scaled images to 100 × 100 pixels to ensure the
same size of the input for the network. We fed the ar-
tificial neural network with data after four alternating
convolutional and max-pooling steps. We used max-
pooling because it is the most effective technique for
reducing the sizes of images, which works well with
neural network models. Such an approach turned out
to be better in practice than average pooling (Brown-
lee(2019)). The convolutional layers extract features
from images before they are fed into the network. The
activation function of hidden layers was ReLU, and
the output layer had raw values. The loss function
took the form of categorical cross-entropy. Thus, it
could be higher than one. To train the neural network,
we used RGB color channels and applied the Adam
optimizer (Kingma(2015)). For quaternion networks,
we used the HSV color model. We carried out the
training over 50 epochs (for Spot), and 100 epochs
(for Unitree). The batch size was 16. We fitted the
above parameters experimentally.
In the experimental session we learned to distin-
guish spot behaviour (and for the most challengig
poses also Unitree Go2 pro poses), we have a full list
of experimental variations in Tab 1. We can see sam-
ples of each class in Fig 5.
Figure 5: Examples of Spot poses: Top left = Rotation on
the shorter side, Top right = Sitting, Middle left = Standing,
Middle right = Turning left, Bottom left = Turning right,
Bottom right = Rotation on the longer side.
Table 1: Experimental variants.
Pose
1
Pose
2
Result
Spot case
Standing Sitting robot Fig. 6
Standing Rotation on the longer side Fig. 7
Standing Rotation on the shorter side Fig. 8
Turning left Turning right Fig. 9
Standing Turning right Fig. 10
Unitree case
Turning left Turning right Fig. 12
We performed all the experiments in a similar
way. Thus, our results show how the MCCV5 method
works in each learning epoch and present the results
of five internal tests and the average result.
Various poses of the robot were analyzed, includ-
ing standing vs. sitting, standing vs. turning on the
long side, standing vs. turning on the shorter side,
turning left vs. turning right, and standing vs. turning
ICSOFT 2025 - 20th International Conference on Software Technologies
284