KANs can adjust adaptively according to the training
data, thereby better capturing nonlinear relationships
in the data (Helbing, & Molnar, 1995).
The study also employs the Transformer model,
which consists of encoders and decoders, to process
local and global features. Each Transformer contains
three layers and eight attention heads, and the strategy
network parameters are shared. KANs are introduced
into the Transformer encoders and decoders to
enhance the model's ability to capture complex
dynamic interactions. In addition, the study also
adjusts the model's dimensions to adapt to a specific
dataset—the Wusi dataset, proposed by Zhu et al. in
2024, which includes historical motion sequences of
multiple participants.
By combining KANs and Transformers, the study
aims to generate more natural and accurate future
motion predictions while improving the model's
adaptability to unseen complex patterns (Alahi, Goel,
Ramanathan, et al, 2016). The proposal of this new
algorithm aims to address the limitations of existing
algorithms in handling highly nonlinear and periodic
data, as well as the problem of overfitting to training
data (Guo, Bennewitz, 2019).
2 METHOD
2.1 Dataset
This study cites the first large-scale multiplayer 3D
sports dataset, Wusi (Wusi Basketball Dataset),
proposed in 2024 by Zhu et al (Zhu, Qin, Lou, et al,
2024). The Wusi dataset shows advantages when
faced with the task of multiplayer sports prediction,
outperforming other datasets in terms of size
(duration and number of people) and intensity of
interactions.
The input data is composed of historical
movement sequences from multiple participants.
Given Ρ participants,the movement history of each
participant 𝑝 may be represented as a time series of
length Τ, where each time step 𝑡 records the body
posture of the participant in 3D space: 𝒳
1 ≤𝑡≤
Τ,1 ≤𝑝≤P . The output data consists of motion
predictions for all participants in the future time
period, and the goal of the output is to predict the
sequence of postures from time Τ to Τ + Τ
. For each
participant 𝑝, the sequence of predicted future poses
is represented as: 𝒳
Τ≤𝑡≤Τ+ Τ
,1 ≤𝑝≤Ρ.
𝒳
represents the 3D pose at time step 𝑡.
2.2 Existing Algorithm
The motion prediction model in the study was
modelled using Markov Decision Process (MDP).
Behavioural cloning uses expert demonstration data
to train the model by supervised learning, which
minimizes the discrepancy between the model-
generated actions and the expert's behaviours.
Behavioural cloning methods excel in terms of
computational and sample efficiency (Caude,
Behavioural, 2010), but there are some problems; the
strategies tend to overfit the presentation of the expert
in the region of the state space, limiting the ability to
generalize (Borui, Ehsan, Hsu, 2019). To address
these problems, Generative Adversarial Imitation
Learning (GAIL) (Jonathan and Stefano, 2016) was
introduced. The policy network is regularized by
adversarial training to match its distribution of state-
action pairs with that of the policy of the expert, while
a specific cognitive hierarchy model is used to
express the recursive reasoning process (Colin, Ho,
and Chong, 2004).
2.3 Limitations of Baseline
During the algorithmic implementation of the
baseline, multiple performance bottlenecks limit the
accuracy and training efficiency of the model.
In multi-agent motion prediction tasks with input
data having complex dynamic interactions and
potentially nonlinear features, the nonlinear feature
extraction capability of the model is crucial for
prediction accuracy. Transformer models are known
for their expertise in identifying semantic correlations
(Alharthi, & Mahmood, 2024), but Transformer-
based deep learning model performs obvious
limitations when dealing with highly nonlinear and
periodic data (Nie, et al, 2022; Zeng, et al, 2023). This
study argues that when the input data contains
complex dynamic interactions, the prediction
accuracy of the model decreases significantly due to
its inability to effectively capture these nonlinear
relationships. Meanwhile, due to the inability of the
model to adequately capture the complex interaction
characteristics between the participants, the generated
motion sequences do not behave naturally enough in
certain scenarios. This phenomenon not only affects
the generative ability of the model, but also reduces
its credibility in practical applications.
As the model underperforms under complex
models, Transformer-based deep learning models
may perform overfitting on the data, which means
that the model performs well on the training data but
generalizes poorly on the validation or test dataset.