Large Language Model-Informed Geometric Trajectory Embedding for

Driving Scenario Retrieval

Tin Stribor Sohn

1,∗

, Maximilian Dillitzer

1,2,∗

, Tim Br

uhl

, Robin Schwager

Tim Dieter Eberhardt

, Michael Auerbach

and Eric Sax

Dr. Ing. h.c. F. Porsche AG, Weissach, Germany

Hochschule Esslingen, Esslingen, Germany

Karlsruher Institut f

ur Technologie, Karlsruhe, Germany

Keywords:

Geometric Trajectory Embedding, Large Language Model, Scenario Retrieval, Behavioural Scenario.

Abstract:

This paper introduces a Large Language Model-informed geometric embedding for retrieving behavioural

driving scenarios from unlabelled trajectory data, aimed at improving the search of real driving data for

scenario-based testing. A Variational Recurrent Autoencoder with a Hausdorff Distance-based loss generates

trajectory embeddings that capture detailed spatial patterns and interactions, offering enhanced interpretability

over traditional mean squared error-based models. The embeddings are further organised through unsuper-

vised clustering using HDBSCAN, grouping scenarios by similarities at the scene, infrastructure, behaviour,

and interaction levels. Using GPT-4o for describing scenarios, clusters, and inter-cluster relationships, the

approach enables targeted scenario retrieval via a Graph Retrieval-Augmented Generation pipeline, enabling a

natural language search of unlabelled trajectories. Evaluation demonstrates a retrieval precision of 80.2% for

behavioural queries involving infrastructure, multi-agent interactions, and diverse trafﬁc conditions.

1 INTRODUCTION

The Safety of the Intended Functionality (SOTIF)

standard provides guidance for the validation and ver-

iﬁcation (V&V) process of automated driving systems

(ADS). According to SOTIF, driving functions must

not only ensure safety under typical operating condi-

tions, but must also be able to deal with hazardous

and unforeseen scenarios that pose the highest safety

risks (International Organization for Standardization,

2022). This standard emphasises the need for an ADS

not only to behave safely on its own, but also to antic-

ipate and handle potential faults and misbehaviour of

other road users, emphasising that driving functions

need to be evaluated at a behavioural level to ensure

robust fault anticipation.

In real-world driving data, road user behaviour is

often represented by object lists with associated tra-

jectories that serve as indicators of driving scenar-

ios. However, mapping scenarios and their associ-

ated trajectories to each other remains a signiﬁcant

challenge for V&V tasks. Existing manoeuvre-based

∗

Equal contribution

Ego Trajectory

& Trajectories of other

Road Users

VRAE

Latent Space

Geometric Embedding

LLM

Trajectory and Cluster

Description with

Prompts

Cluster Embedding

(HDBSCAN)

GraphRAG

Indexing

Graph

Database

Natural Language

Query

queried Output

Reconstructed

Ground Truth

Hausdorff

Distance

1. Describe Sample Scene

2. Compare to neighbouring

Samples in same Cluster

3. Compare to random

Samples of other Clusters

Figure 1: Method for generating geometric trajectory em-

beddings and mapping them to textual descriptions, en-

abling scenario retrieval and searchability at a behavioural

level through natural language queries.

approaches often rely on pre-deﬁned rules, which

Sohn, T. S., Dillitzer, M., Brühl, T., Schwager, R., Eberhardt, T. D., Auerbach, M. and Sax, E.

Large Language Model-Informed Geometric Trajectory Embedding for Driving Scenario Retrieval.

DOI: 10.5220/0013276500003941

In Proceedings of the 11th International Conference on Vehicle Technology and Intelligent Transport Systems (VEHITS 2025), pages 66-75

ISBN: 978-989-758-745-0; ISSN: 2184-495X

may not be sufﬁcient to capture the full range of be-

havioural complexity.

This paper proposes a novel method to enhance

the searchability of unlabelled trajectory-based sce-

narios found in naturalistic driving data through the

integration of geometric trajectory embeddings and

Large Language Models (LLMs) (Fig. 1). By ad-

dressing existing limitations in the searchability of be-

havioural scenarios in scenario-based testing (SBT),

the proposed method contributes to the state of the art

in ﬁve key ways:

• Fine-Grained Clustering of Unlabelled Trajec-

tories: Using a variational recurrent autoencoder

and a Hausdorff loss function, geometric embed-

dings are generated for unlabelled trajectories that

capture ﬁne-grained attributes rather than aver-

age metrics such as mean squared error (MSE).

Cluster analysis within this embedding space is

achieved by Hierarchical Density-Based Spatial

Clustering of Applications with Noise (HDB-

SCAN) to reveal attribute patterns.

• Integrated Embedding of Scene Elements and

Interactions: The joint geometric embedding en-

compasses infrastructure, general scene context,

manoeuvres, and object interactions, reﬂecting

multi-object scenarios.

• Behavioural Scenario Description and Search-

ability: By describing and comparing trajec-

tory clusters in the geometric embedding through

prompting an LLM for natural language descrip-

tions (Wei et al., 2022), the method improves in-

terpretability and enables behaviour-based search-

ability across clusters, allowing natural language

queries to be mapped to trajectory data.

• Indexing Through the GraphRAG Pipeline:

Embeddings are indexed within a Graph

Retrieval-Augmented Generation (RAG) pipeline

(Edge et al., 2024), enabling natural language re-

trieval and open vocabulary manoeuvre mapping,

along with combinatorial behaviour scenario

search.

• Demonstrated Retrieval Performance: Evalua-

tion results indicate an average retrieval precision

of 80.2% with promising results in scenarios re-

trieval involving infrastructure and object interac-

tions, as well as in the combinatorial manoeuvre

space.

2 RELATED WORK

Current behavioural scenario extraction methods can

be classiﬁed into learning-based embeddings, trajec-

tory clustering and rule-based manoeuvre extraction.

Learning-based trajectory embedding approaches

use deep learning techniques to generate embeddings

of driving scenarios.

Sonntag et al. introduced a method to minimise

the test space by identifying edge cases using deep

learning based outlier detection (Sonntag. et al.,

2024). Their approach used autoencoders to gener-

ate trajectory embeddings and calculated MSE recon-

struction errors to identify unusual scenarios. How-

ever, this method has limitations as it focuses primar-

ily on spatio-temporal features and relies on an overall

scenario embedding, which limits its applicability for

detailed scenario analysis, particularly when address-

ing SOTIF. The use of MSE often fails to preserve

the ﬁner geometric properties of the trajectories, re-

sulting in a more generalised representation that may

miss critical behavioural nuances such as local lateral

driving behaviour. In addition, the interpretability and

searchability of the embedding is challenging as the

properties of the clusters are not described.

Hoseini et al. present a non-parametric trajectory

clustering framework using Generative Adversarial

Networks (GANs) to generate realistic synthetic data

(Hoseini et al., 2021). Their approach incorporates

geometric relations through Dynamic Time Warping

(DTW) and models transitive relations within an em-

bedding space. However, its applicability to be-

havioural scenario search is limited due to the limited

interpretability of the embedding space. Furthermore,

it only considers lateral and longitudinal positioning,

omitting key parameters essential for dynamic sce-

nario characterisation, and focuses on only three spe-

ciﬁc scenarios: cut-in and left or right pass-by ma-

noeuvres.

Clustering-based methods have also been ex-

plored to group similar driving scenarios, often focus-

ing on the temporal and dynamic aspects of trajecto-

ries.

Ries et al. introduced a method for trajectory-

based clustering using DTW, which identiﬁes similar

driving scenarios by comparing the temporal align-

ment of dynamic object trajectories (Ries et al., 2021).

While this method is effective for querying similar

trajectories, it does not capture the full complexity of

driving scenarios, particularly in terms of abstraction

layers that involve multiple interacting entities and re-

lated infrastructure.

Similarly, Watanabe et al. proposed a sce-

nario clustering method based on predeﬁned features

(Watanabe et al., 2019). Although this approach al-

lows for the organisation of scenario data, it lacks the

ability to adaptively discover relevant features from

the trajectory data itself, limiting its applicability to

Large Language Model-Informed Geometric Trajectory Embedding for Driving Scenario Retrieval

behavioural scenario search.

Rule-based manoeuvre extraction approaches for

driving scenario retrieval are based on pre-deﬁned

rules and patterns.

Montanari et al. introduced a pattern cluster-

ing method that identiﬁes recurring scenario patterns

from time series data (Montanari et al., 2020). While

this method is useful for recognising common scenar-

ios and corner cases, it does not explicitly consider

the geometric properties of the trajectories in relation

to the infrastructure and lacks interpretability due to

missing pattern descriptions.

Elspas et al. introduced a more sophisticated

rule-based system using regular expressions to extract

driving scenarios (Elspas et al., 2020). Their method

focuses on detecting speciﬁc manoeuvres, such as

cut-ins and lane changes, by applying pre-deﬁned

rules to time series data. While this allows for a

functional interpretation of scenario patterns, the rule-

based nature of the system requires signiﬁcant manual

effort to deﬁne appropriate patterns, which limits its

scalability and adaptability to new scenarios, and may

also lead to missing speciﬁcations of certain edge and

corner cases.

Later, Elspas et al. used fully Convolutional Neu-

ral Networks (CNNs) to extract scenarios from time

series data (Elspas et al., 2021). However, their ap-

proach relies on labelled datasets with ground truth

annotations, which presents challenges in terms of

complexity and scalability, as manually annotating

large and diverse datasets is labour intensive and

may not cover all relevant aspects of real-world be-

havioural driving scenarios.

Sohn et al. investigated natural language sce-

nario retrieval, assessing the ability of different LLMs

to capture scenario information (Sohn et al., 2024).

However, as this work focused on the description and

retrieval of perceptual data within the 6-Layer Model

(6LM) framework (Scholtes et al., 2021), behavioural

scenario retrieval was not a primary focus.

The methods reviewed address different aspects of

trajectory embedding and scenario retrieval, but tend

to either over-generalise or focus narrowly on speciﬁc

aspects of driving behaviour. Learning-based meth-

ods tend to oversimplify trajectory embeddings by re-

lying on MSE and overall scenario representations,

which may obscure critical geometric and relational

properties. Clustering methods focus primarily on

temporal alignment and pre-deﬁned features, but fail

to capture the full complexity of multi-agent interac-

tions. Rule-based methods, while useful for detecting

speciﬁc manoeuvres, require signiﬁcant manual effort

and may struggle to adapt to new or complex scenar-

ios. All presented methods lack searchability and in-

terpretability of the derived clusters, leaving out the

mapping of user queries or behavioural scenario de-

scriptions to the trajectory space.

3 METHOD

To overcome these limitations, a novel method is

proposed that aims to enhance the retrieval of driv-

ing scenarios by combining geometry-aware trajec-

tory embeddings with word embeddings from LLMs,

enabling behavioural-level searchability through nat-

ural language queries (Fig. 1). This approach is in-

tended to support V&V processes for ADS, ensuring

that behavioural patterns in unlabelled driving data

can be effectively analysed and retrieved through nat-

ural language or behavioural scenario descriptions.

3.1 Fine-Grained Trajectory

Embedding Using Geometric Loss

To generate a trajectory embedding that models the

ﬁne-grained geometric aspects of behavioural scenar-

ios, a Variational Recurrent Autoencoder (VRAE) is

employed using Long Short-Term Memory (LSTM)

cells with self-attention to capture sequential pat-

terns. The model is trained using the Hausdorff Dis-

tance (HD) as the loss function. Unlike the MSE,

which tends to generalise trajectory features, the HD

is able to preserve spatial properties by quantifying

the largest deviation between trajectory points. This

property is necessary to preserve the geometric at-

tributes required for ﬁne-grained analysis of driving

behaviour, particularly in scenarios involving multi-

ple agents and dynamic interactions such as cut-ins,

overtaking manoeuvres or multiple lane changes.

The HD (d

) between two trajectories A =

, a

, . . . , a

} and B = {b

, b

, . . . , b

} in a metric

space with the Manhattan (L1) distance d

(a, b) =

∑

i=1

− b

| is deﬁned as follows:

(A, B) = max



sup

a∈A

inf

b∈B

(a, b), sup

b∈B

inf

a∈A

(a, b)



This metric ensures that each point in A is close to

at least one point in B and vice versa, capturing tra-

jectory similarity based on maximum deviation. Us-

ing HD as a loss function within the VRAE allows for

a more accurate representation of trajectory structure,

which is required for behavioural scenario analysis.

VEHITS 2025 - 11th International Conference on Vehicle Technology and Intelligent Transport Systems

3.2 Clustering of Trajectory

Embeddings with HDBSCAN

Following embedding generation, trajectories are

clustered using Hierarchical Density-Based Spatial

Clustering of Applications with Noise (HDBSCAN)

(McInnes et al., 2017). It is well suited to dealing

with the distinctive features of unlabelled, naturalis-

tic trajectory data by identifying clusters of different

densities based on spatial and behavioural character-

istics from coarse to ﬁne-grained properties. HDB-

SCAN helps to isolate clusters that reﬂect different

driving behaviours, allowing further analysis at a sce-

nario level. By explicitly accounting for noise in the

data, it also helps to identify unique patterns that rep-

resent outliers. This clustering step thus organises

the trajectory embeddings according to ﬁne-grained

attributes, preserving the distinctions between the dif-

ferent driving behaviours observed.

3.3 Semantic Interpretation of

Scenarios Using LLMs

To make the generated embedding interpretable, each

trajectory cluster and individual scenarios within the

clusters are described using an LLM, in this approach

GPT-4o (OpenAI, 2024). This is achieved using a

prompt pipeline consisting of three main steps for

each sample in the dataset:

1. A single scenario is given as context with the task

to analyse overall properties such as trafﬁc density

and each of the object trajectories in relation to

the infrastructure (road properties such as lanes)

as well as the interaction with other objects.

2. The corresponding cluster in the embedding space

is used to sample the ﬁve nearest neighbours of the

sample within its cluster to describe the intrinsic

attributes of the cluster by comparing the samples

to each other.

3. Each of the other clusters is considered by sam-

pling randomly ﬁve samples and describing them

as well as comparing them with the previously de-

scribed cluster.

Each description is referenced to the correspond-

ing data samples and associated clusters. This en-

sures interpretability at sample level, within clusters

and between clusters of behavioural scenarios, and

allows user queries to search from coarse to ﬁne-

grained behavioural similarities and differences. This

step facilitates the translation of raw geometric data

into semantically meaningful interpretable represen-

tations, bridging the gap between abstract trajectory

data and functional driving behaviour. The integration

of LLMs improves searchability by linking trajectory

embeddings to a descriptive language.

3.4 Natural Language Scenario

Retrieval via GraphRAG Pipeline

The generated descriptions, combined with the corre-

sponding references in the dataset and the clustered

embedding, are indexed using a GraphRAG pipeline.

The idea of RAG is to provide a queryable external

context source for LLMs to answer targeted questions

about speciﬁc data not previously used to train the

LLM. GraphRAG implements this idea by modelling

the data into a knowledge graph, addressing relation-

ships between entities as well as general concepts by

clustering the knowledge graph (Edge et al., 2024).

By embedding the knowledge graph in the same em-

bedding as the LLM queries, natural language search

can be performed by referencing the previously gener-

ated descriptions as well as their associated data sam-

ples and trajectory embedding clusters.

4 EXPERIMENTS

4.1 Dataset

In this paper, a naturalistic vehicle trajectory dataset is

used to support trajectory embedding generation and

retrieval tasks.

The HighD dataset consists of more than 110,500

vehicles covering approximately 44,500 kilometres of

driving over 147 hours. The data, focusing on high-

way scenarios, was collected with a drone at six dif-

ferent German highway locations, overcoming com-

mon limitations of traditional trafﬁc data collection

such as occlusion. The trajectory of each vehicle, in-

cluding manoeuvres and dimensions, was extracted

with a positioning error of less than ten centimetres

(Krajewski et al., 2018). For analysis, within the

frame of this research, the dataset is segmented into

15-second sequences to allow for the analysis of ve-

hicle interactions and manoeuvres at a higher granu-

larity.

4.2 Embedding Generation and Cluster

Analysis of Driving Scenarios

To analyse the embedding, both MSE and HD are

used as loss functions to generate the trajectory em-

bedding with VRAE. The differences identiﬁed are

compared by analysing the HDBSCAN clusters on

Large Language Model-Informed Geometric Trajectory Embedding for Driving Scenario Retrieval

a t-distributed Stochastic Neighbour Embedding (t-

SNE) projection and mapping general properties such

as number of lanes, lane change, speed, acceleration,

and trafﬁc density to the embedding.

These metrics not only provide insight into the

speciﬁc behavioural characteristics of each cluster,

but also help validate the plausibility of behavioural

scenario representations, as clusters are expected to

reﬂect infrastructure characteristics, ego-object dy-

namics, as well as manoeuvres and interaction pat-

terns.

Both the description and the retrieval performance

of the proposed approach are validated using a set of

33 different user queries related to behavioural sce-

nario analysis (Table 1). Speciﬁcally, these queries

are categorised as follows:

• General Scene: Retrieval of scenarios based on

general properties such as trafﬁc density or base

infrastructure such as amount of lanes.

• Infrastructure Manoeuvres: Focused retrieval

based on manoeuvre interactions with infrastruc-

ture elements such as lane markings (e.g. a lane

change).

• Ego Manoeuvres: Retrieval based on trajectory

characteristics and behaviour of the trajectory it-

self, which can be broken down into motion prim-

itives such as speed and acceleration.

• Object Interaction: Retrieval focused on inter-

actions between road users, capturing multi-agent

dynamics.

• Combinatorial Scenarios: Retrieval based on

combinations of general, infrastructure, ego ma-

noeuvres and interactions, such as simultaneous

high density trafﬁc and lane change manoeuvres.

Scenario retrieval performance is assessed using

the Precision@k (prec@k) metric, measured across

different retrieval categories (General, Infrastructure,

Ego, Interaction and Combinatorics). Speciﬁcally, the

prec@k metric is used to determine the relevance of

the top ﬁve retrieved samples for each query. The av-

erage prec@k value for all queries belonging to the

same category is used for the analysis.

5 RESULTS

5.1 Analysis of Geometric Embedding

In order to evaluate the properties and feasibility of

geometric embedding using HD, a comparison with

an MSE-based embedding is performed.

(a)

(b)

2.5

3.5

(c)

(d)

1.5

2.5

(e)

1.5

2.5

(f)

Figure 2: Trajectory embeddings using MSE (a) and HD

(HD) (b), along with their corresponding embeddings of

scenario features such as the number of lanes (c, d) and lane

changes (e, f).

5.1.1 General Structure of the Embedding

Space and Resulting HDBSCAN Clusters

The MSE and HD embeddings, along with their re-

spective clustering results, show differences in cluster

granularity, as seen in the clustering results produced

by the HDBSCAN algorithm (Fig. 2a, 2b). The iden-

tiﬁed clusters show differences in coarseness between

the embedding methods, with a notable difference in

the number of clusters observed in each embedding

type, which is 8 for MSE and 49 for HD.

5.1.2 Mapping of Behavioural Scenario Features

to the Embedding Space

To illustrate the separability of features within clus-

ters, two features are mapped: one general (number

of lanes) and one trajectory-speciﬁc (lane change). In

both embeddings, clusters representing different lane

numbers are visually distinct (Fig. 2c, 2d). The map-

ping of lane changes shows different distributions,

with the MSE embedding showing a randomly dis-

tributed pattern of samples, whereas the HD embed-

ding shows these changes along the edges of visu-

ally separated clusters, as seen in the grey HDBSCAN

cluster (Fig. 2e, 2f).

VEHITS 2025 - 11th International Conference on Vehicle Technology and Intelligent Transport Systems

(a)

(b)

0.2

0.4

0.6

0.8

1.2

(c)

−10

−5

(d)

Figure 3: HD embeddings of driving scenario features, including mean velocity in [m/s] (a), number of road users (b), lateral

velocity change in [m/s²] (c) and longitudinal velocity change in [m/s²] (d).

Given these observations, while the MSE embed-

dings can be used to identify general attributes, the

ability to identify ﬁne-grained details of the trajecto-

ries is signiﬁcantly compromised. Consequently, only

the HD embedding is considered for the subsequent

analysis.

5.2 Driving Scenario Features

To analyse the interpretability and searchability of

the HD embedding, additional previously extracted

features such as mean velocity, number of objects,

lateral velocity change and acceleration are mapped

onto the embedding visualisation. These features are

also cross-referenced with the descriptions generated

by the LLM for the corresponding clusters.

The mean velocity evaluation shows a clear gradi-

ent across the clusters, with low speed scenarios con-

centrated on the left and high speed conditions on the

right (Fig. 3a).

The number of road users within a scenario fur-

ther supports this division, as a higher density of road

users appears on the left side of the visualisation (Fig.

3b). In particular, a dense cluster at the leftmost tip

of the embedding, characterised by a high number of

road users and low speed, suggests trafﬁc congestion.

This observation is consistent with the LLM descrip-

tions, which highlight congestion and disrupted ﬂow

in this area. Conversely, the right side of the visual-

isation, characterised by fewer road users, correlates

with high speed scenarios and continuous trafﬁc ﬂow.

Changes in lateral velocity are more pronounced

at the periphery of the clusters (Fig. 3c), consis-

tent with the HD embedding’s identiﬁcation of lane

changes, particularly at the edges (Fig. 2f). The LLM

descriptions support this by noting lane changes and

multi-lane adjustments within the edges of the respec-

tive main clusters.

The longitudinal speed changes show the dynam-

ics of acceleration and deceleration across the sce-

narios (Fig. 3d). Acceleration values are concen-

trated on the left side of the embedding, where lower

General Scene

Infrastructure Manoeuvre

Ego Manoeuvre

Object Interaction

Combination

1 1 1 1

0.9

0.94

0.8

0.67 0.6

0.6

0.72 0.73

0.6

0.47

Figure 4: Retrieval performance of the LLM for behavioural

queries and their combinatorics with three levels of detail

(average prec@5 over several queries).

speeds dominate, allowing greater acceleration poten-

tial. Conversely, deceleration values are concentrated

in the right-hand clusters, corresponding to the high-

speed conditions. This inversion, where lower speeds

allow greater acceleration and higher speeds involve

greater deceleration, is consistent with typical driving

behaviour, as conﬁrmed by the LLM scenario descrip-

tions, which capture this contrasting pattern between

the low-speed, acceleration-rich areas and the high-

speed, deceleration-dominant clusters.

5.3 Retrieval Analysis

For retrieval analysis, the LLM is queried at three lev-

els of detail: L1, L2 and L3. Example queries are

listed in Table 1. The query results are shown in Fig-

ure 4, where darker tiles indicate better model per-

formance, with 0% being the lowest score and 100%

being the best possible score. As the prec@k is calcu-

lated based on the top ﬁve retrieved samples, values

up to 100% are possible for the average prec@k value

of several queries.

The performance of the model varies with the

complexity of the queries. At the lowest level of de-

tail (L1), simple queries, such as keeping the speed of

the ego trajectory constant, give an average prec@k of

98%. At the medium complexity level (L2), queries

such as ” pass-by from left” result in an average

Large Language Model-Informed Geometric Trajectory Embedding for Driving Scenario Retrieval

Table 1: Overview of query types for trajectory retrieval: Categorisation of General Scene, Infrastructure Manoeuvre, Ego

Manoeuvre, Object Interaction and Combination queries by level of complexity (L1, L2, L3). Text in italic denotes place-

holders for speciﬁc values (e.g., direction = left, speed value = 20 km/h)

Description Objective L1 Queries L2 Queries L3 Queries

General Scene

Retrieve trajectories of vehicles on a two-

lane highway; three-lane highway sce-

nario

Retrieve trajectories in moderate trafﬁc

density; congested trafﬁc conditions

Retrieve trajectories with high trafﬁc den-

sity in lane number; heavy trafﬁc in mid-

dle and left lanes with free trafﬁc ﬂow in

right lane

Infrastructure Manoeuvre

Retrieve trajectories of vehicles maintain-

ing lane number

Retrieve trajectories involving lane

changes at speeds above speed value;

performing aggressive lane changes

Retrieve trajectories showing lane change

to the direction; multi-lane changes of

different road users at the same time

Ego Manoeuvre

Retrieve trajectories of vehicles travelling

at a constant speed of speed value

Retrieve trajectories with acceleration;

deceleration; lateral movement without

lane change

Retrieve stop-and-go trajectories; acceler-

ation up to speed value; braking to below

speed value

Object Interaction

Retrieve trajectories following an object

at speed value; approaching an object at

speed value

Retrieve trajectories with pass-by ma-

noeuvre from the direction; cut-in at

speeds above speed value

Retrieve trajectories with evasive ma-

noeuvre; overtaking; merging into high-

way trafﬁc in high density

Combination

Retrieve trajectories following a lane on a

two-lane highway with high acceleration

Retrieve trajectories showing lane change

after pass-by in moderate trafﬁc

Retrieve trajectories with evasive ma-

noeuvre to the right to lane lane number

at high speeds; multi-lane changes of one

road user in heavy trafﬁc

prec@k of 80.2%, indicating the ability to cover ﬁne

granularities in the description of behavioural sce-

narios. At the highest level of complexity (L3),

queries such as ”evasive manoeuvre to the right at

high speed” are challenging and result in an average

prec@k of 62.4%.

Overall, the Infrastructure Manoeuvre category

achieves the highest performance across all levels,

with an average prec@k of 88.7%. Conversely, the

Combination category shows the lowest performance,

with an average prec@k of 65.7%.

An extract of the detailed output from the LLM in

response to a natural language query is shown in Table

2, which includes the exact query, the corresponding

retrieved trajectory, and a condensed response from

the LLM, shortened for total length.

Qualitative analysis of these responses reveals the

model’s deep overall understanding of the objects in-

volved and their relationships by querying the knowl-

edge graph. The LLM provides detailed references

within the knowledge graph, specifying information

such as concrete object IDs and precise speed and ac-

celeration values relevant to the queries. In partic-

ular, the model demonstrates high accuracy by pro-

viding the vehicle ID and speciﬁc time frames for

each queried manoeuvre. In addition, the LLM out-

put includes direct indices to data sources relevant to

the query context, supporting traceability across data

layers. Responses to complex queries remain clear

and comprehensive, providing descriptions that cap-

ture both the sample retrieved and its contextual pa-

rameters.

6 EVALUATION

The embedding analysis results indicate that the HD

embedding is highly effective for trajectory data.

Compared to the MSE embedding, the HD embed-

ding shows better performance in modelling and clus-

tering trajectories, allowing for ﬁne-grained differen-

tiation of attributes such as lane change behaviour and

speciﬁc ego manoeuvres. In contrast, MSE embed-

ding primarily captures broader scenario characteris-

tics and lacks the resolution to distinguish between

detailed manoeuvres. Beyond the more distinct clus-

ters, the applicability of the HD embedding is re-

inforced by consistent performance across different

visualisations, such as the clear correlation between

high-speed scenarios and low object counts, and vice

versa. This geometric embedding approach facili-

tates the targeted search for behavioural scenarios and

supports nuanced trajectory analysis. The generated

LLM descriptions include the previously extracted

features in their descriptions at the same locations in

the same clusters as visualised in the embedding. This

further demonstrates the feasibility of the three-stage

cluster description by prompting an LLM for the pur-

poses of interpretability and searchability.

Evaluating the retrieval performance, with an av-

erage prec@k of 80.2%, the presented approach

shows that even higher levels of detail of the general

scene, infrastructure, ego manoeuvres as well as ob-

ject interactions can be described and retrieved. On

the other hand, the high degree of combinatorics in

the queries can be a challenge, with performance de-

grading towards higher levels of detail, affecting the

overall performance.

VEHITS 2025 - 11th International Conference on Vehicle Technology and Intelligent Transport Systems

Table 2: Detailed overview of trajectory retrieval queries: Examples of query types, corresponding top retrieved trajectories

and shortened responses from the LLM, highlighting different driving manoeuvres and trafﬁc conditions in real-world highway

scenarios.

Query Top Retrieved Trajectory Shortened Answer of LLM

Retrieve trajectories in

dense trafﬁc on the right

lane with low speed.

(General Scene, L3)

xVelocity

In several highway scenarios, Lane 0, which is the rightmost lane, ex-

hibits dense trafﬁc with vehicles maintaining low speeds. Vehicles

in Lane 0 generally show consistent following behavior, maintaining

steady speeds and positions without signiﬁcant lane changes or aggres-

sive maneuvers. For instance, Vehicle ID 300 travels at below 9.3 m/s,

maintaining a consistent lane-following behavior without signiﬁcant

speed variation [Data: Entities (981)]. Similarly, Vehicle 369 travels

slower at around 15.3 m/s, slightly decelerating over time and main-

taining its position in Lane 0, indicating moderate trafﬁc density [Data:

Entities (963)]. Additionally, Vehicle 678 follows a steady path with

incremental velocities, further indicating a following behavior in Lane

0 [Data: Entities (193)]. These observations highlight the dense and

low-speed trafﬁc conditions in Lane 0, contributing to the overall trafﬁc

ﬂow on the highway [Data: Reports (226); Entities (1480, 198, 347);

Relationships (697, 743, 741, 1839, 2039)].

Retrieve trajectories of a

lane change to the left.

(Infrastructure Maneuver, L2)

xVelocity

Several vehicles in the dataset exhibit lane changes to the left, indicat-

ing maneuvers from a higher-numbered lane to a lower-numbered lane.

Notable examples include: Vehicle ID 273: Changes from Lane 1 to

Lane 2 at frame 2990, showing a leftward shift in trajectory [Data: Re-

ports (147); Entities (1956); Relationships (663, 1583, 3782)]. Vehicle

ID 2286: Moves from Lane 2 to Lane 1 at frame 27404, indicating a

smooth, controlled transition [Data: Reports (147); Entities (304, 309);

Relationships (2672, 128)]. Vehicle ID 2295: Lane change from Lane

1 to Lane 2 at frame 27488, demonstrating a smooth high-speed ma-

neuver [Data: Reports (147); Entities (307, 310); Relationships (2674,

129)]. Vehicle ID 1383: Leftward shift from Lane 1 to Lane 2 around

frame 16407[Data: Entities (630); Relationships (240)]. Vehicle ID

1176: Changes from Lane 1 to Lane 2 at frame 14070, possibly indicat-

ing an overtaking maneuver or adaptive lane positioning [Data: Entities

(90); Relationships (1099)].

Retrieve trajectories of a

multi lane change to the

left.

(Infrastructure Maneuver, L3)

xVelocity

Vehicle 210 initiates in Lane 2 and shifts to Lane 0 around frame 2321,

with slight speed changes before and after the lane change, indicating

a maneuver to overtake or evade slower trafﬁc [Data: Entities (4); Re-

lationships (15, 17, 663, 1583, 3782)]. Vehicle 2186 starts in lane 0

with a velocity around

31.9 m/s initially and changes to lane 2 around

frame 2165, demonstrating a lane change maneuver without any signif-

icant speed variation [Data: Entities (1414); Relationships (478, 2763,

3955, 3962, 3969)]. Vehicle 273 changes lanes from Lane 2 to Lane 0

around frame 2990, indicated by a laneId change and a minor deviation

in y-velocity [Data: Entities (1956); Relationships (663, 1583, 3782)].

Vehicle 1994 starts in lane 0 and transitions to lane 1 with a higher xVe-

locity around 33.58 and noticeable x-Velocity, indicating a lane change

from 0 to 2, possibly for evasive maneuver [Data: Entities (1946); Re-

lationships (4371, 2107, 2109)].

Retrieve trajectories of a

aggressive lane change

over multiple lanes to the

right.

(Infrastructure Maneuver, L3)

xVelocity

In the provided data, Vehicle 214 exhibits aggressive lane-changing be-

havior over multiple lanes to the right. Vehicle 214 maintains a high

speed ranging from 35.65 to 37.17 m/s and shifts between Lanes 0,

1, and 2 over various frames, indicating multiple lane changes. These

maneuvers suggest attempts to overtake other vehicles or adjust to vary-

ing trafﬁc densities, reﬂecting a responsive and adaptive driving pattern

[Data: Reports (261); Entities (6); Relationships (21, 20, 22, 24, 19,

+more)].

Retrieve trajectories of an

evasive maneuver to the

right with a speed over

70 km/h.

(Combination, L3)

xVelocity

The data provided indicates the presence of evasive maneuvers to the

right by vehicles with speeds over 20 m/s. Within the highway scene,

several vehicles exhibit high-speed driving with speciﬁc instances of

abrupt lane changes or swerving actions that qualify as evasive maneu-

vers. For instance: Vehicle 214, and Vehicle 221 are noted for high-

speed driving and dynamic lane-changing behaviors, which include

evasive actions intended to avoid potential collisions. These maneu-

vers go beyond strategic overtaking or adaptive driving, as they involve

rapid adjustments in trajectory to evade other vehicles. Additionally,

the entity labeled ”EVADE” conﬁrms instances of evading within the

dataset, speciﬁcally highlighting scenarios where evasive actions were

observed. This entity references particular relationships and data points

(Entities 32 and 867; Relationships 1810) that document recorded eva-

sive maneuvers to the right by vehicles traveling over 20 m/s.

At the lowest levels of detail (L1 and L2), the

LLM retrieves scenarios with high accuracy, and the

descriptions it generates closely match the queried

behaviours. This high accuracy indicates that the

LLM effectively understands simple and moderately

detailed scenarios, capturing key trajectory features

and behavioural cues with little difﬁculty.

However, performance drops for the most com-

plex queries (L3). This drop is most noticeable

for queries that capture combined manoeuvres over

Large Language Model-Informed Geometric Trajectory Embedding for Driving Scenario Retrieval

longer periods of time, such as stop-and-go scenarios

or long overtaking manoeuvres. One reason for this

performance drop could be the segmentation of the

data into 15 second sequences. As a result, these sce-

narios cannot be adequately queried as they are not

properly embedded within this short time window. In

these cases, the LLM reports that there are few or no

such instances in the dataset, again demonstrating a

good understanding but limiting its retrieval perfor-

mance score for these queries. Choosing variable time

windows to capture both short and long term manoeu-

vre development could help to capture these manoeu-

vres with the presented method. Additional perfor-

mance challenges are particularly notable within the

Combination category, which captures combinatorial

queries of multiple behavioural elements. In many

cases, the response contains speciﬁc subsets of the

combinatorics, but often struggles to capture all re-

lated aspects for highly combinatorial queries at L2

and L3.

Qualitative analysis of the model responses shows

a comprehensive understanding of the LLM in terms

of the data samples and the associated clusters in the

geometric embedding space. By providing both data

and knowledge graph references, coupled with a full

text answer and detailed reasoning about the retrieved

scenarios, interpretability is facilitated.

7 CONCLUSION AND FUTURE

WORK

This paper presents a comprehensive approach for re-

trieving behavioural scenarios on unlabelled trajecto-

ries from real driving data. The geometric embed-

ding based on HD is able to capture detailed scenario

attributes such as infrastructure, speciﬁc features of

object trajectories as well as their interactions. The

guided description of the trajectory space by GPT-4o

combined with the indexing by a GraphRAG pipeline

allows users to query and analyse the generated repre-

sentation with natural language and behavioural sce-

nario descriptions without prior annotation and addi-

tional information extraction. The open context na-

ture of the LLM, by providing an open vocabulary,

allows queries that do not necessarily need to be con-

sidered prior to data extraction.

Future research could focus on improving the pro-

posed approach in several ways. First, the introduc-

tion of different segment lengths can be considered

to accommodate different granularities in manoeuvre

combinations, which would address potential perfor-

mance issues, such as the observed challenges in stop-

and-go scenarios as longer length takeovers. Subse-

quently, the use of alternative models to VRAE, such

as Transformers, may also prove beneﬁcial for ex-

tended context representation. This can be coupled

with the inclusion of additional scenario parameters

based on the 6LM for enhanced context representa-

tion and search. The database should be extended

beyond the highway scenarios in the highD dataset,

such as intersections, urban scenarios and other real-

world use cases. To evaluate the scenario retrieval

performance, ground truth datasets should be created

that allow for additional performance metrics such as

Recall@k. This can be combined with the evalua-

tion of different alternative prompting and RAG ap-

proaches to improve retrieval speciﬁcally for combi-

natorial queries. Besides the HD which introduces a

spatial distance, additional metrics should be evalu-

ated with respect to the resulting embedding space for

driving scenarios, speciﬁcally including temporal and

spatial distances. Finally, the method should be eval-

uated and improved on the basis of user studies with

the involvement of V&V engineers, investigating the

most important types of queries as well as drawing the

relation to the application related to the SOTIF stan-

dard.

REFERENCES

Edge, D., Trinh, H., Cheng, N., Bradley, J., Chao, A., Mody,

A., Truitt, S., and Larson, J. (2024). From local to

global: A graph rag approach to query-focused sum-

marization. arXiv preprint arXiv:2404.16130.

Elspas, P., Klose, Y., Isele, S. T., Bach, J., and Sax, E.

(2021). Time series segmentation for driving scenario

detection with fully convolutional networks. In VE-

HITS, pages 56–64.

Elspas, P., Langner, J., Aydinbas, M., Bach, J., and Sax,

E. (2020). Leveraging regular expressions for ﬂexible

scenario detection in recorded driving data. In 2020

IEEE International Symposium on Systems Engineer-

ing (ISSE), pages 1–8. IEEE.

Hoseini, F., Rahrovani, S., and Chehreghani, M. H. (2021).

Vehicle motion trajectories clustering via embedding

transitive relations. In 2021 IEEE International In-

telligent Transportation Systems Conference (ITSC),

pages 1314–1321.

International Organization for Standardization (2022).

Road vehicles - safety of the inteded fuctionality. ISO

21448:2022(E). ICS: 43.040.10.

Krajewski, R., Bock, J., Kloeker, L., and Eckstein, L.

(2018). The highd dataset: A drone dataset of natural-

istic vehicle trajectories on german highways for val-

idation of highly automated driving systems. In 2018

21st International Conference on Intelligent Trans-

portation Systems (ITSC), pages 2118–2125.

McInnes, L., Healy, J., and Astels, S. (2017). hdbscan: Hi-

VEHITS 2025 - 11th International Conference on Vehicle Technology and Intelligent Transport Systems

erarchical density based clustering. Journal of open

source software, 2(11):205.

Montanari, F., German, R., and Djanatliev, A. (2020). Pat-

tern recognition for driving scenario detection in real

driving data. In 2020 IEEE Intelligent Vehicles Sym-

posium (IV), pages 590–597. IEEE.

OpenAI, I. (2024). Hello gpt-4o. https://openai.com/index/

hello-gpt-4o/. Accessed: 2024-08-29.

Ries, L., Rigoll, P., Braun, T., Schulik, T., Daube, J., and

Sax, E. (2021). Trajectory-based clustering of real-

world urban driving sequences with multiple trafﬁc

objects. In 2021 IEEE International Intelligent Trans-

portation Systems Conference (ITSC), pages 1251–

1258.

Scholtes, M., Westhofen, L., Turner, L. R., Lotto, K.,

Schuldes, M., Weber, H., Wagener, N., Neurohr, C.,

Bollmann, M. H., K

ortke, F., et al. (2021). 6-layer

model for a structured description and categoriza-

tion of urban trafﬁc and environment. IEEE Access,

9:59131–59147.

Sohn, T. S., Dillitzer, M., Ewecker, L., Br

uhl, T., Schwa-

ger, R., Dalke, L., Elspas, P., Oechsle, F., and Sax, E.

(2024). Towards scenario retrieval of real driving data

with large vision-language models. In 10th Interna-

tional Conference on Vehicle Technology and Intelli-

gent Transport Systems (VEHITS 2024), pages 496–

505.

Sonntag., M., Vater., L., Vuskov., R., and Eckstein., L.

(2024). Detecting edge cases from trajectory datasets

using deep learning based outlier detection. In Pro-

ceedings of the 10th International Conference on Ve-

hicle Technology and Intelligent Transport Systems -

VEHITS, pages 31–39. INSTICC, SciTePress.

Watanabe, H., Tobisch, L., Rost, J., Wallner, J., and Prokop,

G. (2019). Scenario mining for development of

predictive safety functions. In 2019 IEEE Interna-

tional Conference on Vehicular Electronics and Safety

(ICVES), pages 1–7. IEEE.

Wei, J., Wang, X., Schuurmans, D., Bosma, M., ichter,

b., Xia, F., Chi, E., Le, Q. V., and Zhou, D. (2022).

Chain-of-thought prompting elicits reasoning in large

language models. In Koyejo, S., Mohamed, S., Agar-

wal, A., Belgrave, D., Cho, K., and Oh, A., editors,

Advances in Neural Information Processing Systems,

volume 35, pages 24824–24837. Curran Associates,

Inc.

Large Language Model-Informed Geometric Trajectory Embedding for Driving Scenario Retrieval