Abstract
Trajectory prediction for moving objects is a critical task for intelligent transportation with numerous applications, such as route planning, traffic management, congestion alleviation, etc. In this paper, we propose a novel framework that integrates sequence modeling, trajectory clustering and topology extraction to improve the accuracy of trajectory prediction. By incorporating selfattention for sequence modeling, we are able to effectively capture the temporal dependencies in trajectory data. Additionally, by taking into account the clustering information via a variational autoencoder and the topological information based on a graphical neural network (GNN), we can further improve the accuracy of trajectory prediction. Furthermore, integrating a GNN facilitates our framework to handle diverse characteristics of road networks, such as road distance and traffic status, thereby making the proposed approach adaptive to different practical scenarios. As demonstrated by the experimental results on two publicly available datasets, our proposed method improves the accuracies by up to 0.5% and 3.8% for 1step and 15step predictions respectively, compared to the stateoftheart method.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
Intelligent transportation system (ITS) is a popular topic of broad interests and it aims to create safe, efficient and smart transportation networks [1,2,3,4,5]. As an essential element of ITS, trajectory prediction estimates the future locations of moving objects based on their historical trajectories [6, 7], and it has been extensively applied to a variety of locationbased applications, including traffic management, route planning, locationbased advertising, etc. [8,9,10,11,12].
Due to its importance, numerous methods have been proposed for trajectory prediction in the literature [10, 13,14,15,16,17,18]. These traditional approaches can be broadly classified into two categories: (i) Markovchainbased methods [10, 13, 14], and (ii) deeplearningbased methods [15, 16, 19]. In Markovchainbased methods, each trajectory is represented by a Markov process, and the next location is predicted by maximizing the transition probability from the current location to the destination. Despite its success in shortterm predictions, a loworder Markov chain cannot accurately capture the longterm dependence of different locations in a trajectory [10]. On the other hand, a highorder Markov model often requires a large volume of historical data to learn its transition probabilities, and collecting these training data can be prohibitively expensive or even impossible in practice. In deeplearningbased methods, sequential networks, including recurrent neural network (RNN) [15] and longshortterm memory (LSTM) network [16, 20, 21], have been adopted to predict the future location of interest based on all historical locations in a trajectory. As such, the issue of longterm dependence can be appropriately addressed. To enhance the representation of trajectories in numeric space, methodologies such as Variational Autoencoder (VAE) are employed to extract trajectory embeddings [11, 18, 22].
Two important aspects must be carefully taken into account to facilitate accurate trajectory prediction in practice: (i) trajectory clustering, and (ii) topological information. First, different moving objects may carry different characteristics and, hence, they must be accurately captured by different prediction models [23]. Motivated by this observation, a number of clustering methods, including kmeans [24], densitybased spatial clustering of applications with noise (DBSCAN) [25], scalable clustering [26], etc., have been proposed to partition all trajectories into multiple clusters. These clustering methods must appropriately define a similarity measure to quantitatively assess the distance between two given trajectories. To this end, a variety of techniques for distance evaluation have been developed, including Bhattacharyya distance [27], Euclidean distance [28], Hausdorff distance [25], longest common subsequence (LCSS) [29], dynamic time warping (DTW) [30, 31], etc.
Most traditional clustering methods treat the overall trajectory as a single sample and they cannot appropriately capture the complex “subpatterns” that are formed by one or multiple subsequences of transitions within a trajectory. For example, consider two different routes between two locations, referred to as “Route A” and “Route B” respectively. A vehicle may travel between these two locations via any of the two routes. If a vehicle first travels on “Route A” and then switches to “Route B”, its trajectory, referred to as “Route C”, is different from either “Route A” or “Route B”, even though it shares similar subpatterns with them. In this case, “Route C” may be assigned to a separate cluster and, consequently, we cannot take advantage of its similarity with “Route A” or “Route B” for trajectory prediction.
In addition, most traditional methods make a “hard” decision on their clustering results. Namely, each sample (i.e., trajectory) is exclusively assigned to a single cluster and does not belong to any other clusters. In many practical applications, a set of samples, especially those on cluster boundaries, may carry intermediate characteristics over multiple clusters and cannot be definitely assigned to one cluster [32]. Furthermore, the aforementioned “hard” decision often results in extremely imbalanced clusters, where several “minority” clusters may be composed of very few samples. It is nontrivial to learn an accurate prediction model for these minority clusters without a sufficient number of training samples [25].
Second, it is crucial to incorporate topological information for trajectory prediction in order to enhance the prediction accuracy [15]. In practice, the locations of moving objects are constrained by road networks. Furthermore, the important characteristics of road segments (e.g., connectivity, distance, traffic, etc.) often play a critical role on optimal routing and, consequently, the optimal trajectories for movement. In the literature, an RNN with constrained state space has been proposed to exploit topological information by restricting the next road segment to the neighbors of current segment [15]. Such a method, however, only takes into account the road connectivity and cannot be easily generalized to accommodate other important topological characteristics such as road lengths.
However, integrating the aforementioned information into a cohesive framework is not a trivial task. Several significant challenges must be carefully addressed. Firstly, it is paramount to represent trajectory data in a format conducive to clustering, topological analysis, and sequence modeling. Such a representation should effectively capture the inherent characteristics of trajectories while being compatible with the specific requirements associated with individual algorithms. Secondly, managing the complexity for integrating diverse techniques is another challenge. Balancing the intricacies of each methodology while maintaining the overall coherence and efficiency requires careful considerations.
In this paper, we propose a novel integrated framework for accurate trajectory prediction. In particular, we utilize a unified attention model to extract representations for both trajectories and road segments and adopt a simpleyeteffective strategy of division and integration. Our proposed approach is composed of three major components: (i) trajectory clustering, (ii) topology extraction, and (iii) sequence modeling. We first represent the past transitions ending at each road segment by an embedding vector that is derived by selfattention [33, 34]. Next, we VAE where the prior distribution of latent variables is approximated by a Gaussian mixed model (GMM) to cluster these embeddings [35, 36]. Unlike the traditional clustering methods that focus on complete trajectories, VAEs aim to cluster past transitions within individual road segments. In addition, VAE yields a “soft” decision on its clustering results, where each sample is assigned to multiple clusters with different probabilities.
Furthermore, the road network of interest is modeled as a graph, and a graph neural network (GNN) is subsequently adopted to extract the corresponding topological information [37,38,39]. Unlike the traditional RNN approach that is specifically designed to consider road connectivity only, our GNN method is generalpurpose and can incorporate all topological knowledge (e.g., road connectivity, road lengths, traffic status, etc.) into our proposed model. Finally, the clustering results and the topological information are seamlessly combined together to learn a sequence model for trajectory prediction.
In summary, our major contributions in this paper are as follows. All steps of our proposed approach are based on deep learning, enabling rapid execution on GPUs, and are scalable to largesize datasets.

1.
We propose a unified framework for accurate trajectory prediction by seamlessly integrating trajectory clustering, topological extraction and sequence modeling.

2.
We develop a novel trajectory clustering approach based on VAE, which yields soft cluster assignments and can efficiently take into account subpatterns within trajectories.

3.
We derive an efficient generalpurpose approach to embed highlydiversified topological information based on GNN.
The remainder of this paper is organized as follows. We briefly review several related works in Section 2, and describe the relevant preliminaries in Section 3. The proposed method is developed in Section 4. The efficacy of our proposed method is demonstrated by two realworld datasets in Section 5. Finally, we conclude in Section 6.
2 Related works
In this section, we delve deeper into existing methodologies, providing a comprehensive exploration of these methods in the literature, and consequently, motivate our proposed work.
2.1 Trajectory prediction
Most traditional approaches can be broadly classified into two categories: (i) Markovchainbased methods, and (ii) deeplearningbased methods. A Markov chain model is composed of a set of discrete states and the transition probabilities between these states. In the context of trajectory prediction, each discrete state represents a candidate location, and the transition probability between two states stands for the likelihood of a vehicle moving from one location to the other. The transition probabilities are learned by counting the numbers of historical transitions [10, 13, 14]. Given the Markov chain model, the future location is predicted by maximizing the transition probability of the next movement.
On the other hand, RNN [15] and LSTM [16, 40, 41] have been adopted for trajectory prediction based on deep learning. In these deep learning models, each location is represented by an embedding vector. The embedding vectors of historical locations are integrated together to yield a processed vector, representing the encoded information of the historical trajectory. Such a vector is further fed into a classifier to predict the next location.
Despite their great success in trajectory prediction, a single model is often insufficient to capture all possible scenarios in practice, as it learns common patterns among similar trajectories, but ignores the individual characteristics of different trajectories. It, in turn, motivates the idea of trajectory clustering where similar trajectories are grouped together to form a cluster and a unique prediction model is learned for each cluster [10].
2.2 Trajectory clustering
A variety of methods have been proposed for trajectory clustering based on different distance metrics [23]. In [27], each trajectory is represented by a distribution related to the movement directions, and Bhattacharyya distance is calculated for two trajectories based on their corresponding distributions. In [28], Euclidean distance is defined for two trajectories as the average of the pairwise distances between the corresponding segments. These methods assume that all trajectories should have equal lengths, which is usually impractical. To address this issue, Hausdorff distance is proposed in [25]. It is defined as the maximum distance between any two segments of two trajectories. LCSS distance is further proposed in [29], where the similarity between two trajectories is measured by the ratio of the length of their longest common subsequence over the length of the shorter trajectory. DTW [30, 31] is an alternative approach, where an optimal match between two trajectories is first derived and then the distance is measured between the two optimally matched sequences.
Once the distance metric is appropriately defined, most traditional clustering methods can be straightforwardly applied. In [24], kmeans is adopted based on LCSS distance. In [25], DBSCAN is applied based on Hausdorff distance. These clustering methods are not scalable for largescale datasets. For this reason, a scalable framework is proposed in [26], where the trajectories are first undersampled to form a small dataset, and then trajectory clustering is implemented on the small dataset by using the visual assessment of tendency (VAT) method. Next, a representing trajectory is selected from each cluster, and each trajectory in the original dataset is assigned to its optimal cluster by minimizing the distance between the given trajectory and the representing trajectory from the optimal cluster.
The traditional methods for trajectory clustering assign each trajectory to a single cluster. Such a “hard” decision often results in nonoptimal assignments for those trajectories sitting on the cluster boundaries. In addition, it may lead to extremely imbalanced clusters, where trajectory prediction becomes nontrivial for the “minority” clusters as they are composed of very few trajectories.
2.3 Topological extraction
The authors of [15] have taken into account the topological information by restricting the next road segment to the neighbors of current segment. However, other important topological characteristics such as road lengths are ignored. Alternatively, GNN has been adopted for topological extraction [37]. From its physical essence, a road network can be naturally modeled as a graph with nodes representing road segments and edges representing road connections. To incorporate other topological information, each node or edge can be assigned with a vector composed of its features such as the degree of connectivity and edge length. These feature vectors can be aggregated to generate highlevel representations by using graph convolution operations [42]. A comprehensive survey on GNN for topological extraction can be found in [37].
In summary, conventional methods for trajectory prediction face significant challenges in achieving optimal trajectory clustering and effectively handling diverse topological information. A prominent limitation arises from the conventional approach for trajectory clustering, which often overlooks subtrajectories within clusters, leading to coarsegrained groups. Additionally, the adoption of “hard clustering” methods often produces imbalanced trajectory groups, especially to the detriment of minority clusters. Furthermore, the deficiency of conventional methods in integrating comprehensive topological information hampers their ability to fully utilize available data, including crucial factors such as road lengths and traffic conditions, critical for accurate trajectory prediction. These observations motivate us to propose a novel method to appropriately address the aforementioned issues.
3 Preliminaries
In this section, we describe the relevant preliminaries that are of importance to our proposed approach.
3.1 Attentionbased sequence modeling
An attentionbased model is formed by stacking an embedding layer, M encoding blocks and an output layer [33]. Denote the lth state of the nth sequence as \(s_{n,l}\). The embedding layer converts a given \(s_{n,l}\) to a Ddimensional embedding vector \(\textbf{e}_{n,l}^{\left( 1\right) }\in \mathbb {R}^D\):
where \(\textbf{s}_{n,l}\in \mathbb {R}^Q\) is a onehot encoded vector with the \(s_{n,l}\)th element equal to 1 and all other elements equal to 0, and \(\textbf{W}_\text {EMB}\) is the embedding matrix that should be learned during the training process. In (1), \(\textbf{e}_{n,l}^{\left( 1\right) }\) is essentially equal to the \(s_{n,l}\)th column of \(\textbf{W}_\text {EMB}\).
Each encoding block is composed of an attention layer and a linear layer. An attention layer consists of H attention heads. Note that the value of H must be appropriately chosen so that D is a multiple of H [33]. For an attention head, each element in the output sequence is a weighted summation of the elements in its input sequence, where the weights determine the significance of each element in the input sequence.
Figure 1 shows the network structure for the hth head of the mth attention layer. Its input sequence is composed of \(L_n\) embedding vectors \(\left\{ \textbf{e}_{n,1}^{(m)},\textbf{e}_{n,2}^{(m)},\cdots ,\textbf{e}_{n{,L}_n}^{(m)}\right\} \). Each vector \(\textbf{e}_{n,l}^{(m)}\) is linearly transformed into a key vector \(\textbf{k}_{n,l}^{(m,h)}\in \mathbb {R}^{D_A}\), a query vector \(\textbf{q}_{n,l}^{(m,h)}\in \mathbb {R}^{D_A}\) and a value vector \(\textbf{v}_{n,l}^{(m,h)}\in \mathbb {R}^{{D}/{H}}\):
where \(\textbf{W}_K^{\left( m,h\right) }\), \(\textbf{W}_Q^{\left( m,h\right) }\) and \(\textbf{W}_V^{\left( m,h\right) }\) are the transformation matrices that should be learned during the training process. Note that both \(\textbf{k}_{n,l}^{(m,h)}\) and \(\textbf{q}_{n,l}^{(m,h)}\) have the dimension of \(D_A\) where the subscript A implies that \(D_A\) is an attentionrelated quantity.
Since the lth element \(\textbf{o}_{n,l}^{\left( m,h\right) }\) in the output sequence corresponds to the prediction at the lth step, it should only depend on the previous l elements in the input sequence. By adopting the attention mechanism, \(\textbf{o}_{n,l}^{\left( m,h\right) }\) can be represented as:
In (3), each weight \(\alpha _{n,l,\ i}^{\left( m,h\right) }\) is computed by:
where \(\langle \bullet , \bullet \rangle \) denotes the inner product of two vectors.
Such an attention mechanism is repeated for H time, where each attention head attends to the input sequence with different transformation matrices \(\textbf{W}_K^{\left( m,h\right) }\),\(\textbf{W}_Q^{\left( m,h\right) }\) and \(\textbf{W}_V^{\left( m,h\right) }\). The outputs from all attention heads are concatenated to generate the integrated information \(\left\{ \textbf{a}_{n,1}^{(m)},\textbf{a}_{n,2}^{(m)},\right. \) \(\left. \cdots ,\textbf{a}_{n,L_n1}^{(m)}\right\} \) where each element \(\textbf{a}_{n,l}^{(m)}\) is expressed as:
Figure 2 shows the network structure of the mth encoding block. After the input sequence \(\left\{ \textbf{e}_{n,1}^{(m)},\textbf{e}_{n,2}^{(m)},\cdots ,\textbf{e}_{n{,L}_n}^{(m)}\right\} \) is processed by the attention layer and linear layer, the output of the mth encoding block is:
where \(\textbf{W}_\text {LIN}^{\left( m,i\right) }\) and \(\textbf{b}_\text {LIN}^{\left( m,i\right) } (i=1\ \text {or}\ 2)\) denote the transformation matrix and biased vector, respectively.
Finally, the output layer transforms each output vector \(\textbf{e}_{n,l}^{(M+1)}\) of the last encoding block to a Qdimensional vector, and then applies a softmax activation function to generate a probability distribution \({\hat{\textbf{s}}}_{n,l}\in \mathbb {R}^Q\) over all possible road segments:
where \(\textbf{W}_\text {OUT}\) and \(\textbf{b}_\text {OUT}\) are the transformation matrix and biased vector, respectively.
3.2 Clustering based on variational autoencoder
Consider clustering the set \(\mathcal {E}=\left\{ \textbf{e}_{n};1\le n\le N\right\} \). Assume that each sample \(\textbf{e}\) in the set \(\mathcal {E}\) is generated by a latent vector \(\textbf{z}\) [37]:
where \(p\left( \bullet \right) \) stands for a probability density function (PDF) of a random variable, \(\mathcal {N}\left( \bullet ,\bullet \right) \) represents a multivariate normal distribution, \(\varvec{\upmu }_\textbf{e}\left( \textbf{z}\right) \) and \(\varvec{\upsigma }_\textbf{e}\left( \textbf{z}\right) \) are the mean and variance vectors respectively, and \(\text {diag}\left( \bullet \right) \) denotes a diagonal matrix with its diagonal elements specified by the given vector. To formulate the dependence of \(\varvec{\upmu }_\textbf{e}\left( \textbf{z}\right) \) and \(\varvec{\upsigma }_\textbf{e}\left( \textbf{z}\right) \) on the latent vector \(\textbf{z}\), a multilayer perceptron (MLP) model \(f_{\varvec{\uptheta }}\) parameterized by \(\varvec{\uptheta }\) is used [37]:
where \(\varvec{\uptheta }\) is determined by maximum likelihood estimation (MLE), as will be discussed at the end of this subsection.
To capture the intrinsic patterns for clustering, we further assume that \(\textbf{z}\) is generated from a Ccomponent GMM [37], where C is the predefined number of clusters. Let \(c\in \left\{ 1,2,\cdots ,\ C\right\} \) denote the component from which \(\textbf{z}\) is selected. The variable c follows a Cdimensional categorical distribution:
where \(\sum _{c=1}^{C}\pi _c=1\) and \(\varvec{\uppi }\) should be determined by MLE. Consequently, we have:
where \(\varvec{\upmu }_\textbf{z}\left( c\right) \) and \(\varvec{\upsigma }_\textbf{z}\left( c\right) \) are the corresponding mean and variance vectors respectively. By applying the chain rule on (8) and (10)(11), the joint PDF can be written as:
In order to determine the unknown parameters, including \(\varvec{\uptheta }\) and \(\varvec{\uppi }\), we apply MLE by maximizing the following loglikelihood over the dataset \(\mathcal {E}\):
where \(\mathbb {E}\left( \bullet \right) \) represents the expectation. In practice, maximizing the loglikelihood in (13) is often intractable, because it involves an integration over \(\textbf{z}\) that is difficult to compute. Alternatively, we can maximize the evidence lower bound (ELBO) \(\mathcal {L}_\text {ELBO}\) of \(\mathbb {E}_{\mathcal {E}}\left( \log {p\left( \textbf{e}\right) }\right) \) [37]:
where \(q\left( \textbf{z},c\textbf{e}\right) \) is a variational approximation of \(p\left( \textbf{z},c\right) \) [37].
By adopting the mean field assumption [37], \(q\left( \textbf{z},c\textbf{e}\right) \) can be further factorized as:
In (15), \(q\left( \textbf{z}\textbf{e}\right) \) is assumed to follow a normal distribution:
and \(q\left( c\textbf{e}\right) \) represents the probability that a sample \(\textbf{e}\) belongs to the cth cluster. Similar to \(\varvec{\upmu }_\textbf{e}\left( \textbf{z}\right) \) and \(\varvec{\upsigma }_\textbf{e}\left( \textbf{z}\right) \), we use another MLP \(g_{\varvec{\upphi }}\) parameterized by \(\varvec{\upphi }\) to generate \(\varvec{\upmu }_\textbf{z}\left( \textbf{e}\right) \) and \(\varvec{\upsigma }_\textbf{z}\left( \textbf{e}\right) \) from \(\textbf{e}\):
Remember that \(\textbf{e}\) is generated from a latent vector \(\textbf{z}\). Hence, \(q\left( c\textbf{e}\right) \) and \(p\left( c\textbf{z}\right) \) should be identical. Based on the Bayes theorem, we have:
Substituting (8)(12) and (15)(18) into (14), \(\mathcal {L}_\text {ELBO}\) can be expressed as a function of the following parameters [37]:
By maximizing \(\mathcal {L}_\text {ELBO}\) based on stochastic gradient descending, these unknown parameters can be efficiently determined. Note that the expectation over the dataset \(\mathcal {E}\) in (14) is computed by the empirical summation over all samples in \(\mathcal {E}\). More details about the aforementioned MLE method can be found in [37].
Once \(\Omega \) is known, the clustering results can be computed by using (18).
3.3 Graph neural network
Consider a network that contains Q nodes \(\{s_q;1\le q\le Q\}\). GNN targets learning a representative vector for each node \(s_q\) that captures the topological information associated with it . Such a goal is achieved by iteratively aggregating the information from the neighbors of each node [43]. Specifically, suppose that a GNN is composed of R layers. The rth layer takes the representation vector \(\textbf{x}_q^{(r)}\) of each node \(s_q\) as its input, and outputs the updated representation \(\textbf{x}_q^{(r+1)}\) by applying a set of transformations. Formally, a linear transformation is first applied to \(\textbf{x}_i^{\left( r\right) }\). Next, the representations of the neighborhood nodes of each node \(s_q\) are linearly combined for information aggregation. Finally, an elementwise sigmoid function is applied to the derived vector:
where \(\mathcal {N}_q\) denote the neighbor set of the node \(s_q\), \(\sigma \left( \bullet \right) \) denotes the elementwise sigmoid function, \(\textbf{W}_\text {LIN}^{(r)}\) is the transformation matrix to be learned, and \(a_{qi}^{(r)}\) is the weight determined by the attention mechanism:
In (23), \(\ell \left( \bullet \right) \) is the leaky ReLU function [43], and \(\textbf{w}_A^{\left( r\right) }\) is a vector that should be learned.
The final output vectors \(\{\textbf{x}_q=\textbf{x}_q^{(R+1)};1\le q\le Q\}\) of GNN are expected to capture the topological information of the road network. To this end, a link prediction task is carefully designed to guide GNN training [44]. Let \(\textbf{G}\in \mathbb {R}^{Q\times Q}\) represent the adjacency matrix of the road segments. Its element \(g_{ij}\in \left\{ 0,1\right\} \) indicates whether \(s_j\) is a successor of \(s_i\). Note that \(\textbf{G}\) is not necessarily symmetric, because the existence of a path \(s_i\rightarrow s_j (i.e., g_{ij}=1)\) does not necessarily imply that the opposite route \(s_j\rightarrow s_i\) must exist, considering the possibility that the road may be oneway. If the road representations \(\{\textbf{x}_q;1\le q\le Q\}\) are appropriately learned, we should be able to accurately reconstruct \(\textbf{G}\) from \(\{\textbf{x}_q;1\le q\le Q\}\). We estimate the element \({\hat{g}}_{ij}\) in the reconstructed adjacency matrix \(\hat{\textbf{G}}\) by [44]:
where \({\widetilde{x}}_i\) and \({\hat{\textbf{x}}}_i\) denote the last element and the subvector composed of all other elements of \(\textbf{x}_i\) respectively, and \(\Vert \bullet \Vert _2\) stands for the L2 norm of a vector. Note that the matrix \(\hat{\textbf{G}}\) defined by (22) is asymmetric in general.
Our proposed GNN is trained by minimizing the crossentropy loss between \(\textbf{G}\) and \(\hat{\textbf{G}}\) [44]:
Once the training process is complete, the topological information is encoded into the derived vectors \(\{\textbf{x}_q;1\le q\le Q\}\)
4 Proposed method
Consider a road set S composed of Q road segments \(\{s_1,s_2,\cdots ,s_Q \}\) and a trajectory set \(\{T_1,T_2,\cdots ,T_N \}\) where each trajectory \(T_n\) is composed of \(L_n\) timeordered road segments \(\{s_{n,1},s_{n,2},\cdots ,s_{n,L_n } \}\) and each road segment \(s_{n,l}\) is represented by its index in the road set S. For instance, a trajectory \(T_n=\{3,2,5\}\) denotes the timeordered sequence \(s_3\rightarrow s_2\rightarrow s_5\) including three road segments.
Our goal is to predict the next K segments \(\left\{ s_{n{,L}_n+1},s_{n,L_n+2},\right. \) \(\left. \cdots ,s_{n{,L}_n+K}\right\} \) that \(T_n\) will reach in the future. Towards this goal, we first adopt an attentionbased model to learn the sequential patterns from historical trajectories. With the proposed model, we can represent the past transitions ending at each road segment by an embedding vector. Based upon these derived vectors, a VAE is employed to cluster historical trajectories with soft decisions. In addition, we use a GNN to extract topological information from road networks. Finally, the clustering outcomes and topological information are seamlessly integrated together to learn a refined prediction model. In this section, we describe the proposed algorithm for trajectory prediction and highlight its novelties.
4.1 Sequence modeling
It has been demonstrated in the literature that attentionbased methods can achieve high computational efficiency and superior modeling accuracy in longterm dependence [33]. For this reason, we adopt an attentionbased model in this paper to learn the sequential patterns and predict the future road segments for a given trajectory \(T_n\).
Figure 3 shows the network structure of our proposed attentionbased model. Given a trajectory \(T_n\) as the input, its past road segments \(\left\{ s_{n,1},s_{n,2},\cdots ,s_{n{,L}_n}\right\} \) are transformed into the corresponding embedding vectors \(\left\{ \textbf{e}_{n,1}^{\left( 1\right) },\textbf{e}_{n,2}^{\left( 1\right) },\cdots , \right. \) \( \left. \textbf{e}_{n{,L}_n}^{\left( 1\right) }\right\} \) by the embedding layer, and then propagated through M encoding blocks and the output layer, finally generating the output vectors \(\left\{ {\hat{\textbf{s}}}_{n,1},{\hat{\textbf{s}}}_{n,2},\cdots ,{\hat{\textbf{s}}}_{n{,L}_n}\right\} \).
Three important clarifications should be made for the network structure in Fig. 3. First, the first \(L_n1\) output vectors \(\left\{ {\hat{\textbf{s}}}_{n,1},{\hat{\textbf{s}}}_{n,2},\cdots ,{\hat{\textbf{s}}}_{n{,L}_n1}\right\} \) are used for model training, and the last output vector \({\hat{\textbf{s}}}_{n{,L}_n}\) is used to predict the next road segment \(s_{n,{\ L}_n+1}\). Remember that each vector \({\hat{\textbf{s}}}_{n,l}\) depends on the first l road segments \(T_{n,l}=\left\{ s_{n,1},s_{n,2},\cdots ,s_{n,l}\right\} \), and it represents the likelihood for the next road segment \(s_{n,l+1}\) to occur. In the ideal case, the \(s_{n,l+1}\)th element of \({\hat{\textbf{s}}}_{n,l}\) should be equal to 1 and all other elements should be equal to 0, representing a onehot encoded vector. Our training goal is to maximize the overall likelihood of these predictions over all trajectories \(\left\{ T_1,T_2,{\cdots ,T}_N\right\} \), which is equivalent to minimizing the negative loglikelihood:
where \({\hat{\textbf{s}}}_{n,l\ }\left[ s_{n,l+1}\right] \) represents the \(s_{n,l+1}\)th element of \({\hat{\textbf{s}}}_{n,l\ }\). Note that \({\hat{\textbf{s}}}_{n,l\ }\) is generated by a softmax function, as shown in (7). Maximizing \({\hat{\textbf{s}}}_{n,l\ }\left[ s_{n,l+1}\right] \) in (24) would simultaneously minimize all other elements in \({\hat{\textbf{s}}}_{n,l\ }\).
With the loss function in (24), the sequence model is learned by backpropagation, where the gradients are computed to update all model parameters based on an iterative algorithm such as stochastic gradient descent [35]. Once the model is learned, the optimal prediction for \(s_{n,{\ L}_n+1}\) is determined as the road segment associated with the maximum element in \({\hat{\textbf{s}}}_{n{,L}_n}\).
Second, the network structure in Fig. 3 does not have a “fixed” size. Instead, the numbers of its inputs and outputs both depend on \(L_n\). If a long trajectory with a large value of \(L_n\) is considered, the network will process a large number of inputs and outputs.
Third, if we need to predict K road segments \(\left\{ s_{n{,L}_n+1}, s_{n,L_n} \right. \)\( \left. {+2},\cdots ,s_{n{,L}_n+K}\right\} \) based on \(T_n=\left\{ s_{n,1},s_{n,2},\cdots ,s_{n{,L}_n}\right\} \), we should first predict \(s_{n,{\ L}_n+1}\) by using the proposed model. Next, we add the predicted value \(s_{n,{\ L}_n+1}\) into \(T_n\), resulting in \(T_n=\left\{ s_{n,1},s_{n,2},\cdots ,s_{n{,L}_n+1}\right\} \), and the updated \(T_n\) with \(L_n+1\) road segments is used to further predict \(s_{n,{\ L}_n+2}\). The aforementioned process is repeated until all K road segments are predicted.
4.2 Trajectory clustering
As discussed in the previous subsection, the embedding vector \(\textbf{e}_{n,l}^{\left( m\right) }\) captures the information carried by the first l elements \(T_{n,l}=\left\{ s_{n,1},s_{n,2},\cdots ,s_{n,l}\right\} \) of a trajectory \(T_n\). Hence, \(\textbf{e}_{n,l}^{\left( m\right) }\) can be used as a representation of the subtrajectory \(T_{n,l}\). In this section, we will use a VAE integrated with GMM [37] to cluster trajectories based on the embedding vector set \(\mathcal {E}=\left\{ \textbf{e}_{n,l}^{\left( m_\text {CLU}\right) };1\le n\le N;1\le l\le L_n\right\} \), where \(m_\text {CLU}\) is a userdefined parameter indicating the specific layer to extract the required embedding vectors for trajectory clustering.
Two important issues must be carefully considered for the proposed clustering method. First, before the neural network in Section 4.1 is trained, the embedding vectors in \(\mathcal {E}\) are unknown. To mitigate this issue, we propose to first train an initial attentionbased model by using all trajectories where no clustering is applied. The resulting embedding vectors from the initial model are used for trajectory clustering. Afterwards, the attentionbased model is retrained based on the clustered trajectories, as will be discussed in Section 4.4.
Second, the value of \(m_\text {CLU}\) must be appropriately chosen. If it is overly small, \(\textbf{e}_{n,l}^{\left( m_\text {CLU}\right) }\) does not gain sufficient attention on the previous road segments and, hence, it mostly carries the information of the current road segment \(s_{n,l}\), instead of the subtrajectory \(T_{n,l}\). On the other hand, if \(m_\text {CLU}\) is prohibitively large, the information encoded by \(\textbf{e}_{n,l}^{\left( m_\text {CLU}\right) }\) is mostly suitable for trajectory prediction, rather than trajectory clustering.
By applying VAEbased clustering method as discussed in Section 3.2, the probability for an embedding vector \(\textbf{e}_{n,l}^{\left( m_\text {CLU}\right) }\) to belong to the cth cluster, namely \(q\left( c\textbf{e}_{n,l}^{\left( m_\text {CLU}\right) }\right) \) can be calculated as:
In this paper, instead of assigning \(\textbf{e}_{n,l}^{\left( m_\text {CLU}\right) }\) to a single cluster, we use the above probabilities to represent the soft decision for trajectory clustering.
4.3 Topology extraction
The trajectory of a moving object is restricted by the road network S, and the next location along a trajectory must be a neighbor of the last road segment. Therefore, incorporating topological information into our trajectory prediction model can significantly improve its accuracy, as it adds extra constraints and can filter out a large number of “illegal” predictions.
To extract the topological information, we aim to learn a set of road representation vectors \({\textbf{x}_q;1\le q\le Q}\) that encode the connections of the road segments in S. Towards this goal, we start with the embedding vectors \(\left\{ \textbf{e}_q^{\left( 1\right) };1\le q\le Q\right\} \). Each vector \(\{\textbf{e}_q^{\left( 1\right) }\}\) corresponds to the qth road segment and is the qth column of \(\textbf{W}_\text {EMB}\), as shown in (1). We adopt a GNNbased neural network as discussed in Section 3.3 to derive \(\{\textbf{x}_q;1\le q\le Q\}\).
Note that the embedding vectors \(\left\{ \textbf{e}_q^{\left( 1\right) };1\le q\le Q\right\} \) are adopted as the initial node features of GNN, because they are learned from a trajectory prediction task and implicitly carry the knowledge on road connectivity. Hence, these vectors can serve as a good starting point to learn \(\{\textbf{x}_q;1\le q\le Q\}\).
Once \(\{\textbf{x}_q;1\le q\le Q\}\) are derived, they can be further utilized for trajectory prediction, as will be discussed in the next subsection.
4.4 Integrated model
By processing a given trajectory \(T_n=\left\{ s_{n,1},s_{n,2},\cdots ,s_{n{,L}_n}\right\} \) based on the algorithms in Section 4.14.3, the following information can be extracted for each road segment \(s_{n,l}\): (i) the embedding vector \(\textbf{e}_{n,l}^{\left( 1\right) }\) in (1) associated with the road segment \(s_{n,l}\), (ii) the clustering outcome \(\textbf{q}_{n,l}\) in (25) associated with the first l road segments \(T_{n,l}=\left\{ s_{n,1},s_{n,2},\cdots ,s_{n,l}\right\} \), and (iii) the topological information \(\textbf{x}_{n,l}\) learned from a GNN for the road segment \(s_{n,l}\). Instead of directly adopting the network output in Fig. 3 for trajectory prediction, we propose to combine \(\textbf{e}_{n,l}^{\left( 1\right) }\), \(\textbf{q}_{n,l}\) and \(\textbf{x}_{n,l}\) together by an integration layer to generate a fused feature vector \(\textbf{o}_{n,l}\):
where \(\textbf{W}_\text {TRA}\), \(\textbf{W}_\text {CLU}\) and \(\textbf{W}_\text {TOP}\) are the transformation matrices to be learned.
The integrated model for trajectory prediction is shown in Fig. 4. It is composed of an integration layer, M encoding blocks and an output layer. Note that its network structure is similar to the attentionbased model in Fig. 3, except that the embedding layer is replaced with the integration layer. This integrated model can be trained by following the same method in Section 4.1.
4.5 Algorithm summary
Algorithm 1 summarizes the major steps of our proposed trajectory prediction algorithm. First, we train an attentionbased model for sequence modeling, allowing us to capture the temporal dependencies associated with all given trajectories. Second, embedding vectors are extracted by the sequence model to represent the subtrajectories. Based on these embedding vectors, a VAE is trained to cluster subtrajectories. It, in turn, allows us to identify similar trajectories and group them together. Third, the embedding vectors learned from the sequence model are adopted to extract topological information by using a GNN. As a result, we can identify the topological structure of the road network. Such information is highly important in order to accurately predict the trajectory of a moving object. Finally, the derived information, including the embedding vectors corresponding to road segments, the probabilities for trajectory clustering, and the road representation vectors capturing topology information, are all integrated together to train a new attentionbased sequence model. The overall flow is shown in Fig. 5. Note that the proposed algorithm effectively combines sequence modeling, clustering and GNN to achieve accurate trajectory prediction. As will be demonstrated by our experimental results in Section 5, the proposed approach can achieve superior prediction accuracy over other traditional methods.
Given that our framework integrates various components for sequence prediction, trajectory clustering, and topological information extraction, it is able to incorporate diverse information sources, encompassing road lengths, traffic conditions, and beyond. For example, traffic status on each road segment can be effectively represented as the road features and seamlessly integrated using our GNN component. This unique adaptability extends the proposed framework across diverse domains beyond trajectory prediction, spanning applications such as traffic prediction, demand analysis, and more.
5 Experimental results
In this section, we present the details of our datasets, experimental setups and important findings based on two public datasets.
5.1 Datasets
To evaluate the efficacy of our proposed method, numerical experiments are performed on two publicly available datasets: (i) the Porto dataset [45], and (ii) the Chengdu dataset [46]. Both datasets are composed of trajectories collected from taxi trips, where each trip is defined by a sequence of GPS coordinates. In our preprocessing, the GPS locations in each trip are mapped to the corresponding road segments by using a hidden Markov model, and then each trajectory is further cleaned by removing its adjacent duplicated road segments [10]. Each dataset is randomly split into a training set (80%), a validation set (10%), and a testing set (10%).
The Porto dataset is a collection of GPS trajectories from 442 taxis in Porto, Portugal [45]. The dataset is composed of over 1.7 million trips where GPS coordinates are recorded every 15 seconds for each trip. We select 15,216 trips that are located around the city center, involving 1,040 road segments. On the other hand, the Chengdu dataset contains 1.4 billion trips that are collected from 14 thousand taxis in Chengdu, China [46]. The GPS coordinates in each trip are recorded every 10 seconds. In this example, we select 31,655 trips that involve 10,130 road segments. Figure 6 shows a portion of the road network in each dataset.
These two datasets are chosen for our numerical experiments due to two reasons:

1.
Accessibility: Both datasets are readily available and free to use. Accessing other datasets, such as the Singapore dataset [10], requires an authorization process that is difficult for us to implement.

2.
Quantity and quality: Both datasets are composed of a substantial number of trajectories, which is crucial for training deeplearningbased models. Additionally, the trajectories in both datasets are recorded at short intervals of tens of seconds, ensuring continuity and accuracy. In contrast, other datasets such as the Beijing dataset in Tdrive [10] are often recorded with longer intervals, leading to discontinuous trajectory sequences.
It is crucial to acknowledge the impact of several important aspects, including urban development, traffic regulations, seasonal variations, etc., in shaping trajectory patterns, which are not fully captured in these two datasets. Nevertheless, our focus in this work is primarily on topological information and these two datasets are used to validate the proposed approach. In our future research, we will further test additional datasets once they become available.
5.2 Experimental settings
To implement the proposed method, an Adam optimizer is utilized to train the sequence model, GNN, and VAE, with learning rates set as 0.0001, 0.001, and 0.0001, respectively. Additionally, the maximum number of training iterations is capped at 100, with early stopping implemented upon convergence of loss on the validation set. We use the testing set to estimate the final prediction accuracy. Table 1 summarizes the model parameters in our experiments.
For testing and comparison purposes, several conventional methods are implemented:

1.
Markov model with clustering (ClusMM) [10]: Trajectories are clustered by using VAT [26], and then a firstorder Markov model is trained for each cluster.

2.
Constrained state space RNN (CSSRNN) [15]: LSTM is adopted for trajectory prediction. In addition, the next road segment must be one of the neighbors of current segment.

3.
Networkbased trajectory prediction (NetTraj) [40]: Network structure is incorporated into a bidirectional LSTM for trajectory prediction. In the original method, trajectories are represented as sequences of road intersections and directions. To align with our context, the representation is adapted to sequences of road segments.

4.
Importancebased trajectory prediction (ImpRNN) [41]: Road importance is evaluated using a pagerank method and further integrated into LSTM for trajectory prediction. To accommodate our applications, trajectories are represented as sequences of road segments.

5.
Attentionbased sequence model (ASM): The sequence model in Section 4.1 is adopt, without trajectory clustering and topology extraction.
5.3 Results and discussions
Figure 7 shows the prediction accuracy as a function of the prediction step \(N_{pred}\) for different methods on each dataset. Here, prediction step \(N_{pred}\) refers to the number of future road segments that should be predicted. Table 2 presents the prediction accuracies when \(N_{pred}=1\) and \(N_{pred}=15\). Studying Fig. 7 and Table 2, one can find several important observations. First, our proposed approach outperforms ClusMM, CSSRNN, NetTraj, ImpRNN and ASM consistently. Its efficacy is most pronounced when the prediction step is large. On the Porto dataset, our proposed method improves the accuracies by up to 3.8% for 15step prediction, compared to other conventional method. Second, the proposed trajectory clustering and topology extraction can effectively improve the prediction accuracy, as our proposed method outperforms ASM in both examples. Third, the scale of the road network presents a significant challenge, as the performance of each method on the largescale Chengdu dataset is inferior to that on the Porto dataset.
Figure 8 presents the average distance error, measured by the distance between the centers of the predicted road segment and the true one. Due to the variations of road lengths, the distance error exhibits significant fluctuations. Nevertheless, it is observed that the proposed method achieves the lowest error, indicating its robust capability in continuous trajectory prediction.
To further illustrate the insights for trajectory clustering, a pairwise distance matrix of 500 randomlyselected training trajectories is shown in Fig. 9. The distance values are normalized over [0, 1], and the trajectory indexes are reordered so that those trajectories with small distances are close to each other. In Fig. 9, the trajectories in the same dark block along the matrix diagonal are assigned to the same cluster. In these two examples, there are a large number of clusters. If the training data are explicitly partitioned over so many clusters, each cluster may contain an insufficient number of training trajectories, thereby resulting in poor prediction accuracy. It, in turn, explains the reason why our proposed clustering approach with soft decisions is preferred over the conventional ClusMM method based on hard decisions. Comparing the two distance matrixes, one can find that the large number of road segments for the Chengdu dataset imposes great challenges on trajectory clustering. As shown in Fig. 9(b), only small clusters can be identified along the diagonal of the distance matrix. It explains the reason why our proposed method achieves similar accuracy as ASM without trajectory clustering in this example.
Figure 9 additionally depicts three trajectories belonging to the same cluster for each example. It is evident that these trajectories are concentrated within a confined region, indicating that the clustering results effectively capture the spatial proximity of trajectories. Moreover, substantial overlap among these trajectories suggests that the clustering results also capture their movement patterns, thereby facilitating trajectory prediction.
In this paper, as the true cluster labels are unknown, it is difficult to define a quantitative measure to assess clustering accuracy. In our future research, we will further study efficient ways to evaluate clustering accuracy and improve our proposed clustering algorithm.
6 Conclusions
In this paper, we present an integrated approach for accurate trajectory prediction by leveraging an attentionbased model for sequence modeling, a VAE for trajectory clustering, and a GNN for topological information extraction. Our proposed selfattention model facilitates us to accurately capture the complex temporal dependencies in trajectory data. By incorporating the clustering information via a VAE and the topological information with a GNN, we can further improve the underlying representation and understanding when learning the trajectory patterns of interest. As demonstrated by the experimental results on two public datasets, our approach achieves superior accuracy over other conventional methods. It, in turn, highlights the importance of considering diverse knowledge and information when solving the problem of trajectory prediction.
In our future work, we will study efficient ways to evaluate trajectory clustering accuracy, and further scale and generalize the proposed method across large and diverse datasets. Furthermore, exploring additional information sources, such as road lengths and traffic information, may further improve the prediction accuracy. Overall, our proposed approach provides a scalable framework for future research and practical implementation of trajectory prediction.
Data Availability
The datasets used in our study are available using the following links: Porto datasethttps://www.kaggle.com/datasets/crailtap/taxitrajectory. Chengdu datasethttps://pan.baidu.com/s/1OeNs36fZHEon2yNA2bhs9A with access code hqen.
References
Lamssaggad A, Benamar N, Hafid AS, Msahli M (2021) A survey on the current security landscape of intelligent transportation systems. IEEE Access 9:9180–9208
Kaffash S, Nguyen AT, Zhu J (2021) Big data algorithms and applications in intelligent transportation system: A review and bibliometric analysis. Int J Prod Econ 231:107868
Fatemidokht H, Rafsanjani MK, Gupta BB, Hsu CH (2021) Efficient and secure routing protocol based on artificial intelligence algorithms with uavassisted for vehicular ad hoc networks in intelligent transportation systems. IEEE Trans Intell Transp Syst 22(7):4757–4769
Veres M, Moussa M (2019) Deep learning for intelligent transportation systems: A survey of emerging trends. IEEE Trans Intell Transp Syst 21(8):3152–3168
Zhang J, Wang FY, Wang K, Lin WH, Xu X, Chen C (2011) Datadriven intelligent transportation systems: A survey. IEEE Trans Intell Transp Syst 12(4):1624–1639
Huang Y, Du J, Yang Z, Zhou Z, Zhang L, Chen H (2022) A survey on trajectoryprediction methods for autonomous driving. IEEE Trans Intell Veh 7(3):652–674
Tan M, Shen H, Xi K, Chai B (2023) Trajectory prediction of flying vehicles based on deep learning methods. Appl Intell 53(11):13621–13642
Zhang X, Fu X, Xiao Z, Xu H, Qin Z (2022) Vessel trajectory prediction in maritime transportation: Current approaches and beyond. IEEE Trans Intell Transp Syst
Zhong G, Zhang H, Zhou J, Zhou J, Liu H (2022) Shortterm 4d trajectory prediction for uav based on spatiotemporal trajectory clustering. IEEE Access 10:93362–93380
Rathore P, Kumar D, Rajasegarar S, Palaniswami M, Bezdek JC (2019) A scalable framework for trajectory prediction. IEEE Trans Intell Transp Syst 20(10):3860–3874
Neumeier M, Botsch M, Tollkühn A, Berberich T (2021) Variational autoencoderbased vehicle trajectory prediction with an interpretable latent space. In: IEEE Int. Intell. Transp. Syst. Conf. IEEE, p 820–827
Bharilya V, Kumar N (2024) Machine learning for autonomous vehicle’s trajectory prediction: A comprehensive survey, challenges, and future research directions. Vehic Comm 100733
Asahara A, Maruyama K, Sato A, Seto K (2011) Pedestrianmovement prediction based on mixed markovchain model. In: Proc. ACM SIGSPATIAL. p 25–33
Wang B, Hu Y, Shou G, Guo Z (2016) Trajectory prediction in campus based on markov chains. In: Proc. BigCom. Springer, p 145–154
Wu H, Chen Z, Sun W, Zheng B, Wang W (2017) Modeling trajectories with recurrent neural networks. In: Proc. 26th Int. Joint Conf. Artif. Intell., vol. 25. p 3083–3090
Ip A, Irio L, Oliveira R (2021) Vehicle trajectory prediction based on lstm recurrent neural networks. In: Proc. IEEE 93rd Veh. Technol. Conf. IEEE, p 1–5
Yan M, Li S, Chan CA, Shen Y, Yu Y (2021) Mobility prediction using a weighted Markov model based on mobile user classification. Sensors 21(5):1740
Chen X, Zhang H, Hu Y, Liang J, Wang H (2023) Vnagt: Variational nonautoregressive graph transformer network for multiagent trajectory prediction. IEEE Trans Vehic Technol
Yang C, Pei Z (2023) Longshort term spatiotemporal aggregation for trajectory prediction. IEEE Trans Intell Transp Syst 24(4):4114–4126
Sahadevan D, Ponnusamy P, Gopi VP, Nelli MK (2022) Groundbased 4d trajectory prediction using bidirectional lstm networks. Appl Intell 52(14):16417–16434
Hasan F, Huang H (2023) Malsnet: A multihead attentionbased lstm sequencetosequence network for sociotemporal interaction modelling and trajectory prediction. Sensors 23(1):530
Xu P, Hayet JB, Karamouzas I (2023) Contextaware timewise vaes for realtime vehicle trajectory prediction. IEEE Robot Automat Lett
Besse PC, Guillouet B, Loubes JM, Royer F (2016) Review and perspective for distancebased clustering of vehicle trajectories. IEEE Trans Intell Transp Syst 17(11):3306–3317
Choong MY, Angeline L, Chin RKY, Yeo KB, Teo KTK (2016) Vehicle trajectory clustering for traffic intersection surveillance. In: Proc. IEEE Int. Conf. Consum. Electron.Asia. IEEE, p 1–4
Liu LX, Song JT, Guan B, Wu ZX, He KJ (2012) Tradbscan: a algorithm of clustering trajectories. Appl Mech Mater 121:4875–4879
Kumar D, Wu H, Rajasegarar S, Leckie C, Krishnaswamy S, Palaniswami M (2018) Fast and scalable big data trajectory clustering for understanding urban mobility. IEEE Trans Intell Transp Syst 19(11):3709–3722
Li X, Hu W, Hu W (2006) A coarsetofine strategy for vehicle motion trajectory clustering. In: Proc. 18th Int. Conf. Pattern Recognit., vol. 1. IEEE, p 591–594
Nanni M, Pedreschi D (2006) Timefocused clustering of trajectories of moving objects. J Intell Inf Syst 27:267–289
Vlachos M, Kollios G, Gunopulos D (2002) Discovering similar multidimensional trajectories. In: Proc. 18th Int. Conf. Data Eng. IEEE, p 673–684
Sousa RSD, Boukerche A, Loureiro AA (2020) Vehicle trajectory similarity: models, methods, and applications. ACM Comput Surv 53(5):1–32
Wang W, Xia F, Nie H, Chen Z, Gong Z, Kong X, Wei W (2020) Vehicle trajectory clustering based on dynamic representation learning of internet of vehicles. IEEE Trans Intell Transp Syst 22(6):3567–3576
Ferraro MB, Giordani P (2020) Soft clustering. Wiley Interdisciplinary. Rev Comput Stat 12(1):1480
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Proc. Adv. Neural Inf. Process. Syst. 30:5998–6008
Zhang A, Zhang B, Bi W, Mao Z (2022) Attention based trajectory prediction method under the air combat environment. Appl Intell 52(15):17341–17355
Goodfellow I, Bengio Y, Courville A (2016) Deep Learning. MIT press, Cambridge
Jiang Z, Zheng Y, Tan H, Tang B, Zhou H (2016) Variational deep embedding: An unsupervised and generative approach to clustering. arXiv:1611.05148
Jiang Z, Zheng Y, Tan H, Tang B, Zhou H (2016) Variational deep embedding: A generative approach to clustering. CoRR. abs/1611.05148, p 1–8
Cheng H, Liu M, Chen L, Broszio H, Sester M, Yang MY (2023) Gatraj: A graphand attentionbased multiagent trajectory prediction model. ISPRS J Photogramm Remote Sens 205:163–175
Westny T, Oskarsson J, Olofsson B, Frisk E (2023) Evaluation of differentially constrained motion models for graphbased trajectory prediction. In: 2023 IEEE Intelligent Vehicles Symposium (IV). IEEE, p 1–8
Liang Y, Zhao Z (2021) Nettraj: A networkbased vehicle trajectory prediction model with directional representation and spatiotemporal attention mechanisms. IEEE Trans Intell Transp Syst 23(9):14470–14481
Guan L, Shi J, Wang D, Shao H, Chen Z, Chu D (2023) A trajectory prediction method based on bayonet importance encoding and bidirectional lstm. Expert Syst Appl 223:119888
Jiang W, Luo J (2022) Graph neural network for traffic forecasting: A survey. Expert Syst Appl 207:117921
Velickovic P, Cucurull G, Casanova A, Romero A, Lio P, Bengio Y, et al (2018) Graph attention networks. In: Proc. Int. Conf. Learn. Representations. p 1–12
Salha G, Limnios S, Hennequin R, Tran VA, Vazirgiannis M (2019) Gravityinspired graph autoencoders for directed link prediction. In: Proc. 28th ACM Int. Conf. Inf. Knowl. Manage. p 589–598
MoreiraMatias L, Gama J, Ferreira M, MendesMoreira J, Damas L (2013) Predicting taxipassenger demand using streaming data. IEEE Trans Intell Transp Syst 14(3):1393–1402
Liu H, Jin S, Yan Y, Tao Y, Lin H (2019) Visual analytics of taxi trajectory data via topical subtrajectories. Vis Inform 3(3):140–149
Author information
Authors and Affiliations
Contributions
Shuo Zhao: Methodology, Software, WritingOriginal draft. Zhaozhi Li: Software, WritingReview Zikun Zhu: Software, WritingReview. Charles Chang: Conceptualization, Methodology, WritingReveiw. Xin Li: Conceptualization, Methodology, WritingOriginal draft, Review & Editing. YingChi Chen: Conceptualization, Methodology, WritingReview. Bo Yang: Conceptualization, Methodology, WritingReview.
Corresponding author
Ethics declarations
Competing interests
The authors have no competing interests to declare that are relevant to the content of this article.
Ethical and informed consent for data used
The authors state that this research complies with ethical standards.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author selfarchiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhao, S., Li, Z., Zhu, Z. et al. An integrated framework for accurate trajectory prediction based on deep learning. Appl Intell 54, 10161–10175 (2024). https://doi.org/10.1007/s10489024057243
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489024057243