Abstract
Dynamic Graph Neural Network (DGNN) models have been widely used for modelling, prediction and recommendation tasks in domains such as ecommerce and social networks, due to their ability to capture node interaction features and temporal features. Current methods for dynamic graph representation learning mainly depend on querying Khop neighbors and the triadic closure law to derive node representations. However, as the number of layers in neural networks increases, this can cause problems with oversmoothing and overly complex calculations. Additionally, current models cannot ensure that events arrive at adjacent nodes in chronological order according to timestamps. To address these problems, we propose a Dynamic Graph Neural Network via Memory Regenerate and Neighbor Propagation(DGNNMN) model. The model presents a memory regeneration strategy for obtaining node time information features and a time edgepropagating method for obtaining neighbour information. By combining these two methods to fuse output vectors, it captures node feature representations. In addition, we present a strategy for the timestamp encoding of node messages, which effectively ensures that node messages propagate to neighboring nodes in an ordered manner according to timestamps, thereby better capturing the temporal characteristics of events in dynamic graphs. Extensive experiments conducted on five public datasets demonstrate the effectiveness of DGNNMN for link prediction and node classification task. Furthermore, the method outperforms other stateoftheart methods. The data and code are available on GitHub.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
A dynamic graph can be described as a temporal graph that is based on timestamped interactions. The key elements of a dynamic graph include the source node, target node, timestamp, and edge features. The primary task of dynamic graph embedding is to incorporate learning of node representations [1,2,3,4] that encode temporal and structural dependencies which leads to various downstream tasks. Traditional graph representation learning methods mostly rely on static graphs to solve problems. Dealing with nodes and edges in realworld scenarios may seem straightforward. However, it is important to recognise that they are dynamic and evolving. As a result, understanding the dynamic evolution patterns of nodes with graphs has emerged as an important area in the field of graph data analysis and mining [5].
With the advancement of deep learning and graph neural network research, the study of dynamic graphs has experienced rapid development. The primary structure of dynamic graphs is shown in Fig. 1(a). Existing research has typically used static graph models to analyse dynamic graphs, but this approach only produces static representations of nodes, missing a considerable amount of valuable dynamic information. In particular, as shown in Fig. 1(b), when time information is ignored and dynamic graphs are represented as static graphs, the sequence \(U2\rightarrow I1\rightarrow U5\) would be considered a valid interaction path. However, it is ineffective in terms of capturing temporal dynamics, because nodes have only one embedding and cannot represent temporal information. Therefore, timeaware prediction tasks in realworld scenarios cannot be effectively accomplished using static graph neural network models.
In realworld scenarios, the evolution of graphs is constantly changing over time, with nodes and edges appearing or disappearing as time progresses. For example, in recommendation systems, if a user shows significant interest in a particular item at time t and clicks on it (creating an edge), similar items (nodes) may appear at \(t+1\). To address such realworld problems, a significant body of research has used discrete dynamic graph approaches to capture episodic temporal information [6]. These methods represent dynamic graphs as a series of discrete static snapshots, and are often combined with RNN structures to model temporal sequences, capture dynamic properties on the graph, and learn temporal representations.
However, the episodic temporal information inevitably leads to the loss of node connectivity details, as shown in Fig. 1(c). Due to the segmentation of snapshots, certain effective pathways (e.g. \(U5\rightarrow I1\leftarrow U2\)) are not considered.
In order to solve the above problems, researchers have proposed the concept of continuous time dynamic graph [2, 3, 7, 8]. Chang et al. [9] proposed a selection mechanism to obtain neighbours of interacting nodes, which uses the information of interacting edges and dependent edges to capture local information while combining the information of nodes in different time periods to capture global information. Zhang et al. [4] proposed a stochastic wander strategy based on a temporal network to capture the dynamically changing information of the temporal network ( motifs) dynamically changing representations. Huang et al. [10] modelled the edge formation process using Hawkes process and predicted the links by combining dynamic events, dynamic nodes.
Historical data and past events are essential for future forecasting. Continuoustime dynamic graph models significantly enhance predictive accuracy capturing temporal dependencies among data [11]. Therefore, various domains, including weather forecasting and stock market prediction, benefit significantly from those models [4, 12]. The memory module, a crucial element in the majority of continuoustime dynamic graph models, is used to maintain past node state representations. The dynamic graph is modelled as a sequence of timestamped events, with techniques for updating and storing temporal information. For example, Rossi et al. [2] proposed a strategy for memory units to update the storage of temporal information, which fuses temporal information into the embedding of nodes; Xu et al. [3] proposed Khopping temporal neighbours for sampling strategy and learned temporal encoding to preserve temporal information in message passing; Wang et al. [13] proposed asynchronous propagation of attention graph neural network algorithm, which decouples model inference and graph computation, allowing graph computation to be executed independently without affecting the speed of model inference.
Despite the progress made by the aforementioned studies in advancing this field, two significant challenges still persist:
Challenge 1: Complex Dependencies in Dynamic Graph Models. Models such as TGN [2] and TGAT [3] suffer from the drawbacks of excessive smoothing and time consumption when the number of layers is too high during the graph query operations, which leads to models with complex dependencies.
Challenge 2: Stability problems in Dynamic Graph Models. Models such as EvolveGCN [14] and MERIT [15], which utilise the RNN structure [16, 17], are usually used to solve the longterm dependency problem prevalent in general recurrent neural networks, and cannot ensure that events arrive in timestamped order when applied to dynamic graphical neural networks, resulting in unstable model performance.
To address the aforementioned problems, we propose a novel Dynamic Graph Neural Network via Memory regenerate and Neighbor propagation model(DGNNMN). In this model, we first input the dynamic graph into the memory regeneration module to achieve the generation of memory vectors that fuse temporal information. Then, we innovatively propose a propagation strategy along the temporal edge to achieve the acquisition of neighbour information and generate node vectors that fuse neighbour information. The model is based on the multihead attention module to achieve the fusion of node vectors containing temporal information and node vectors containing neighbour information. Finally, the feedback strategy is used to achieve iterative updating of the node representation.
The contributions of the present work are summarized as follows.

We propose a model for gathering node information based on a memory update strategy. Then, we optimize node representations via a memory update strategy. To collect the original information, we use an information aggregation module to combine it with data from neighbour nodes.

We propose a method for retrieving neighbour information using a strategy called Neighbour Propagation. Our method stores vast amounts of highorder neighbour data through an independent neighbour information propagation module. This results in realtime model inference, achieved by directly accessing vector lists, thereby optimising execution efficiency.

We conducted extensive experiments on five realworld dynamic graph datasets, comparing DGNNMN with seven base methods on link prediction and node classification. The results demonstrate that DGNNMN outperforms the existing baseline models. In addition, the effectiveness of using the memory regenerate and neighbor propagation strategies are demonstrated.
The rest of the paper is organized as follows. Section 2 briefly introduces related work. Section 3 provides the basic definition of the problem. In Section 4, we describe the overall framework of DGNNMN and the introduction of each component. Section 5 presents the experiments and analysis. Finally, Section 6 summarizes the conclusions of this study and discusses the future work.
2 Related work
In this section, we provide an overview of some researches relevant to this paper, including Graph Neural Networks (GNNs) and Dynamic Graph Neural Networks(DGNNs).
2.1 Graph neural networks
Graph Neural Networks (GNNs), as discussed in reference [18], represent a class of deep learning models designed for graph data. They have the capability to learn intricate nonlinear relationships within graphs and effectively integrate both structural and featurebased information of nodes.
Graph neural networks were first proposed by Gori et al. [19] and aim to map nodes, edges and even the whole to a lowdimensional vector space and maintain the structure and properties of the graph itself, using deep learning methods. Later Thomas Kipf et al. [20] first proposed graph convolutional neural networks to learn node representations through the manipulation of convolutional layers, an approach that has been remarkably successful in handling tasks such as node classification and link prediction. Hamilton et al. [21] introduced the GraphSAGE model, which performs information aggregation directly on fixedsize node neighborhoods. Veličković et al. [22] proposed using a combination of node features and attention mechanisms to assign different weights to different nodes, with the learned representation vectors being more discriminative. Bhatti et al. [23] summarised the applications of graph neural networks in various research areas, analysed the current state of research and gave future research directions. Representative work related to graph neural also includes LEFTNet [24] etc.
Graph neural networks have been studied in heterogeneous graphs and dynamic graphs, etc., and have shown strong academic and application potentials in recent years [25, 26]. Wang et al. [27] proposed to learn heterogeneous graph node representations by aggregating the feature information of neighbour nodes under different metapaths according to the importance coefficients. Fu et al. [28] proposed to capture heterogeneous node information in the middle of metapaths using a relationshipaware approach, and proposed a knowledge distillation method to capturing relevant semantics. Li et al. [29] devised attributecompletion and attributeenhancement methods to learn information in heterogeneous graphs and used Transformer’s approach instead of the original graphattention approach. Yu et al. [30] proposed a new transformerbased architecture for dynamic graph learning, which enables the model to effectively benefit from a longer history. Saxena et al. [31] proposed a method based on network centrality and combined it with Graph Convolutional Networks (GCN) to predict the connections between network nodes.
As graph neural networks mature, they are beginning to be applied to a wider range of domains, e.g., Chen et al. [32] applied graph neural networks to predicting ocean surface temperatures by proposing a spatiotemporal attention graph convolutional network and introducing an attention mechanism to capture spatial dependence and temporal correlation. Such applications extend the domain of graph neural networks and make them a versatile tool for multiple domains.
2.2 Dynamic graph neural networks
Dynamic Graph Neural Networks(DGNNs) models allow us to model the evolution of nodes and edges in timeseries data, enabling a better understanding and prediction of temporal phenomena [33, 34].
Nguyen et al. [35] proposed a method to generate node embeddings by simulating temporal random wandering. Zhang et al. [6, 36, 37] modelled temporal dependence by extracting nodeinduced sequences using a coupled prediction network. However, these methods cannot be generalised to new nodes and edges and do not depict the evolving nature of the dynamical graph. Trivedi et al. [38] introduced an attention layer to aggregate node neighbourhood information for node embeddings. Rossi et al. [2] proposed a generalised induction framework which operates on a continuous time dynamic graph represented as a sequence of events. Xu et al. [3] proposed a memoryless approach where temporal information is incorporated into the Transformer model through Bochner’s theorem to perform selfattention operations to obtain node embeddings. Wang et al. [13] proposed an asynchronous propagation attention mechanism , so that the time graph aggregation operations can be done locally during online inference. Li et al. [39] proposed the use of parametric metrics to estimate the impact scores of nodes in an evolving graph, and further developed an efficient singlescan algorithm to ensure the validity of graph query results.
Although various models have developed embedding methods that effectively learn the dynamic behaviour of nodes, current research is focused on the design of appropriate models for the propagation and interaction of temporal information.
3 Preliminaries
In this section, we formally define several essential concepts and problem definitions related to Dynamic Graph Neural Networks.
Definition 1
Static Graph. A static graph is composed of a vertex set V(G) and an edge set E(G), denoted as \(G = (V, E)\). Here, E(G) is a finite set of edges, where edges are unordered pairs (for undirected graphs) or ordered pairs (for directed graphs) of vertices.
Definition 2
Continuous Time Dynamic Graph. A continuoustime dynamic graph can capture the complete temporal information of a dynamic network and is represented as \(G=\{(v_i,v_j,t,\Delta t,e_{ij} (t));i,j=1,2,\dots ,n\}\), where \(v_i\),\(v_j\) represent nodes in the network, t and \(\Delta \) \(t\) denote the appearance time and duration of a connection, and \(e_{ij} (t)\) represents an edge.
Neural networks used for supervised learning in dynamic graph scenarios can be viewed as a combination of an encoder and a decoder. Solving dynamic graph problems can be considered as finding an appropriate function that maps dynamic graph information into node embeddings, which is the role of the encoder. The decoder’s function is to take one or more node embeddings as input and make predictions for specific tasks, such as link prediction or node classification. The problem of Dynamic Graph Embedding was defined as follows.
Problem 1
Dynamic Graph Embedding. Given a dynamic graph G, there exists a mapping function \(f:V\rightarrow R^d\). Dynamic graph embedding aims to learn a mapping function f that projects nodes from V into ddimensional vectors. These lowdimensional vectors should capture heterogeneous information and highorder information within G.
The key notations used in this study are listed in Table 1.
4 DGNNMN model
In this section, we introduce a novel DGNNMN model designed for representation learning on continuoustime dynamic graphs with timestamped event sequences. For each time t, it generates graph node embeddings \(Z(t)=(z_1 (t),\dots ,z_{n(t)} (t))\).
The general framework (Fig. 2) consists of three key modules: Memory Regenerate Module (A), MultiHead Attention Module (B), and Neighbor Information Propagation Module (C). Specifically, in Module (A) captures initial information of nodes through a node information acquisition strategy and subsequently captures temporal information of nodes via a node information aggregation and update method. The Module (C) initially acquires neighbor information based on a timeedge propagation strategy and then stores neighbor information in a parallelized manner. The Module (B) builds current node embeddings on top of the Module (A) and Module (C) using multihead attention operations. Finally, the obtained node embeddings are decoded by a multilayer perceptron for downstream tasks, such as link prediction and node classification.
4.1 Memory regenerate module
An essential component of DGNNMN is the Memory Regenerate Module, which plays a crucial role in the acquisition of temporal interaction information related to nodes. The module consists of three main units: Node Information Capture, Information Aggregation and Memory Regenerate. The model efficiently performs parallel processing while learning node representations from the temporal characteristics of the data. The specific workflow is illustrated in Fig. 3.
4.1.1 Node information capture
When an event occurs, accompanied by an interaction between two nodes, for each node i involved in the event, messages need to be computed and their memories updated. At time t, when the original node i interacts with the target node j through event \(e_{ij}(t)\), two messages can be computed:
Similarly, if it is an event at the node level, denoted as \(v_i (t)\), a single message can be computed for the nodes involved in the event:
where \(s_i(t^ )\) represents the memory of node i before time t (containing all the information about node i before the last interaction). msg is a learnable message function implemented using MLPs. This module also supports edge deletion events.
In the edge deletion event \((i,j,t',t)\), the edge created between nodes i and j at time \(t'\) is deleted at time t. To handle this, the node and its incoming and outgoing edges are simply removed from the spatiotemporal graph. This ensures that the node is not utilized in the graph while calculating embeddings for other nodes.
4.1.2 Node information aggregation
Due to the possibility of a single node having multiple interactions with other nodes before time t, meaning a node may generate multiple events before time t, for computational efficiency, this module uses an average aggregation mechanism to aggregate messages \( U_i(t_1),..., U_i(t_n)\), where \(t_1,...,t_n \le t\) when generating messages for each event.
where agg represents an average aggregation function.
This function can also be implemented in a learnable manner, for instance using RNNs or graph attention mechanisms, but doing so would increase computational complexity.
4.1.3 Node memory update
The memory of a node is updated for every event involving the node itself. For interaction events involving two nodes, i and j, the memory of both nodes is updated after the event occurs. For nodelevel events, only the memories of the relevant nodes are updated. The specific implementation process is as follows:
where mem is a learnable memory regenerate function, in this case, we use a recurrent neural network, specifically LSTM.
4.2 Neighbour messaging module
The Memory Regenerate Module only considers the temporal dependencies between nodes. To address the issue of incomplete higher order information, we suggest implementing the Neighbour Information Propagation Module. In addition to storing a considerable amount of highlevel adjacent information, this module enables selection of important adjacent information. The process of twohop neighbour information propagation is illustrated in Fig. 4. When the Memory Regenerate Module generates a vector for realtime interaction between two nodes, a propagator first generates an interaction vector. This vector is then propagated through feedback operations to the vectors of the khop neighbours. DGNNMN does not need to query the information of neighbouring nodes in the graph, as this module independently stores neighbour information during model inference. Instead, it reads the stored information from the vector lists of neighbouring nodes for realtime model inference.
4.2.1 Neighbour messaging module
In Fig. 4, the red and blue nodes interact at time t. The information of the two nodes, as well as the features of their edges, can be jointly represented as an interaction tuple \((z_{i} (t),z_{j} (t),e_{ij} (t)\). Here, \(z_i (t)\) and \(z_j (t)\) are created by the attention encoder. The neighbor propagation process can be described using the following two mathematical equations:
where \(\varphi (\cdot )\) is referred to as the neighbor generation function, responsible for inductively summarizing neighbor information; \(N_{ij}^k\) represents the transmission of neighbor information between the interacting nodes; \(f(\cdot )\) is used for reasonable attenuation or mapping of messages; \(\rho (\cdot )\) is a dimension reduction function, used to aggregate multiple incoming neighbor information into one; \(\psi (\cdot )\) updates the information of past nodes \(N(t^)\). The specific explanations for the equations are as follows:

1)
The neighbor information is generated through the generation function \(\varvec{\varphi }\). To record what happens when a node is involved in an interaction, the elements of the interaction tuple are summed when a node is involved in an interaction, i.e. \(neighbour(t)=z_i(t)+e_{ij}(t)+z_i(t)\), and timestamped. Note that instead of cascading, direct summation is used, which limits vector dimensionality and saves vector list memory.

2)
Sampling subgraphs follows a temporal pattern for neighbor information propagation \(\varvec{N_{ij}^k}\). After the generation of neighbour information in the first step, there is a need to disseminate this information to other nodes. However, propagating a single neighbour’s information to all neighbours is inefficient. Therefore, to model rapidly changing trends and update node embeddings, we employ a newer neighbour sampling strategy. This strategy is more suitable for capturing timevarying information.

3)
The propagation function f applies appropriate attenuation or mapping to the messages. After sampling the neighboring nodes, the information of the neighboring nodes is strengthened and weakened by means of an identity mapping, which ensures that the propagation of the information of the nodes is strictly related to the structure of the original node and its neighboring nodes.

4)
At a particular time step, a node may receive numerous neighbor messages. We use an averaging strategy, called \(\rho \), to deal with the imbalance and maintain fairness. Nodes typically receive multiple vectors from their neighbours during the information propagation phase.To address the imbalance between active (high degree) and inactive (low degree) nodes nodes, we use a reduction function that consolidates multiple messages into one, each node receives only one message per batch.

5)
After receiving information, nodes summarize the historical states of their neighbors using a selfupdate function \(\varvec{\psi }\). To update the vector list, we use a queue data structure with a First Input First Output(FIFO) mechanism. This queue structure ensures that the latest information remains in the vectorlist and obsolete information gets discarded.
4.2.2 Vector list storage strategy
Considering that the arrival order of each neighbor’s information varies, it is necessary to apply positional encoding to each neighbor’s information, while simultaneously setting the number of neighboring nodes. The positional information is converted into a onehot format and then inputted into the embedding lookup layer using the absolute positional encoding method.
For a given input sequence of length n, let t represent the order of arrival of neighbors, and \(p_n\subseteq R^d \) represent the vector corresponding to position t, where d is the dimension of the vector. Function \(f:N \rightarrow R^d\) is used to generate the positional vector \(p_n\) and is defined as follows:
The frequency \(\omega _k\) is defined as follows:
From the function definition, it can be observed that the frequency decreases along the vector dimension. As a result, it forms a geometric series in terms of wavelength, ranging from \(2\pi \) to \(1000 \cdot 2\pi \). The positional encoding \(p_n\) is a combination of cosine and sine functions for each frequency, and d is divisible by 2.
For a node’s information \(N(t)=(neighbor_1 , neighbor_2,\cdots ,\)\( neighbor_m)\), the positional encoding layer combines the positional information into the original vector list as follows:
where \(\widehat{N(t)}, N(t) ,P(t) \in R_n \times d\), with n being the maximum length of the vector list and d representing the vector dimension.
4.3 Multihead attention process
We employ a multihead attention mechanism to compel the model to learn different aspects of knowledge. The multihead attention achieves this by utilizing multiple independent attention heads, which compute attention weights separately, and then concatenate or weightsum their results to obtain richer representations, as illustrated in Fig. 5.
The primary computational process is represented through the following equations:
where Q represents Query, which can be interpreted as whether there is an association between the current node and other nodes; K represents Key, understood as the key information of the node when matched with Q; V represents Value, interpretable as the node’s important features. The dot product value between ‘Q’ and ‘K’ determines the contribution to the final output of ‘V’; the larger the dot product, the greater its contribution to V. \(W_Q, W_K, W_v \in R^{d \times d_h}\) are learnable weight matrices used to capture the interaction between node information and the information of neighboring nodes after positional encoding, where \(d_h\) is the output dimension. \(\sqrt{d}\) serves as a scaling factor for data normalization. Through this mechanism, the attention module can determine how to update node embeddings based on both the node itself and its neighboring information.
It is necessary to use a normalisation strategy to limit the mean and variance of the outputs, as attentional outputs may differ between nodes. As complex attention mechanisms can disrupt the statistical distribution within piles, this paper uses layer normalisation in its calculations. Layer normalization achieves this goal by computing the mean and variance as follows:
where d represents the dimension of the input to layer normalization, \(\mu \) and \(\sigma \) represent the mean and variance shared by all hidden units in the layer. \(\odot \) denotes elementwise multiplication between two vectors. The learnable parameters b and g are defined as bias and gain to ensure that the normalization operation does not affect the original information.
4.4 Model training
For various downstream tasks, it is imperative to choose diverse loss functions and metrics to appraise the performance of different models. A timevarying negative sampling strategy has been proposed by us to construct positivenegative sample pairs for training using crossentropy loss function.
where, the summation occurs when nodes i and j interact during training. \(\sigma \) represents the sigmoid activation function, and \(p_n (v)\) is the negative sampling distribution. The overall flow of DGNNMN is presented in Algorithm 1.
5 Experiments
In this section, we introduce the datasets used for the experiment and thecomparison algorithm and evaluation index. Then the effectiveness of the proposed model is verifed by edge prediction, node classification, ablation experiment, hyperparameter analysis and time complexity analysis.
5.1 Datasets
In this paper, we utilize five realworld dynamic graph datasets to evaluate the effectiveness of the model. Table 2 presents the detailed information about these datasets.

Reddit^{Footnote 1}: This dataset represents data about posts made by users on subreddits within a month. Its nodes correspond to users and posts, and its edges indicate posting requests with timestamps.

Wikipedia^{Footnote 2}: This dataset covers edits made to Wikipedia pages within a month. Its nodes represent human editors and Wikipedia pages, with edges representing edits with timestamps..

Enron^{Footnote 3}: This dataset is a communication network, with edges representing email communication between core employees over several years.

MOOC^{Footnote 4}: This dataset involves online courses, where nodes denote students and course content units, such as videos and problem sets, and edges indicate students’ access to specific units.

Meituan^{Footnote 5}: This dataset is a collection of food preference data from the Meituan platform, with nodes representing users and dishes, and edges representing user click behavior.
5.2 Baselines
We compare the proposed method with nine dynamic graph representation learning models, categorized based on how they handle the data: discrete dynamic graph representation learning models: EvolveGCN [14], and continuoustime dynamic graph representation learning models: JODIE [1], TGAT [3], Dyrep [38], TGN [2], APAN [13], MERIT [15], EdgeBank [40], DyGformer [30].

JODIE(2019) [1]: Two recurrent neural networks are used to update the embedding of users and items at each interaction. A novel projection operator is introduced that learns and estimates the user’s embedding at any time in the future, and therefore models the future embedding trajectories of the user and item.

Dyrep(2019) [38]: Considering the changes in network topology and node interactions, the dynamic network can be divided into two processes: the association process and the communication process. This suggests that learning can act as a mediator that effectively connects the two processes.

EvolveGCN(2020) [14]: A graph convolutional network model is employed in the time dimension without the aid of node embedding. The method evolves the GCN parameters by using RNN to capture the dynamics of the graph sequence.

TGAT(2020) [3]: Node embeddings should incorporate both static node features and dynamic topological features. Based on the selfattention mechanism and leveraging the classic Bochner theorem from harmonic analysis, we have developed a novel temporal encoding technique.

TGN(2020) [2]: A dynamic graph model that is both general and efficient has been proposed. The model incorporates memory modules and graph convolution operations, enabling it to learn from the data order while maintaining efficient parallel processing.

APAN(2021) [13]: It introduces an asynchronous propagation attention graph neural network algorithm that decouples model inference from graph computations, enabling independent execution of graph computations without impeding the speed of model inference.

MERIT(2022) [15]: By employing a contextaware attention mechanism to sequentially locate the most relevant neighbors in a temporal order, we jointly capture contextual information at multiple levels on the temporal graph. Ultimately, multiple aggregation and propagation steps are carried out to explore and leverage highlevel structural information for downstream tasks.

EdgeBank(2022) [40]: Two visualisation techniques are proposed to understand recurring patterns of edges over time, and a more rigorous evaluation procedure is proposed to predict specific dynamic maps.

DyGformer(2023) [30]: Extensible coding interfaces and comprehensive evaluation protocols are proposed, demonstrating their effectiveness in capturing node correlation and longterm temporal dependencies.

OUR : We introduce an approach that employs memory regeneration alongside neighbour propagation. Initially, the memory regenerate mechanism effectively captures node temporal features. Then, neighbour propagation independently stores highlevel neighbour information. Finally, multihead attention selects critical information to improve node representations.
5.3 Evaluation metrics
In this paper, we employ Average Precision (AP) and Area Under Curve (AUC) as the evaluation metrics.
Among these, AP can be utilized to assess the detection performance of a system, specifically its ability to correctly predict targets across different thresholds within the test sample. It results in the PrecisionRecall (PR) curve, computed based on the following formula:
By integrating the area under the PR curve, one can obtain the AP value for that particular category, as expressed by the following formula:
In this context, TP stands for true positives, FP represents false positives, FN denotes false negatives, P corresponds to precision, and R signifies recall.
AUC is employed to assess the performance of binary classification models. It results in the Receiver Operating Characteristic (ROC) curve, calculated based on the following formula:
By integrating the area under the Receiver Operating Characteristic (ROC) curve, one can obtain the AUC value for that particular category, as expressed by the following formula:
In this context, FP represents false positives, TN denotes true negatives, TP stands for true positives, FN signifies false negatives, F corresponds to false positive rate (FPR), and T represents true positive rate (TPR).
5.4 Experimental settings
For the five datasets mentioned above, we employed the Adam optimizer with a learning rate set to 0.0001 for training. The data was split into training, validation, and testing sets with a ratio of 70\(\%\), 15\(\%\), and 15\(\%\), respectively. A batch size of 200 was used during training. To prevent overfitting, a dropout rate of 0.1 was applied, and early stopping was set with a patience of 5. The number of attention heads was configured as 2, and the messagepassing layers were set to 2. In both the encoder and decoder, a twolayer feedforward neural network (MLP) was used, and no extensive hyperparameter tuning was conducted in the experimental results.
5.5 Link prediction
We employed AP and AUC as evaluation metrics to assess the model’s performance in link prediction tasks. In this task, we estimated the standard deviation of the samples (across more than 10 random seeds) and reported the average AP and AUC results as percentages.
We treat the observed node future link prediction as a transductive learning task and the future link prediction on unseen nodes as an inductive learning task. The transductive learning results are presented in Table 3, and the inductive learning results are shown in Table 4. The best results are indicated in bold, while the secondbest results are underscored.Since the source code for the MERIT model is not provided, only a portion of the experimental results are shown.
The results in Tables 3 and 4 demonstrate the superior performance of our approach in inductive and transductive learning tasks. In the transductive learning task, the proposed DGNNMN model consistently outperforms all baseline models in link prediction. Compared to models like APAN that do not require memory regenerate, DGNNMN achieves a maximum improvement of 6.63\(\%\) and 4.01\(\%\) in AP and AUC metrics, respectively. This result underscores the significant enhancement that the construction of memory regenerate units brings to DGNNMN by effectively preserving node information.
In the inductive learning task, DGNNMN demonstrates improvements in both AP and AUC, surpassing most models and significantly outperforming classical models like TGAT and TGN. For these two metrics, it shows improvements of at least around 5\(\%\), highlighting the effectiveness of the position encoding in the neighbor information propagation module and the design of vector list queries in compensating for the drawback of limited neighbor information acquired through memory regenerate. We compared DGNNMN with DyGformer(2023) and EdgeBank(2022), and we can find that our model overall outperforms the newer model, obtaining good performance results. For the MOOC and Enron datasets, DGNNMN achieves a maximum improvement of 8.59\(\%\) in AP and 14.08\(\%\) in AUC. This underscores the significant advantages of our model in scenarios with fewer nodes but complex interactions among them, validating the effectiveness of DGNNMN.
In summary, the DGNNMN model demonstrates outstanding performance in both inductive and transductive learning tasks, specifically in managing node information through memory regeneration and in situations with fewer nodes but intricate connections. These findings validate the effectiveness of the model and its potential for a broad range of applications.
5.6 Node classification
We also conducted specific experiments on the downstream task of node classification, where the dataset was divided into training, validation, and test sets with a ratio of 70\(\%\), 15\(\%\), and 15\(\%\). For the node classification task, we estimated the standard deviation of the samples (across more than 10 random seeds) and reported the AUC results as percentages. The results are presented in Table 5, with the best results indicated in bold and the secondbest results underscored.
As we can see from Table 5, it is evident that DGNNMN performs well in node classification tasks. For the Reddit dataset, which has a larger dataset and more edge features, the MERIT model uses the Coco attention mechanism to capture multilevel context effectively, resulting in better node classification performance, about 5.34\(\%\) higher than DGNNMN that employs only multihead attention. However, for the Wikipedia and Meituan datasets, DGNNMN achieves a maximum improvement of 1.93\(\%\) and 4.08\(\%\) in AUC, respectively. This suggests that the embeddings learned after memory regenerate and neighbor propagation are capable of performing well in node classification tasks.
To summarise, the DGNNMN model performs adeptly in classification tasks for nodes, particularly on specific datasets where its value is increased through learning features for embedding via memory regeneration and neighbour propagation, resulting in its superiority over other models. Nevertheless, on occasions such as the Reddit dataset, other models may outperform it based on criteria associated with the dataset and task.
5.7 Ablation experiments
To assess the contribution of each module used in the model to the final recommendation performance, we conducted ablation experiments by removing specific modules. In this section, we designed three variants to evaluate the extent of performance improvement from different modules in DGNNMN. The symbols used in these experiments are detailed in Table 6, and the results are presented in Fig. 6.
As shown in Fig. 6, the performance advantage of the DGNNMN model across four different datasets, especially in comparison to its variant models, is evident. This indicates that different components play varying roles in improving the overall model performance. It also implies the need to comprehensively consider the effects of each component when constructing and optimizing dynamic graph models to better leverage information within dynamic graphs.
Due to the unique structure of the MOOC dataset, which has fewer nodes but extensive interactions between them, the neighbor propagation module plays a crucial role in improving the performance on this dataset. This underscores the idea that the model may exhibit different advantages on different datasets due to variations in data structure.
In conclusion, the results of the ablation experiments highlight the excellent performance of the DGNNMN model across multiple datasets, while emphasizing the importance of considering dataset characteristics and the interplay of different components in the model’s design and optimization.
5.8 Parameter analysis
In this section, we investigated the hyperparameter of neighbor sampling. We set the neighbor sampling size to 0, 10, 20, and 30 to explore the specific impact of this hyperparameter on the model. The results are presented in Fig. 7.
As is shown in the Fig. 7, we can observe that as the number of neighbors increases beyond 10, the experimental results on the Wikipedia and MOOC datasets remain almost constant, with the curve in a very stable state, indicating minimal impact on the model’s robustness. However, for the Enron dataset, the experimental results clearly show an upward trend. This suggests that the setting of the neighbor sampling size is of great significance to the overall model.
5.9 Model efficiency analysis
In the link prediction task, we computed the memory and training time for each epoch in the Wikipedia dataset. Each batch contained 200 interactions, and the model utilized two network layers. Models with lower memory consumption are positioned closer to the lowerleft corner. The horizontal axis represents the average time required for model training epochs, and the vertical axis indicates the memory required for training the model. To ensure the fairness of the experiments, all experiments were conducted on a Windows system with an RTX3060 GPU. The experimental results are depicted in the Fig. 8.
As shown in Fig. 8, it is evident that among all models, DGNNMN has the shortest training time and relatively lower memory consumption. This is because DGNNMN does not need to query neighboring node information in the graph; it simply reads information from the stored vector lists, significantly reducing computational overhead. Additionally, TGAT is the most competitive model among those based on continuous dynamic graphs, but due to its extensive attention coefficient calculations between nodes, the graph querying process is more timeconsuming.
6 Conclusion
In this paper, we present the DGNNMN framework, a dynamic graph neural network approach that addresses the challenges of the intricate dependencies and longterm dependencies of dynamic graph data. In response to the first chalenge, We propose a model for gathering node information based on a memory update strategy and optimize node representations via a memory update strategy. For the second challenge, We propose a method for retrieving neighbour information using a strategy called Neighbour Propagation, this module stores vast amounts of highorder neighbour data through an independent neighbour information propagation module. Finally, through a multihead attention mechanism, we achieve fast learning of dynamic graphs. Comprehensive experiments on real datasets show that DGNNMN outperforms current stateoftheart methods.
In our future work we will conduct indepth research and exploration of the scalability of the model to provide more comprehensive and enhanced evidence to support the scalability of the proposed method, and also explore more practical application scenarios, especially in the areas of Dynamic Network Analysis, Social Network Mining, and Recommender Systems, to validate the applicability and scalability of the method in different scenarios.In addition, investigating the development of visualisation tools and interpretability measures to help endusers understand dynamic graph models is an important focus for further research and applications.
Availability of supporting data
The datasets used in the experiments are publicly available in the online repository.
References
Kumar S, Zhang X, Leskovec J (2019) Predicting dynamic embedding trajectory in temporal interaction networks. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining. pp 1269–1278
Rossi E, Chamberlain B, Frasca F, Eynard D, Monti F, Bronstein M (2020) Temporal graph networks for deep learning on dynamic graphs. arXiv:2006.10637
Xu D, Ruan C, Korpeoglu E, Kumar S, Achan K (2020) Inductive representation learning on temporal graphs. arXiv:2002.07962
Zhang Y, Xiong Y, Li D, Shan C, Ren K, Zhu Y (2021) Cope: modeling continuous propagation and evolution on interaction graph. In: Proceedings of the 30th ACM international conference on information & knowledge management. pp 2627–2636
Ma Y, Guo Z, Ren Z, Tang J, Yin D (2020) Streaming graph neural networks. In: Proceedings of the 43rd international ACM SIGIR conference on research and development in information retrieval. pp 719–728
Zhang Z, Bu J, Ester M, Zhang J, Yao C, Li Z, Wang C (2020) Learning temporal interaction graph embedding via coupled memory networks. Proc Web Conf 2020:3049–3055
Xu D, Ruan C, Korpeoglu E, Kumar S, Achan K (2019) Selfattention with functional time representation learning. Adv Neur Inf Process Syst 32
Hu W, Yang Y, Cheng Z, Yang C, Ren X (2021) Timeseries event prediction with evolutionary state graph. In: Proceedings of the 14th ACM international conference on web search and data mining. pp 580–588
Chang X, Liu X, Wen J, Li S, Fang Y, Song L, Qi Y (2020) Continuoustime dynamic graph learning via neural interaction processes. In: Proceedings of the 29th ACM international conference on information & knowledge management. pp 145–154
Huang W, Zhang T, Rong Y, Huang J (2018) Adaptive sampling towards fast graph representation learning. Adv Neur Inf Process Syst 31
Kang H, Ho JH, Mesquita D, Pérez J, Souza AH (2021) Online graph nets
Zhang M, Wu S, Yu X, Liu Q, Wang L (2022) Dynamic graph neural networks for sequential recommendation. IEEE Trans Knowl Data Eng 35(5):4741–4753
Wang X, Lyu D, Li M, Xia Y, Yang Q, Wang X, Wang X, Cui P, Yang Y, Sun B, et al. (2021) Apan: Asynchronous propagation attention network for realtime temporal graph embedding. In: Proceedings of the 2021 international conference on management of data. pp 2628–2638
Pareja A, Domeniconi G, Chen J, Ma T, Suzumura T, Kanezashi H, Kaler T, Schardl T, Leiserson C (2020) Evolvegcn: Evolving graph convolutional networks for dynamic graphs. Proc AAAI Conf Artif Intell 34:5363–5370
Hu B, Wu Z, Zhou J, Liu Z, Huangfu Z, Zhang Z, Chen C (2022) Merit: Learning multilevel representations on temporal graphs. IJCAI
Hochreiter S, Schmidhuber J (1997) Long shortterm memory. Neural Comput 9(8):1735–1780
Chung J, Gulcehre C, Cho K, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv:1412.3555
Zhou J, Cui G, Hu S, Zhang Z, Yang C, Liu Z, Wang L, Li C, Sun M (2020) Graph neural networks: A review of methods and applications. AI open. 1:57–81
Gori M, Monfardini G, Scarselli F (2005) A new model for learning in graph domains. In: Proceedings. 2005 IEEE international joint conference on neural networks, 2005., vol. 2. IEEE, pp 729–734
Kipf TN, Welling M (2016) Semisupervised classification with graph convolutional networks. arXiv:1609.02907
Hamilton W, Ying Z, Leskovec J (2017) Inductive representation learning on large graphs. Adv Neur Inf Process Syst 30
Veličković P, Cucurull G, Casanova A, Romero A, Lio P, Bengio, Y (2017) Graph attention networks. arXiv:1710.10903
Bhatti UA, Tang H, Wu G, Marjan S, Hussain A (2023) Deep learning with graph convolutional networks: An overview and latest applications in computational intelligence. Int J Intell Syst 2023:1–28
Du Y, Wang L, Feng D, Wang G, Ji S, Gomes CP, Ma ZM et al (2024) A new perspective on building efficient and expressive 3d equivariant graph neural networks. Adv Neur Inf Process Syst 36
Veličković P, Fedus W, Hamilton WL, Liò P, Bengio Y, Hjelm RD (2018) Deep graph infomax. arXiv:1809.10341
Xia W, Li Y, Tian J, Li S (2021) Forecasting interaction order on temporal graphs. In: Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining. pp 1884–1893
Wang X, Ji H, Shi C, Wang B, Ye Y, Cui P, Yu PS (2019) Heterogeneous graph attention network. In: The world wide web conference. pp 2022–2032
Fu J, Li C, Zhao Z, Zeng Q (2023) Heterogeneous graph knowledge distillation neural network incorporating multiple relations and crosssemantic interactions. Inf Sci 658:120004
Li C, Fu J, Yan Y, Zhao Z, Zeng Q (2024) Higher order heterogeneous graph neural network based on node attribute enhancement. Expert Syst Appl 238:122404
Yu L, Sun L, Du B, Lv W (2024) Towards better dynamic graph learning: New architecture and unified library. Adv Neur Inf Process Syst 36
Saxena R, Pati SP, Kumar A, Jadeja M, Vyas P, Bhateja V, Lin JCW (2023) An efficient betgcn approach for link prediction. IJIMAI. 8(1):38–52
Chen D, Wen J, Lv C (2023) A spatiotemporal attention graph convolutional networks for sea surface temperature prediction
Zhang Y, Xiong Y, Liao Y, Sun Y, Jin Y, Zheng X, Zhu Y (2023) Tiger: temporal interaction graph embedding with restarts. Proc ACM Web Conf 2023:478–488
Luo L, Haffari G, Pan S (2023) Graph sequential neural ode process for link prediction on dynamic and sparse graphs. In: Proceedings of the sixteenth ACM international conference on web search and data mining. pp. 778–786
Nguyen GH, Lee JB, Rossi RA, Ahmed NK, Koh E, Kim S (2019) Continuoustime dynamic network embeddings. Comp Proc Web Conf 2018:969–976
Zhang Y, Xiong Y, Kong X, Niu Z, Zhu Y (2019) Ige+: A framework for learning node embeddings in interaction graphs. IEEE Trans Knowl Data Eng 33(3):1032–1044
Zhang Y, Xiong Y, Kong X, Zhu Y (2017) Learning node embeddings in interaction graphs. In: Proceedings of the 2017 ACM on conference on information and knowledge management. pp 397–406
Trivedi R, Farajtabar M, Biswal P, Zha H (2019) Dyrep: Learning representations over dynamic graphs. In: International conference on learning representations
Li Y, Shen Y, Chen L, Yuan M (2023) Zebra: When temporal graph neural networks meet temporal personalized pagerank. Proc VLDB Endow 16(6):1332–1345
Poursafaei F, Huang S, Pelrine K, Rabbany R (2022) Towards better evaluation for dynamic link prediction. Adv Neural Inf Process Syst 35:32928–32941
Funding
This work is supported by National Key R&D Program of China(Grant No.2022ZD0119501); National Natural Science Foundation of China (Grant No.62072288, 52374221); the Natural Science Foundation of Shandong Province (Grant No.ZR2022MF268, ZR2021QG038); ShandongYouth Innovation Team; the Taishan Scholar Program of Shandong Province (Grant No.tsqn202211154, ts20190936), the ‘Qunxing Plan’ project of educational and teaching research of Shandong University of Science and Technology(Grant No. QX2020Z12).
Author information
Authors and Affiliations
Contributions
Chao Li, Runshuo Liu, Jinhu Fu, Zhongying Zhao, Hua Duan, Qingtian Zeng wrote the main manuscript text; Runshuo Liu and Jinhu Fu prepared the result of our experiments; All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Ethical Approval
Not applicable.
Consent to participate
There is the consent of all authors.
Human and Animal Ethics
Not applicable.
Consent for publication
There is the consent of all authors.
Competing interests
The authors declare that there is no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author selfarchiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Li, C., Liu, R., Fu, J. et al. DGNNMN: Dynamic Graph Neural Network via memory regenerate and neighbor propagation. Appl Intell 54, 9253–9268 (2024). https://doi.org/10.1007/s10489024055003
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489024055003