Abstract
Given the advantages of machine learning method with accuracy and high efficiency, when compared with classical numerical methods, the cell-average based neural network (CANN) method is proposed to opens up a new field for solving and simulating solutions of nonlinear Schrödinger equations numerically. Inspired by the finite volume method for solving fluid flow problems, the CANN method seeks to explore shallow and fast neural network solvers to approximate the solution average difference between two consecutive time levels. The CANN method can be considered as a time discretization scheme, which is an explicit one-step method that evolves the solution forward in time. The CANN method is a network method of the finite volume type, which means that it is mesh dependent and is a local solver. The experimental results demonstrate that the CANN method yields satisfactory outcomes at different time steps within the specified time window. Once the neural network has been effectively trained, it can be applied to solve the NLS equation with different initial conditions. Furthermore, the CANN method demonstrates strong generalization ability in processing low-quality data with noise. To enhance the utilization of the CANN method in partial differential equations, we carry out numerical experiments on the NLS equation, which show the high practicality and accuracy of this method.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
In recent years, more attentions have been attracted to the study of nonlinear models in optical fiber communication, plasma, condensed matter physics, and fluid mechanics [1,2,3,4]. It is well known that the Nonlinear Schrödinger (NLS) equation can be widely used in describing the quantum behavior of microscopic particles in quantum mechanics [5,6,7]. Furthermore, exact solutions of this equation can be used to describe nonlinear phenomena in other physical fields, such as optics, plasma, Bose–Einstein condensates, fluid mechanics and Heisenberg ferromagnet [8,9,10,11,12]. For example, the transmission of light pulses in dispersive and nonlinear media, the motion of superconducting electrons in electromagnetic fields, and the propagation of thermal pulses, etc. Among the above studies, the NLS eqaution has important mathematical and physical significance for the simulation of various physical phenomena [13], driving the depth and development of research.
It is not surprising that certain numerical methods have been employed in the development of relatively accurate models for the NLS equation. Eight finite difference schemes for NLS equation are compared and analyzed in [14] and the physical properties of solitary-wave solutions are studied in [15]. Reference [16] generalizes the split-step finite difference method to solve the scaler NLS equation and the coupled NLS equations. A local meshless method for spatial discretization of NLS equations is proposed in [17], while integrators are used in temporal discretization. Additional finite difference methods are utilized in the works [14, 16, 18]. In contrast to the finite difference methods, the Refs. [19,20,21,22] employ the Galerkin method, which directly addresses the original control equation in integral form. Moreover, some comprehensive methods and other numerical methods can be found in [13, 23,24,25] and [26,27,28,29,30,31,32,33], respectively. While classical numerical methods have yielded abundant results, there remains a need to develop a neural network approach to studying the NLS equation that excels in terms of stability, accuracy, and efficiency when compared to established classical numerical methods.
Being one of the models to realize machine learning, neural network has been widely used to solve different kinds of PDEs in recent years. The physics-informed neural networks (PINNs) have been applied to approximate solutions of the NLS equation [34,35,36,37]. The PINNs method has been employed for simulating various phenomena, including rogue wave solutions, soliton solutions of higher-order NLS equations, data-driven solutions of the Sasa–Satsuma equation, and solutions of the logarithmic NLS equation featuring \(\mathcal{P}\mathcal{T}\)-symmetric potential, respectively [38,39,40,41,42,43]. In addition, there are also some other methods which are combined with neural network [44,45,46,47,48,49,50,51]. Chen etc. enhanced the training accuracy of PINN for NLS equations through the innovative application of multi-view transfer learning [52]. Jaganathan etc. developed the proficient capability of PINN in accurately solving coupled NLS equations [53]. Furthermore, Bogdanov etc. introduced a novel approach to phase computation for finite-genus solutions to the NLS equation, employing convolutional neural networks [54]. Although the PINN method has advantages of mesh free and automatic differentiation, it can only approximate the analytical solution of the differential equation within fixed time window, which has certain inefficiencies and limitations. Therefore, we intend to further improve the neural network method and thus it can be widely applied to various cases.
In this paper, we consider the following (1+1)-dimensional nonlinear Schrödinger equation:
[32] where u are complex-valued solutions with respect to x and t, \(\alpha , \beta \) and \(\gamma \) are real coefficients, \(i=\sqrt{-1}\).
Based on the finite volume method for solving fluid flow problems, we propose a cell average neural network (CANN) method as a novel approach for solving the NLS equation. The idea of CANN method [55, 56] is to explore a network to approximate the solution average difference between two neighboring time steps. After well-trained, the CANN method is implemented as a conventional explicit one-step finite volume scheme. Training of the CANN is to find a super scheme for the given PDEs. Unlike the huge computational network, CANN method establishes a local and network-related solver, which constructs a network between adjacent moments and transfers the well trained parameters of network to the next time interval. The key advantage of the CANN method is its explicit scheme, which progresses the solution evolution forward in time explicitly upon being well trained. Upon achieving proficiency through training, the CANN method demonstrates a remarkable capability to accommodate larger time step sizes (i.e. \(\Delta t = 4 \Delta x\)), rendering it exceptionally fast and efficient, especially when applied to higher-order PDEs.
Following a series of numerical experiments, our observations indicate that a well-trained neural network exhibits the capability to accurately predict the solution function’s trend for time intervals beyond those included in the training set. Remarkably, even when changing the initial condition u(x, 0) or adjusting the time increment, favorable outcomes can be achieved without necessitating a retraining of the neural network. To further explore the generalization ability of the CANN method, we apply it to deal with some low-quality initial training set with noise. Notably, the CANN method effectively captures the trend of the main wave propagation, demonstrating its robust performance within a specified margin of error.
This paper is organized as follows. In Sect. 2, we detail the CANN method and how it is specifically applied to NLS equation. In Sect. 3, we clarify the setup of the neural network structure, and the algorithm for training the network. In Sect. 4, we simulate several different wave solutions of the NLS equation using the CANN method. Finally, we conclude in Sect. 5.
2 CANN method
In this section, we detail the principles and mechanisms of the CANN method and clarify the connection between the Schrödinger equation solver and CANN method.
2.1 Problem setup and motivation
We consider solving a class of partial differential equations
here x and t denote the spatial and temporal variables, respectively. And \(\Omega \) represents the spatial domain. The operator denoted as \({\mathcal {L}}\) is employed to symbolize a generic higher-order differential operator encompassing nonlinear terms. The presence of these nonlinear terms often renders it challenging to determine exact solutions for such partial differential equations. Consequently, the utilization of deep neural networks consistently emerges as a viable approach for finding solutions. In this paper, we focus on the widely recognized nonlinear one-dimensional time-dependent Schrödinger equation, which is defined as
T represents the final time, \(i=\sqrt{-1}\), and u(x, t) signifies a complex-valued function. It’s notable that when \(\beta \) equals zero, Eq. (2.2) transforms into the linear variant of the Schrödinger equation.
The CANN method is driven by finite volume scheme. Our approach is guided by traditional numerical methods that follow the characterization of solutions of PDEs to construct neural networks. Suppose that on a one-dimensional spatial domain \(\Omega =\left[ a,b \right] \), \(\Omega \) is divided into J cells, then the average size of each cell is \(\Delta x=\frac{b-a}{J}\). Let the jth cell be \(I_{j}=[x_{j-1/2},x_{j+1/2}]\), then \(\left[ a,b \right] =\bigcup _{j=1}^{J} I_{j}\). It is also worth noting that \(x_{1/2} =a, \dots , x_{i+1/2}=a+i\Delta x, \dots , x_{J+1/2}=b \). Similarly, we divide the temporal domain into \(N_t\) parts and denote the time step size by \(\Delta t\), so we have \(t_{n} =n\Delta t\).
Next, we integrate both sides of the Eq. (2.1) over the computational unit \(I_{j}\) and the time interval \([t_{n},t_{n+1}]\) simultaneously to get
According to the definition of the cell average
we have the following form of Eq. (2.3)
As the main idea of the CANN method, here we design a fully connected neural network \({\mathcal {N}}(\cdot ;\Theta )\) to approximate the average difference between two adjacent time steps, i.e. the right-hand side of Eq. (2.5)
We know that \(\bar{u} _{j}(t_n)\) is the mean value of the solution on the jth cell at time level \(t_n\). Then the cell average \(\bar{u} _{j}(t_{n+1})\) at next time level \(t_{n+1}\) can be obtained by summing the output of the neural network and \(\bar{u} _{j}(t_n)\). That is, from Eq. (2.5) we get
where \(\Theta \) can be interpreted more precisely as a parameter set containing the weights and biases used by the network. More detailed descriptions of the input vectors, output vectors, network structure, and the updating of the parameters are provided below.
2.2 Network architecture
We denote the solution average at time level \(t_n\) as \(\bar{u} _{j}^{n}\). With \(\bar{v}_j^{out}\approx \bar{u} _{j}^{n+1}\) and \(\bar{v}_j^{in}=\bar{u} _{j}^{n}\) in (2.7), we establish the format of the CANN method for solving PDEs
As the input vector of neural network, the general format of \(\overrightarrow{V}_{j}^{in}\) is
where p, q, \(n_p\) and \(n_q\) are all positive integers. When we solve nonlinear PDEs, we always choose the input vector with \(p\ge 1\) and \(q\ge 1\).
In particular, if the equation satisfies periodic boundary conditions, it is important to note that when \(j=1\), the ghost cell averages with \(\bar{u}_{J-p+1}^{n}, \dots , \bar{u}_{J}^{n}\) to the left of \(\bar{u}_{1}^{n}\) are required to assemble the input vector \(\overrightarrow{V}_{1}^{in}\). Similarly, the incorporation of ghost cell averages, denoted by \(\bar{u}_{1}^{n}, \dots , \bar{u}_{q}^{n}\) positioned to the right of \(\bar{u}_{J}^{n}\) will be integrated into the input vector \(\overrightarrow{V}_{J}^{in}\). For example, we consider the case of \(p=q=3\), resulting in the following input vector
When \(j=1\), the input vector is \(\overrightarrow{V}_{1}^{in}=\left[ \bar{u}_{J-2}^{n}, \bar{u}_{J-1}^{n},\right. \left. \bar{u}_{J}^{n},\bar{u}_{1}^{n},\bar{u}_{2}^{n},\bar{u}_{3}^{n},\bar{u}_{4}^{n} \right] ^T\); when \(j=J\), the input vector is \(\overrightarrow{V}_{J}^{in}=\left[ \bar{u}_{J-3}^{n},\bar{u}_{J-2}^{n},\bar{u}_{J-1}^{n},\bar{u}_{J}^{n},\bar{u}_{1}^{n},\bar{u}_{2}^{n},\bar{u}_{3}^{n}\right] ^T \). And so forth for the remaining vectors.
For the non-periodic boundary condition, we will utilize the initial evolution averages for the ghost cells outside the domain to enforce boundary conditions. For example, \(\overrightarrow{V}_{1}^{in}=\left[ \bar{u}_{-2}^{n}, \bar{u}_{-1}^{n},\bar{u}_{0}^{n},\bar{u}_{1}^{n},\bar{u}_{2}^{n},\bar{u}_{3}^{n},\bar{u}_{4}^{n} \right] ^T\), \(\overrightarrow{V}_{J}^{in}=\left[ \bar{u}_{J-3}^{n}, \bar{u}_{J-2}^{n},\bar{u}_{J-1}^{n},\bar{u}_{J}^{n},\bar{u}_{J+1}^{n},\bar{u}_{J+2}^{n},\bar{u}_{J+3}^{n} \right] ^T \). Here, \(\bar{u}_{-2}^{n}, \bar{u}_{-1}^{n},\bar{u}_{0}^{n}\) and \(\bar{u}_{J+1}^{n},\bar{u}_{J+2}^{n},\bar{u}_{J+3}^{n}\) are computed by the cell average formula in the ghost cells. The selection of the input vector’s form can occasionally and directly impact the accuracy of predicting the solution average at the next time level.
In this paper, we propose the design of an M-layer fully connected neural network, in which \((M-2)\) layers, excluding the input and output layers, are designated as hidden layers. Therefore, the minimum structure of the network is a three-layer fully connected neural network. Typically, the number of hidden layers and neurons in each layer increases with the complexity of solving the partial differential equations. Here we let \(n_{i}\ (i=1,\dots ,M)\) denote the number of neurons per layer. The dimension of the input vector is \(n_1=p+q+1\), and the dimension of the output vector is \(n_M=1\). In addition, the neural network function \({\mathcal {N}}(\overrightarrow{V}_{j}^{in};\Theta )\) is represented as a composite of multilayer operations
where \(\circ \) denotes the composite operation, W and b are the weights and bias terms of the network, respectively. If we define \(z^{l-1}\) \((l=1,\dots ,M-2)\) as the input of the lth hidden layer, the output expression for the lth hidden layer is given by:
where \(z^0=\overrightarrow{V}_{j}^{in}\), and \(\sigma \) is the hyperbolic tangent activation function. The optimization process for the parameter set \(\Theta \) that we concerned with will be elucidated in the next section.
3 Training process
In this section, we explore the training methodology utilized to obtain the optimal parameter set \(\Theta ^{*}\), ensuring that the neural network effectively approximates the average evolution of the solution \(\bar{u} _{j}^{n} \rightarrow \bar{u} _{j}^{n+1}\). The incorporation of learning datasets, encompassing both training and target data, plays a pivotal role throughout the entire training process.
3.1 Generation of datasets
Learning data is gathered in pairs, each corresponding to the solution averages at two adjacent time levels. This collection of learning data is denoted as
where \(\overrightarrow{V}_{j}^{n}\) is shaped like the vector described in (2.9), and \(N_t\) represents the number of time units. The elements within the vector \(\overrightarrow{V}_{j}^{n}\) are derived from solution averages obtained from provided data (or the initial condition \(u(x,0)=u_0(x)\)). The target data \(\bar{u} _{j}^{n+1}\) represents the solution averages obtained through alternative numerical methods or observations derived from realistic physical–mathematical experiments. Our goal is to minimize the mean squared error between the output and the target data \(\bar{u} _{j}^{n+1}\) by using appropriate optimizer.
Remark 1
Compared to the CANN method in [55, 56], we improve the assembly of the input vector \(\overrightarrow{V}_{j}^{n}\) for the CANN method by providing two different approaches in this research for the NLS equation. Approach (I) involves assembling the input vector directly using complex values u(x, t), while approach (II) involves assembling the input vector using the envelope or modulus \(\vert u\vert =\sqrt{Re(u)^2+Im(u)^2}\).
To assess the efficacy of the CANN method in handling corrupted or low-quality data, the following noisy data is generated as learning dataset
Here \(\omega _j\) and \(\xi _j\) represent Gaussian white noise, both drawn from a standard normal distribution. In this paper, we consider two cases with \(\eta = 0.02\) and \(\eta = 0.05\) (e.g. Fig. 1), representing \(\pm 2\%\) and \(\pm 5\%\) relative noises in the entire dataset, respectively.
Neural networks are inherently resistant to input noise due to their ability to generalize from noisy data. This resistance is crucial for solving differential equations where input data may be contaminated with noise. The robustness of neural networks ensures that the noise does not disproportionately affect the accuracy of the solution as the algorithm progresses. Neural networks, through iterative training and learning processes, can diminish the impact of input noise over time. As the training progresses, the network learns to filter out noise and focus on the underlying patterns and structures within the data. This characteristic ensures that the solution remains stable and accurate, even when starting with noisy inputs.
Remark 2
According to Morozov’s principle [57, 58], under optimal conditions, the accuracy of the results obtained using neural network-based methods should fall within the range of input noise. This principle aligns with the observed behavior of neural networks in handling noise. The network’s final output is expected to have an error margin that is consistent with the level of noise present in the input data, ensuring that the solutions are not disproportionately affected by noise. This work will be subjected to further detailed analysis in the future.
3.2 Training methodology
In this section, we detail the training process to obtain the optimal parameter set \(\Theta ^{*}\), which can be used to accurately approximate the solution average evolution \(\bar{u} _{j}^{n} \rightarrow \bar{u} _{j}^{n+1}\). As discussed in Remark 1, two distinct perspectives are considered regarding the generation of learning dataset S in (3.1) (or \(S_N\) in (3.2)). And the following squared loss function (3.3) will be used in the training process for all \(j=1,\dots ,J\) and for all \(n=0,\dots ,N_t-1\),
The training process of CANN can be succinctly summarized in Algorithm 1. The overall training process and output module are shown in Fig. 2. As the Remark 1 in [56], we also use multiple time levels of solution averages in the training process. It’s worth noting that we use the same network in all steps. The training process involves updating the network parameters \(\Theta =\Theta ^{*}\) in a specific time step interval, and then \(\Theta ^{*}\) is supplied to train the same neural network as the initial parameters for the next time step interval up to the training data corresponding to \(t_{N_t}\).
Referring to Remark 2 of [56], it is feasible to employ the proficiently trained CANN model using a single trajectory dataset to address the problem (2.1) under varied initial values \(u^i(x,0)=u_0^i(x)\), where i denotes distinct initial conditions. Once the network is trained, it will behave as an explicit one-step finite volume scheme to solve problems with different initial conditions without retraining the network. For example, if trajectory data of \(u^1(x,0)=u_0^1(x)\) is gathered for network training, the well-trained CANN remains applicable for solving different initial value problems with trajectories \(u^i(x,0)=u_0^i(x)\) \((i=2,3,\dots )\).
Throughout the training process of the CANN method, the effectiveness of the method is significantly influenced by four key elements: (1) the selection of the spatial spatial mesh size \(\Delta x\); (2) the determination of the time step \(\Delta t\); (3) the specification of the length and structure of the input vector \(\overrightarrow{V}_{j}^{in}\); (4) the determination of the number of layers in the hidden layer of the fully connected neural network and the number of neurons corresponding to each layer. It is noteworthy that the primary advantage of our CANN method, in comparison to traditional numerical approaches, lies in its ability to accommodate large time steps (e.g. \(\Delta t=2\Delta x\) with different \(\Delta x\)). When confronted with high-dimensional partial differential equations, traditional methods tend to be costly and algorithmically intricate, whereas the CANN method allows for a quicker and more efficient calculation of the evolution process.
4 Numerical results
In this section, a series of numerical experiments are presented for addressing various types of linear and nonlinear Schrödinger equations. According to the Remark 1, both Approach I and Approach II are employed in Example 4.1.1 and Example 4.2.1, whereas Example 4.1.2 and Example 4.2.2 exclusively utilize Approach II. Subsequently, we apply the CANN method to the accuracy tests and evaluate the network’s ability to test different initial conditions. In contrast to the previous numerical tests in [55, 56], we also improve the CANN method to solve the NLS equation with different initial conditions as well as with different parameters c or \(\omega \) in this section. Furthermore, the CANN solver are also conducted on datasets of lower quality that include noise to validate the stability and feasibility.
In order to enhance the comprehensibility of the numerical results, we provide the error expressions for both the \(L_2\) and \(L_\infty \) norms,
where the symbol T represents the final time, J represents the total number of spatial domain cells, \(\bar{v}_j^{out}\) stands for the average of the predicted solution within the jth cell, and \(\bar{u}_j\) corresponds to the average of the exact solution within the same jth cell.
4.1 Linear Schrödinger equation
Example 4.1.1
When \(\beta =\gamma =0\), \(\alpha =-1\), Eq. (1.1) becomes the linear case of Schrödinger equation in [32]
The exact solution is given by \(u(x,t)=e^{ict}sin(x)\), satisfying periodic boundary conditions within the spatio-temporal domain of \( [-5,5]\times [0,1]\). In this example, we employ Approach I and Approach II to generate input vectors with values u in the complex plane and a modulus of \(|u|\).
Accuracy tests: When \(c=1\), we select the spatial mesh size \(\Delta x=0.05\) and temporal mesh size \(\Delta t=2\Delta x\). It turns out that the size of the training data set is only 200 in each time segment and the total size is 2000. The training set is generated by solution \(u(x,0)=sin(x)\). The neural network contains only one hidden layer and the hidden layer has eight neurons. And the specific structure of the input vector is given below
Upon successful training of this neural network, it can accurately predict the wave behavior of the solution up to time \(T = 2\). The time slices simulated by our CANN method and the overall fluctuation trend are shown in Fig. 3.
Furthermore, in order to test the stability of the trained neural network, we keep the fixed spatial mesh size \(\Delta x=0.05\) and test four cases of different time steps \(\Delta t= 4\Delta x\), \(2\Delta x\), \(\Delta x\), \((1/2)\Delta x\). The errors of \(L_2\) and \(L_{\infty }\) at \(T=2\) are in Table 1. Through the experiments, we find that the error results of simulations with different \(\Delta t\) tend to be stable.
Complex value test: Following the same setting of CANN solver, we change the input vectors in the learning datasets with complex values. The input vector is shown below
Once the accumulation error is under control and the network training process is successfully completed, we obtain an optimized CANN solver capable of predicting the solution’s behavior up to time \(T = 2\). A graphical representation of the trajectory prediction, encompassing the real and imaginary components of u(x, t) as well as the modulus at time \(T=2\), is presented in Fig. 4. The corresponding \(L_2\) and \(L_\infty \) errors list in Table 2.
Different c and different initial conditions: In this part, we directly use the neural network trained in accuracy tests to simulate u(x, t) while altering the parameter c of u(x, t) to \(-\) 5 and 5. The resulting approximation of the solution for these different values of c are presented in Figs. 5 and 6. Importantly, the initial condition of the solution remains consistent throughout these experiments. To assess the generalization capability of our CANN model across varying initial conditions, we conduct experiments involving \(u(x,0)=sin(x)-cos(x)\) and \(u(x,0)=cos(x)\), and the outcomes are presented in Figs. 7 and 8.
Low quality data: Finally, we apply the CANN method to deal with low-quality data. Specifically, we incorporate noise levels of \(\eta =0.02\), \(\eta =0.05\), and \(\eta =0.1\) representing the introduction of approximately \(\pm 2\%\), \(\pm 5\%\) and \(\pm 10\%\) noise within the dataset during accuracy tests, respectively. The results, illustrated in Figs. 9, 10 and 11, substantiate the robustness of the CANN method in effectively handling data noise.
Example 4.1.2
When \(\beta =0\), \(\alpha =1\), \(\gamma =c-(2/x^2)\), Eq. (1.1) becomes the linear case of Schrödinger equation
The exact solution is given by
and its spatio-temporal domain is \([1,11]\times [0,1]\).
Accuracy tests: \(c=1\). First, we choose the appropriate spatial mesh size \(\Delta x=0.05\) and temporal mesh size \(\Delta t=\Delta x\) for generating the training set of \(u(x,0)=x^2\). Similarly the size of training data set is also 200 in each time segment and the total size is 4000. The network has only one hidden layer containing eight neurons. Its input vector is of the following form
Once the network is trained, it can be used to predict the behavior of the solution up to the moment \(T = 2\). Snapshots of its simulation at different moments are shown in Fig. 12.
Additionally, to evaluate the accuracy in relation to various time step sizes, we maintain a fixed spatial mesh size of \(\Delta x=0.05\) while considering four different time step sizes: \(\Delta t= 2\Delta x\), \(\Delta x\), \((1/2)\Delta x\), \((1/4)\Delta x\). The \(L_2\) and \(L_{\infty }\) errors, calculated at the final time point \(T=2\), are presented in Table 3. Notably, all simulations conducted with varying \(\Delta t\) values are stable.
Different c and different initial conditions: In this part, we employ the well trained CANN solver in the accuracy tests to test different parameters in the problem with \(c=0.5\) and \(c=2\). The simulation for these different c values are displayed in Figs. 13 and 14. It is worth noting that the performance of different parameters are good without retraining the neural network throughout these experiments. Furthermore, we also carry out experiments involving different initial conditions \(u_0(x)=\frac{1}{x}+\frac{x^2}{100}\) form the solution \(u(x,t)=(\frac{1}{x}+\frac{x^2}{100})e^{it}\) and \(u_0(x)=\frac{1}{x}\) from the solution \(u(x,t)=\frac{1}{x}e^{it}\) to evaluate the generalization capability of our method. The results illustrated in Figs. 15 and 16 are good.
Low quality data: Finally, two cases are considered by taking \(\eta =0.02\) and \(\eta =0.05\) corresponding to \(\pm 2\%\) and \(\pm 5\%\) relative noise in all data. In order to have a clearer view of the processing of the noise-added data points, here the figures are all intercepted only in the spatial interval [1, 2]. From Figs. 17 and 18 we observe that the CANN model predictions are quite robust to data noise..
4.2 Nonlinear Schrödinger equation
Example 4.2.1
Firstly, we show an accuracy test for the NLS equation
which admits a progressive plane wave solution
where \(c=\omega ^2+0.5\vert A \vert ^2\omega \), A and \(\omega \) are constants. The exact solution satisfies periodic boundary conditions in the spatio-temporal domain \( [0,2\pi ]\times [0,1] \).
Accuracy tests: When \(c=1.5\), we select a spatial mesh size of \(\Delta x=2\pi /160\) and a temporal mesh size of \(\Delta t=(8/\pi )\Delta x\). Here the size of training data set is only 160 in each time segment and the total size is 1600. The training dataset is generated from the trajectory of the solution described in Eq. (4.10). Our shallow neural network is designed with a single hidden layer comprising eight neurons. It is important to highlight that in this case, the input vector
is constructed from modulus values. The evolution process over adjacent time intervals involves approximately \(K = 10^{4}\) iterations. Upon the successful completion of network training, the solution’s wave behavior in Fig. 19 can be predicted up to \(T = 2\).
Additionally, to assess the stability of the trained neural network, we applied the well trained CANN solver to test different temporal mesh size of \(\Delta t= (4/\pi )\Delta x\), \((8/\pi )\Delta x\), \((32/3\pi )\Delta x\) and \((16/\pi )\Delta x\) with fixed \(\Delta x={2\pi }/{160}\). We calculate and report the \(L_2\) and \(L_{\infty }\) errors at \(T=2\) in Table 4. The findings indicate favorable performance in error values across simulations with different \(\Delta t\), implying that a larger time step size can be employed to efficiently simulate the equations.
Complex value test: Additionally, we also address the equation with Approach II in Remark 1 as the input vector for our neural network without changing the network architecture. Subsequently, the following vectors are assembled into the neural network,
After training, we can apply the well-trained CANN solver to predict the approximation up to time \(T=2\). The results for the real and imaginary components along with \(\vert u \vert \) are displayed in Fig. 20 and Table 5.
Next, we aim to assess the potential for high generalization of our method by applying it to various initial conditions and different c and \(\omega \) for the NLS equation without network retraining.
Different initial conditions: First, we employ the CANN method to address a distinct initial condition \(u(x,0)=1.5e^{ix}\). The CANN method has a good performance in dealing with the different initial conditions in Fig. 21.
Different c and \(\omega \): Second, to validate the different parameters c and \(\omega \) in the equation, we systematically present the following two cases.
(1) In this case, we consider \(c=5\) and \(\omega =2\) with
which follows periodic boundary condition in the spatio-temporal domain \( [0,1]\times [0,1] \). The previously well-trained CANN solver, which utilizes modulus as input vector, is employed to address in this case. Significantly, while the architecture of the neural network remains consistent, the crucial consideration lies in selecting an appropriate time step during the simulation process. Hence, we opt for a spatial mesh size of \(\Delta x=0.005\) and a corresponding temporal mesh size of \(\Delta t=5\Delta x\). The results simulated using the CANN method, along with the overall fluctuation pattern, are illustrated in Fig. 22.
(2) In this case, we consider \(c=0.5\) and \(\omega =-1\) with
With the similar setting for the neural network, we only need to change \(\Delta x=0.05\) and \(\Delta t=2\Delta x\) for the different parameters. The predicted results are displayed in Fig. 23.
Low quality data: At last, we apply the CANN method to deal with low-quality data. We introduce noise levels of \(\