1 Introduction

Testing independence is an important and fundamental problem in statistics and is also popular in other fields, such as finance (Olagunju 2011), environmental engineering (Kim et al. 2020), meteorology (Kulkarni et al. 2021), and medicine (Ma et al. 2021). There are various approaches for testing independence that rely on correlation coefficients, such as Pearson’s correlation coefficient, Kendall’s tau, and Spearman’s rank correlation coefficient. These coefficients serve as measures of dependence. However, these conventional methods are limited to detecting linear or monotonic relationships and, therefore, may not be applicable for general independence tests. To detect more complex nonlinear associations, researchers have introduced numerous alternative measures. For instance, maximum information coefficient (MIC), introduced by Reshef et al. (2011), provides a score that is approximately equal to the coefficient of determination (\(R^2\)) with respect to the regression function. The distance correlation (dCor), proposed by Székely et al. (2007), does not involve nonparametric estimation and is free of tuning parameters. Meanwhile, a rank-based distance measure (HHG) proposed by Heller et al. (2013) demonstrates robust numerical performance. These measures have been further explored in various contexts, including MIC estimation based on the BackMIC algorithm (Cao et al. 2021), matrix multivariate auto-distance covariance and correlation functions for time series (Fokianos and Pitsillou 2018), and tests based on the rank-based indices (Zhou et al. 2024). Additionally, there exist other methods to test independence, such as Hilbert-Schmidt independence criterion (Gretton et al. 2007), maximum mean discrepancy (Gretton et al. 2012), rank correlation-based statistics (Leung and Drton 2018), data-driven representation (Gonzalez et al. 2021), multivariate ranks defined by transportation measures (Deb and Sen 2023), and receiver operating characteristic analysis (Limnios and Clémençon 2024).

Previous studies on independence tests assumed that observations were directly collected without measurement errors, but such an assumption might not always hold in numerous scientific domains. Therefore, in this paper, we address the problem of independence tests between two random variables, X and Y, where X is subject to measurement errors. Specifically, we observe only the surrogate variable W, where \(W=X+\epsilon \) and \(\epsilon \) represents the measurement error. Such considerations are motivated by the study of low-resolution observations of source stars (LROSS), from which a sample of 660 observations was collected to unveil physical processes that could influence both galaxy evolution and cosmic expansion. The LROSS dataset, which contains redshift, metallicity, and radial velocity values, was downloaded from the Panoramic Survey Telescope and Rapid Response System (Pan-STARRS) at the National Astronomical Science Data Center of China, and was publicly released on September 28, 2023. In astronomy, it is crucial to understand the formation of cosmic structures and the evolution of galaxies by exploring the associations between redshift with metallicity or radial velocity. Hence, our goal is to examine whether nonlinear associations exist between redshift and metallicity or radial velocity. If such associations are present, it is also worthwhile to investigate the distributions of metallicity and radial velocity given redshift respectively. However, redshift is subject to substantial measurement errors, and it presents a considerable challenge to address these issues simultaneously with in this framework. Although Fan et al. (2022) proposed a modified distance correlation method to test independence in the presence of measurement errors, their approach is only applicable under the repeated measurement design. It might fail without repeated measurements as it cannot be used to estimate the conditional distribution directly.

Mutual information (MI, Shannon (1948)) can be viewed as the expectation of the log-likelihood ratio, which has the following form,

$$\begin{aligned} MI(X, Y)=E\bigg \{\log \frac{f_{X,Y}(X, Y)}{f_X(X)f_Y(Y)}\bigg \}, \end{aligned}$$
(1.1)

where \(f_X\), \(f_Y\) and \(f_{X,Y}\) are the density functions of X, Y and (XY) respectively. The Neyman-Pearson lemma (Neyman and Pearson 1933) guarantees that the independence test based on MI is most powerful in an averaging sense. MI is sensitive to detect nonlinear associations. It is a symmetric and nonnegative measure, which equals zero if and only if the variables X and Y are independent. As a powerful nonlinear dependence measure, MI plays a pivotal role in data analysis and has garnered considerable attention in the literature. For instance, Pethel and Hahs (2014) proposed an exact null hypothesis significance test using MI. Zeng et al. (2018) introduced the jackknife kernel estimation for MI, which has very good statistical properties, such as automatic bias correction and high local power in independence tests. Runge (2018) proposed a fully non-parametric test for continuous data based on the conditional MI combined with a local permutation scheme. Berrett and Samworth (2019) devised an MI-based test for assessing the independence between two multivariate random vectors. Additionally, Ai et al. (2022) developed an MI-based test for independence. Their proposed test is simple to implement, easy to compute, and consistent against all departures from independence, with only a slight loss of local power. Notably, this loss is independent of the data dimension.

However, the MI defined in (1.1) cannot be directly estimated when the variable X contains measurement errors. To address this issue, we rewrite MI as follows,

$$\begin{aligned} MI(X, Y)=E\big [\log \big \{f_{Y|X}(Y\mid X)\big /f_Y(Y)\big \}\big ], \end{aligned}$$

where \(f_{Y|X}\) is the conditional density function (CDF) of Y given X. Therefore, it is a crucial and challenging task to estimate \(f_{Y|X}(Y\mid X)\), as it could affect the estimation accuracy of MI(XY), especially when X contains measurement errors. In this paper, we will utilize the deconvolution method to estimate \(f_{Y|X}\). This approach circumvents the need for repeated measurements, making it particularly suitable for our purpose. Deconvolution techniques have been widely acknowledged in the literature. For instance, Scott and Terrell (1987) used it to improve the accuracy of cross-validation, while Carroll and Hall (1988) used it to estimate density functions in the presence of known distributional errors. Stefanski and Carroll (1990) applied deconvolution to improve kernel density estimators. Marron and Wand (1992) utilized it to address the variance term in density estimation, thereby improving performance assessment. Additionally, Fan and Truong (1993) employed the deconvolution method to estimate the probability density function and investigated the asymptotical normality of the deconvolution kernel density estimators. Wang and Wang (2011) introduced an R software package, “decon", which contains a collection of functions that apply the deconvolution kernel methods to address challenges caused by measurement errors. Furthermore, Huang and Zhang (2023) developed a deconvolution kernel estimator for average dose-response functions and derived its asymptotic properties, including bias, variance, and linear expansion.

In this paper, our aim is to test the independence between X and Y and estimate the CDF \(f_{Y|X}\) when X is subject to measurement errors. Our work makes several innovative contributions to this field. First, we introduce a novel double kernel CDF estimator using the deconvolution method. We also investigate the impact of different error distributions on the convergence rates of the CDF estimator. Second, we then propose an estimator of MI for independence test between two variables. We also establish its asymptotic properties under both the null and the alternative hypotheses assuming different error distributions. Third, our analysis of the LROSS dataset not only tests the independence between redshift and metallicity or radial velocity, but also reveals the CDFs of metallicity and radial velocity given redshift respectively.

The remainder of the paper is organized as follows. Section 2 develops the CDF estimator and discusses its convergence rates. Section 3 introduces the MI estimator and its asymptotic properties. Section 4 reports simulation studies and real data analysis. Some concluding remarks are given in Sect. 5. All technical proofs are provided in the supplemental material.

2 Conditional density estimation

Suppose that \((W_i, Y_i)\) for \(i\in \{1,\cdots ,n\}\) is a random sample from the population (WY), where \(W_i=X_i+\epsilon _i\), with \(W_i\) representing the observable values of \(X_i\) and \(\epsilon _i\) being the measurement errors. In Sect. 2.1, we adopt the deconvolution techniques and propose the double kernel estimator of the CDF in the presence of measurement errors. The convergence rate of this estimator depends on the smoothness of the error distribution, which is characterized by the decay rate of its characteristic function in the tails. Drawing on the terminology introduced by Fan and Truong (1993) and Delaigle (2021), we differentiate between two classes of errors: super smooth and ordinary smooth.

1. Super smooth with order \(\beta \): The characteristic function \(\phi _\epsilon (\cdot )\) satisfies

$$\begin{aligned} d_0|t|^{\beta _0}e^{-|t|^\beta /\gamma }\le |\phi _\epsilon (t)|\le d_1|t|^{\beta _1}e^{-|t|^\beta /\gamma }\quad as~t\rightarrow \infty , \end{aligned}$$
(2.1)

where \(d_0\), \(d_1\), \(\beta \) and \(\gamma \) are positive constants, \(\beta _0\) and \(\beta _1\) are constants.

2. Ordinary smooth with order \(\beta \): The characteristic function \(\phi _\epsilon (\cdot )\) satisfies

$$\begin{aligned} d_0|t|^{-\beta }\le |\phi _\epsilon (t)|\le d_1|t|^{-\beta }\quad as~t\rightarrow \infty , \end{aligned}$$
(2.2)

for positive constants \(d_0\), \(d_1\) and \(\beta \).

For example, Gaussian and Cauchy distributions are super smooth distributions, while gamma and double exponential distributions are ordinary smooth distributions. The order \(\beta \) represents the decay rate of \(\phi _\epsilon (t)\) as \(t\rightarrow \infty \), which reflects the smoothness of the error distribution. For instance, \(\beta =1\) corresponds to the Cauchy distribution; \(\beta =2\) applies to both Gaussian and double exponential distributions. In the gamma distribution, the order \(\beta \) depends on both the shape and scale parameters.

2.1 Double kernel estimate

Let \(K_{h_2}(\cdot )=K(\cdot /h_2)/h_2\), where \(K(\cdot )\) is a kernel function with bandwidth \(h_2\). When \(X_i\) are observable, according to De Gooijer and Zerom (2003), the estimator of \(f_{Y|X}(y\mid x)\) is given by

$$\begin{aligned} \tilde{f}_{Y|X}(y\mid x) =\frac{1}{nh_1}\sum _j{G\Big (\frac{x-X_j}{h_1}\Big )K_{h_2}(Y_j-y)}\Big /{\hat{f}(x)}, \end{aligned}$$
(2.3)

where \(G(\cdot )\) is also a kernel function with bandwidth \(h_1\), and

$$\begin{aligned} \hat{f}(x)=\frac{1}{nh_1}\sum _i G\Big (\frac{x-X_i}{h_1}\Big ) \end{aligned}$$

is the Parzen-Rosenblatt density estimator of X (Parzen 1962; Rosenblatt 1956).

When the variables \(X_1,\cdots ,X_n\) are unobservable, the estimator \(\tilde{f}_{Y|X}(y\mid x)\) cannot be directly calculated. Suppose that one can only observe the surrogate variables \(W_i\) through \(W_i=X_i+\epsilon _i\) for \(i\in \{1,\cdots ,n\}\). Denote the density functions of X and W as \(f_X(\cdot )\) and \(f_W(\cdot )\), respectively, and the distribution function of \(\epsilon \) as \(F_\epsilon (\cdot )\). Then \(f_W(w)=\int _{-\infty }^{\infty }f_X(w-x)\,dF_\epsilon (x)\). By the deconvolution method described in Stefanski and Carroll (1990), the marginal density function \(f_X(\cdot )\) can be estimated as

$$\begin{aligned} \hat{f}_n(x)=\frac{1}{2\pi }\int _{-\infty }^{\infty }e^{-itx}\phi _G(th_1)\frac{\hat{\phi }_n(t)}{\phi _\epsilon (t)}\,dt, \end{aligned}$$
(2.4)

where \(\phi _G(\cdot )\) is the Fourier transform of the kernel function \(G(\cdot )\), \(\phi _\epsilon (\cdot )\) is the characteristic function of the error term \(\epsilon \), and \(\hat{\phi }_n(\cdot )=n^{-1}\sum _{j=1}^n{e^{itW_j}}\) is the empirical characteristic function of \(\{W_j,~j=1\cdots ,n\}\). Then, (2.4) can be rewritten in the following kernel form,

$$\begin{aligned}&\hat{f}_n(x)=\frac{1}{nh_1}\sum _{j=1}^n{G_n\Big (\frac{x-W_j}{h_1}}\Big )\nonumber \\&\quad ~~\text{ with }~~ G_n(x)=\frac{1}{2\pi }\int _{-\infty }^{\infty }e^{-itx}\frac{\phi _G(t)}{\phi _\epsilon (t/h_1)}\,dt. \end{aligned}$$
(2.5)

According to (2.3) and (2.5), the double kernel CDF estimator of \(f_{Y|X}(y\mid x)\) based on the surrogate variables \(W_i\) is given by

$$\begin{aligned} \hat{f}_{Y|X}(y\mid x)=&\sum _j{G_n\Big (\frac{x-W_j}{h_1}\Big )K_{h_2}(Y_j-y)}\Big /\sum _i{G_n\Big (\frac{x-W_i}{h_1}\Big )}\nonumber \\ =&\frac{1}{nh_1}\sum _j{G_n\Big (\frac{x-W_j}{h_1}\Big )K_{h_2}(Y_j-y)}\Big /\hat{f}_n(x), \end{aligned}$$
(2.6)

where \(\hat{f}_n(x)\) and \(G_n(x)\) are defined by (2.5).

2.2 Performance of double kernel estimators

To establish our main results, we require the following conditions.

  1. (C1)

    The characteristic function of the error distribution \(\phi _\epsilon (\cdot )\) does not vanish.

  2. (C2)

    Let \(a<b\). The marginal density \(f_X(\cdot )\) of X is bounded away from zero on the interval [ab] and has a bounded \(k_1\)th derivative. The density function \(f_Y(\cdot )\) of the observed Y is bounded away from zero to infinity, and has a bounded \(k_3\)th derivative.

  3. (C3)

    The CDF \(f_{Y|X}(y\mid x)\) has a continuous \(k_2\)th derivative with respect to x on [ab].

  4. (C4)

    The kernel function \(G(\cdot )\) is a \(k_1\)th-order kernel. Namely, \(\int _{-\infty }^{\infty }G(y)\,dy=1\), \(\int _{-\infty }^{\infty }y^{k_1}G(y)\,dy \ne 0\) and \(\int _{-\infty }^{\infty }y^jG(y)\,dy=0\) for \(j=1, \cdots , k_1-1\). Similarly, \(K(\cdot )\) and \(L(\cdot )\) are \(k_2\)th-order and \(k_3\)th-order kernels, respectively.

  5. (C5)

    The bandwidths \(h_1\), \(h_2\) and \(h_3\) satisfy \((nh_3)^{-1}\log n\rightarrow 0\), \(nh_1^{1/2}h_2^{1/2}h_3^{1/2}(h_1^{2k_1-1/2}+h_2^{2k_2-1/2}+h_3^{2k_3-1/2})(h_1^{1/2}+h_2^{1/2}+h_3^{1/2})\rightarrow 0\), \(h_1^{k_1-1/2}h_2^{-1/2}h_3^{1/2}\rightarrow 0\), \(h_1^{-1/2}h_2^{k_2-1/2}h_3^{1/2} \rightarrow 0\), \(h_1^{k_1-1/2}h_2^{1/2}h_3^{1/2}\rightarrow 0\) and \(h_1^{1/2}h_2^{1/2}h_3^{k_3-1/2}\rightarrow 0\).

Remark 1

These conditions are generally regarded as mild. In particular, condition (C1) ensures that the estimator (2.6) is well-defined (Stefanski and Carroll 1990). Conditions (C2) and (C3) are analogous to those typically required for ordinary nonparametric regression, assuming bounded support and relating to the smoothness of the density function and CDF (Fan and Truong 1993; De Gooijer and Zerom 2003; Fan and Jiang 2005; Su and White 2014). Condition (C4) assumes the orders of these kernel functions (Fan and Truong 1993). The order of common kernel functions, such as Gaussian and Epanechnikov kernels, is typically set to 2. Condition (C5) on the bandwidths is necessary for deriving the asymptotic properties of the independence test statistic and can be easily satisfied.

The following subsections present two sets of results under super smooth and ordinary smooth error distributions respectively. The first set discusses the local and global convergence rates, while the second focuses on the uniform results. The global rates are described by the \(L_p\)-norms, which are defined as follows. Let \(g(\cdot )\) denote a real-valued function on the real line and \(\omega (\cdot )\) be a non-negative weight function. Define \(\Vert g(\cdot )\Vert _{wp}=\big \{\int |g(x)|^p\omega (x)\,dx\big \}^{1/p}\), for \(1\le p\le \infty \), and \(\Vert g(\cdot )\Vert _{w\infty }=\sup _x|\omega (x)g(x)|\).

To express the consistency results, a class of joint density functions of (XY) needs to be introduced. In this class, conditions (C1)-(C3) should hold uniformly. More precisely, let B be a positive constant and [ab] denote a compact interval. Define \(m(x, y)=E[K_{h_2}(Y-y)\mid X=x]\) and

$$\begin{aligned} \mathscr {F}_{k_1,B}&=\Big \{f(x,y): |m^{(j)}(\cdot ,\cdot )|\le B, \\&\quad \ j=1,\cdots ,k_1,~|f_X^{(k_1)}(\cdot )|\le B, \\&\quad \ \min _{a\le x\le b}f_X(x)\ge B^{-1}\Big \}. \end{aligned}$$

Note that this class \(\mathscr {F}_{k_1,B}\) reformulates the standard conditions so that they hold uniformly.

2.2.1 Super smooth error distributions

We first present the local and global rates under super smooth error distributions. Let

$$\begin{aligned} b_{k_1}(x,y)&=(-1)^{k_1}\bigg [\frac{[m(x,y)f_X(x)]^{(k_1)}}{k_1!}\\&\quad -m(x,y)\frac{f_X^{(k_1)}(x)}{k_1!}\bigg ]f_X^{-1}(x)\int _{-\infty }^{\infty }u^{k_1}G(u)\,du. \end{aligned}$$

Theorem 2.1

Suppose that conditions (C1)-(C4) are satisfied and the first half inequality of (2.1) holds. Assume that \(\phi _G(t)\) has bounded support on \(|t|\le C_0\). Then, for \(h_1=c_1(\log n)^{-1/\beta }\) with \(c_1>C_0(2/\gamma )^{1/\beta }\) and \(h_2=O((\log n)^{-1/\beta })\), we have

$$\begin{aligned}&E\big |\big (\hat{f}_{Y|X}(y\mid x)-f_{Y|X}(y\mid x)\big )\hat{f}_n(x)/f_X(x)\big |^2\\&\quad =[c_1^{k_1}b_{k_1}(x,y)]^2(\log n)^{-2{k_1}/\beta }(1+o(1)) ~~~\text{ and }\\&E\int _a^b\big |\big (\hat{f}_{Y|X}(y\mid x)-f_{Y|X}(y\mid x)\big )\hat{f}_n(x)/f_X(x)\big |^2dx\\&\quad =\int _a^b[c_1^{k_1}b_{k_1}(x,y)]^2dx(\log n)^{-2k_1/\beta }(1+o(1)). \end{aligned}$$

The factor \(\hat{f}_n(x)/f_X(x) (\rightarrow _P 1)\) is used to avoid the case where the denominator of \(\hat{f}_{Y|X}(y\mid x)-f_{Y|X}(y\mid x)\) might degenerate to 0. It does not significantly affect the statistical properties of the proposed estimator, as demonstrated in the subsequent proofs. Specifically, \(E\big [\big (\hat{f}_{Y|X}(y\mid x)-f_{Y|X}(y\mid x)\big )^2|W_1,\cdots ,W_n\big ]=[c_1^{k_1}b_{k_1}(x,y)]^2(\log n)^{-2k_1/\beta }(1+o(1))\). The rates mentioned earlier also hold uniformly over \(\mathscr {F}_{k_1,B}\).

Theorem 2.2

Assume that \(\phi _\epsilon (\cdot )\) and \(G(\cdot )\) satisfy the conditions of Theorem 2.1. If the weight function \(\omega (x)\) has support [ab], then

$$\begin{aligned}&\lim _{d\rightarrow \infty }\limsup _{n\rightarrow \infty }\sup _{f\in \mathscr {F}_{k_1,B}}P_f\big \{|\hat{f}_{Y|X}(y\mid x)\\&\quad -f_{Y|X}(y\mid x)|\ge d(\log n)^{-k_1/\beta }\big \}=0 ~~~~~and\\&\lim _{d\rightarrow \infty }\limsup _{n\rightarrow \infty }\sup _{f\in \mathscr {F}_{k_1,B}}P_f\big \{\Vert \hat{f}_{Y|X}(y\mid \cdot )-f_{Y|X}(y\mid \cdot )\Vert _{wp}\\&\quad \ge d(\log n)^{-k_1/\beta }\big \}=0, ~1\le p\le \infty . \end{aligned}$$

Theorem 2.2 implies an interesting phenomenon that the convergence rate of \(\hat{f}_{Y\mid X}\) remains the same under the weighted \(L_p\)-loss \((1\le p<\infty )\) and \(L_\infty \)-loss. However, this conclusion does not hold for ordinary nonparametric regression, where the global rates of convergence tend to be slower under \(L_\infty \)-loss (Stone 1982).

2.2.2 Ordinary smooth error distributions

In order to explicitly compute the rate of the mean squared error (MSE) of the CDF estimator, a condition on the tail behavior of \(\phi _\epsilon (t)\) is required. This condition, which is a deformation of (2.2), is given by

$$\begin{aligned} t^\beta \phi _\epsilon (t)\rightarrow c,~~ |t^{\beta +1}\phi _\epsilon '(t)|=O(1)\quad as~t\rightarrow \infty , \end{aligned}$$
(2.7)

where c is a non-zero constant.

Theorem 2.3

Assume that conditions (C1)-(C4) are satisfied, and \(\int _{-\infty }^{\infty }|t^{\beta +1}|\big (|\phi _G(t)|+|\phi _G'(t)|\big )\,dt<\infty \), \(\int _{-\infty }^{\infty }|t^{\beta +1}\phi _G(t)|^2\,dt<\infty \). Then, under the ordinary smooth error distribution condition (2.7) and assuming \(h_1=c_2n^{-1/\{2(k_1+\beta )+1\}}\) with \(c_2>0\), \(h_2=O\big (n^{-1/\{2(k_1+\beta )+1\}}\big )\) and \(n^{1-1/\{2(k_1+\beta )+1\}}h_2\rightarrow \infty \), we have

$$\begin{aligned}&E\big |\big (\hat{f}_{Y|X}(y\mid x)-f_{Y|X}(y\mid x)\big )\hat{f}_n(x)/f_X(x)\big |^2\\&\quad =\bigg [b_{k_1}^2(x, y)h_1^{2k_1}+\frac{1}{nh_1^{1+2\beta }}v(x,y)\bigg ](1+o(1)) \\&\quad =O(n^{-2k_1/\{2(k_1+\beta )+1\}})\\&\quad ~~and~~~ E\int _a^b\big |\big (\hat{f}_{Y|X}(y\mid x)\\&\quad -f_{Y|X}(y\mid x)\big )\hat{f}_n(x)/f_X(x)\big |^2\,dx=O(n^{-2k_1/\{2(k_1+\beta )+1\}}), \end{aligned}$$

where \( v(x,y)=1\big /\big \{2\pi f_X^2(x)\big \}{\int }_{-\infty }^{\infty }|t^\beta /c|^2|\phi _G(t)|^2\,dt {\int }_{-\infty }^{\infty }\tau ^2(x-v)f_X(x-v)\,dF_\epsilon (v), \) with \(\tau ^2(\cdot )=E\big [(K_{h_2}(Y-y)-m(x,y))^2|X=\cdot \big ]\).

Remark 2

(1) According to Theorems 2.1 and 2.3, the convergence rate of MSE under the super smooth error distribution condition is \(O((\log n)^{-2{k_1}/\beta })\), which is significantly slower than that under the ordinary smooth error distribution, \(O(n^{-2k_1/\{2(k_1+\beta )+1\}})\). These two convergence rates are influenced by measurement errors through the parameter \(\beta \). Furthermore, when there is no measurement error in the data, Tsybakov (2011) gave the convergence rate of the kernel density estimator, which is \(O(n^{-2k_1/(2k_1+1)})\) for \(h=c_3n^{-1/(2k_1+1)})\) with a constant \(c_3\). Obviously, the convergence rate under measurement error with the ordinary smooth error distribution condition is slightly slower than the rate given by Tsybakov (2011).

(2) According to Theorems 2.1 and 2.3, the convergence rate of MSE improves when the smooth parameter \(\beta \) decreases or the order \(k_1\) of the kernel function increases. Commonly used kernel functions, such as the Epanechnikov kernel and Gaussian kernel, are of order 2. Using these second-order kernel functions, denoted as G(u), we can construct fourth-order kernel functions given by \(G^*(u)=3G(u)/2+uG'(u)/2\), where \(G'(u)\) is the first derivative of G(u). Similarly, it is possible to construct sixth-order and higher-order kernel functions, which can further enhance the convergence rate of MSE.

The next theorem shows that the previous results hold uniformly for a class of densities.

Theorem 2.4

If \(\phi _\epsilon (\cdot )\), \(h_1\) and \(G(\cdot )\) satisfy the conditions of Theorem 2.3, and the weight function has bounded support [ab], then

$$\begin{aligned}&\lim _{d\rightarrow \infty }\limsup _{n\rightarrow \infty }\sup _{f\in \mathscr {F}_{k_1,B}}P_f\big \{|\hat{f}_{Y|X}(y\mid x)\\&\quad -f_{Y|X}(y\mid x)|\ge dn^{-k_1/\{2(k_1+\beta )+1\}}\big \}=0 ~~~~~and\\&\lim _{d\rightarrow \infty }\limsup _{n\rightarrow \infty }\sup _{f\in \mathscr {F}_{k_1,B}}P_f\big \{\Vert \hat{f}_{Y|X}(y\mid \cdot )-f_{Y|X}(y\mid \cdot )\Vert _{wp}\\&\quad \ge dn^{-k_1/\{2(k_1+\beta )+1\}}\big \}=0,~1\le p<\infty . \end{aligned}$$

3 Independence test

Section 2 applies the deconvolution techniques to estimate the CDF in the presence of measurement errors. Based on this, we use the proposed CDF estimator to construct an MI estimator for testing the independence between X and Y when X is measured with errors. The asymptotic properties of the MI estimator are investigated under both super smooth and ordinary smooth error distributions.

Independence is one of the fundamental concepts in data analysis and statistical inference. The null hypothesis states that two random variables are independent, i.e., \(H_0:~X\perp \!\!\!\perp Y.\) Under the alternative hypothesis, the MI is bounded away from 0, i.e., \( H_1: MI(X,Y)\ge c_0 > 0. \) To test the independence between X and Y when X is subject to measurement errors, we first give the estimator of MI as follows,

$$\begin{aligned} \widehat{MI}(X, Y)=\frac{1}{n}\sum _{i=1}^n\bigg \{\log \frac{\hat{f}_{Y|X}(Y_i\mid W_i)}{\hat{f}_Y(Y_i)}\bigg \}, \end{aligned}$$

where \(\hat{f}_{Y|X}(Y_i\mid W_i)\) and \(\hat{f}_Y(Y_i)\) are the leave-one-out kernel density estimators of \(f_{Y|X}(Y_i\mid W_i)\) and \(f_Y(Y_i)\), respectively. Specifically,

$$\begin{aligned}&\hat{f}_{Y|X}(Y_i\mid W_i)=\frac{\sum _{j\ne i}G_{n}^*(W_i-W_j)K_{h_2}(Y_i-Y_j)}{\sum _{j\ne i}G_{n}^*(W_i-W_j)} ~~~\text{ and }\\&\hat{f}_Y(Y_i)=\frac{1}{n-1}\sum _{j\ne i}L_{h_3}(Y_i-Y_j), \end{aligned}$$

where \(G_{n}^*(\cdot )=G_n(\cdot /h_1)/h_1\), \(L_{h_3}(\cdot )=L(\cdot /h_3)/h_3\), and \(L(\cdot )\) is a kernel function.

The following two subsections discuss the asymptotic properties of the proposed MI estimator under the null and alternative hypotheses, assuming that the measurement errors are either super smooth or ordinary smooth.

3.1 Super smooth error distributions

The following two theorems present the asymptotic properties of \(\widehat{MI}(X, Y)\) under the null and alternative hypotheses in the case of super smooth error distributions. To derive the asymptotic normality of \(\widehat{MI}(X, Y)\) under the null hypothesis, we define \(D_{1n}=-\nu _1\nu _2d_{1n}/(2nh_1h_2)+\nu _1d_{2n}/(2nh_1)+\nu _3d_{3n}/(2nh_3)\), where \(\nu _1=\int G_n^2(u)\,du\), \(\nu _2=\int K^2(u)du\), \(\nu _3=\int L^2(u)du\), \(d_{1n}=n^{-1}\sum _{i=1}^n 1/\hat{f}_{X, Y}(W_i, Y_i)\), \(d_{2n}=n^{-1}\sum _{i=1}^n 1/\hat{f}_{X}(W_i)\), \(d_{3n}=n^{-1}\sum _{i=1}^n 1/\hat{f}_Y(Y_i)\), \(d_{4n}=\int _{-\infty }^\infty \big [m(u,y)-m(x,y)\big ]G_n^*(x-u)f_X(u)\,du\), \(\hat{f}_{X, Y}(W_i, Y_i)\) and \(\hat{f}_X(W_i)\) are the leave-one-out kernel density estimators. Specifically, \(\hat{f}_{X, Y}(W_i, Y_i)=(n-1)^{-1}\sum _{j\ne i}G_{n}^*(W_i-W_j)K_{h_2}(Y_i-Y_j)\) and \( \hat{f}_X(W_i)=(n-1)^{-1}\sum _{j\ne i}G_{n}^*(W_i-W_j)\).

Theorem 3.1

Suppose that conditions (C1)-(C5) are satisfied, \(\phi _G(t)\) has bounded support on \(|t|\le C_0\), and \(h_1=c_1(\log n)^{-1/\beta }\) with \(c_1>C_0(6\gamma )^{1/\beta }\), \(h_2=O((\log n)^{-1/\beta })\), where \(\gamma \) and \(\beta \) are defined in (2.1). Under the null hypothesis that X and Y are independent, we have

$$\begin{aligned} \frac{n\big \{\widehat{MI} (X, Y)-D_{1n}\big \}}{\sqrt{h_1^{-1}h_2^{-1}\sigma _n^2+h_3^{-1}\sigma _1^2}}\xrightarrow {d} N(0,1), \end{aligned}$$

where \(\sigma _1^2=2E\{f_Y^{-1}(Y)\}{\int }\big \{2^{-1}{\iint } L(v)L(v+z)\,dv-L(z)\big \}^2\,dz \ \text {and} \ \sigma _n^2=2E\{f_{X,Y}^{-1}(W,Y)\}{\iint }\big \{2^{-1}{\iint } G_n(u)K(v)G_n(u+t)K(v+z)dudv-G_n(t)K(z)\big \}^2 dtdz. \)

Theorem 3.2

Suppose that the conditions of Theorem 3.1 are satisfied. Under the alternative hypothesis that X and Y are dependent, we have \(n\big (h_1^{-1}h_2^{-1}\sigma _n^2+h_3^{-1}\sigma _1^2\big )^{-1/2}\big \{\widehat{MI}(X, Y)-D_{1n}\big \}\rightarrow \infty \).

3.2 Ordinary smooth error distributions

In this subsection, we discuss the asymptotic properties of \(\widehat{MI}(X, Y)\) under the null and alternative hypotheses for ordinary smooth error distributions. To derive the asymptotic normality of \(\widehat{MI}(X, Y)\) under the null hypothesis, we define \(D_{2n}=-\nu _1\nu _2d_{1n}/(2nh_1h_2)+\nu _1d_{2n}/(2nh_1)+\nu _3d_{3n}/(2nh_3)\), where \(\nu _1\), \(\nu _2\), \(\nu _3\), \(d_{1n}\), \(d_{2n}\) and \(d_{3n}\) are defined in Sect. 3.1.

Theorem 3.3

Suppose that conditions (C1)-(C5) are satisfied and \(h_1=c_2n^{-1/\{2(k_1+\beta )+1\}}\) with \(c_2>0\), \(h_2=O\big (n^{-1/\{2(k_1+\beta )+1\}}\big )\) and \(n^{1-1/\{2(k_1+\beta )+1\}}h_2\rightarrow \infty \). Under the null hypothesis that X and Y are independent, we have

$$\begin{aligned} \frac{n\big \{\widehat{MI} (X, Y)-D_{2n}\big \}}{\sqrt{h_1^{-2}\sigma _n^2+h_3^{-1}\sigma _1^2}}\xrightarrow {d} N(0,1). \end{aligned}$$

Theorem 3.4

Suppose that the conditions of Theorem 3.3 are satisfied. Under the alternative hypothesis that X and Y are dependent, we have \(n(h_1^{-2}\sigma _n^2+h_3^{-1}\sigma _1^2)^{-1/2}\big \{\widehat{MI}(X, Y)-D_{2n}\big \}\rightarrow \infty \).

The proposed MI estimator in Sect. 3 serves as a natural independence test statistic. Statistical tests that depend on asymptotic distributions usually require a large sample size, whereas the permutation technique provides precise distribution even for small sample sizes (Mariano and Manuel 2008). Let \(\Omega =\big \{(W_i, Y_i), i = 1, \cdots , n\big \}\) denote a random sample drawn from (WY), and let \(\{\delta _1, \delta _2, \cdots , \delta _n\}\) be a random permutation of \(\{1, 2, \cdots , n\}\). Based on the dataset \(\Omega _1 = \big \{(W_i, Y_{\delta _i}), i = 1, \cdots , n\big \}\), we calculate the MI estimator, denoted by \(\widehat{MI_1}\). Repeat the above procedure M times to obtain \(T_M=\{\widehat{MI_k}, k=1,\cdots ,M\}\). The distribution of \(\widehat{MI}\) under the null hypothesis can be approximated by the empirical distribution of \(T_M\). We then reject the null hypothesis if \(\widehat{MI}\) computed from the original data is greater than the \((1-\alpha )\)-th quantile of \(T_M\), where \(\alpha \) is a prespecified significance level.

4 Numerical study

In this section, we conduct extensive numerical simulations to evaluate the performance of the proposed methods for CDF estimation and independence inference. Furthermore, we apply our methods to the LROSS dataset to explore the associations between redshift and metallicity or radial velocity.

4.1 Estimation performance of CDF

To test the effectiveness of our proposed deconvolution double kernel estimation (DC for short) method of the CDF, we compare it with the Nadaraya-Watson (NW for short) and local linear (LL for short) methods, both of which simply ignore the measurement errors, in the following examples. The CDF estimators based on the NW and LL methods are given by

$$\begin{aligned}&\hat{f}_\mathrm{{NW}}(y\mid x)=\sum _j{G\Big (\frac{x-W_j}{h_1}\Big )K_{h_2}}\\&(Y_j-y)\Big /\sum _j{G\Big (\frac{x-W_j}{h_1}}\Big ), \end{aligned}$$
$$\begin{aligned} \hat{f}_\mathrm{{LL}}(y\mid x)&= \sum _jG\Big (\frac{x-W_j}{h_1}\Big )K_{h_2}(Y_j-y)\\&\quad \ \{T_{n,2}-(W_j-x)T_{n,1}\} \Big /(T_{n,0}T_{n,2}-T_{n,1}^2), \end{aligned}$$

where \(T_{n,k}=\sum _{i=1}^n{G\big \{(x-W_j)/h_1\big \}(W_i-x)^k}\), \(k=0,1,2\). The optimal bandwidths \(h_1\) and \(h_2\) are selected to minimize the root mean squared error (RMSE), where \(\mathrm{{RMSE}}=\big \{(Nm)^{-1}\sum _{j=1}^N\sum _{i=1}^m\big (\hat{f}^{(j)}_{Y|X}(y_{i}\mid x)-f^{(j)}_{Y|X}(y_{i}\mid x)\big )^2\big \}^{1/2}\), with \(y_i=l_0+0.1*(i-1)\), \(m=10(u_0-l_0)+1\), and N is the number of repeated experiments. \(K(\cdot )\) is chosen to be the Epanechnikov kernel. The values of the upper and lower bounds \(u_0\) and \(l_0\) are given in the following examples.

Fig. 1
figure 1

Estimated CDF curves in Example 4.1 based on the DC (dashed line), NW (dotted line), and LL (dot dash line) methods, along with the true CDF curves (solid line) for \(n = 200\) (left) and \(n = 500\) (right), with \(\sigma _0^2=0.25\) (top) and \(\sigma _0^2=0.1\) (bottom) when \(\epsilon \) follows normal distribution

Fig. 2
figure 2

Estimated CDF curves in Example 4.1 based on the DC (dashed line), NW (dotted line), and LL (dot dash line) methods, along with the true CDF curves (solid line) for \(n = 200\) (left) and \(n = 500\) (right), with \(\sigma _0^2=0.25\) (top) and \(\sigma _0^2=0.1\) (bottom) when \(\epsilon \) follows double exponential distribution

Example 4.1

Let (XY) be a pair of random variables that follow a multivariate normal distribution with zero mean and covariance matrix (\(0.5^{|k-l|}\)). Suppose \(W=X+\epsilon \), where \(\epsilon \) represents the measurement error. The true CDF of Y given \(X=x\) can be easily derived as

$$\begin{aligned} f_{Y|X}(y\mid x)=\frac{1}{\sqrt{2\pi (1-0.5^2)}}\exp \Big \{{-\frac{(0.5x-y)^2}{2\times (1-0.5)^2}}\Big \}. \end{aligned}$$

The measurement error \(\epsilon \) is generated from the following two cases.

  1. (I)

    The measurement error \(\epsilon \) is generated from a normal distribution \( N(0,\sigma _0^2)\), where \(\sigma _0^2=0.25\) or 0.1. In this case, \(\Phi _\epsilon (t)=\exp (-\frac{1}{2}\sigma _0^2t^2)\). Suppose the kernel function \(G(\cdot )\) has a Fourier transform given by \(\Phi _G(t)=(1-t^2)^3_+\). According to (2.5),

    $$\begin{aligned} G_n(x)=\frac{1}{n}\int _0^1\cos (tx)(1-t^2)^3\exp \Big (\frac{\sigma _0^2t^2}{2h_1^2}\Big )\,dt. \end{aligned}$$
  2. (II)

    The measurement error \(\epsilon \) is generated from a double exponential distribution with the density function \( f_\epsilon (z)=\exp \big (-\sqrt{2}|z|/\sigma _0\big )/(\sqrt{2}\sigma _0). \) The characteristic function of \(\epsilon \) is \( \Phi _\epsilon (t)=1/(1+\sigma _0^2t^2/2), \) where \(\sigma _0^2=0.25\) and 0.1. According to (2.5),

    $$\begin{aligned} G_n(x)&= \frac{1}{2\pi }\int _{-\infty }^\infty \exp (-itx)\Phi _G(t)\Big (1+\frac{\sigma _0^2t^2}{2h_1^2}\Big )\,dt\\&= G(x)+\frac{\sigma _0^2}{2h_1^2}\frac{1}{2\pi }\\&\quad \int _{-\infty }^\infty \exp (-itx)t^2\Phi _G(t)\,dt \\&= G(x)-\frac{\sigma _0^2}{2h_1^2}G^{''}(x). \end{aligned}$$

    If \(G(\cdot )\) is further chosen to be the Gaussian kernel \(G(x)=(\sqrt{2\pi })^{-1}\exp (-x^2/2)\), then \(G_n(x)=(2\pi )^{-1/2}\exp (-x^2\big /2)\big \{1-\sigma _0^2(x^2-1)\big /(2h_1^2)\big \}\).

We apply the aforementioned bandwidth selection method to determine the optimal bandwidths \(h_1\) and \(h_2\), and set \([l_0, u_0]\) to \([-1,2]\). The estimated CDF curves based on the DC, NW and LL methods, as well as the true CDF curves for \(x = 1\) when \(\sigma _0^2=0.25\) and 0.1, are shown in Figs. 1, 2 for \(n=200\) and 500, respectively. The RMSEs for \(n=50\), 100 and 200 are presented in Table 1.

Example 4.2

Let \(X\sim N(1,1)\), \(W=X+\epsilon \), and Y follows the model \(Y=0.5e^X+\sin (\pi X)+0.6e\), where \(e\sim N(0,1)\). The measurement error \(\epsilon \) is generated from normal and double exponential distributions, respectively, as described in Example 4.1. The true CDF is

$$\begin{aligned} f_{Y|X}(y\mid x)=\frac{1}{0.6\sqrt{2\pi }}\exp \Big \{\frac{-\big (y-0.5\exp {(x)}-\sin (\pi x)\big )^2}{2\times 0.6^2}\Big \}. \end{aligned}$$

We select the optimal bandwidths \(h_1\) and \(h_2\) in the same way as in Example 4.1, and set \([l_0, u_0]=[-0.5, 3]\). Figure 3 presents the true and estimated curves of \(f_{Y|X}(y\mid x)\) for \(x=1\) based on the DC, NW and LL methods, under the normal error case for different sample sizes and variances of measurement error. The RMSEs of \(\hat{f}_{Y|X}(y\mid x)\) for different error distributions and sample sizes are presented in Table 2.

Table 1 RMSEs of CDF estimators \(\hat{f}_{Y|X}(y\mid x)\) for \(x=1\) based on the DC, NW and LL methods under different error distributions
Fig. 3
figure 3

Estimated CDF curves in Example 4.2 based on the DC (dashed line), NW (dotted line), and LL (dot dash line) methods, along with the true CDF curves (solid line) for \(n = 200\) (left) and \(n = 500\) (right), with \(\sigma _0^2=0.25\) (top) and \(\sigma _0^2=0.1\) (bottom) when \(\epsilon \) follows normal distribution

Example 4.3

Let \(Y\mid X\sim N(X,0.5)\), \(W=X+\epsilon \), where \(X\sim N(0,1)\), and \(\epsilon \) follows normal and double exponential distributions, respectively, as described in Example 4.1. The true CDF is

$$\begin{aligned} f_{Y|X}(y\mid x)=\frac{1}{0.5\sqrt{2\pi }}\exp \Big \{-\frac{(y-x)^2}{2\times 0.5^2}\Big \}. \end{aligned}$$

Here, \([l_0, u_0]\) is set to \([-0.5, 2.5]\). Figure 4 illustrates the true and estimated curves of \(f_{Y|X}(y\mid x)\) for \(x=1\) based on the DC, NW and LL methods under the double exponential error distribution for different sample sizes and variances of measurement error. Table 3 presents the RMSEs for \(\hat{f}_{Y|X}(y\mid x)\) under different error distributions and sample sizes.

It can be seen from Figs. 1, 2, 3 and 4 and Tables 1, 2, 3 that the performance of our proposed DC method is superior to that of the NW and LL methods. As the sample size n increases, the estimated CDF curve based on the DC method approaches the true CDF curve, and the RMSE gradually decreases. Moreover, the estimation results of the DC method improve as the variance of the measurement error decreases, evidenced by closer curves and lower RMSE values. Additionally, as shown in Tables 1, 2 and 3, as n increases, the RMSE of the CDF estimator with the double exponential error decreases faster than that with normal error. This finding is consistent with the theoretical result in Remark 2.(1) of Section 2.2, which indicates that the MSE converges much faster under the ordinary smooth error distribution compared to the super smooth error distribution. Furthermore, the RMSE of the estimator with double exponential distributed errors is smaller than that with normally distributed errors.

4.2 Independence test

In this section, we perform numerical analysis to demonstrate the performance of our proposed independence test approach. Specifically, we conduct simulations to compare our proposed test (DCT for short) with tests where the CDF is estimated using the NW and LL methods, denoted as NWT and LLT, respectively. Additionally, we compare the performance of our proposed test with the MIC, dCor and HHG approaches described in Section 1.

We apply four models to assess the proposed DCT method: linear, quadratic, circle and spiral circle. We compare our proposed method with five other independence test methods, using empirical size and power as the evaluation criteria. In the following examples, we use the likelihood cross-validation (LCV) method to select bandwidths \(h_1\) and \(h_2\). While this method is widely recognized in standard kernel density estimation (Silverman 1986), it has not yet been used for conditional density estimation. Specifically, we define the cross-validated likelihood for the CDF estimator as follows,

$$\begin{aligned} \mathscr {L}(h_1,h_2)&=\prod _i{\hat{f}_{Y|X}^{-i}(Y_i\mid W_i)\hat{f}_n^{-i}(W_i)}\\ &=\prod _i{\frac{1}{(n-1)h_1}\sum _{j\ne i}{G_n\Big (\frac{W_i-W_j}{h_1}\Big ) K_{h_2}(Y_i-Y_j)}}, \end{aligned}$$

where \(\hat{f}_{Y|X}^{-i}\) denotes \(\hat{f}_{Y|X}\) evaluated with \((W_i, Y_i)\) left out and \(\hat{f}_n^{-i}\) denotes \(\hat{f}_n\) evaluated with \(W_i\) left out. The optimal bandwidths \(h_1\) and \(h_2\) are selected to maximize \(\mathscr {L}\).

Example 4.4

(Linear model) Let \(X\sim N(0,1)\), \(W=X+\epsilon \) and Y follows a linear model \(Y=aX+e\), where \(e\sim N(0,1)\), and X and e are independent. The measurement error \(\epsilon \) is chosen to follow normal and double exponential distributions, as described in Example 4.1. The larger the value of a, the stronger the correlation between X and Y. When \(a = 0\), X and Y are independent. The simulation results of empirical sizes and powers based on DCT, NWT, LLT, MIC, dCor and HHG with different values of a are shown in Table 4, and the results of the mean squared deviation (MSD) are shown in Table 5 and Fig. 5, where \(\mathrm{{MSD}}=N^{-1}\sum _{j=1}^N\big \{\widehat{MI}_j(X,Y)-\overline{MI}(X,Y)\}^2\), and \(\overline{MI}(X,Y)\) is the mean of \(\widehat{MI}_j(X,Y)\) for \(j=1, \cdots , N\).

Example 4.5

(Quadratic model) Let \(X\sim N(0,1)\), \(W=X+\epsilon \) and Y follows a quadratic model \(Y=aX^2+e\), where \(e\sim N(0,1)\), and X and e are independent. The measurement error \(\epsilon \) is chosen to follow normal and double exponential distributions, as described in Example 4.1. The correlation between X and Y strengthens as a increases. The simulation results of empirical sizes and powers based on DCT, NWT, LLT, MIC, dCor and HHG with different values of a are shown in Table 6. To present the results graphically, we display the rejection rates against a and the significance level \(\alpha \) in Figs. 6 and 7, respectively.

Example 4.6

(Circle model) Let \(\theta \sim U(0, 1)\), e, \(\xi \sim N(0,1)\), where \(\theta \), e and \(\xi \) are independent. (XY) is generated from the following circle model with noise contamination (NC):

$$\begin{aligned}&X=10\cos (2\pi \theta )+NC\times e ~~~~~and \\&Y=10\sin (2\pi \theta )+NC\times \xi , \end{aligned}$$

where \(W=X+\epsilon \), and the measurement error \(\epsilon \) is chosen to follow normal and double exponential distributions, as described in Example 4.1. The more noises are added, the weaker the correlation between X and Y becomes. Additionally, we take \(X = e\) and \(Y = \xi \) to simulate the independent case. The simulation results of empirical sizes and powers based on DCT, NWT, LLT, MIC, dCor and HHG with different values of NC are shown in Table 7.

Table 2 RMSEs of the estimated function \(\hat{f}_{Y|X}(y\mid x)\) for \(x=1\) based on DC, NW and LL methods under different error distributions
Fig. 4
figure 4

Estimated CDF curves in Example 4.3 based on the DC (dashed line), NW (dotted line) and LL (dot dash line) methods, along with the true CDF curves (solid line) for \(n = 200\) (left) and \(n = 500\) (right), with \(\sigma _0^2=0.25\) (top) and \(\sigma _0^2=0.1\) (bottom) when \(\epsilon \) follows double exponential distribution

Table 3 RMSEs of \(\hat{f}_{Y|X}(y\mid x)\) for \(x=1\) based on DC, NW and LL methods under different error distributions
Table 4 Empirical sizes and powers based on DCT, NWT, LLT, MIC, dCor and HHG for different values of a for the linear model in Example 4.4, with significance level \(\alpha =0.05\) and sample sizes \(n=\)50 and 100
Table 5 MSDs of \(\widehat{MI}(X, Y)\) based on DCT, NWT and LLT methods across different sample sizes for linear model in Example 4.4, where \(\sigma _0^2=0.25\) and \(\epsilon \) follows the double exponential distribution
Fig. 5
figure 5

MSDs of \(\widehat{MI}(X, Y)\) based on DCT (solid line), NWT (dashed line), and LLT (dotted line) methods across different sample sizes for linear model in Example 4.4, with X and Y being both independent (left) and correlated (right), where \(\sigma _0^2=0.25\) and \(\epsilon \) follows the double exponential distribution

Table 6 Empirical sizes and powers based on DCT, NWT, LLT, MIC, dCor and HHG for different values of a in the quadratic model of Example 4.5, with significance level \(\alpha =0.05\) and sample sizes \(n=\)50 and 100
Fig. 6
figure 6

Rejection rates based on DCT (solid line), NWT (dashed line), LLT (dotted line), MIC (long dash line), dCor (dot dash line), and HHG (two dash line) under the null and alternative hypotheses for different values of a in the quadratic model of Example 4.5, with significance level \(\alpha =0.05\) and sample size n=100, where \(\epsilon \) follows normal (left) and double exponential (right) distributions

Fig. 7
figure 7

Rejection rates based on DCT (solid line), NWT (dashed line), and LLT (dotted line) methods under the null (left) and alternative (right) hypotheses for different significance levels \(\alpha \) in the quadratic model of Example 4.5, where \(\sigma _0^2=0.25\) and \(\epsilon \) follows the double exponential distribution. The dot-dash curve in the left figure represents the significance level control

Table 7 Empirical sizes and powers based on DCT, NWT, LLT, MIC, dCor and HHG for different noise amplitudes in the circle model of Example 4.6, with significance level \(\alpha =0.05\) and sample sizes \(n=\)50 and 100

Example 4.7

(Spiral circle model) Let \(u\sim U(0, 4)\), e, \(\xi \sim N(0,1)\), where u, e and \(\xi \) are independent. (XY) is generated from a spiral circle model with NC.

$$\begin{aligned}&X=7u\sin (u\pi )+NC\times e ~~~~~and\\&Y=7u\cos (u\pi )+NC\times \xi , \end{aligned}$$

where \(W=X+\epsilon \), and the measurement error \(\epsilon \) is chosen to follow normal and double exponential distributions respectively, as described in Example 4.1. An increase in NC leads to a decrease in the correlation between X and Y. To simulate the independent case, we take \(X = e\) and \(Y = \xi \) again. The simulation results of empirical sizes and powers based on DCT, NWT, LLT, MIC, dCor and HHG with different values of NC are shown in Table 8. To present the results graphically, we display the rejection rates against NC and the significance level \(\alpha \) in Figs. 8 and 9, respectively.

Table 8 Empirical sizes and powers based on DCT, NWT, LLT, MIC, dCor and HHG for different noise amplitudes in the spiral circle model of Example 4.7, with significance level \(\alpha =0.05\) and sample sizes \(n=\)50 and 100
Fig. 8
figure 8

Rejection rates based on DCT (solid line), NWT (dashed line), LLT (dotted line), MIC (long dash line), dCor (dot dash line) and HHG (two dash line) under the null and alternative hypotheses for the spiral circle model with different noise amplitudes in Example 4.7, with significance level \(\alpha \) = 0.05, sample size n=100, and \(\epsilon \) following normal (left) and double exponential (right) distributions

Fig. 9
figure 9

Rejection rates based on DCT (solid line), NWT (dashed line) and LLT (dotted line) methods under the null (left) and alternative (right) hypotheses for different significance levels \(\alpha \) in the spiral circle model of Example 4.7, where \(\sigma _0^2=0.1\) and \(\epsilon \) follows the double exponential distribution. The dot dash line in the left panel represents the significance level control curve

Fig. 10
figure 10

CDF estimators \(\hat{f}_{Y|X}(y\mid x)\) for metallicity based on the DC (solid line), NW (dotted line) and LL (dot dash line) methods at redshift values of 0.02 (upper left), 0.66 (upper right), 1.95 (lower left) and 2.59 (lower right), respectively

From Tables 4-8 and Figs. 6 and 8, we observe that when X and Y are independent, the empirical sizes of our proposed test are closer to the significance level of 0.05. When X and Y are correlated, the test power gradually approaches 1 as the correlation strengthens. Comparing the performance of different methods across various models and error distributions, we find that our proposed DCT method is the most stable. The dCor method performs well only in linear models, but poorly in other models. HHG and NWT perform better than MIC and LLT, but not as well as the DCT method in all scenarios. Furthermore, from Table 5 and Fig. 5, we find that smaller sample size results in larger MSD of \(\widehat{MI}(X,Y)\), and the MSD decreases as the sample size increases in both independent and correlated cases. Additionally, as shown in Figs. 7 and 9, for different significance levels, all empirical sizes based on DCT are closer to the prespecified significance level when X and Y are independent. This implies that the type I error is generally controlled. When X and Y are correlated, the test powers based on DCT are consistently higher than those based on NWT and LLT across different significance levels. Overall, our proposed DCT method is simple to implement, robust in permutation procedure, and demonstrates superior performance.

4.3 Analysis of LROSS data

In the analysis of the LROSS data, we use the proposed deconvolution double kernel CDF and MI estimators. Here, metallicity or radial velocity is denoted as Y, and redshift is denoted as X. The observed redshift, denoted by W, is obtained from multiple observations and contains measurement errors. We remove invalid values and standardize the data before analysis. The measurement error is assumed to follow a normal distribution with mean 0 and variance \(\sigma _0^2\). The standard deviation \(\sigma _0\) is estimated to be 0.28973 using the partial replication method, as mentioned in Fan et al. (2016). We will separately investigate the conditional density estimation and the independence of redshift with respect to metallicity or radial velocity.

4.3.1 Conditional density estimation

We first study the relationship between redshift and metallicity. The optimal bandwidths, \(h_1 = 0.16\) and \(h_2 = 0.58\), are obtained using the LCV method mentioned in Sect. 4.2. The CDF estimators \(\hat{f}_{Y|X}(y\mid x)\) of celestial metallicity with respect to redshift are presented in Fig. 10 for redshift values of 0.02, 0.66, 1.95 and 2.59, respectively. From Fig. 10, we observe that the CFDs of celestial metallicity approximately follow the Gaussian distribution at redshifts 0.02, 0.66 and 1.95. However, when the redshift is 2.59, the conditional distribution of metallicity exhibits a bimodal pattern. Additionally, at different redshifts, the conditional density peak is around a metallicity value of \(-\)0.09, with an associated \(y=0.2\). These findings help scientists infer changes in the metallicity of stars and galaxies across various epochs and environments, providing deeper insights when exploring the formation and evolution process of celestial bodies in the cosmic developmental history. Furthermore, Fig. 10 indicates that the number of extremely metal-poor and extremely metal-rich objects in the universe is a very small fraction at different redshifts. Previous studies have confirmed that the proportion of extremely metal-poor objects, such as stars in ancient globular clusters or some early-forming stars, is relatively small (Howes et al. 2015). Stars with a metallicity up to 0.52 and \(y=2\) are extremely metal-rich and relatively rare. These stars may have formed or evolved in heavy-element-rich nebulae (Cinquegrana and Karakas 2022). Figure 11 illustrates the estimated curves of conditional expectation for metallicity with respect to redshift. From Fig. 11, it is evident that the average metallicity of galaxies decreases as redshift increases, which aligns with the predictions of stellar metallicity chemical evolution models (Yabe et al. 2014). Additionally, within the lower redshift range, the average metallicity decreases gradually. However, this decrease is moderate, indicating that at low redshifts, the average metallicity remains relatively stable compared to higher redshifts. Similar conclusions have been reported in previous studies (Kulkarni et al. 2005).

Fig. 11
figure 11

The estimated curves of conditional expectation \(E(Y\mid X=x)\) for metallicity based on the DC (solid line), NW (dashed line) and LL (dot dash line) methods, with the redshift value ranging from 0.02 to 2.59

Next, we investigate the relationship between redshift and radial velocity. The optimal bandwidths, \(h_1 = 0.18\) and \(h_2 = 1.97\), are obtained using the LCV method. The CDF estimates of celestial radial velocity for redshift values of 0.02, 0.66, 1.95 and 2.59 are presented in Fig. 12. As shown in Fig. 12, the conditional densities approximately follow the Gaussian distribution, which is consistent with the findings of Strauss and Willick (1995). The figure also shows that at redshifts of 0.02, 0.66, 1.95 and 2.59, the density peaks of radial velocity are 0.01 (\(y=0\)), 0.16 (\(y=0.4\)), 0.51 (\(y=1.3\)) and 0.70 (\(y=1.8\)), respectively. This suggests that as the redshift increases, the density peaks of radial velocity gradually rise. These findings are consistent with Hubble’s Law (Hubble 1929). Furthermore, the estimated curves of conditional expectation for radial velocity with respect to redshift are shown in Fig. 13. From Fig. 13, we observe that the average radial velocity of galaxies also increases as redshift increases, reflecting the phenomenon of cosmic expansion. By measuring the velocity distribution of galaxies at various redshifts, we can confirm Hubble’s Law and accurately calibrate the Hubble constant.

Fig. 12
figure 12

CDF estimators \(\hat{f}_{Y|X}(y\mid x)\) for radial velocity based on the DC (solid line), NW (dotted line) and LL (dot dash line) methods at redshift values of 0.02 (upper left), 0.66 (upper right), 1.95 (lower left) and 2.59 (lower right), respectively

4.3.2 Independence test

In this subsection, we apply our proposed test method to assess independence in the LROSS data. We use a deconvolution double kernel estimator to evaluate MI and perform 1000 permutations to obtain the 95% quantile for the test. First, we test the independence between metallicity and redshift. By using the proposed deconvolution double kernel MI estimator, we find that \(\widehat{MI}=0.0450\), which is significantly larger than the 95% quantile of 0.0003 calculated through the permutation technique. Thus, we reject the null hypothesis, indicating dependence exists between redshift and metallicity. This conclusion aligns with previous findings in Pilyugin et al. (2013). Subsequently, we test the independence between redshift and radial velocity. The statistic \(\widehat{MI}=0.4172\) is also significantly higher than the 95% quantile of 0.0018. Therefore, we reject the null hypothesis, suggesting a strong relationship between redshift and radial velocity. This result is consistent with the associations described by Hubble’s law.

Fig. 13
figure 13

The estimated curves of conditional expectation \(E(Y\mid X=x)\) for radial velocity based on the DC (solid line), NW (dashed line) and LL (dot dash line) methods, with the redshift value ranging from 0.02 to 2.59

5 Concluding remarks

Testing independence has gained increasing attention in the statistical literature. In many scientific fields, observations are often collected with measurement errors. We focus on testing statistical independence when these measurement errors are substantial. Specifically, we estimate MI by transforming the CDF using a novel deconvolution double kernel method. For both super smooth and ordinary smooth measurement errors, we establish the convergence rates of the CDF estimator and analyze the asymptotic behavior of the MI estimator. Furthermore, our proposed asymptotic theories can be extended to test for conditional independence, which will be explored in our future research.