Abstract
Among existing methods for independence test, mutual information (MI) has great popularity as it is invariant to monotone transformations and enjoys higher power in detecting nonlinear associations. In this paper, we propose a novel MI-based independence test in the presence of measurement errors. The conditional density functions involved in MI are estimated using a novel deconvolution double kernel method. The convergence rates of these estimates are derived under the assumption that the measurement errors are either ordinary or super smooth. In addition, the asymptotic behaviors of the resultant estimate of MI are established under both the null and alternative hypotheses. Extensive simulation studies and an application to the low-resolution observations of source stars dataset confirm the superior numerical performances of the proposed methods.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
Testing independence is an important and fundamental problem in statistics and is also popular in other fields, such as finance (Olagunju 2011), environmental engineering (Kim et al. 2020), meteorology (Kulkarni et al. 2021), and medicine (Ma et al. 2021). There are various approaches for testing independence that rely on correlation coefficients, such as Pearson’s correlation coefficient, Kendall’s tau, and Spearman’s rank correlation coefficient. These coefficients serve as measures of dependence. However, these conventional methods are limited to detecting linear or monotonic relationships and, therefore, may not be applicable for general independence tests. To detect more complex nonlinear associations, researchers have introduced numerous alternative measures. For instance, maximum information coefficient (MIC), introduced by Reshef et al. (2011), provides a score that is approximately equal to the coefficient of determination (\(R^2\)) with respect to the regression function. The distance correlation (dCor), proposed by Székely et al. (2007), does not involve nonparametric estimation and is free of tuning parameters. Meanwhile, a rank-based distance measure (HHG) proposed by Heller et al. (2013) demonstrates robust numerical performance. These measures have been further explored in various contexts, including MIC estimation based on the BackMIC algorithm (Cao et al. 2021), matrix multivariate auto-distance covariance and correlation functions for time series (Fokianos and Pitsillou 2018), and tests based on the rank-based indices (Zhou et al. 2024). Additionally, there exist other methods to test independence, such as Hilbert-Schmidt independence criterion (Gretton et al. 2007), maximum mean discrepancy (Gretton et al. 2012), rank correlation-based statistics (Leung and Drton 2018), data-driven representation (Gonzalez et al. 2021), multivariate ranks defined by transportation measures (Deb and Sen 2023), and receiver operating characteristic analysis (Limnios and Clémençon 2024).
Previous studies on independence tests assumed that observations were directly collected without measurement errors, but such an assumption might not always hold in numerous scientific domains. Therefore, in this paper, we address the problem of independence tests between two random variables, X and Y, where X is subject to measurement errors. Specifically, we observe only the surrogate variable W, where \(W=X+\epsilon \) and \(\epsilon \) represents the measurement error. Such considerations are motivated by the study of low-resolution observations of source stars (LROSS), from which a sample of 660 observations was collected to unveil physical processes that could influence both galaxy evolution and cosmic expansion. The LROSS dataset, which contains redshift, metallicity, and radial velocity values, was downloaded from the Panoramic Survey Telescope and Rapid Response System (Pan-STARRS) at the National Astronomical Science Data Center of China, and was publicly released on September 28, 2023. In astronomy, it is crucial to understand the formation of cosmic structures and the evolution of galaxies by exploring the associations between redshift with metallicity or radial velocity. Hence, our goal is to examine whether nonlinear associations exist between redshift and metallicity or radial velocity. If such associations are present, it is also worthwhile to investigate the distributions of metallicity and radial velocity given redshift respectively. However, redshift is subject to substantial measurement errors, and it presents a considerable challenge to address these issues simultaneously with in this framework. Although Fan et al. (2022) proposed a modified distance correlation method to test independence in the presence of measurement errors, their approach is only applicable under the repeated measurement design. It might fail without repeated measurements as it cannot be used to estimate the conditional distribution directly.
Mutual information (MI, Shannon (1948)) can be viewed as the expectation of the log-likelihood ratio, which has the following form,
where \(f_X\), \(f_Y\) and \(f_{X,Y}\) are the density functions of X, Y and (X, Y) respectively. The Neyman-Pearson lemma (Neyman and Pearson 1933) guarantees that the independence test based on MI is most powerful in an averaging sense. MI is sensitive to detect nonlinear associations. It is a symmetric and nonnegative measure, which equals zero if and only if the variables X and Y are independent. As a powerful nonlinear dependence measure, MI plays a pivotal role in data analysis and has garnered considerable attention in the literature. For instance, Pethel and Hahs (2014) proposed an exact null hypothesis significance test using MI. Zeng et al. (2018) introduced the jackknife kernel estimation for MI, which has very good statistical properties, such as automatic bias correction and high local power in independence tests. Runge (2018) proposed a fully non-parametric test for continuous data based on the conditional MI combined with a local permutation scheme. Berrett and Samworth (2019) devised an MI-based test for assessing the independence between two multivariate random vectors. Additionally, Ai et al. (2022) developed an MI-based test for independence. Their proposed test is simple to implement, easy to compute, and consistent against all departures from independence, with only a slight loss of local power. Notably, this loss is independent of the data dimension.
However, the MI defined in (1.1) cannot be directly estimated when the variable X contains measurement errors. To address this issue, we rewrite MI as follows,
where \(f_{Y|X}\) is the conditional density function (CDF) of Y given X. Therefore, it is a crucial and challenging task to estimate \(f_{Y|X}(Y\mid X)\), as it could affect the estimation accuracy of MI(X, Y), especially when X contains measurement errors. In this paper, we will utilize the deconvolution method to estimate \(f_{Y|X}\). This approach circumvents the need for repeated measurements, making it particularly suitable for our purpose. Deconvolution techniques have been widely acknowledged in the literature. For instance, Scott and Terrell (1987) used it to improve the accuracy of cross-validation, while Carroll and Hall (1988) used it to estimate density functions in the presence of known distributional errors. Stefanski and Carroll (1990) applied deconvolution to improve kernel density estimators. Marron and Wand (1992) utilized it to address the variance term in density estimation, thereby improving performance assessment. Additionally, Fan and Truong (1993) employed the deconvolution method to estimate the probability density function and investigated the asymptotical normality of the deconvolution kernel density estimators. Wang and Wang (2011) introduced an R software package, “decon", which contains a collection of functions that apply the deconvolution kernel methods to address challenges caused by measurement errors. Furthermore, Huang and Zhang (2023) developed a deconvolution kernel estimator for average dose-response functions and derived its asymptotic properties, including bias, variance, and linear expansion.
In this paper, our aim is to test the independence between X and Y and estimate the CDF \(f_{Y|X}\) when X is subject to measurement errors. Our work makes several innovative contributions to this field. First, we introduce a novel double kernel CDF estimator using the deconvolution method. We also investigate the impact of different error distributions on the convergence rates of the CDF estimator. Second, we then propose an estimator of MI for independence test between two variables. We also establish its asymptotic properties under both the null and the alternative hypotheses assuming different error distributions. Third, our analysis of the LROSS dataset not only tests the independence between redshift and metallicity or radial velocity, but also reveals the CDFs of metallicity and radial velocity given redshift respectively.
The remainder of the paper is organized as follows. Section 2 develops the CDF estimator and discusses its convergence rates. Section 3 introduces the MI estimator and its asymptotic properties. Section 4 reports simulation studies and real data analysis. Some concluding remarks are given in Sect. 5. All technical proofs are provided in the supplemental material.
2 Conditional density estimation
Suppose that \((W_i, Y_i)\) for \(i\in \{1,\cdots ,n\}\) is a random sample from the population (W, Y), where \(W_i=X_i+\epsilon _i\), with \(W_i\) representing the observable values of \(X_i\) and \(\epsilon _i\) being the measurement errors. In Sect. 2.1, we adopt the deconvolution techniques and propose the double kernel estimator of the CDF in the presence of measurement errors. The convergence rate of this estimator depends on the smoothness of the error distribution, which is characterized by the decay rate of its characteristic function in the tails. Drawing on the terminology introduced by Fan and Truong (1993) and Delaigle (2021), we differentiate between two classes of errors: super smooth and ordinary smooth.
1. Super smooth with order \(\beta \): The characteristic function \(\phi _\epsilon (\cdot )\) satisfies
where \(d_0\), \(d_1\), \(\beta \) and \(\gamma \) are positive constants, \(\beta _0\) and \(\beta _1\) are constants.
2. Ordinary smooth with order \(\beta \): The characteristic function \(\phi _\epsilon (\cdot )\) satisfies
for positive constants \(d_0\), \(d_1\) and \(\beta \).
For example, Gaussian and Cauchy distributions are super smooth distributions, while gamma and double exponential distributions are ordinary smooth distributions. The order \(\beta \) represents the decay rate of \(\phi _\epsilon (t)\) as \(t\rightarrow \infty \), which reflects the smoothness of the error distribution. For instance, \(\beta =1\) corresponds to the Cauchy distribution; \(\beta =2\) applies to both Gaussian and double exponential distributions. In the gamma distribution, the order \(\beta \) depends on both the shape and scale parameters.
2.1 Double kernel estimate
Let \(K_{h_2}(\cdot )=K(\cdot /h_2)/h_2\), where \(K(\cdot )\) is a kernel function with bandwidth \(h_2\). When \(X_i\) are observable, according to De Gooijer and Zerom (2003), the estimator of \(f_{Y|X}(y\mid x)\) is given by
where \(G(\cdot )\) is also a kernel function with bandwidth \(h_1\), and
is the Parzen-Rosenblatt density estimator of X (Parzen 1962; Rosenblatt 1956).
When the variables \(X_1,\cdots ,X_n\) are unobservable, the estimator \(\tilde{f}_{Y|X}(y\mid x)\) cannot be directly calculated. Suppose that one can only observe the surrogate variables \(W_i\) through \(W_i=X_i+\epsilon _i\) for \(i\in \{1,\cdots ,n\}\). Denote the density functions of X and W as \(f_X(\cdot )\) and \(f_W(\cdot )\), respectively, and the distribution function of \(\epsilon \) as \(F_\epsilon (\cdot )\). Then \(f_W(w)=\int _{-\infty }^{\infty }f_X(w-x)\,dF_\epsilon (x)\). By the deconvolution method described in Stefanski and Carroll (1990), the marginal density function \(f_X(\cdot )\) can be estimated as
where \(\phi _G(\cdot )\) is the Fourier transform of the kernel function \(G(\cdot )\), \(\phi _\epsilon (\cdot )\) is the characteristic function of the error term \(\epsilon \), and \(\hat{\phi }_n(\cdot )=n^{-1}\sum _{j=1}^n{e^{itW_j}}\) is the empirical characteristic function of \(\{W_j,~j=1\cdots ,n\}\). Then, (2.4) can be rewritten in the following kernel form,
According to (2.3) and (2.5), the double kernel CDF estimator of \(f_{Y|X}(y\mid x)\) based on the surrogate variables \(W_i\) is given by
where \(\hat{f}_n(x)\) and \(G_n(x)\) are defined by (2.5).
2.2 Performance of double kernel estimators
To establish our main results, we require the following conditions.
-
(C1)
The characteristic function of the error distribution \(\phi _\epsilon (\cdot )\) does not vanish.
-
(C2)
Let \(a<b\). The marginal density \(f_X(\cdot )\) of X is bounded away from zero on the interval [a, b] and has a bounded \(k_1\)th derivative. The density function \(f_Y(\cdot )\) of the observed Y is bounded away from zero to infinity, and has a bounded \(k_3\)th derivative.
-
(C3)
The CDF \(f_{Y|X}(y\mid x)\) has a continuous \(k_2\)th derivative with respect to x on [a, b].
-
(C4)
The kernel function \(G(\cdot )\) is a \(k_1\)th-order kernel. Namely, \(\int _{-\infty }^{\infty }G(y)\,dy=1\), \(\int _{-\infty }^{\infty }y^{k_1}G(y)\,dy \ne 0\) and \(\int _{-\infty }^{\infty }y^jG(y)\,dy=0\) for \(j=1, \cdots , k_1-1\). Similarly, \(K(\cdot )\) and \(L(\cdot )\) are \(k_2\)th-order and \(k_3\)th-order kernels, respectively.
-
(C5)
The bandwidths \(h_1\), \(h_2\) and \(h_3\) satisfy \((nh_3)^{-1}\log n\rightarrow 0\), \(nh_1^{1/2}h_2^{1/2}h_3^{1/2}(h_1^{2k_1-1/2}+h_2^{2k_2-1/2}+h_3^{2k_3-1/2})(h_1^{1/2}+h_2^{1/2}+h_3^{1/2})\rightarrow 0\), \(h_1^{k_1-1/2}h_2^{-1/2}h_3^{1/2}\rightarrow 0\), \(h_1^{-1/2}h_2^{k_2-1/2}h_3^{1/2} \rightarrow 0\), \(h_1^{k_1-1/2}h_2^{1/2}h_3^{1/2}\rightarrow 0\) and \(h_1^{1/2}h_2^{1/2}h_3^{k_3-1/2}\rightarrow 0\).
Remark 1
These conditions are generally regarded as mild. In particular, condition (C1) ensures that the estimator (2.6) is well-defined (Stefanski and Carroll 1990). Conditions (C2) and (C3) are analogous to those typically required for ordinary nonparametric regression, assuming bounded support and relating to the smoothness of the density function and CDF (Fan and Truong 1993; De Gooijer and Zerom 2003; Fan and Jiang 2005; Su and White 2014). Condition (C4) assumes the orders of these kernel functions (Fan and Truong 1993). The order of common kernel functions, such as Gaussian and Epanechnikov kernels, is typically set to 2. Condition (C5) on the bandwidths is necessary for deriving the asymptotic properties of the independence test statistic and can be easily satisfied.
The following subsections present two sets of results under super smooth and ordinary smooth error distributions respectively. The first set discusses the local and global convergence rates, while the second focuses on the uniform results. The global rates are described by the \(L_p\)-norms, which are defined as follows. Let \(g(\cdot )\) denote a real-valued function on the real line and \(\omega (\cdot )\) be a non-negative weight function. Define \(\Vert g(\cdot )\Vert _{wp}=\big \{\int |g(x)|^p\omega (x)\,dx\big \}^{1/p}\), for \(1\le p\le \infty \), and \(\Vert g(\cdot )\Vert _{w\infty }=\sup _x|\omega (x)g(x)|\).
To express the consistency results, a class of joint density functions of (X, Y) needs to be introduced. In this class, conditions (C1)-(C3) should hold uniformly. More precisely, let B be a positive constant and [a, b] denote a compact interval. Define \(m(x, y)=E[K_{h_2}(Y-y)\mid X=x]\) and
Note that this class \(\mathscr {F}_{k_1,B}\) reformulates the standard conditions so that they hold uniformly.
2.2.1 Super smooth error distributions
We first present the local and global rates under super smooth error distributions. Let
Theorem 2.1
Suppose that conditions (C1)-(C4) are satisfied and the first half inequality of (2.1) holds. Assume that \(\phi _G(t)\) has bounded support on \(|t|\le C_0\). Then, for \(h_1=c_1(\log n)^{-1/\beta }\) with \(c_1>C_0(2/\gamma )^{1/\beta }\) and \(h_2=O((\log n)^{-1/\beta })\), we have
The factor \(\hat{f}_n(x)/f_X(x) (\rightarrow _P 1)\) is used to avoid the case where the denominator of \(\hat{f}_{Y|X}(y\mid x)-f_{Y|X}(y\mid x)\) might degenerate to 0. It does not significantly affect the statistical properties of the proposed estimator, as demonstrated in the subsequent proofs. Specifically, \(E\big [\big (\hat{f}_{Y|X}(y\mid x)-f_{Y|X}(y\mid x)\big )^2|W_1,\cdots ,W_n\big ]=[c_1^{k_1}b_{k_1}(x,y)]^2(\log n)^{-2k_1/\beta }(1+o(1))\). The rates mentioned earlier also hold uniformly over \(\mathscr {F}_{k_1,B}\).
Theorem 2.2
Assume that \(\phi _\epsilon (\cdot )\) and \(G(\cdot )\) satisfy the conditions of Theorem 2.1. If the weight function \(\omega (x)\) has support [a, b], then
Theorem 2.2 implies an interesting phenomenon that the convergence rate of \(\hat{f}_{Y\mid X}\) remains the same under the weighted \(L_p\)-loss \((1\le p<\infty )\) and \(L_\infty \)-loss. However, this conclusion does not hold for ordinary nonparametric regression, where the global rates of convergence tend to be slower under \(L_\infty \)-loss (Stone 1982).
2.2.2 Ordinary smooth error distributions
In order to explicitly compute the rate of the mean squared error (MSE) of the CDF estimator, a condition on the tail behavior of \(\phi _\epsilon (t)\) is required. This condition, which is a deformation of (2.2), is given by
where c is a non-zero constant.
Theorem 2.3
Assume that conditions (C1)-(C4) are satisfied, and \(\int _{-\infty }^{\infty }|t^{\beta +1}|\big (|\phi _G(t)|+|\phi _G'(t)|\big )\,dt<\infty \), \(\int _{-\infty }^{\infty }|t^{\beta +1}\phi _G(t)|^2\,dt<\infty \). Then, under the ordinary smooth error distribution condition (2.7) and assuming \(h_1=c_2n^{-1/\{2(k_1+\beta )+1\}}\) with \(c_2>0\), \(h_2=O\big (n^{-1/\{2(k_1+\beta )+1\}}\big )\) and \(n^{1-1/\{2(k_1+\beta )+1\}}h_2\rightarrow \infty \), we have
where \( v(x,y)=1\big /\big \{2\pi f_X^2(x)\big \}{\int }_{-\infty }^{\infty }|t^\beta /c|^2|\phi _G(t)|^2\,dt {\int }_{-\infty }^{\infty }\tau ^2(x-v)f_X(x-v)\,dF_\epsilon (v), \) with \(\tau ^2(\cdot )=E\big [(K_{h_2}(Y-y)-m(x,y))^2|X=\cdot \big ]\).
Remark 2
(1) According to Theorems 2.1 and 2.3, the convergence rate of MSE under the super smooth error distribution condition is \(O((\log n)^{-2{k_1}/\beta })\), which is significantly slower than that under the ordinary smooth error distribution, \(O(n^{-2k_1/\{2(k_1+\beta )+1\}})\). These two convergence rates are influenced by measurement errors through the parameter \(\beta \). Furthermore, when there is no measurement error in the data, Tsybakov (2011) gave the convergence rate of the kernel density estimator, which is \(O(n^{-2k_1/(2k_1+1)})\) for \(h=c_3n^{-1/(2k_1+1)})\) with a constant \(c_3\). Obviously, the convergence rate under measurement error with the ordinary smooth error distribution condition is slightly slower than the rate given by Tsybakov (2011).
(2) According to Theorems 2.1 and 2.3, the convergence rate of MSE improves when the smooth parameter \(\beta \) decreases or the order \(k_1\) of the kernel function increases. Commonly used kernel functions, such as the Epanechnikov kernel and Gaussian kernel, are of order 2. Using these second-order kernel functions, denoted as G(u), we can construct fourth-order kernel functions given by \(G^*(u)=3G(u)/2+uG'(u)/2\), where \(G'(u)\) is the first derivative of G(u). Similarly, it is possible to construct sixth-order and higher-order kernel functions, which can further enhance the convergence rate of MSE.
The next theorem shows that the previous results hold uniformly for a class of densities.
Theorem 2.4
If \(\phi _\epsilon (\cdot )\), \(h_1\) and \(G(\cdot )\) satisfy the conditions of Theorem 2.3, and the weight function has bounded support [a, b], then
3 Independence test
Section 2 applies the deconvolution techniques to estimate the CDF in the presence of measurement errors. Based on this, we use the proposed CDF estimator to construct an MI estimator for testing the independence between X and Y when X is measured with errors. The asymptotic properties of the MI estimator are investigated under both super smooth and ordinary smooth error distributions.
Independence is one of the fundamental concepts in data analysis and statistical inference. The null hypothesis states that two random variables are independent, i.e., \(H_0:~X\perp \!\!\!\perp Y.\) Under the alternative hypothesis, the MI is bounded away from 0, i.e., \( H_1: MI(X,Y)\ge c_0 > 0. \) To test the independence between X and Y when X is subject to measurement errors, we first give the estimator of MI as follows,
where \(\hat{f}_{Y|X}(Y_i\mid W_i)\) and \(\hat{f}_Y(Y_i)\) are the leave-one-out kernel density estimators of \(f_{Y|X}(Y_i\mid W_i)\) and \(f_Y(Y_i)\), respectively. Specifically,
where \(G_{n}^*(\cdot )=G_n(\cdot /h_1)/h_1\), \(L_{h_3}(\cdot )=L(\cdot /h_3)/h_3\), and \(L(\cdot )\) is a kernel function.
The following two subsections discuss the asymptotic properties of the proposed MI estimator under the null and alternative hypotheses, assuming that the measurement errors are either super smooth or ordinary smooth.
3.1 Super smooth error distributions
The following two theorems present the asymptotic properties of \(\widehat{MI}(X, Y)\) under the null and alternative hypotheses in the case of super smooth error distributions. To derive the asymptotic normality of \(\widehat{MI}(X, Y)\) under the null hypothesis, we define \(D_{1n}=-\nu _1\nu _2d_{1n}/(2nh_1h_2)+\nu _1d_{2n}/(2nh_1)+\nu _3d_{3n}/(2nh_3)\), where \(\nu _1=\int G_n^2(u)\,du\), \(\nu _2=\int K^2(u)du\), \(\nu _3=\int L^2(u)du\), \(d_{1n}=n^{-1}\sum _{i=1}^n 1/\hat{f}_{X, Y}(W_i, Y_i)\), \(d_{2n}=n^{-1}\sum _{i=1}^n 1/\hat{f}_{X}(W_i)\), \(d_{3n}=n^{-1}\sum _{i=1}^n 1/\hat{f}_Y(Y_i)\), \(d_{4n}=\int _{-\infty }^\infty \big [m(u,y)-m(x,y)\big ]G_n^*(x-u)f_X(u)\,du\), \(\hat{f}_{X, Y}(W_i, Y_i)\) and \(\hat{f}_X(W_i)\) are the leave-one-out kernel density estimators. Specifically, \(\hat{f}_{X, Y}(W_i, Y_i)=(n-1)^{-1}\sum _{j\ne i}G_{n}^*(W_i-W_j)K_{h_2}(Y_i-Y_j)\) and \( \hat{f}_X(W_i)=(n-1)^{-1}\sum _{j\ne i}G_{n}^*(W_i-W_j)\).
Theorem 3.1
Suppose that conditions (C1)-(C5) are satisfied, \(\phi _G(t)\) has bounded support on \(|t|\le C_0\), and \(h_1=c_1(\log n)^{-1/\beta }\) with \(c_1>C_0(6\gamma )^{1/\beta }\), \(h_2=O((\log n)^{-1/\beta })\), where \(\gamma \) and \(\beta \) are defined in (2.1). Under the null hypothesis that X and Y are independent, we have
where \(\sigma _1^2=2E\{f_Y^{-1}(Y)\}{\int }\big \{2^{-1}{\iint } L(v)L(v+z)\,dv-L(z)\big \}^2\,dz \ \text {and} \ \sigma _n^2=2E\{f_{X,Y}^{-1}(W,Y)\}{\iint }\big \{2^{-1}{\iint } G_n(u)K(v)G_n(u+t)K(v+z)dudv-G_n(t)K(z)\big \}^2 dtdz. \)
Theorem 3.2
Suppose that the conditions of Theorem 3.1 are satisfied. Under the alternative hypothesis that X and Y are dependent, we have \(n\big (h_1^{-1}h_2^{-1}\sigma _n^2+h_3^{-1}\sigma _1^2\big )^{-1/2}\big \{\widehat{MI}(X, Y)-D_{1n}\big \}\rightarrow \infty \).
3.2 Ordinary smooth error distributions
In this subsection, we discuss the asymptotic properties of \(\widehat{MI}(X, Y)\) under the null and alternative hypotheses for ordinary smooth error distributions. To derive the asymptotic normality of \(\widehat{MI}(X, Y)\) under the null hypothesis, we define \(D_{2n}=-\nu _1\nu _2d_{1n}/(2nh_1h_2)+\nu _1d_{2n}/(2nh_1)+\nu _3d_{3n}/(2nh_3)\), where \(\nu _1\), \(\nu _2\), \(\nu _3\), \(d_{1n}\), \(d_{2n}\) and \(d_{3n}\) are defined in Sect. 3.1.
Theorem 3.3
Suppose that conditions (C1)-(C5) are satisfied and \(h_1=c_2n^{-1/\{2(k_1+\beta )+1\}}\) with \(c_2>0\), \(h_2=O\big (n^{-1/\{2(k_1+\beta )+1\}}\big )\) and \(n^{1-1/\{2(k_1+\beta )+1\}}h_2\rightarrow \infty \). Under the null hypothesis that X and Y are independent, we have
Theorem 3.4
Suppose that the conditions of Theorem 3.3 are satisfied. Under the alternative hypothesis that X and Y are dependent, we have \(n(h_1^{-2}\sigma _n^2+h_3^{-1}\sigma _1^2)^{-1/2}\big \{\widehat{MI}(X, Y)-D_{2n}\big \}\rightarrow \infty \).
The proposed MI estimator in Sect. 3 serves as a natural independence test statistic. Statistical tests that depend on asymptotic distributions usually require a large sample size, whereas the permutation technique provides precise distribution even for small sample sizes (Mariano and Manuel 2008). Let \(\Omega =\big \{(W_i, Y_i), i = 1, \cdots , n\big \}\) denote a random sample drawn from (W, Y), and let \(\{\delta _1, \delta _2, \cdots , \delta _n\}\) be a random permutation of \(\{1, 2, \cdots , n\}\). Based on the dataset \(\Omega _1 = \big \{(W_i, Y_{\delta _i}), i = 1, \cdots , n\big \}\), we calculate the MI estimator, denoted by \(\widehat{MI_1}\). Repeat the above procedure M times to obtain \(T_M=\{\widehat{MI_k}, k=1,\cdots ,M\}\). The distribution of \(\widehat{MI}\) under the null hypothesis can be approximated by the empirical distribution of \(T_M\). We then reject the null hypothesis if \(\widehat{MI}\) computed from the original data is greater than the \((1-\alpha )\)-th quantile of \(T_M\), where \(\alpha \) is a prespecified significance level.
4 Numerical study
In this section, we conduct extensive numerical simulations to evaluate the performance of the proposed methods for CDF estimation and independence inference. Furthermore, we apply our methods to the LROSS dataset to explore the associations between redshift and metallicity or radial velocity.
4.1 Estimation performance of CDF
To test the effectiveness of our proposed deconvolution double kernel estimation (DC for short) method of the CDF, we compare it with the Nadaraya-Watson (NW for short) and local linear (LL for short) methods, both of which simply ignore the measurement errors, in the following examples. The CDF estimators based on the NW and LL methods are given by
where \(T_{n,k}=\sum _{i=1}^n{G\big \{(x-W_j)/h_1\big \}(W_i-x)^k}\), \(k=0,1,2\). The optimal bandwidths \(h_1\) and \(h_2\) are selected to minimize the root mean squared error (RMSE), where \(\mathrm{{RMSE}}=\big \{(Nm)^{-1}\sum _{j=1}^N\sum _{i=1}^m\big (\hat{f}^{(j)}_{Y|X}(y_{i}\mid x)-f^{(j)}_{Y|X}(y_{i}\mid x)\big )^2\big \}^{1/2}\), with \(y_i=l_0+0.1*(i-1)\), \(m=10(u_0-l_0)+1\), and N is the number of repeated experiments. \(K(\cdot )\) is chosen to be the Epanechnikov kernel. The values of the upper and lower bounds \(u_0\) and \(l_0\) are given in the following examples.
Example 4.1
Let (X, Y) be a pair of random variables that follow a multivariate normal distribution with zero mean and covariance matrix (\(0.5^{|k-l|}\)). Suppose \(W=X+\epsilon \), where \(\epsilon \) represents the measurement error. The true CDF of Y given \(X=x\) can be easily derived as
The measurement error \(\epsilon \) is generated from the following two cases.
-
(I)
The measurement error \(\epsilon \) is generated from a normal distribution \( N(0,\sigma _0^2)\), where \(\sigma _0^2=0.25\) or 0.1. In this case, \(\Phi _\epsilon (t)=\exp (-\frac{1}{2}\sigma _0^2t^2)\). Suppose the kernel function \(G(\cdot )\) has a Fourier transform given by \(\Phi _G(t)=(1-t^2)^3_+\). According to (2.5),
$$\begin{aligned} G_n(x)=\frac{1}{n}\int _0^1\cos (tx)(1-t^2)^3\exp \Big (\frac{\sigma _0^2t^2}{2h_1^2}\Big )\,dt. \end{aligned}$$ -
(II)
The measurement error \(\epsilon \) is generated from a double exponential distribution with the density function \( f_\epsilon (z)=\exp \big (-\sqrt{2}|z|/\sigma _0\big )/(\sqrt{2}\sigma _0). \) The characteristic function of \(\epsilon \) is \( \Phi _\epsilon (t)=1/(1+\sigma _0^2t^2/2), \) where \(\sigma _0^2=0.25\) and 0.1. According to (2.5),
$$\begin{aligned} G_n(x)&= \frac{1}{2\pi }\int _{-\infty }^\infty \exp (-itx)\Phi _G(t)\Big (1+\frac{\sigma _0^2t^2}{2h_1^2}\Big )\,dt\\&= G(x)+\frac{\sigma _0^2}{2h_1^2}\frac{1}{2\pi }\\&\quad \int _{-\infty }^\infty \exp (-itx)t^2\Phi _G(t)\,dt \\&= G(x)-\frac{\sigma _0^2}{2h_1^2}G^{''}(x). \end{aligned}$$If \(G(\cdot )\) is further chosen to be the Gaussian kernel \(G(x)=(\sqrt{2\pi })^{-1}\exp (-x^2/2)\), then \(G_n(x)=(2\pi )^{-1/2}\exp (-x^2\big /2)\big \{1-\sigma _0^2(x^2-1)\big /(2h_1^2)\big \}\).
We apply the aforementioned bandwidth selection method to determine the optimal bandwidths \(h_1\) and \(h_2\), and set \([l_0, u_0]\) to \([-1,2]\). The estimated CDF curves based on the DC, NW and LL methods, as well as the true CDF curves for \(x = 1\) when \(\sigma _0^2=0.25\) and 0.1, are shown in Figs. 1, 2 for \(n=200\) and 500, respectively. The RMSEs for \(n=50\), 100 and 200 are presented in Table 1.
Example 4.2
Let \(X\sim N(1,1)\), \(W=X+\epsilon \), and Y follows the model \(Y=0.5e^X+\sin (\pi X)+0.6e\), where \(e\sim N(0,1)\). The measurement error \(\epsilon \) is generated from normal and double exponential distributions, respectively, as described in Example 4.1. The true CDF is
We select the optimal bandwidths \(h_1\) and \(h_2\) in the same way as in Example 4.1, and set \([l_0, u_0]=[-0.5, 3]\). Figure 3 presents the true and estimated curves of \(f_{Y|X}(y\mid x)\) for \(x=1\) based on the DC, NW and LL methods, under the normal error case for different sample sizes and variances of measurement error. The RMSEs of \(\hat{f}_{Y|X}(y\mid x)\) for different error distributions and sample sizes are presented in Table 2.
Example 4.3
Let \(Y\mid X\sim N(X,0.5)\), \(W=X+\epsilon \), where \(X\sim N(0,1)\), and \(\epsilon \) follows normal and double exponential distributions, respectively, as described in Example 4.1. The true CDF is
Here, \([l_0, u_0]\) is set to \([-0.5, 2.5]\). Figure 4 illustrates the true and estimated curves of \(f_{Y|X}(y\mid x)\) for \(x=1\) based on the DC, NW and LL methods under the double exponential error distribution for different sample sizes and variances of measurement error. Table 3 presents the RMSEs for \(\hat{f}_{Y|X}(y\mid x)\) under different error distributions and sample sizes.
It can be seen from Figs. 1, 2, 3 and 4 and Tables 1, 2, 3 that the performance of our proposed DC method is superior to that of the NW and LL methods. As the sample size n increases, the estimated CDF curve based on the DC method approaches the true CDF curve, and the RMSE gradually decreases. Moreover, the estimation results of the DC method improve as the variance of the measurement error decreases, evidenced by closer curves and lower RMSE values. Additionally, as shown in Tables 1, 2 and 3, as n increases, the RMSE of the CDF estimator with the double exponential error decreases faster than that with normal error. This finding is consistent with the theoretical result in Remark 2.(1) of Section 2.2, which indicates that the MSE converges much faster under the ordinary smooth error distribution compared to the super smooth error distribution. Furthermore, the RMSE of the estimator with double exponential distributed errors is smaller than that with normally distributed errors.
4.2 Independence test
In this section, we perform numerical analysis to demonstrate the performance of our proposed independence test approach. Specifically, we conduct simulations to compare our proposed test (DCT for short) with tests where the CDF is estimated using the NW and LL methods, denoted as NWT and LLT, respectively. Additionally, we compare the performance of our proposed test with the MIC, dCor and HHG approaches described in Section 1.
We apply four models to assess the proposed DCT method: linear, quadratic, circle and spiral circle. We compare our proposed method with five other independence test methods, using empirical size and power as the evaluation criteria. In the following examples, we use the likelihood cross-validation (LCV) method to select bandwidths \(h_1\) and \(h_2\). While this method is widely recognized in standard kernel density estimation (Silverman 1986), it has not yet been used for conditional density estimation. Specifically, we define the cross-validated likelihood for the CDF estimator as follows,
where \(\hat{f}_{Y|X}^{-i}\) denotes \(\hat{f}_{Y|X}\) evaluated with \((W_i, Y_i)\) left out and \(\hat{f}_n^{-i}\) denotes \(\hat{f}_n\) evaluated with \(W_i\) left out. The optimal bandwidths \(h_1\) and \(h_2\) are selected to maximize \(\mathscr {L}\).
Example 4.4
(Linear model) Let \(X\sim N(0,1)\), \(W=X+\epsilon \) and Y follows a linear model \(Y=aX+e\), where \(e\sim N(0,1)\), and X and e are independent. The measurement error \(\epsilon \) is chosen to follow normal and double exponential distributions, as described in Example 4.1. The larger the value of a, the stronger the correlation between X and Y. When \(a = 0\), X and Y are independent. The simulation results of empirical sizes and powers based on DCT, NWT, LLT, MIC, dCor and HHG with different values of a are shown in Table 4, and the results of the mean squared deviation (MSD) are shown in Table 5 and Fig. 5, where \(\mathrm{{MSD}}=N^{-1}\sum _{j=1}^N\big \{\widehat{MI}_j(X,Y)-\overline{MI}(X,Y)\}^2\), and \(\overline{MI}(X,Y)\) is the mean of \(\widehat{MI}_j(X,Y)\) for \(j=1, \cdots , N\).
Example 4.5
(Quadratic model) Let \(X\sim N(0,1)\), \(W=X+\epsilon \) and Y follows a quadratic model \(Y=aX^2+e\), where \(e\sim N(0,1)\), and X and e are independent. The measurement error \(\epsilon \) is chosen to follow normal and double exponential distributions, as described in Example 4.1. The correlation between X and Y strengthens as a increases. The simulation results of empirical sizes and powers based on DCT, NWT, LLT, MIC, dCor and HHG with different values of a are shown in Table 6. To present the results graphically, we display the rejection rates against a and the significance level \(\alpha \) in Figs. 6 and 7, respectively.
Example 4.6
(Circle model) Let \(\theta \sim U(0, 1)\), e, \(\xi \sim N(0,1)\), where \(\theta \), e and \(\xi \) are independent. (X, Y) is generated from the following circle model with noise contamination (NC):
where \(W=X+\epsilon \), and the measurement error \(\epsilon \) is chosen to follow normal and double exponential distributions, as described in Example 4.1. The more noises are added, the weaker the correlation between X and Y becomes. Additionally, we take \(X = e\) and \(Y = \xi \) to simulate the independent case. The simulation results of empirical sizes and powers based on DCT, NWT, LLT, MIC, dCor and HHG with different values of NC are shown in Table 7.
Example 4.7
(Spiral circle model) Let \(u\sim U(0, 4)\), e, \(\xi \sim N(0,1)\), where u, e and \(\xi \) are independent. (X, Y) is generated from a spiral circle model with NC.
where \(W=X+\epsilon \), and the measurement error \(\epsilon \) is chosen to follow normal and double exponential distributions respectively, as described in Example 4.1. An increase in NC leads to a decrease in the correlation between X and Y. To simulate the independent case, we take \(X = e\) and \(Y = \xi \) again. The simulation results of empirical sizes and powers based on DCT, NWT, LLT, MIC, dCor and HHG with different values of NC are shown in Table 8. To present the results graphically, we display the rejection rates against NC and the significance level \(\alpha \) in Figs. 8 and 9, respectively.
From Tables 4-8 and Figs. 6 and 8, we observe that when X and Y are independent, the empirical sizes of our proposed test are closer to the significance level of 0.05. When X and Y are correlated, the test power gradually approaches 1 as the correlation strengthens. Comparing the performance of different methods across various models and error distributions, we find that our proposed DCT method is the most stable. The dCor method performs well only in linear models, but poorly in other models. HHG and NWT perform better than MIC and LLT, but not as well as the DCT method in all scenarios. Furthermore, from Table 5 and Fig. 5, we find that smaller sample size results in larger MSD of \(\widehat{MI}(X,Y)\), and the MSD decreases as the sample size increases in both independent and correlated cases. Additionally, as shown in Figs. 7 and 9, for different significance levels, all empirical sizes based on DCT are closer to the prespecified significance level when X and Y are independent. This implies that the type I error is generally controlled. When X and Y are correlated, the test powers based on DCT are consistently higher than those based on NWT and LLT across different significance levels. Overall, our proposed DCT method is simple to implement, robust in permutation procedure, and demonstrates superior performance.
4.3 Analysis of LROSS data
In the analysis of the LROSS data, we use the proposed deconvolution double kernel CDF and MI estimators. Here, metallicity or radial velocity is denoted as Y, and redshift is denoted as X. The observed redshift, denoted by W, is obtained from multiple observations and contains measurement errors. We remove invalid values and standardize the data before analysis. The measurement error is assumed to follow a normal distribution with mean 0 and variance \(\sigma _0^2\). The standard deviation \(\sigma _0\) is estimated to be 0.28973 using the partial replication method, as mentioned in Fan et al. (2016). We will separately investigate the conditional density estimation and the independence of redshift with respect to metallicity or radial velocity.
4.3.1 Conditional density estimation
We first study the relationship between redshift and metallicity. The optimal bandwidths, \(h_1 = 0.16\) and \(h_2 = 0.58\), are obtained using the LCV method mentioned in Sect. 4.2. The CDF estimators \(\hat{f}_{Y|X}(y\mid x)\) of celestial metallicity with respect to redshift are presented in Fig. 10 for redshift values of 0.02, 0.66, 1.95 and 2.59, respectively. From Fig. 10, we observe that the CFDs of celestial metallicity approximately follow the Gaussian distribution at redshifts 0.02, 0.66 and 1.95. However, when the redshift is 2.59, the conditional distribution of metallicity exhibits a bimodal pattern. Additionally, at different redshifts, the conditional density peak is around a metallicity value of \(-\)0.09, with an associated \(y=0.2\). These findings help scientists infer changes in the metallicity of stars and galaxies across various epochs and environments, providing deeper insights when exploring the formation and evolution process of celestial bodies in the cosmic developmental history. Furthermore, Fig. 10 indicates that the number of extremely metal-poor and extremely metal-rich objects in the universe is a very small fraction at different redshifts. Previous studies have confirmed that the proportion of extremely metal-poor objects, such as stars in ancient globular clusters or some early-forming stars, is relatively small (Howes et al. 2015). Stars with a metallicity up to 0.52 and \(y=2\) are extremely metal-rich and relatively rare. These stars may have formed or evolved in heavy-element-rich nebulae (Cinquegrana and Karakas 2022). Figure 11 illustrates the estimated curves of conditional expectation for metallicity with respect to redshift. From Fig. 11, it is evident that the average metallicity of galaxies decreases as redshift increases, which aligns with the predictions of stellar metallicity chemical evolution models (Yabe et al. 2014). Additionally, within the lower redshift range, the average metallicity decreases gradually. However, this decrease is moderate, indicating that at low redshifts, the average metallicity remains relatively stable compared to higher redshifts. Similar conclusions have been reported in previous studies (Kulkarni et al. 2005).
Next, we investigate the relationship between redshift and radial velocity. The optimal bandwidths, \(h_1 = 0.18\) and \(h_2 = 1.97\), are obtained using the LCV method. The CDF estimates of celestial radial velocity for redshift values of 0.02, 0.66, 1.95 and 2.59 are presented in Fig. 12. As shown in Fig. 12, the conditional densities approximately follow the Gaussian distribution, which is consistent with the findings of Strauss and Willick (1995). The figure also shows that at redshifts of 0.02, 0.66, 1.95 and 2.59, the density peaks of radial velocity are 0.01 (\(y=0\)), 0.16 (\(y=0.4\)), 0.51 (\(y=1.3\)) and 0.70 (\(y=1.8\)), respectively. This suggests that as the redshift increases, the density peaks of radial velocity gradually rise. These findings are consistent with Hubble’s Law (Hubble 1929). Furthermore, the estimated curves of conditional expectation for radial velocity with respect to redshift are shown in Fig. 13. From Fig. 13, we observe that the average radial velocity of galaxies also increases as redshift increases, reflecting the phenomenon of cosmic expansion. By measuring the velocity distribution of galaxies at various redshifts, we can confirm Hubble’s Law and accurately calibrate the Hubble constant.
4.3.2 Independence test
In this subsection, we apply our proposed test method to assess independence in the LROSS data. We use a deconvolution double kernel estimator to evaluate MI and perform 1000 permutations to obtain the 95% quantile for the test. First, we test the independence between metallicity and redshift. By using the proposed deconvolution double kernel MI estimator, we find that \(\widehat{MI}=0.0450\), which is significantly larger than the 95% quantile of 0.0003 calculated through the permutation technique. Thus, we reject the null hypothesis, indicating dependence exists between redshift and metallicity. This conclusion aligns with previous findings in Pilyugin et al. (2013). Subsequently, we test the independence between redshift and radial velocity. The statistic \(\widehat{MI}=0.4172\) is also significantly higher than the 95% quantile of 0.0018. Therefore, we reject the null hypothesis, suggesting a strong relationship between redshift and radial velocity. This result is consistent with the associations described by Hubble’s law.
5 Concluding remarks
Testing independence has gained increasing attention in the statistical literature. In many scientific fields, observations are often collected with measurement errors. We focus on testing statistical independence when these measurement errors are substantial. Specifically, we estimate MI by transforming the CDF using a novel deconvolution double kernel method. For both super smooth and ordinary smooth measurement errors, we establish the convergence rates of the CDF estimator and analyze the asymptotic behavior of the MI estimator. Furthermore, our proposed asymptotic theories can be extended to test for conditional independence, which will be explored in our future research.
Data availability
Data is provided within the manuscript.
References
Ai, C., Sun, L.H., Zhang, Z., Zhu, L.: Testing unconditional and conditional independence via mutual information. J. Econom. 39, 105335 (2022)
Berrett, T.B., Samworth, R.J.: Nonparametric independence testing via mutual information. Biometrika 106(3), 547–566 (2019)
Carroll, R.J., Hall, P.: Optimal rates of convergence for deconvolving a density. J. Am. Stat. Assoc. 83(404), 1184–1186 (1988)
Cao, D., Chen, Y., Chen, J., Zhang, H., Yuan, Z.: An improved algorithm for the maximal information coefficient and its application. Royal Soc. Open Sci. 8(2), 201424 (2021)
Cinquegrana, G.C., Karakas, A.I.: The most metal-rich stars in the universe: chemical contributions of low-and intermediate-mass asymptotic giant branch stars with metallicities within 0.04\(\le z \le \) 0.10. Mon. Not. Royal Astron. Soc. 510(2), 1557–1576 (2022)
De Gooijer, J.G., Zerom, D.: On conditional density estimation. Stat. Neerl. 57(2), 159–176 (2003)
Delaigle, A.: Deconvolution kernel density estimation. In: Handbook of Measurement Error Models, pp. 185–220. Chapman and Hall/CRC, Boca Raton (2021)
Deb, N., Sen, B.: Multivariate rank-based distribution-free nonparametric testing using measure transportation. J. Am. Stat. Assoc. 118(541), 192–207 (2023)
Fan, J., Truong, Y.K.: Nonparametric regression with errors in variables. Ann. Stat. 21(4), 1900–1925 (1993)
Fan, J., Jiang, J.: Nonparametric inferences for additive models. J. Am. Stat. Assoc. 100(471), 890–907 (2005)
Fan, G., Liang, H., Shen, Y.: Penalized empirical likelihood for high-dimensional partially linear varying coefficient model with measurement errors. J. Multivar. Anal. 147, 183–201 (2016)
Fan, J., Zhang, Y., Zhu, L.: Independence tests in the presence of measurement errors: an invariance law. J. Multivar. Anal. 188(C), 104818 (2022)
Fokianos, K., Pitsillou, M.: Testing independence for multivariate time series via the auto-distance correlation matrix. Biometrika 105(2), 337–352 (2018)
Gretton, A., Fukumizu, K., Teo, C., et al.: A kernel statistical test of independence. Adv. Neural. Inf. Process. Syst. 20, 585–592 (2007)
Gretton, A., Borgwardt, K.M., Rasch, M.J., et al.: A kernel two-sample test. J. Mach. Learn. Res. 13(1), 723–773 (2012)
Gonzalez, M.E., Silva, J.F., Videla, M., Orchard, M.E.: Data-driven representations for testing independence: modeling, analysis and connection with mutual information estimation. IEEE Trans. Signal Process. 70, 158–173 (2021)
Heller, R., Heller, Y., Gorfine, M.: A consistent multivariate test of association based on ranks of distances. Biometrika 100(2), 503–510 (2013)
Howes, L.M., Casey, A.R., Asplund, M., et al.: Extremely metal-poor stars from the cosmic dawn in the bulge of the Milky Way. Nature 527(7579), 484–487 (2015)
Hubble, E.: A relation between distance and radial velocity among extra-galactic nebulae. Proc. Natl. Acad. Sci. 15(3), 168–173 (1929)
Huang, W., Zhang, Z.: Nonparametric estimation of the continuous treatment effect with measurement error. J. R. Stat. Soc. Ser. B Stat Methodol. 85, 474–496 (2023)
Kim, T.W., Park, J.Y., Shin, J.Y.: Determining proper threshold levels for hydrological drought analysis based on independent tests. J. Korea Water Resour. Assoc. 53(3), 193–200 (2020)
Kulkarni, V.P., Fall, S.M., Lauroesch, J.T., et al.: Hubble space telescope observations of element abundances in low-redshift damped Ly\(\alpha \) galaxies and implications for the global metallicity-redshift relation. Astrophys. J. 618(1), 68–90 (2005)
Kulkarni, H., Khandait, H., Narlawar, U.W., Rathod, P., Mamtani, M.: Independent association of meteorological characteristics with initial spread of Covid-19 in India. Sci. Total Environ. 764, 142801 (2021)
Leung, D., Drton, M.: Testing independence in high dimensions with sums of rank correlations. Ann. Stat. 46(1), 280–307 (2018)
Limnios, M., Clémençon, S.: On ranking-based tests of independence. In: International Conference on Artificial Intelligence and Statistics, pp. 577-585 (2024)
Marron, J.S., Wand, M.P.: Exact mean integrated squared error. Ann. Stat. 20(2), 712–736 (1992)
Mariano, M.G., Manuel, R.M.: A non-parametric independence test using permutation entropy. J. Econom. 144(1), 139–155 (2008)
Ma, L., Wu, X., Li, Z.: High-precision medicine bottles vision online inspection system and classification based on multifeatures and ensemble learning via independence test. IEEE Trans. Instrum. Meas. 70, 1–12 (2021)
Neyman, J., Pearson, E.S.: IX. On the problem of the most efficient tests of statistical hypotheses. Philos. Trans. Royal Soc. London Series A 231, 289–337 (1933)
Olagunju, A.: An empirical analysis of the impact of auditors independence on the credibility of financial statement in Nigeria. Res. J. Finance Account. 2(3), 82–99 (2011)
Parzen, E.: On estimation of a probability density function and mode. Ann. Math. Stat. 33(3), 1065–1076 (1962)
Pethel, S.D., Hahs, D.W.: Exact test of independence using mutual information. Entropy 16(5), 2839–2849 (2014)
Pilyugin, L.S., Lara-López, M.A., Grebel, E.K., et al.: The metallicity-redshift relations for emission-line SDSS galaxies: examination of the dependence on the star formation rate. Mon. Not. R. Astron. Soc. 432(2), 1217–1230 (2013)
Reshef, D.N., Reshef, Y.A., Finucane, H.K., et al.: Detecting novel associations in large data sets. Science 334(6062), 1518–1524 (2011)
Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. Ann. Math. Stat. 27(3), 832–837 (1956)
Runge, J.: Conditional independence testing based on a nearest-neighbor estimator of conditional mutual information. PMLR 84, 938–947 (2018)
Scott, D.W., Terrell, G.R.: Biased and unbiased cross-validation in density estimation. J. Am. Stat. Assoc. 82(400), 1131–1146 (1987)
Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J. 27(3), 379–423 (1948)
Silverman, B.W.: Density Estimation for Statistics and Data Analysis. Chapman and Hall, London (1986)
Stefanski, L.A., Carroll, R.J.: Deconvolving kernel density estimators. Statistics 21(2), 169–184 (1990)
Stone, C.J.: Optimal global rates of convergence for nonparametric regression. Ann. Stat. 10(4), 1040–1053 (1982)
Strauss, M.A., Willick, J.A.: The density and peculiar velocity fields of nearby galaxies. Phys. Rep. 261(5–6), 271–431 (1995)
Székely, G.J., Rizzo, M.L., Bakirov, N.K.: Measuring and testing dependence by correlation of distances. Ann. Stat. 35(6), 2769–2794 (2007)
Su, L., White, H.: Testing conditional independence via empirical likelihood. J. Econom. 182(1), 27–44 (2014)
Tsybakov, A.B.: Introduction to Nonparametric Estimation. Springer, New York (2011)
Wang, X.F., Wang, B.: Deconvolution estimation in measurement error models: the R package decon. J. Stat. Softw. 39(10), 1–24 (2011)
Yabe, K., Ohta, K., Iwamuro, F., et al.: The mass-metallicity relation at z\(\sim \)1.4 revealed with Subaru/FMOS. Mon. Not. Royal Astron. Soc. 437(4), 3647–3663 (2014)
Zeng, X., Xia, Y., Tong, H.: Jackknife approach to the estimation of mutual information. Proc. Natl. Acad. Sci. 115(40), 9956–9961 (2018)
Zhou, Y., Xu, K., Zhu, L., Li, R.: Rank-based indices for testing independence between two high-dimensional vectors. Ann. Stat. 52(1), 184–206 (2024)
Acknowledgements
The authors thank the Associate Editor and two anonymous referees for constructive comments and helpful suggestions, which led to substantial improvements of this paper. They also thank Yingxing Li from Xiamen University and Yuexiao Dong from Temple University for their valuable comments and suggestions on improving the manuscript presentation. This research was supported by the National Social Science Fund of China (22BTJ018), Renmin University of China (22XNA026), and the National Natural Science Foundation of China (12225113, 12171477).
Author information
Authors and Affiliations
Contributions
Guoliang Fan: first author, conceived of the presented idea, methodology, computation and writing. Xinlin Zhang: co-first author, performed the computations, methodology and writing. Liping Zhu: corresponding author, conceived of the presented idea, developed the theory and writing. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Fan, G., Zhang, X. & Zhu, L. Independence test via mutual information in the presence of measurement errors. Stat Comput 34, 192 (2024). https://doi.org/10.1007/s11222-024-10502-9
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11222-024-10502-9