1 Introduction and Main Result

In [8], the authors resolved in the affirmative the long-standing square root problem of Kato for divergence form complex-elliptic operators in \(\mathbb {R}^n\). This was the culmination of a series of previous results: [16] (treating the 1-d case); [15] and [26, 27] (treating small perturbations of the constant coefficient case); [37] (the 2-d case); [9] (small perturbations of the real-symmetric case, sometimes referred to as the restricted Kato problem); and [33] (the case that the heat kernel satisfies a Gaussian upper bound and Nash-type Hölder continuity estimates). The solution of the Kato problem, and the circle of ideas involved in its proof, led to subsequent breakthroughs in the theory of elliptic boundary value problems, see, e.g., [1,2,3,4,5, 10, 12, 18, 25, 30,31,32, 34,35,36]. See also the significant related ground-breaking work in the parabolic setting: [6, 7, 43].

In this note, we initiate the study of the square root problem in the non-divergence setting in dimensions greater than 1, with the eventual goal of developing applications to the theory of boundary value problems, as has been done in the aforementioned divergence form case. Previously, the non-divergence problem had been treated only in the 1-dimensional setting, in [39]; in fact, in that paper the authors treat the more general class of operators of the form \(L=-aDbD\), where D denotes the ordinary differentiation operator on the line, and ab are arbitrary bounded accretive complex-valued functions on the line.

At present, we are able to treat only the case of real coefficients. On the other hand, we point out that in the divergence form case, there is no known proof for real, non-symmetric coefficients that is fundamentally easier than the proof in the general case, owing to the non-selfadjointness of non-symmetric divergence form operators. Moreover, it is the real, non-symmetric case that underlies the breakthrough in the study of the Dirichlet problem obtained in [31]. We observe that in the non-divergence setting, we may assume without loss of generality that the coefficient matrix is symmetric, but in contrast to the divergence form case, operators of non-divergence type are inherently non-selfadjoint, even with symmetric coefficients.

A fundamental difficulty that one encounters in the non-divergence setting, is that the Kato problem for non-divergence elliptic operators seems to be most naturally formulated in a weighted \(L^2\) space, and in general, the weight need not belong to the Muckenhoupt \(A_2\) class. Another difficulty inherent to the non-divergence setting is the lack of uniqueness (see the work of Nadirashvili [42]). On the other hand, working with real coefficients allows us to make use of the pioneering work of Krylov and Safonov [40], as well as the important ideas of Baumann [11] and Escauriaza [24]. We shall return to these matters in the sequel. First, let us set notation and definitions.

We will say that the operator L is a second-order elliptic operator in non-divergence form on \(\mathbb {R}^n\) if

$$\begin{aligned} Lu = - \sum _{i, j = 1}^n a_{ij} D_i D_j u, \end{aligned}$$

where \(A = (a_{ij}(\cdot ))\) is a real and measurable coefficient matrix which (without loss of generality) we can take to be symmetric, and for which we also assume, for some \(\lambda > 0\),

$$\begin{aligned} A(x) \xi \cdot \xi \ge \lambda |\xi |^2 ~\text { and }~ |A(x) \xi \cdot \zeta | \le \lambda ^{-1} |\xi | |\zeta | \quad \text {for all } \xi , \zeta \in \mathbb {R}^n \text { and a.e. } x \in \mathbb {R}^n. \end{aligned}$$
(1.1)

For such L, we have also its adjoint operator

$$\begin{aligned} L^*u = - \sum _{i, j = 1}^n D_i D_j (a_{ij} u). \end{aligned}$$

Following [24], we say that the function \(u \in L^1_{\textrm{loc}}(\mathbb {R}^n)\) is a solution of the adjoint equation \(L^*u = 0\) if for every \(\varphi \in \mathscr {C}_c^\infty (\mathbb {R}^n)\) we have

$$\begin{aligned} \int _{\mathbb {R}^n} u(x) L\varphi (x) dx = 0. \end{aligned}$$

Let us recall also the definition of Muckenhoupt weights, that will be used throughout the text because of the properties of some particularly relevant adjoint solution, as we shall see shortly.

Definition 1.2

(Muckenhoupt weights) We say that the function w belongs to the Muckenhoupt class of weights \(A_p\) for some \(1< p < \infty \) if \(w(x) > 0\) a.e. \(x \in \mathbb {R}^n\) and

$$\begin{aligned}{}[w]_{A_p} := \sup _B \left( \frac{1}{|B|} \int _B w(x) dx \right) \left( \frac{1}{|B|} \int _B w(x)^{1-p'} dx\right) ^{p-1} < \infty , \end{aligned}$$

where the supremum is taken over all the balls \(B \subset \mathbb {R}^n\), and also denote \(A_\infty := \bigcup _{1< p < \infty } A_p\).

We recall that it is well known that the \(A_\infty \) property is equivalent to the Reverse Hölder property, i.e., that there is an exponent \(q>1\), and a uniform constant C such that for every ball B,

$$ \left( \frac{1}{|B|} \int _B w^q(x) dx \right) ^{1/q} \le C \frac{1}{|B|} \int _B w(x) dx (RH_q). $$

For details of the theory of Muckenhoupt weights, the reader may consult, e.g. [20, Chapter 7].

With this definition in mind, we recall a fundamentally important property of equations in non-divergence form.

Lemma 1.3

([24, Theorem 1.1]) Let L be a second-order elliptic operator in non-divergence form, and \(L^*\) its adjoint. Then there exists a non-negative solution W of the adjoint equation \(L^*W = 0\) in \(\mathbb {R}^n\), satisfying \(W(B_1(0)) = |B_1(0)|\), which we call the global non-negative adjoint solution. Furthermore, W satisfies a Reverse Hölder property with exponent \(\frac{n}{n-1}\), so \(W \in A_\infty \) (c.f. Definition 1.2). Moreover, the \(RH_{n/(n-1)}\) constants depend only on dimension and ellipticity.

If the coefficients of L are smooth, or even belong to VMO, then W (with the stated normalization) is unique. In general, it need not be unique. On the other hand, for any given L, any choice of such a W will enjoy the same quantitative estimates, with uniform control of all relevant constants. In the case that W is not unique, we may therefore simply fix an arbitrary choice of W.

It is a well known fact that \(A_\infty \) weights are doubling. In our case, this means that there exists a constant \(C_D = C_D([W]_{A_\infty }) \ge 1\) such that \(W(2B) \le C_D W(B)\) for every ball B.

From now on we will work most of the time in the weighted Hilbert space

$$ L^2_W := L^2(\mathbb {R}^n, W(x)dx). $$

In the non-divergence setting, this space is more natural in many ways than unweighted \(L^2\). In particular, the following identity holds, as may be seen formally by using \(L^*W = 0\) and integrating by parts:

$$\begin{aligned} \int _{\mathbb {R}^n} u Lu W dx = \int _{\mathbb {R}^n} A \nabla u \cdot \nabla u W dx. \end{aligned}$$
(1.4)

In fact, one may readily deduce that (1.4) holds when the coefficients are smooth, and more generally, it also holds at least when the coefficients have sufficiently small BMO norm (depending only on dimension and ellipticity), for all \(u\in \mathcal {D}(L)\) (the domain of L), defined by

$$ \mathcal {D}(L):= \left\{ u\in L^2_W: Lu \in L^2_W\right\} . $$

Indeed, if the coefficients have sufficiently small BMO norm, then \(W\in A_2\) (see [23, Theorem 1.2], and its proofFootnote 1). In turn, using this fact, one may proveFootnote 2 the regularity estimate

$$\begin{aligned} \Vert \nabla ^2 u\Vert _{L^2_W} \lesssim \Vert f\Vert _{L^2_W} \end{aligned}$$
(1.5)

for solutions of the Poisson problem \(Lu=f\in L^2_W\), and hence, that

$$\begin{aligned} \mathcal {D}(L) = H_W^{2}(\mathbb {R}^n) := \left\{ u\in L^2_W: \nabla u, \nabla ^2 u \in L^2_W\right\} . \end{aligned}$$
(1.6)

The identity (1.4) then follows readily for all \(u\in \mathcal {D}(L)\).

Remark 1.7

We observe that \(H_W^{2}(\mathbb {R}^n)\) is dense in \(L^2_W\) when \(W\in A_2\) (indeed, even \(\mathscr {C}_c^\infty \) is dense in that case), and therefore L is densely defined when the coefficients have sufficiently small BMO norm.

We shall also consider the normalized adjoint of L, which we denote by \(\widetilde{L}\), and which we define to be the adjoint of L with respect to the space \(L^2_W\). Thus, \(\widetilde{L}\) is given, at least for smooth coefficients, by the formula

$$\begin{aligned} \widetilde{L} u := - \frac{1}{W} \sum _{i, j = 1}^n D_i D_j (a_{ij} u W) = \frac{1}{W}\, L^*(uW). \end{aligned}$$

If the coefficients are merely measurable, then we interpret \(\widetilde{L}\) in the weak sense: we say that \(u\in L^2_W\) belongs to \(\mathcal {D}(\widetilde{L})\), the domain of \(\widetilde{L}\), provided that there is a function \(g\in L^2_W\) such that for every \(v\in \mathcal {D}(L)\),

$$ \int _{\mathbb {R}^n} Lv \, u\, W dx = \int _{\mathbb {R}^n} v g W dx, $$

and in this case we set \(\widetilde{L} u = g\). Just as for (1.4), integrating by parts and using \(L^*W = 0\), we obtain (at least in the case of smooth coefficients, and for \(u\in H^2_W\))

$$\begin{aligned} \int _{\mathbb {R}^n} u \widetilde{L} u W dx = \int _{\mathbb {R}^n} A \nabla u \cdot \nabla u W dx. \end{aligned}$$
(1.8)

We shall henceforth make the qualitative assumption that the coefficients \(a_{ij}\) are smooth, with qualitative \(L^\infty \) bounds on \(\nabla a_{ij}\) and \(\nabla ^2 a_{ij}\). Thus, (1.8) will be valid in the sequel, for \(u\in H^2_W\). On the other hand, we emphasize that our quantitative bounds will never depend on smoothness, nor on estimates for the derivatives of \(a_{ij}\).

Remark 1.9

Using (1.4), (1.8), (1.5), and (1.6)Footnote 3, one may then show that L, and \(\widetilde{L}\), viewed as unbounded operators on \(L^2_W\), are each closed, sectorial and m-accretive, and hence each has an m-accretive square root (see [38, Theorem 3.35, p. 281], or [28, Sections 3 and 7]). Moreover, L generates a heat semigroup \(z\mapsto e^{-zL}\), which is well-defined and analytic in a sector containing the positive real axis (hence, the analogous statement is also true for \(\widetilde{L}\)).

Let us now sketch the proofs of the functional analytic facts listed in Remark 1.9.

L and \(\widetilde{L}\) are closed operators. For \(\widetilde{L}\), this follows immediately from the fact that L is densely defined (see [38, p. 168]). Thus we consider L. Suppose that \(\{u_n\}_n \subset \mathcal {D}(L)\), that \(u_n \rightarrow u\) in \(L^2_W\), and that

$$ f_n:= L u_n \rightarrow f ~\text { in } L^2_W. $$

We need to verify that \(u\in \mathcal {D}(L)\), and that \(L u=f\), i.e., that the graph \(\{(u,Lu):u \in \mathcal {D}(L)\}\) is a closed set in \(L^2_W\times L^2_W\). Applying (1.4) and (1.5) to \(u_n-u_m\), we see that \(\{\nabla u_n\}_n\) and \(\{\nabla ^2 u_n\}_n\) are each convergent in \(L^2_W\), thus, \(\{u_n\}_n\) is convergent in \(H^2_W\), and since \(u_n\rightarrow u\) in \(L^2_W\), we see that \(u\in H^2_W =\mathcal {D}(L)\), and that \(u_n \rightarrow u\) in \(H^2_W\). In particular, \(D_iD_j u_n \rightarrow D_iD_j u\) in \(L^2_W\) for each \(i,j = 1,2,\dots ,n\), hence \(Lu_n \rightarrow Lu\), so that \(Lu=f\), as desired.

L and \(\widetilde{L}\) are sectorial. It follows readily from (1.4) (respectively, (1.8)) that the numerical ranges

$$ \varTheta :=\left\{ \langle Lu,u\rangle \in \mathbb {C}: \Vert u\Vert _{L^2_W} =1\right\} , \quad \widetilde{\varTheta }:= \left\{ \langle \widetilde{L} u,u\rangle \in \mathbb {C}: \Vert u\Vert _{L^2_W} =1\right\} $$

are each contained in a sector \(S_\omega := \{z\in \mathbb {C}: |\textrm{arg}\,z|\le \omega \} \cup \{0\}\), with \(0<\omega <\pi /2\), depending only on ellipticity. We omit the standard argument.

L and \(\widetilde{L}\) are m-accretive. By [38, Problem 3.31, p. 279], it suffices to show that L is m-accretive. To this end, set

$$ \varDelta := \mathbb {C}\setminus S_\omega , $$

and for \(\zeta \in \varDelta \), set \(\delta =\delta (\zeta ):= \textrm{dist}(\zeta ,S_\omega )\). By symmetry, we also have \(\delta = \textrm{dist}(\overline{\zeta },S_\omega )\). We now claim that

$$\begin{aligned} \Vert (L-\zeta )u\Vert _{L^2_W} \ge \delta \Vert u\Vert _{L^2_W},\quad u \in \mathcal {D}(L),~\zeta \in \varDelta . \end{aligned}$$
(1.10)

Indeed, to verify the claim, we may assume without loss of generality that \(\Vert u\Vert _{L^2_W} = 1\), in which case

$$ \Vert (L-\zeta )u\Vert _{L^2_W} \ge \left| \langle (L-\zeta )u,u\rangle \right| = \left| \langle Lu,u\rangle - \zeta \right| \ge \delta , $$

since \(\langle Lu,u\rangle \in \varTheta \subset S_\omega \).

Fix \(\zeta \in \varDelta \). Then \(L-\zeta \) is 1-1 on \(\mathcal {D}(L)\), and has closed range (since L is a closed operator). Similarly, \(\widetilde{L}-\overline{\zeta }\) is 1-1 on \(\mathcal {D}(\widetilde{L})\). Since L is densely defined,

$$ \mathcal {N}(\widetilde{L}-\overline{\zeta }) = \mathcal {R}(L-\zeta )^\perp , $$

i.e., the null space of \(\widetilde{L}-\overline{\zeta }\) is the orthogonal complement of the range of \(L-\zeta \). Thus, \(L-\zeta \) has dense range, since \(\widetilde{L}-\overline{\zeta }\) is 1-1. Hence, \(L-\zeta \) is invertible as a mapping from \(\mathcal {D}(L)\) onto \(L^2_W\). Combined with the estimate (1.10), this shows that L is m-accretive (see [38, p. 279]).

The Heat Semi-Group

Given the preceeding properties of L, we have existence, uniqueness, and analyticity of a contraction semigroup \(e^{-zL}\), for z in the open sector \(S^0_\alpha := \{z\in \mathbb {C}: |\textrm{arg}\,z| < \alpha \}\), provided \(0<\alpha < \pi /2-\omega \). See, e.g., [38, pp. 480–493, especially Theorem 1.24, p. 492].

Our main result is the following:

Theorem 1.11

Let L be a second-order elliptic operator in non-divergence form with smooth real coefficients satisfying (1.1), and let W be the associated global non-negative adjoint solution provided by Lemma 1.3. If \(W \in A_2\) (see Definition 1.2), then we have

$$\begin{aligned} \left\| \sqrt{L}f\right\| _{L^2_W} \approx \Vert \nabla f\Vert _{L^2_W} \approx \left\| \sqrt{\widetilde{L}}f\right\| _{L^2_W}, \end{aligned}$$

where the implicit constants depend only on n, \(\lambda \) and \([W]_{A_2}\).

The main goal of this paper is to prove this theorem. Hence, from here on we will always impose the extra assumption that \(W \in A_2\), along with the qualitative assumption that the coefficients are smooth. The result will follow at once from Theorems 3.1 and 4.1.

Some additional remarks are in order.

Remark 1.12

As mentioned above, in general W belongs to the class \(A_\infty \), and thus \(W\in A_p\) for some p depending on dimension and ellipticity, but p may be strictly greater than 2, and in fact in the general case we have no precise upper bound on p. Thus, our result is only a partial one, and does not address the fundamental challenge of treating the non-\(A_2\) case. On the other hand, as noted above, if the coefficients have sufficiently small BMO norm, then \(W\in A_2\), and thus our result does apply in that setting.

Remark 1.13

In the case that the coefficient matrix has sufficiently small BMO norm, then as also noted above, we may identify the domain \(\mathcal {D}(L)\) as the weighted Sobolev space \(H^2_W(\mathbb {R}^n)\) (see (1.6)). Hence, by combining several known (or at least implicit) results, we may identify the domain of \(\sqrt{L}\) as the Sobolev space \(H_W^1(\mathbb {R}^n):=\{u\in L^2_W(\mathbb {R}^n): \nabla u \in L^2_W(\mathbb {R}^n)\}\); this corresponds to the estimate \(\Vert \sqrt{L}f\Vert _{L^2_W} \lesssim \Vert \nabla f\Vert _{L^2_W}\). Indeed, in [22] it is shown that the operator L has a bounded holomorphic functional calculus in (unweighted) \(L^2\) (even in \(L^p\)), provided that the BMO norm of the coefficients is sufficiently small. Under the same smallness assumption, the arguments of [22] may be extended to the weighted case considered here, to deduce that L has a bounded holomorphic functional calculus in \(L_W^2\). Combining the results of [45] and [41], we find that \(\mathcal {D}(\sqrt{L})\) is the complex interpolation space mid-way between \(L^2_W\) and \(\mathcal {D}(L) = H^2_W\), i.e., \(\mathcal {D}(\sqrt{L}) = H^1_W\).

The analogous strategy fails for \(\widetilde{L}\), as we have no idea how to identify \(\mathcal {D}(\widetilde{L})\) (similarly, the square root problem in the divergence form case entailed the same difficulty).

Remark 1.14

The assumption of smoothness of the coefficients is purely qualitative, and our quantitative estimates will not depend on smoothness, but only on the stated parameters n, \(\lambda \) and \([W]_{A_2}\). However, it is not clear at present how to make sense of the identity (1.8) for non-smooth coefficients, and as a consequence, in the absence of smoothness, we do not know how to prove certain estimates which rely on (1.8), such as Lemma 2.11(viii) (in the case of measurable coefficients, we know how to give only a formal proof of the latter, assuming a priori finiteness of \(\Vert \nabla e^{-t^2\widetilde{L}} f\Vert _{L^2_W}\)).

On the other hand, as noted above, identity (1.4) holds without smoothness, in the case that the coefficients have sufficiently small BMO norm. Under the latter scenario, we require identity (1.8) and its consequences (and thus, the qualitative, a priori assumption of smoothness of the coefficients) in two places: 1) in the proof of Theorem 4.1 (the square root problem for \(\widetilde{L}\)), where estimate (1.8) is heavily used, and 2) in the proof of the m-accretivity of L given above, where we used (1.8) to establish density of the range of \(L-\zeta \). Otherwise, (1.8) is not used in the proof of Theorem 3.1 (the square root problem of L).

Although our operators L and \(\widetilde{L}\) are not of divergence form, there is a nice identity relating these two non-divergence operators with another one which is in divergence form, but degenerate ellipticFootnote 4. Indeed, if we let \(\widetilde{\textrm{div}}\) denote the normalized divergence, defined for an \(\mathbb {R}^n\)-valued function \(\textbf{v}\) by \(\widetilde{\textrm{div}}\,\textbf{v}:= \frac{1}{W} \textrm{div}(W\textbf{v})\), then \(\widetilde{\textrm{div}}\) is precisely the adjoint operator to \(-\nabla \) inside \(L^2_W\), and we also have, using \(L^*W=0\),

$$\begin{aligned} Lu + \widetilde{L} u = -2 \widetilde{\textrm{div}}(A \nabla u). \end{aligned}$$
(1.15)

In the case of non-smooth coefficients, we interpret the latter identity in the weak sense described above: for \(u, \varphi \in H_W^2\),

$$ \int _{\mathbb {R}^n} \big (\varphi Lu \,+\, u L\varphi \big ) W dx = 2\int _{\mathbb {R}^n} A\nabla u\cdot \nabla \varphi \,W dx\,. $$

The identity (1.15) will be of great use to us in the sequel.

The paper is organized as follows:

  • In Section 2 we give some definitions and estimates for some of the operators, which will appear repeatedly across the paper.

  • In Section 3 we prove \(\Vert \sqrt{L}f\Vert _{L^2_W} \lesssim \Vert \nabla f\Vert _{L^2_W}\), which turns out to be a relatively easy consequence of Littlewood–Paley theory because of the form of L (since L annihilates not only constants but also first degree monomials).

  • In Section 4 we prove \(\Vert \sqrt{\widetilde{L}}f\Vert _{L^2_W} \lesssim \Vert \nabla f\Vert _{L^2_W}\), which is in fact the more difficult result in the paper. To treat \(\widetilde{L}\), we follow broadly the scheme provided by [8], first reducing the problem to some square function estimates, which are handled using a T1-like argument and then a local Tb argument. Of course, some significant modifications of the arguments in [8] are needed; the identity (1.15) will be useful in this case.

We remark that the square root problem for \(\widetilde{L}\) is significantly more difficult than its analogue for L.

1.1 Notation

  • We use the notation \(a \lesssim b\) to denote that there exists a positive harmless constant C (which can vary from line to line) such that \(a \le Cb\). We will also denote \(a \approx b\) whenever \(a \lesssim b\) and \(b \lesssim a\).

  • In the proofs of the results from now on, we will omit dependencies of constants on n, \(\lambda \), \([W]_{A_\infty }\) and \([W]_{A_2}\) – treating them as harmless constants – although we will make these dependencies explicit in the statements.

  • Euclidean balls are denoted by \(B_t(x) := \{y \in \mathbb {R}^n : |y-x| < t \}\).

  • If \(B = B_t(x) \subset \mathbb {R}^n\) is a ball and \(\kappa > 0\), by \(\kappa B\) we denote the ball with same radius and scaled by a factor of \(\kappa \), i.e., \(B_{\kappa t}(x)\). The same applies to cubes.

  • For \(E \subset \mathbb {R}^n\), |E| denotes the Lebesgue measure of E.

  • If \(E, F \subset \mathbb {R}^n\) are arbitrary subsets, we write \(\textrm{dist}(E, F) := \inf \{|x-y| : x \in E, y \in F\}\).

  • For any subset \(E \subset \mathbb {R}^n\), we denote \(\textbf{1}_E\) the characteristic function of E (i.e. \(\textbf{1}_E(x) = 1\) if \(x \in E\) and 0 otherwise). Concretely, we write \(\textbf{1} := \textbf{1}_{\mathbb {R}^n}\), the function constantly 1.

  • We will denote vector-valued functions with boldface letters, e.g., \(\textbf{v} := (v_1,\dots ,v_n)\).

  • \(D_j\) denotes the differentiation operator in the direction of \(x_j\), i.e., \(D_j = \frac{\partial }{\partial x_j}\).

  • We denote averages with respect to a measure \(\nu \) by ⨏E\(f d\nu := \nu (E)^{-1} \int _E f d\nu \). Often the measure with respect to which we take averages will be the weighted measure W(x)dx: it will be clear by the context. For the latter measure, we write \(W(E) := \int _E W(x) dx\).

  • We will frequently use cubes in our proofs: every time we cover \(\mathbb {R}^n\) (or some portion of it) by cubes, we mean we are using a covering by cubes of the dyadic grid \(\{2^j \textbf{k} + [0, 2^{j})^n : j \in \mathbb {Z}, \textbf{k} \in \mathbb {Z}^n\}\). Anytime we use the letter Q, we will be referring to a dyadic cube. For such a cube Q, we let \(\ell (Q)\) denote its sidelength.

  • We let \(\mathcal {M}\) and \(\mathcal {M}_W\) denote, respectively, the classical Hardy–Littlewood maximal operator, and the Hardy–Littlewood maximal function with respect to the measure W(x)dx, that is,

    figure a

    and

    figure b

    Since W is doubling (because \(W \in A_\infty \)), \(\mathcal {M}_W\) is bounded on \(L^p_W\) for every \(1< p < \infty \). We will use this fact in the sequel.

  • As explained before, we set \(L^2_W := L^2(\mathbb {R}^n, W(x)dx)\), and we define the weighted Sobolev space \(H_W^2:=\{u\in L^2_W: \nabla u\in L_W^2, \, \nabla ^2 u \in L_W^2\}\). We will also write \(L_W^2(E) := L^2(E, W(x)dx)\) for any subset \(E \subset \mathbb {R}^n\).

  • We denote the composition of two operators U and V by \(UV(f) := U(V(f))\)).

  • For a function \(f\in L^2(\mathbb {R}^n)\), we denote its Fourier transform by \(\hat{f}\).

  • \(\mathcal {S}\) will denote the usual Schwarz class of smooth, rapidly decaying functions on \(\mathbb {R}^n\).

2 Preliminaries

2.1 Gaussian Bounds for Kernels of Semigroups

From now on, we will use many times the parabolic semigroup (with elliptic homogeneity) \(e^{-t^2L}\), whose kernel is the fundamental solution \({\varGamma }_{t^2}(x, y)\); i.e. we have \(e^{-t^2L}f(x) = \int _{\mathbb {R}^n} {\varGamma }_{t^2}(x, y)f(y)dy\) for sufficiently regular f. The fundamental solution satisfies the following Gaussian estimate:

Lemma 2.1

([24, Theorem 1.2]) The kernel \({\varGamma }_{t^2}(\cdot , \cdot )\) of \(e^{-t^2L}\) satisfies the Gaussian bounds

$$\begin{aligned} {\varGamma }_{t^2}(x, y) \lesssim \min \left\{ \frac{1}{W(B_t(x))}, \frac{1}{W(B_t(y))} \right\} e^{-c\frac{|x-y|^2}{t^2}} W(y) \end{aligned}$$
(2.2)

and

$$\begin{aligned} \max \left\{ \frac{1}{W(B_t(x))}, \frac{1}{W(B_t(y))} \right\} e^{-\frac{|x-y|^2}{ct^2}} W(y) \lesssim {\varGamma }_{t^2}(x, y), \end{aligned}$$
(2.3)

where the implicit constants and c depend on n and \(\lambda \).

Remark 2.4

The results stated above as Lemmas 1.3 and 2.1, are stated in [24] explicitly for smooth coefficients, but as the author points out, “the usual compactness arguments”, and the uniformity of the estimates depending only on n and \(\lambda \), allow one to deduce the existence of (non-unique) W and \({\varGamma }\) verifying the same bounds, in the general case of bounded measurable coefficients.

Remark 2.5

The doubling property of W, combined with the exponential decay factor, allow us to interchange “min” and “max” in (2.2) and (2.3), modulo an adjustment of the constants.

Remark 2.6

The (absolute values of) the kernels of the operators \(t^2Le^{-t^2L}\) and \(t^4L^2e^{-t^2L}\) also satisfy the upper bound (2.2), by analyticity of the semigroup \(z\mapsto e^{-zL}\) in a sector.

2.2 Weighted Littlewood–Paley Theory

The following results are standard. We recall them here for the sake of convenience of exposition.

Lemma 2.7

Let \(W\in A_2\), and let \(K_t f := k_t * f\), with k defined on \(\mathbb {R}^n\) satisfying \(|k(x)| \le (1+|x|)^{-n-1}\), where \(k_t(x) := t^{-n}k(x/t)\). Then

$$\begin{aligned} \left\| \sup _{t>0} |K_t f|\right\| _{L_W^2} \lesssim \Vert \mathcal {M} f\Vert _{L_W^2} \lesssim \Vert f\Vert _{L_W^2}, \end{aligned}$$

where \(\mathcal {M}\) is the classical Hardy–Littlewood maximal operator, and the implicit constant depends on n and \([W]_{A_2}\).

Lemma 2.8

Let \(W\in A_2\), and let \(Q_s f := \psi _s * f\), where \(\psi \in \mathcal {S}\) and satisfies \(\int _{\mathbb {R}^n} \psi = 0\). Then

$$\begin{aligned} \int _0^\infty \Vert Q_s f\Vert _{L^2_W}^2 \frac{ds}{s} \lesssim \Vert f\Vert _{L^2_W}^2, \end{aligned}$$

where the implicit constant depends on n, \(\psi \), and \([W]_{A_2}\). Moreover, if in addition \(\psi \) is radial and non-trivial, then using a slight abuse of notation and then normalizing, we may assume that \( \int _0^\infty \hat{\psi }(s)^2 \,\frac{ds}{s} =1\), in which case we have the Calderón reproducing formula

$$ \int _0^\infty Q_s^2 f\, \frac{ds}{s} = f \in L^2_W. $$

Remark 2.9

Regarding the last pair of lemmata:

  • We will often denote by \(Q_t\) the operators satisfying the hypotheses in Lemma 2.8, while we use \(P_t\) for “nice” approximate identities (i.e. \(P_t f := \varphi _t * f\), with \(\varphi \in \mathcal {S}\) radial, and \(\int \varphi = 1\)).

  • It is easy to check if \(P_t\) is a nice approximate identity, then \(Q_t := t D_i P_t\) satisfies the hypotheses of the first part of Lemma 2.8 (where \(D_i\) denotes the partial derivative in any direction \(x_i\)).

  • We will frequently further assume that the kernel k in Lemma 2.7 (in particular, \(\varphi \) and \(\psi \) as above) satisfies \(\textrm{supp}\,k \subset B_1(0)\); in this case, we shall refer to \(K_t\) (in particular, \(P_t\) or \(Q_t\)) as having a “compactly supported kernel”.

  • We will use repeatedly the fact that \(P_t\) and \(Q_s\) commute with derivatives, for they are convolution operators.

The following is an easy consequence of Lemma 2.8, by standard “almost-orthogonality” arguments. We omit the well-known proof.

Lemma 2.10

Let \(\{Q_s\}_{s>0}\) be a family operators satisfying the conditions in Lemma 2.8, and \(R_t\) be a family of operators, bounded on \(L^2_W(\mathbb {R}^n)\) for each fixed \(t>0\), and satisfying, for some \(\alpha > 0\), the almost-orthogonality condition

$$\begin{aligned} \Vert R_tQ_s\Vert _{L^2_W \rightarrow L^2_W} \le C_1 \min \left\{ \frac{t}{s}, \frac{s}{t} \right\} ^\alpha , \end{aligned}$$

where \(C_1\) is a uniform constant which does not depend on t, s. Then \(R_t\) satisfies the square function estimate

$$\begin{aligned} \int _0^\infty \Vert R_t f\Vert _{L^2_W}^2 \frac{dt}{t} \lesssim \Vert f\Vert _{L^2_W}^2, \end{aligned}$$

where the implicit constant depends on n, \(C_1\), \(\alpha \) and \([W]_{A_2}\).

2.3 Uniform Bounds for Some Operators

Let us show that some operators related with the semigroups are uniformly bounded, which we will use throughout the text.

Lemma 2.11

The following operators are \(L^2_W \rightarrow L^2_W\) bounded, uniformly on t, with norm depending on n, \(\lambda \) and the doubling constant for W:

(i)      \(e^{-t^2L}\),

(ii)      \(t^2Le^{-t^2L}\),

(iii)      \(t^4L^2e^{-t^2L}\),

(iv)      \(t\nabla e^{-t^2L}\),

(v)      \(e^{-t^2\widetilde{L}}\),

(vi)      \(t^2\widetilde{L} e^{-t^2\widetilde{L}}\),

(vii)      \(t^4\widetilde{L}^2e^{-t^2\widetilde{L}}\),

(viii)      \(t\nabla e^{-t^2\widetilde{L}}\),

(ix)      \(t e^{-t^2L} \widetilde{\textrm{div}}\),

(x)      \(te^{-t^2\widetilde{L}}\widetilde{\textrm{div}}\),

(xi)      \(t^3 L e^{-t^2L} \widetilde{\textrm{div}}\),

(xii)      \(t^3 \widetilde{L} e^{-t^2\widetilde{L}} \widetilde{\textrm{div}}\).

Remark 2.12

Some of the operators are in fact defined for functions \(\textbf{f} \in (L^2_W)^n\). In this case, we obviously mean that each of their components are bounded.

Proof

(i)–(iii). Recall that \(\mathcal {M}_W\) denotes the Hardy–Littlewood maximal operator with respect to the weighted measure W(x)dx. If we let \(T_t\) denote the operator under consideration in (i), (ii) or (iii), it suffices to observe that in each case we have the pointwise bound

$$ \sup _t |T_t f| \lesssim \mathcal {M}_Wf, $$

using only the Gaussian bounds in Lemma 2.1 (and Remark 2.6 in the case of (ii) and (iii)), and the doubling property of W. We omit the routine details.

To prove (iv), we set \(u := e^{-t^2L} f\). Note that by the identity (1.4), we can “interpolate” the estimates in (i) and (ii) using Cauchy–Schwarz as follows:

$$\begin{aligned} \int _{\mathbb {R}^n} |t \nabla u|^2 W dx\lesssim & {} \int _{\mathbb {R}^n} t^2 A \nabla u \cdot \nabla u W dx = \int _{\mathbb {R}^n} t^2 Lu u W dx\\\le & {} \left( \int _{\mathbb {R}^n} |t^2 Lu|^2 W dx \right) ^{1/2} \left( \int _{\mathbb {R}^n} |u|^2 W dx \right) ^{1/2} \lesssim \int _{\mathbb {R}^n} |f|^2 W dx, \end{aligned}$$

which shows (iv).

In turn, (v)–(vii) follow by duality from (i)–(iii), while (viii) follows “interpolating” (v) and (vi) in the same way that we did it for (iv), this time using (1.8) instead of (1.4).

Then, (ix) and (x) follow by duality from (viii), and (iv), respectively.

Lastly, (xi) (resp. (xii)) follows by first dualizing, and then “interpolating” (vi) and (vii) using (1.8) (resp. (ii) and (iii) using (1.4)). \(\square \)

2.4 Off-diagonal Estimates

Definition 2.13

We say that the operators \(S_t\) satisfy off-diagonal estimates (aka Gaffney estimates), if there exist \(c, C > 0\) (independent of t) such that it holds for every \(t > 0\), and every pair of measurable sets E and F,

$$\begin{aligned} \int _F |S_t f(x)|^2 W(x) dx \le C e^{-c\frac{\textrm{dist}(E, F)^2}{t^2}} \int _E |f(x)|^2 W(x) dx \qquad \text { if } \textrm{supp}\,f \subset E. \end{aligned}$$

Lemma 2.20

The following operators satisfy off-diagonal estimates, with constants depending only on n, \(\lambda \) and the doubling constant of W:

(i)      \(e^{-t^2L}\),

(ii)      \(t^2Le^{-t^2L}\),

(iii)      \(t^4L^2e^{-t^2L}\),

(iv)      \(t\nabla e^{-t^2L}\),

(v)      \(e^{-t^2\widetilde{L}}\),

(vi)      \(t^2\widetilde{L} e^{-t^2\widetilde{L}}\),

(vii)      \(t^4\widetilde{L}^2e^{-t^2\widetilde{L}}\),

(viii)      \(t e^{-t^2\widetilde{L}} \widetilde{\textrm{div}}\),

(ix)      \(t\nabla e^{-t^2\widetilde{L}}\),

(x)      \(t e^{-t^2L} \widetilde{\textrm{div}}\).

  

Proof

Fix sets \(E, F \subset \mathbb {R}^n\). We may assume that \(d := \textrm{dist}(E, F) > 5t\), as otherwise we may invoke Lemma 2.11.

The bound for (i) is a straightforward consequence of the pointwise bounds in Lemma 2.1. We omit the routine details. The off-diagonal estimates for (ii)–(iii) follow in the same way as for (i), using Remark 2.6.

Let us now treat (iv). We argue as in the proof of Caccioppoli’s inequality. Let \(u := e^{-t^2L} f\), with f supported in E. Choose \(\psi \in \mathscr {C}^\infty (\mathbb {R}^n)\), where \(0 \le \psi \le 1\) satisfies \(\psi \equiv 1\) on F, \(\textrm{dist}(\textrm{supp}\,\psi , E) \ge d/2\) (denoting \(d := \textrm{dist}(E, F)\)), and \(\Vert \psi \Vert _\infty + d \Vert \nabla \psi \Vert _\infty + d^2 \Vert \nabla ^2 \psi \Vert _\infty \le C\). For future reference, we note that

$$\begin{aligned} |\nabla \psi |^2 + |\nabla ^2\psi | \lesssim d^{-2} \,\textbf{1}_{\textrm{supp}\,\psi }. \end{aligned}$$
(2.15)

Clearly, we have

$$\begin{aligned} \int _F |t\nabla u|^2 W dx \le t^2 \int _{\mathbb {R}^n} |\nabla u|^2 \psi ^2 W dx. \end{aligned}$$
(2.16)

Compute, using (1.4) and the symmetry of A:

$$\begin{aligned}{} & {} \int _{\mathbb {R}^n} A \nabla (u \psi ) \cdot \nabla (u \psi ) W dx = \int _{\mathbb {R}^n} L(u\psi ) u\psi W = - \sum _{i, j = 1}^n \int _{\mathbb {R}^n} a_{ij} D_i D_j (u \psi ) u \psi W dx\nonumber \\{} & {} = \sum _{i, j = 1}^n\!\!\left( \!\!-\!\!\int _{\mathbb {R}^n}\!\! a_{ij} D_i D_j u u \psi ^2 W dx - 2\!\! \int _{\mathbb {R}^n}\!\! a_{ij} D_i u D_j \psi u \psi W dx -\!\! \int _{\mathbb {R}^n}\!\! a_{ij} D_i D_j \psi u^2 \psi W dx \right) \nonumber \\{} & {} = \int _{\mathbb {R}^n} Lu u \psi ^2 W dx - 2 \int _{\mathbb {R}^n} A \nabla u \cdot \nabla \psi u \psi W dx + \int _{\mathbb {R}^n} L \psi u^2 \psi W dx. \end{aligned}$$
(2.17)

Also,

$$\begin{aligned}{} & {} \int _{\mathbb {R}^n} A \nabla (u \psi ) \cdot \nabla (u \psi ) W dx \ge \lambda \int _{\mathbb {R}^n} |\nabla (u \psi )|^2 W dx\nonumber \\{} & {} = \lambda \left( \int _{\mathbb {R}^n} |\nabla u|^2 \psi ^2 W dx + 2 \int _{\mathbb {R}^n} \nabla u \cdot \nabla \psi u \psi W dx + \int _{\mathbb {R}^n} |\nabla \psi |^2 u^2 W dx \right) \\{} & {} \ge \lambda \int _{\mathbb {R}^n} |\nabla u|^2 \psi ^2 W dx + 2 \lambda \int _{\mathbb {R}^n} \nabla u \cdot \nabla \psi u \psi W dx.\nonumber \end{aligned}$$
(2.18)

Combining (2.17) and (2.18), we may dominate the right hand side of (2.16) by

$$\begin{aligned}{} & {} t^2 \int _{\mathbb {R}^n} |\nabla u|^2 \psi ^2 W dx\nonumber \\{} & {} \lesssim t^2 \int _{\mathbb {R}^n} |\nabla u| |\nabla \psi | |u \psi | W dx + t^2 \int _{\mathbb {R}^n} |Lu u \psi ^2| W + t^2 \int _{\mathbb {R}^n} |L \psi u^2 \psi | W\nonumber \\{} & {} =: I + II + III. \end{aligned}$$
(2.19)

For \(\varepsilon > 0\) to be chosen momentarily, we have

$$ I \lesssim \varepsilon t^2 \int _{\mathbb {R}^n} |\nabla u|^2 \psi ^2 W dx + \varepsilon ^{-1} \left( \frac{t}{d}\right) ^2 \int _{\textrm{supp}\,\psi } u^2 W dx, $$

where we have used “Cauchy’s inequality with \(\varepsilon \)” in term I, and then (2.15). Choosing \(\varepsilon \) small enough, we may hide the first term, and use the off-diagonal bounds for (i) to obtain the desired bound for the second term, since \(t\le d\). We estimate term II using the Cauchy–Schwarz inequality, along with the off-diagonal bounds for (i) and (iii).

The operators (v)–(viii) are dual to the first four, and (x) is dual to (ix). It therefore remains to treat (ix). To this end, we now set \(u:= e^{-t^2\widetilde{L}} f\), with f supported in E, so with \(\psi \) as above, again using (2.16), we have

$$\begin{aligned} \int _F |t\nabla u|^2 W dx\le & {} \int _{\mathbb {R}^n} |t\nabla u|^2 \psi ^2 W dx \lesssim t^2 \int _{\mathbb {R}^n} a_{ij} D_j u D_i u \,\psi ^2 W dx\\= & {} t^2 \int _{\mathbb {R}^n} D_j u \,D_i (a_{ij} W u) \,\psi ^2 dx - t^2 \int _{\mathbb {R}^n} u D_j u \,D_i (a_{ij} W) \,\psi ^2 dx =: \widetilde{I} + \widetilde{II}, \end{aligned}$$

where summation over i, j is understood. We first note that

$$\begin{aligned} \widetilde{I}= & {} -t^2 \int _{\mathbb {R}^n} u D_j D_i (a_{ij} W u) \,\psi ^2 dx - 2 t^2 \int _{\mathbb {R}^n} u D_i (a_{ij} W u) \psi D_j\psi dx \\= & {} t^2 \int _{\textrm{supp}\,\psi } u \widetilde{L} u \psi ^2 W dx - 2 t^2 \int _{\mathbb {R}^n} u D_i (a_{ij} W u) \psi D_j\psi dx =: \widetilde{I}_1+ \widetilde{I}_2. \end{aligned}$$

We obtain the desired estimate for \(\widetilde{I}_1\) by the off-diagonal bounds for (v) and (vi) like for term II in (2.19). Also, integrating by parts,

$$\begin{aligned} \widetilde{I}_2= & {} 2 t^2 \int _{\mathbb {R}^n} u D_i u a_{ij} \psi D_j\psi W dx + 2 t^2 \int _{\mathbb {R}^n} u^2 a_{ij} (D_i \psi D_j\psi + \psi D_iD_j \psi ) W dx\\=: & {} \widetilde{I}_{2,1} + \widetilde{I}_{2,2}. \end{aligned}$$

We will cancel term \(\widetilde{I}_{2,1}\) momentarily, but it may also be handled directly, exactly like term I in (2.19), using Cauchy’s inequality with \(\varepsilon \), hiding the small term, and bounding the other term using the off-diagonal bounds for (v). We estimate term \(\widetilde{I}_{2,2}\) like terms I and III in (2.19), using (2.15), the fact that \(t\le d\), and the off-diagonal bounds for (v).

Since W is an adjoint solution,

$$\begin{aligned} \widetilde{II}= & {} t^2 \int u^2 \psi D_j \psi D_i (a_{ij} W) dx \\= & {} -2t^2 \int u D_i u \psi D_j \psi a_{ij} W dx - t^2 \int u^2 (D_i\psi D_j \psi + \psi D_iD_j \psi )\, a_{ij} W dx\\=: & {} \widetilde{II}_1 + \widetilde{II}_2. \end{aligned}$$

Observe that \(\widetilde{II}_1\equiv -\widetilde{I}_{2,1}\), and \(\widetilde{II}_2 \equiv -\frac{1}{2} \widetilde{I}_{2,2} \). \(\square \)

Lemma 2.21

Let \(K_t\) be a convolution type operator as in Lemma 2.7, with a compactly supported kernel. If \(h=h(x,t)\) is a function such that \(f\mapsto h(\cdot ,t) K_t f\) is bounded on \(L^2_W\), uniformly in t (i.e. \(\Vert h(\cdot ,t) K_t\Vert _{L^2_W \rightarrow L^2_W} \lesssim 1\) uniformly in t), then \(h(\cdot ,t) K_t\) satisfies off-diagonal estimates.

We omit the trivial proof.

Lemma 2.22

Let \(\{U_t\}_{t>0}\) and \(\{U'_t\}_{t>0}\) be two families of operators, each satisfying off-diagonal estimates, then the composition \(U_t U'_t\) also satisfies off-diagonal estimates for each t.

We omit the routine proof.

2.5 Estimates for Differences and Gradients

Lemma 2.25

For \(f \in \textrm{Lip}\) and \(t \le \ell (Q)\), we have

$$\begin{aligned} \int _Q \left| e^{-t^2L}f(x) - f(x)\right| ^2 W(x) dx \lesssim t^2 \Vert \nabla f\Vert _\infty ^2 W(Q) \end{aligned}$$

and

$$\begin{aligned} \int _Q \left| e^{-t^2\widetilde{L}}f(x) - f(x)\right| ^2 W(x) dx \lesssim t^2 \Vert \nabla f\Vert _\infty ^2 W(Q), \end{aligned}$$

where the implicit constants depend on n, \(\lambda \) and \([W]_{A_2}\).

Proof

Let us show the first estimate. Cover Q by non-overlapping cubes \(Q_k\) with sidelength \(t/2 < \ell (Q_k) \le t\). Note that \(e^{-t^2L} \textbf{1} = \textbf{1}\), since \(L\textbf{1}=0\). Letting \([f]_E\) denote the average of f over the set E,

$$\begin{aligned} \left\| e^{-t^2L} f - f\right\| _{L^2_W(Q)}^2 = \sum _k \left\| e^{-t^2L} (f - [f]_{2Q_k}) - (f - [f]_{2Q_k})\right\| _{L^2_W(Q_k)}^2 =: \sum _k A_k^2. \end{aligned}$$
(2.23)

We then have

$$\begin{aligned} A_k\le & {} \left\| e^{-t^2L} \left( (f - [f]_{2Q_k}) \textbf{1}_{2Q_k} \right) \right\| _{L^2_W(Q_k)} + \left\| f - [f]_{2Q_k}\right\| _{L^2_W(Q_k)} \nonumber \\{} & {} + \sum _{j=1}^\infty \left\| e^{-t^2L} \left( (f - [f]_{2Q_k}) \textbf{1}_{2^{j+1}Q_k \setminus 2^jQ_k} \right) \right\| _{L^2_W(Q_k)}\nonumber \\=: & {} I^{(k)} + \sum _{j=1}^\infty II^{(k)}_j. \end{aligned}$$
(2.24)

Using the boundedness of \(e^{-t^2L}\) from Lemma 2.11 and the weighted version of Poincaré’s inequality [29, 15.26], we deduce

$$\begin{aligned} I^{(k)}\lesssim & {} \Vert f - [f]_{2Q_k}\Vert _{L^2_W(2Q_k)} \lesssim \ell (Q_k) \Vert \nabla f\Vert _{L^2_W(2Q_k)} \le t \Vert \nabla f\Vert _{L^2_W(2Q_k)}\\\lesssim & {} t \Vert \nabla f\Vert _\infty \sqrt{W(Q_k)}. \end{aligned}$$

For convenience of notation in the rest of this argument, we replace the constant c by 4c in the off-diagonal estimates for \(e^{-t^2L}\) from Lemma 2.14. Thus,

$$\begin{aligned} II^{(k)}_j\lesssim & {} e^{-c 4^{j+1} \frac{\ell (Q_k)^2}{t^2}} \Vert f - [f]_{2Q_k}\Vert _{L^2_W(2^{j+1}Q_k)}\\\le & {} e^{-c 4^{j}} \left\| f - [f]_{2^{j+1}Q_k}\right\| _{L^2_W(2^{j+1}Q_k)} + \sum _{i=1}^j e^{-c 4^j} \left\| [f]_{2^{i+1}Q_k} - [f]_{2^iQ_k}\right\| _{L^2_W(2^{j+1}Q_k)}\\\lesssim & {} e^{-c 4^{j}} \left\| f - [f]_{2^{j+1}Q_k}\right\| _{L^2_W(2^{j+1}Q_k)} + \sum _{i=1}^j e^{-c 4^j} C_D^{(j-i)/2} \left\| f - [f]_{2^{i+1}Q_k}\right\| _{L^2_W(2^{i+1}Q_k)}. \end{aligned}$$

Hence, summing on j and interchanging the order of summation we obtain

$$\begin{aligned} \sum _{j=1}^\infty \sum _{i=1}^j e^{-c 4^j} C_D^{(j-i)/2} \left\| f - [f]_{2^{i+1}Q_k}\right\| _{L^2_W(2^{i+1}Q_k)} \lesssim \sum _{i=1}^\infty e^{-\frac{c}{2} 4^i} \left\| f - [f]_{2^{i+1}Q_k}\right\| _{L^2_W(2^{i+1}Q_k)}, \end{aligned}$$

and therefore we have, by Poincarés inequality,

$$\begin{aligned} \sum _{j=1}^\infty II^{(k)}_j\lesssim & {} \sum _{j=1}^\infty e^{-\frac{c}{2} 4^j} \left\| f - [f]_{2^{j+1}Q_k}\right\| _{L^2_W(2^{j+1}Q_k)} \lesssim \sum _{j=1}^\infty e^{-\frac{c}{2} 4^j} 2^j \ell (Q_k) \Vert \nabla f\Vert _{L^2_W(2^{j+1}Q_k)}\\\lesssim & {} t \sum _{j=1}^\infty e^{-\frac{c}{4} 4^j} \Vert \nabla f\Vert _\infty \sqrt{W(2^{j+1}Q_k)}\\\lesssim & {} t \Vert \nabla f\Vert _\infty \sum _{j=1}^\infty e^{-\frac{c}{5} 4^j} \sqrt{W(Q_k)} \lesssim t \Vert \nabla f\Vert _\infty \sqrt{W(Q_k)}. \end{aligned}$$

Thus, going back to \(A_k\) (see (2.23)–(2.24)) we obtain

$$\begin{aligned} A_k \lesssim t \Vert \nabla f\Vert _\infty \sqrt{W(Q_k)}, \end{aligned}$$

and therefore, since the cubes \(Q_k\) are non-overlapping,

$$\begin{aligned} \left\| e^{-t^2L} f - f\right\| _{L^2_W(Q)}^2 \lesssim t^2 \Vert \nabla f\Vert _\infty ^2 \sum _k W(Q_k) = t^2 \Vert \nabla f\Vert _\infty ^2 W(Q). \end{aligned}$$

The corresponding estimate for \(e^{-t^2\widetilde{L}}\) holds with exactly the same proof, for the operator is also bounded and satisfies off-diagonal estimates (see Lemmas 2.11 and 2.14), and \(e^{-t^2\widetilde{L}} \textbf{1} = \textbf{1}\) since \(\widetilde{L}\textbf{1} = \frac{1}{W} L^*W = 0\). \(\square \)

Lemma 3.2

For \(f \in \textrm{Lip}\) and \(t \le \ell (Q)\), we have

$$\begin{aligned} \int _Q \left| \nabla e^{-t^2L}f(x)\right| ^2 W(x) dx \lesssim \Vert \nabla f\Vert _\infty ^2 W(Q) \end{aligned}$$

and

$$\begin{aligned} \int _Q \left| \nabla e^{-t^2\widetilde{L}}f(x)\right| ^2 W(x) dx \lesssim \Vert \nabla f\Vert _\infty ^2 W(Q), \end{aligned}$$

where the implicit constant depends on n, \(\lambda \) and \([W]_{A_2}\).

Proof

The reader can check that the proof is the same as for Lemma 2.22, this time using the operators \(t\nabla e^{-t^2L}\) and \(t\nabla e^{-t^2\widetilde{L}}\). Indeed, the proof of the preceeding Lemma used only the boundedness of the operator (Lemma 2.11), the off-diagonal estimates (Lemma 2.14), and the conservation property which allows us to subtract constants. In this case,

$$\begin{aligned} \left\| \nabla e^{-t^2L} f\right\| _{L^2_W(Q)}^2 = \sum _k \left\| \nabla e^{-t^2L} (f - [f]_{2Q_k})\right\| _{L^2_W(Q_k)}^2, \end{aligned}$$

and similarly for \(\widetilde{L}\), because \(e^{-t^2L} \textbf{1} = \textbf{1} =e^{-t^2\widetilde{L}}\textbf{1}\), as before. After this, the proof is the same as that of Lemma 2.22. \(\square \)

3 The Kato Problem for L

In this section our goal is to prove the following result:

Theorem 3.1

It holds

$$\begin{aligned} \left\| \sqrt{L} f\right\| _{L^2_W} \lesssim \Vert \nabla f\Vert _{L^2_W}, \end{aligned}$$

where the hidden constant depends on n, \(\lambda \) and \([W]_{A_2}\).

As noted above (see Remark 1.13), in the case that the coefficient matrix has small enough BMO norm, we could deduce Theorem 3.1 as an easy consequence of certain known results. Instead, following the easier part of our proof of Theorem 4.1 below, we shall give a self-contained, direct argument, which does not rely on an explicit assumption of smallness in BMO, but only on the validity of (1.4), and the assumption that \(W\in A_2\).

To prove the theorem, let us use the representation of the square root operator via the Functional Calculus formula

$$\begin{aligned} \sqrt{L} f = a \int _0^\infty t^3L^2e^{-2t^2L}f \frac{dt}{t}, \end{aligned}$$

where \(a = (\int _0^\infty t^3e^{-2t^2} \frac{dt}{t})^{-1} = \sqrt{\frac{128}{\pi }}\) is just a normalizing constant, to estimate, using duality and later Cauchy–Schwarz,

$$\begin{aligned} \left| \langle \sqrt{L}f,g\rangle _{L^2_W}\right| ^2 \le a^2 \left( \int _0^\infty \left\| tLe^{-t^2L}f(x)\right\| ^2_{L^2_W} \frac{dt}{t} \right) \left( \int _0^\infty \left\| t^2\widetilde{L} e^{-t^2\widetilde{L}}g(x)\right\| ^2_{L^2_W} \frac{dt}{t} \right) . \end{aligned}$$

With this decomposition, we will finish the proof of Theorem 3.1 by duality once we prove the following Lemmas 3.2 and 3.5.

The desired bound for the second factor is the following.

Lemma 3.5

It holds

$$\begin{aligned} \int _0^\infty \int _{\mathbb {R}^n} \left| t^2 \widetilde{L} e^{-t^2\widetilde{L}}g(x)\right| ^2 W(x) dx \frac{dt}{t} \lesssim \Vert g\Vert _{L^2_W}^2, \end{aligned}$$

where the implicit constant depends only on n, \(\lambda \) and the doubling constant for W.

We could obtain the conclusion of the lemma by invoking the abstract McIntosh and Yagi theorem [41, 45], but instead, we will give a self-contained and more elementary proof, using quasi-orthogonality arguments.

Proof

We abbreviate \(V_t := t^2 L e^{-t^2L}\). Its adjoint within \(L^2_W\) is \(\widetilde{V}_t := t^2 \widetilde{L} e^{-t^2\widetilde{L}}\). To make the argument rigorous, given a small positive \(\varepsilon \), we set \(V_t \equiv 0 \equiv \widetilde{V}_t\) whenever \(t\le \varepsilon \) or \(t\ge 1/\varepsilon \), and we obtain quantitative bounds that are uniform in \(\varepsilon \). We compute, using duality, Fubini and Cauchy–Schwarz,

$$\begin{aligned}{} & {} \int _0^\infty \int _{\mathbb {R}^n} \left| \widetilde{V}_t g(x)\right| ^2 W(x) dx \frac{dt}{t} = \int _0^\infty \int _{\mathbb {R}^n} \widetilde{V}_t g(x) \widetilde{V}_t g(x) W(x) dx \frac{dt}{t}\nonumber \\{} & {} = \int _0^\infty \int _{\mathbb {R}^n} g(x) V_t \widetilde{V}_t g(x) W(x) dx \frac{dt}{t} = \int _{\mathbb {R}^n} \int _0^\infty V_t \widetilde{V}_t g(x) \frac{dt}{t} g(x) W(x) dx \nonumber \\{} & {} \le \Vert g\Vert _{L^2_W} \left\| \int _0^\infty V_t \widetilde{V}_t g \frac{dt}{t}\right\| _{L^2_W}. \end{aligned}$$
(3.3)

To deal with the second term, let us first establish a useful fact:

Claim 3.4

It holds, for any \(t, s > 0\),

$$\begin{aligned} \Vert \widetilde{V}_t V_s\Vert _{L^2_W \rightarrow L^2_W} \lesssim \min \left\{ \frac{s}{t}, \frac{t}{s} \right\} . \end{aligned}$$

Proof of the claim We may assume that \(\varepsilon< s,t <1/\varepsilon \). If \(s \le t\), we can compute using (1.15) and the uniform bounds from Lemma 2.11,

$$\begin{aligned} \Vert \widetilde{V}_t V_s\Vert= & {} \left\| t^2 s^2 \widetilde{L} e^{-t^2\widetilde{L}} L e^{-s^2L}\right\| \lesssim \left\| t^2 s^2 \widetilde{L} e^{-t^2\widetilde{L}} \widetilde{L} e^{-s^2L}\right\| + \left\| t^2 s^2 \widetilde{L} e^{-t^2\widetilde{L}} \textrm{div}\,A \nabla e^{-s^2L}\right\| \\\le & {} \frac{s^2}{t^2} \left\| t^4 \widetilde{L}^2 e^{-t^2\widetilde{L}}\right\| \left\| e^{-s^2L}\right\| + \frac{s}{t} \left\| t^3 \widetilde{L} e^{-t^2\widetilde{L}} \textrm{div}\right\| \Vert A\Vert \left\| s \nabla e^{-s^2L}\right\| \lesssim \frac{s^2}{t^2} + \frac{s}{t} \lesssim \frac{s}{t}. \end{aligned}$$

In a similar fashion we can compute, when \(s > t\), using again (1.15) and Lemma 2.11,

$$\begin{aligned} \Vert \widetilde{V}_t V_s\Vert= & {} \left\| t^2 s^2 e^{-t^2\widetilde{L}} \widetilde{L} L e^{-s^2L}\right\| \lesssim \left\| t^2 s^2 e^{-t^2\widetilde{L}} L L e^{-s^2L}\right\| + \left\| t^2 s^2 e^{-t^2\widetilde{L}} \textrm{div}\,A \nabla L e^{-s^2L}\right\| \\\le & {} \frac{t^2}{s^2} \left\| e^{-t^2\widetilde{L}}\right\| \left\| s^4 L^2 e^{-s^2L}\right\| + \frac{t}{s} \left\| t e^{-t^2\widetilde{L}} \textrm{div}\right\| \Vert A\Vert \left\| s^3 \nabla L e^{-s^2L}\right\| \lesssim \frac{t^2}{s^2} + \frac{t}{s}\lesssim \frac{t}{s}. \end{aligned}$$

\(\square \)

With this almost-orthogonality result, we can estimate the last term in (3.3) as follows:

$$\begin{aligned} \left\| \int _0^\infty V_t \widetilde{V}_t g \frac{dt}{t}\right\| _{L^2_W}^2= & {} \int _{\mathbb {R}^n} \left( \int _0^\infty V_t \widetilde{V}_t g(x) \frac{dt}{t}\right) \left( \int _0^\infty V_s \widetilde{V}_s g(x) \frac{ds}{s}\right) W(x) dx\\= & {} \int _0^\infty \int _0^\infty \int _{\mathbb {R}^n} V_t \widetilde{V}_t g(x) V_s \widetilde{V}_s g(x) W(x) dx \frac{dt}{t} \frac{ds}{s}\\= & {} \int _0^\infty \int _0^\infty \int _{\mathbb {R}^n} \widetilde{V}_t g(x) \widetilde{V}_t V_s \widetilde{V}_s g(x) W(x) dx \frac{dt}{t} \frac{ds}{s}\\\le & {} \int _0^\infty \int _0^\infty \left( \int _{\mathbb {R}^n} |\widetilde{V}_t g|^2 W dx \right) ^{1/2} \left( \int _{\mathbb {R}^n} |\widetilde{V}_t V_s \widetilde{V}_s g|^2 Wdx \right) ^{1/2} \frac{dt}{t} \frac{ds}{s}\\\le & {} \int _0^\infty \!\!\! \int _0^\infty \!\!\min \left\{ \frac{s}{t}, \frac{t}{s} \right\} \!\! \left( \int _{\mathbb {R}^n} |\widetilde{V}_t g|^2 W dx \right) ^{1/2}\!\!\! \left( \int _{\mathbb {R}^n} |\widetilde{V}_s g|^2 Wdx \right) ^{1/2} \frac{dt}{t} \frac{ds}{s}\\\lesssim & {} \int _0^\infty \int _0^\infty \min \left\{ \frac{s}{t}, \frac{t}{s} \right\} \int _{\mathbb {R}^n} |\widetilde{V}_t g(x)|^2 W(x) dx \frac{dt}{t} \frac{ds}{s}\\\lesssim & {} \int _0^\infty \int _{\mathbb {R}^n} |\widetilde{V}_t g(x)|^2 W(x) dx \frac{dt}{t}. \end{aligned}$$

Plugging this estimate into (3.3), we have

$$\begin{aligned} \int _0^\infty \int _{\mathbb {R}^n} |\widetilde{V}_t g(x)|^2 W(x) dx \frac{dt}{t} \lesssim \Vert g\Vert _{L^2_W} \left( \int _0^\infty \int _{\mathbb {R}^n} \Vert \widetilde{V}_t g(x)|^2 W(x) dx \frac{dt}{t} \right) ^{1/2}, \end{aligned}$$

from which the result readily follows (recall that we have effectively truncated so \(\varepsilon< t <1/\varepsilon \), hence the integrals are finite).\(\square \)

Let us turn our attention to the other square function estimate.

Lemma 3.7

It holds

$$\begin{aligned} \int _0^\infty \int _{\mathbb {R}^n} \left| tLe^{-t^2L}f(x)\right| ^2 W(x) dx \frac{dt}{t} \lesssim \Vert \nabla f\Vert _{L^2_W}^2, \end{aligned}$$

where the implicit constant depends on n, \(\lambda \) and \([W]_{A_2}\).

We will devote the rest of the section to the proof of Lemma 3.5. As above, set \(V_t := t^2Le^{-t^2L}\) and decompose, with the help of an approximate identity \(P_t\),

$$\begin{aligned} tLe^{-t^2L} = t^{-1}V_t(I-P_t) + t^{-1}V_tP_t =: R_t + T_t. \end{aligned}$$
(3.6)

The proof of Lemma 3.5, and hence of Theorem 3.1, will come immediately from the next two lemmas.

Lemma 3.8

With the notations of (3.6), we have

$$\begin{aligned} \int _0^\infty \Vert R_t f\Vert _{L^2_W}^2 \frac{dt}{t} \lesssim \Vert \nabla f\Vert _{L^2_W}^2, \end{aligned}$$

where the implicit constant depends on n, \(\lambda \) and \([W]_{A_2}\).

Proof

Choose the approximate identity \(P_t := e^{-t^2(-\varDelta )}\). With it, we can compute, using the Fundamental Theorem of Calculus,

$$\begin{aligned} R_t f= & {} t^{-1}V_t (I-P_t) f = V_t \left( \frac{1}{t} (I-P_t) f \right) = V_t \left( - \frac{1}{t} \int _0^t \frac{\partial }{\partial s} P_s f ds \right) \\= & {} -\!2V_t \left( \frac{1}{t} \int _0^t s P_s \varDelta f ds \right) \!=\! -\!2V_t \left( \frac{1}{t} \int _0^t s P_s \; \text {div}\; \nabla f ds \right) \!=: \! 2V_t \left( \frac{1}{t} \int _0^t \! \textbf{Q}_s \nabla f ds \right) \! . \end{aligned}$$

Now, using the boundedness on \(L^2_W\) of \(V_t = t^2Le^{-t^2L}\) (see Lemma 2.11), Hardy’s inequality and the fact that \(\textbf{Q}_s\) satisfies the square function estimate of Lemma 2.8 (see Remark 2.9), we obtain the desired estimate:

$$\begin{aligned} \int _0^\infty \Vert R_t f\Vert _{L^2_W}^2 \frac{dt}{t}= & {} \int _0^\infty \int _{\mathbb {R}^n} \left| 2V_t \left( \frac{1}{t} \int _0^t \textbf{Q}_s \nabla f ds \right) (x)\right| ^2 W(x) dx \frac{dt}{t}\\\lesssim & {} \int _0^\infty \int _{\mathbb {R}^n} \left| \frac{1}{t} \int _0^t \textbf{Q}_s \nabla f(x) ds\right| ^2 W(x) dx \frac{dt}{t}\\\lesssim & {} \int _0^\infty \int _{\mathbb {R}^n} |\textbf{Q}_t \nabla f(x)|^2 W(x) dx \frac{dt}{t} \lesssim \Vert \nabla f\Vert _{L^2_W}^2. \end{aligned}$$

\(\square \)

Lemma 4.2

With the notations of (3.6), we have

$$\begin{aligned} \int _0^\infty \Vert T_t f\Vert _{L^2_W}^2 \frac{dt}{t} \lesssim \Vert \nabla f\Vert _{L^2_W}^2, \end{aligned}$$

where the hidden constant depends on n, \(\lambda \) and \([W]_{A_2}\).

Proof

Simply compute, using the boundedness of \(e^{-t^2L}\) from Lemma 2.11, and the square function bounds of Lemma 2.8 (see Remark 2.9),

$$\begin{aligned} \int _0^\infty \Vert T_t f\Vert _{L^2_W}^2 \frac{dt}{t}\lesssim & {} \int _0^\infty \Vert t L P_t f\Vert _{L^2_W}^2 \frac{dt}{t} \lesssim \sum _{i, j = 1}^n \int _0^\infty \Vert t D_i D_j P_t f\Vert _{L^2_W}^2 \frac{dt}{t}\\= & {} \sum _{i, j = 1}^n \int _0^\infty \Vert t D_i P_t D_j f\Vert _{L^2_W}^2 \frac{dt}{t} =: \sum _{i, j = 1}^n \int _0^\infty \left\| Q_t^{(i)} D_j f\right\| _{L^2_W}^2 \frac{dt}{t} \\\lesssim & {} \sum _{j = 1}^n \Vert D_j f\Vert _{L^2_W}^2 = \Vert \nabla f\Vert _{L^2_W}^2. \end{aligned}$$

\(\square \)

4 The Kato Problem for \(\widetilde{L}\)

In this section our goal is to prove the following result, which is really the main result in this paper:

Theorem 4.1

It holds

$$\begin{aligned} \left\| \sqrt{\widetilde{L}} f\right\| _{L^2_W} \lesssim \Vert \nabla f\Vert _{L^2_W}, \end{aligned}$$

where the implicit constant depends on n, \(\lambda \) and \([W]_{A_2}\).

4.1 Reduction to a Quadratic Estimate

To prove Theorem 4.1, let us again use the representation of the square root operator via the formula

$$\begin{aligned} \sqrt{\widetilde{L}} f = a \int _0^\infty t^3\widetilde{L}^2e^{-2t^2\widetilde{L}}f \frac{dt}{t}, \end{aligned}$$

so that

$$\begin{aligned} \left| \langle \sqrt{\widetilde{L}}f,g\rangle _{L^2_W}\right| ^2 \le a^2 \left( \int _0^\infty \left\| t\widetilde{L} e^{-t^2\widetilde{L}}f(x)\right\| ^2_{L^2_W} \frac{dt}{t} \right) \left( \int _0^\infty \left\| t^2L e^{-t^2L}g(x)\right\| ^2_{L^2_W} \frac{dt}{t} \right) . \end{aligned}$$

Theorem 4.1 then follows immediately from Lemma 4.2 and Theorem 4.3 below.

We estimate the second square function via the following lemma.

Lemma 4.5

It holds

$$\begin{aligned} \int _0^\infty \int _{\mathbb {R}^n} \left| t^2 L e^{-t^2L}g(x)\right| ^2 W(x) dx \frac{dt}{t} \lesssim \Vert g\Vert _{L^2_W}^2, \end{aligned}$$

where the implicit constant depends on n, \(\lambda \) and \([W]_{A_\infty }\).

Proof

The lemma can be proved either by invoking the McIntosh and Yagi theorem [41, 45], or via a self-contained elementary proof using quasi-orthogonality. For the latter path, the proof follows that of Lemma 3.2mutatis mutandis, simply reversing the roles of \(V_t\) and \(\widetilde{V}_t\). We omit the details. \(\square \)

Let us now turn the attention to the other square function estimate, which is in fact the core of this paper.

Theorem 4.3

It holds

$$\begin{aligned} \int _0^\infty \int _{\mathbb {R}^n} \left| t\widetilde{L} e^{-t^2\widetilde{L}}f(x)\right| ^2 W(x) dx \frac{dt}{t} \lesssim \Vert \nabla f\Vert _{L^2_W}^2, \end{aligned}$$

where the implicit constant depends on n, \(\lambda \) and \([W]_{A_2}\).

The rest of this section is devoted to the proof of Theorem 4.3. We start by splitting

$$\begin{aligned} t\widetilde{L} e^{-t^2\widetilde{L}} = t\widetilde{L} e^{-t^2\widetilde{L}}(I-P_t^2) + t\widetilde{L} e^{-t^2\widetilde{L}}P_t^2 =: \widetilde{R}_t + \widetilde{T}_t, \end{aligned}$$
(4.4)

where \(P_t\) is a nice approximate identity with a smooth compactly supported convolution kernel \(\varphi _t(x)= t^{-n} \varphi (x/t)\), which we take to be even. For future reference, let us record the following well-known observation:

$$ t \partial _t \big (\widehat{\varphi }(t\xi )\big )^2 = 2 (\nabla \widehat{\varphi })(t\xi ) \cdot t\xi \widehat{\varphi }(t\xi ) =c \widehat{(x\varphi (x))}(t\xi ) \cdot \widehat{(\nabla \varphi )}(t\xi ) =: c \widehat{\psi ^{(1)}}(t\xi ) \widehat{\psi ^{(2)}}(t\xi ), $$

where c is a harmless constant, and \(\psi ^{(1)} (x):= x\varphi (x)\) and \(\psi ^{(2)}:= \nabla \varphi \) are both \(\mathscr {C}_c^\infty \) functions with mean value zero (here we are using that \(\varphi \) is even, in the case of \(\psi ^{(1)}\)). Hence,

$$\begin{aligned} t \partial _t P_t^2 = c Q_t^{(1)} Q_t^{(2)}, \end{aligned}$$

where \(Q_t^{(k)}\) is the convolution kernel with kernel \(\psi _t^{(k)}(x):=t^{-n}\psi ^{(k)}(x/t)\), \(k=1,2\), and therefore each of \(Q_t^{(1)}\), \(Q_t^{(2)}\) satisfies the square function bound of Lemma 2.8 (and each is bounded on \(L^2_W\), uniformly in t).

Lemma 4.8

With the notations of (4.4), we have

$$\begin{aligned} \int _0^\infty \Vert \widetilde{R}_t f\Vert _{L^2_W}^2 \frac{dt}{t} \lesssim \Vert \nabla f\Vert _{L^2_W}^2, \end{aligned}$$
(4.6)

where the implicit constant depends on n, \(\lambda \) and \([W]_{A_2}\).

Proof

Using the preceeding observations, we may follow the proof of Lemma 3.7, invoking Lemma 2.11 to obtain that \(t^2 \widetilde{L} e^{-t^2\widetilde{L}}\) is \(L^2_W\) bounded, to obtain (4.6). \(\square \)

Applying now (1.15) to \(u = P_t^2 f\) we obtain

$$\begin{aligned} \widetilde{T}_t f = t e^{-t^2\widetilde{L}} \widetilde{L} P_t^2 f = - t e^{-t^2\widetilde{L}} L P_t^2 f - 2 t e^{-t^2\widetilde{L}} \widetilde{\textrm{div}} (A \nabla (P_t^2 f)). \end{aligned}$$
(4.7)

Lemma 4.14

We have

$$\begin{aligned} \int _0^\infty \left\| t e^{-t^2\widetilde{L}} L P_t^2 f\right\| ^2_{L^2_W} \frac{dt}{t} \lesssim \Vert \nabla f\Vert _{L^2_W}^2, \end{aligned}$$

where the implicit constant depends on n, \(\lambda \) and \([W]_{A_2}\).

Proof

The proof is the same as that in Lemma 3.8 once we use that \(e^{-t^2\widetilde{L}} : L^2_W \rightarrow L^2_W\) is uniformly bounded by Lemma 2.11, and that \(P_t\) are uniformly bounded on \(L^2_W\) by Lemma 2.7. \(\square \)

Therefore, to finish the proof of Theorem 4.3 (and hence of Theorem 4.1), it remains to show

$$\begin{aligned} \int _0^\infty \left\| t e^{-t^2\widetilde{L}} \widetilde{\textrm{div}} (A \nabla (P_t^2 f))\right\| ^2_{L^2_W} \frac{dt}{t} \lesssim \Vert \nabla f\Vert _{L^2_W}^2. \end{aligned}$$
(4.9)

4.2 Reduction to a Carleson Measure Estimate

For \(\textbf{g} = (g_1,g_2,\dots ,g_n)\), write

$$\begin{aligned} \theta _t \textbf{g} := t e^{-t^2\widetilde{L}} \widetilde{\textrm{div}} (A \textbf{g})\left( = t e^{-t^2\widetilde{L}} \frac{1}{W} \textrm{div}(W A \textbf{g})\right) . \end{aligned}$$
(4.10)

With this notation, the remaining estimate (4.9) becomes

$$\begin{aligned} \int _0^\infty \Vert \theta _t \nabla (P_t^2 f)\Vert ^2_{L^2_W} \frac{dt}{t} \lesssim \Vert \nabla f\Vert _{L^2_W}^2. \end{aligned}$$
(4.11)

Let us also define the operator

$$\begin{aligned} \widetilde{\theta }_t \textbf{g} := t e^{-t^2\widetilde{L}} \left( \widetilde{\textrm{div}} (A \textbf{g}) - \frac{1}{2} \sum _{i, j = 1}^n a_{ij} D_i \textbf{g}_j \right) , \end{aligned}$$

so that, taking \(\textbf{g} = \nabla u\), and using (1.15),

$$\begin{aligned} \widetilde{\theta }_t \nabla u = -\frac{1}{2} te^{-t^2\widetilde{L}} \widetilde{L} u. \end{aligned}$$
(4.12)

It will be convenient to use both operators at different stages of the proof. Note that trivially, \(\widetilde{\theta }_t \textbf{e} = \theta _t \textbf{e}\), for any constant vector \(\textbf{e}\). In particular, if \(\mathbbm {1}\) denotes the \(n\times n\) identity matrix, then

$$\begin{aligned} \widetilde{\theta }_t \mathbbm {1} = \theta _t \mathbbm {1}, \end{aligned}$$
(4.13)

where we naturally define \(\widetilde{\theta }_t \mathbbm {1}= \theta _t \mathbbm {1}\) as a vector-valued function whose \(k^{th}\) entry is \(\widetilde{\theta }_t \textbf{e}^k = \theta _t \textbf{e}^k\), with \(\textbf{e}^k \) equal to the standard unit basis vector in the \(x_k\) direction.

To prove (4.9), as in the divergence form case treated in [8], we begin with a “T1” reduction.

Lemma 4.27

We have

$$\begin{aligned} \int _0^\infty \Vert \theta _t P_t^2 \textbf{g} - (\theta _t \mathbbm {1}) \cdot (P_t^2 \textbf{g})\Vert ^2_{L^2_W} \frac{dt}{t} \lesssim \Vert \textbf{g}\Vert _{L^2_W}^2, \end{aligned}$$

where the implicit constant depends on n, \(\lambda \) and \([W]_{A_2}\).

Proof

Write \(U_t \textbf{g} := \theta _t P_t^2 \textbf{g} - (\theta _t \mathbbm {1}) \cdot (P_t^2 \textbf{g})\). By Lemma 2.10, it suffices to show that

$$\begin{aligned} \Vert U_t\Vert _{L^2_W \rightarrow L^2_W} \lesssim 1 \end{aligned}$$

uniformly on t, and for some \(\alpha > 0\), and for any nice operator \(Q_s\) as in Lemma 2.8, with a compactly supported kernel,

$$\begin{aligned} \Vert U_t Q_s\Vert _{L^2_W \rightarrow L^2_W} \lesssim \min \left\{ \frac{s}{t}, \frac{t}{s} \right\} ^\alpha . \end{aligned}$$

These two estimates, and hence the conclusion of Lemma 4.14, will follow at once from the next claims and Lemma 2.7.

Claim 4.15

We have, uniformly on t,

$$\begin{aligned} \Vert \theta _t P_t\Vert _{L^2_W \rightarrow L^2_W} \lesssim 1, \end{aligned}$$

where the implicit constant depends on n, \(\lambda \) and \([W]_{A_2}\).

Proof of the claim Just compute, with the aid of Lemmas 2.7 and 2.11,

$$\begin{aligned} \Vert \theta _t P_t\Vert _{L^2_W \rightarrow L^2_W} \le \left\| t e^{-t^2\widetilde{L}} \widetilde{\textrm{div}}\right\| _{L^2_W \rightarrow L^2_W} \Vert A\Vert _\infty \Vert P_t\Vert _{L^2_W \rightarrow L^2_W} \lesssim 1. \end{aligned}$$

\(\square \)

Claim 4.16

We have, uniformly on t,

$$\begin{aligned} \Vert (\theta _t \mathbbm {1}) \cdot P_t\Vert _{L^2_W \rightarrow L^2_W} \lesssim 1, \end{aligned}$$

where the implicit constant depends on n, \(\lambda \) and \([W]_{A_2}\).

Proof of the claim This proof will follow that of [18, (4.10)]. Let us cover \(\mathbb {R}^n\) by cubes \(Q_k\) satisfying \(t/2 < \ell (Q_k) \le t\). In this way, we obtain

$$\begin{aligned} \int _{\mathbb {R}^n} |(\theta _t \mathbbm {1}) (x) \cdot P_t \textbf{g}(x)|^2 W(x) dx = \sum _k \int _{Q_k} |(\theta _t \mathbbm {1}) (x) \cdot P_t \textbf{g}(x)|^2 W(x) dx. \end{aligned}$$
(4.17)

We first establish an \(L^\infty \) bound for \(P_t \textbf{g}(x)\), in the cube \(Q_k\). Note that for \(x \in Q_k\) we have that \(P_t\textbf{g}(x) = P_t(\textbf{g} \textbf{1}_{3Q_k})(x)\) because \(t \le 2 \ell (Q_k)\) (and \(\textrm{supp}\,\varphi \subset B(0, 1)\)).

figure c

where we used the definition of an \(A_2\) weight in the last step (see Definition 1.2).

We next claim that

$$\begin{aligned} \frac{1}{W(Q_k)} \int _{Q_k} |(\theta _t \textbf{b}) (x)|^2 W(x) dx \lesssim |\textbf{b}|_\infty ^2. \end{aligned}$$
(4.18)

Taking this claim for granted momentarily, we obtain

figure d

where in the last two steps we used first (4.18), and then the bounded overlap property of the cubes \(3Q_k\).

It remains to verify (4.18). We dualize: choose \(\textbf{h}=(h_1,h_2,\dots ,h_n) \in L^2_W\), with \(\textrm{supp}\,\textbf{h} \subset Q_k\), and write

$$\begin{aligned}{} & {} \int _{\mathbb {R}^n} \theta _t \textbf{b} (x)\cdot \textbf{h}(x) W(x) dx = \int _{\mathbb {R}^n} te^{-t^2 \widetilde{L}} \widetilde{\textrm{div}} A \textbf{b} (x) \cdot \textbf{h}(x) W(x) dx \nonumber \\{} & {} \qquad = \int _{\mathbb {R}^n} A \textbf{b} (x) \cdot t \nabla e^{-t^2 L} \textbf{h}(x) W(x) dx \lesssim \Vert \textbf{b}\Vert _\infty \int _{\mathbb {R}^n} \left| t \nabla e^{-t^2 L} \textbf{h}(x)\right| W(x) dx\nonumber \\{} & {} \qquad \le \!\Vert \textbf{b}\Vert _\infty \left( \int _{2Q_k} \left| t \nabla e^{-t^2 L} \textbf{h}(x)\right| W(x) dx \!+ \sum _{j = 2}^\infty \int _{2^jQ_k \setminus 2^{j-1}Q_k} \left| t \nabla e^{-t^2 L} \textbf{h}(x)\right| W(x) dx\right) \nonumber \\{} & {} \qquad =: \Vert \textbf{b}\Vert _\infty \left( I^{(k)} + II^{(k)} \right) . \end{aligned}$$
(4.19)

For the first term we may simply compute, using Jensen’s inequality and the boundedness of \(t \nabla e^{-t^2L}\) from Lemma 2.11,

$$\begin{aligned} I^{(k)} \lesssim \left( W(Q_k) \int _{2Q_k} \left| t \nabla e^{-t^2 L} \textbf{h}(x)\right| ^2 W(x) dx \right) ^{1/2} \lesssim \sqrt{W(Q_k)} \Vert \textbf{h}\Vert _{L^2_W}. \end{aligned}$$

For the second term we use Jensen again, and later the off-diagonal estimates from Lemma 2.14 (taking advantage of \(\ell (Q_k) \approx t\)) to obtain

$$\begin{aligned} II^{(k)}\lesssim & {} \sum _{j = 2}^\infty \left( W(2^jQ_k) \int _{2^jQ_k \setminus 2^{j-1}Q_k} \left| t \nabla e^{-t^2 L} \textbf{h}(x)\right| ^2 W(x) dx \right) ^{1/2}\\\lesssim & {} \sum _{j = 2}^\infty \left( C_D^j W(Q_k) e^{-c4^j} \int _{Q_k} |\textbf{h}(x)|^2 W(x) dx \right) ^{1/2} \lesssim \sqrt{W(Q_k)} \Vert \textbf{h}\Vert _{L^2_W}. \end{aligned}$$

With the estimates for \(I^{(k)}\) and \(II^{(k)}\), we can substitute back in (4.19) and obtain

$$\begin{aligned} \int _{\mathbb {R}^n} \theta _t \textbf{b} (x) \cdot \textbf{h}(x) W(x) dx \lesssim \sqrt{W(Q_k)} \Vert \textbf{b}\Vert _\infty \Vert \textbf{h}\Vert _{L^2_W}, \end{aligned}$$

which after squaring gives (4.18) by duality, as desired. This completes the proof of Claim 4.16. \(\square \)

Claim 4.20

Suppose \(s\le t\). Then

$$\begin{aligned} \Vert U_t Q_s\Vert _{L^2_W \rightarrow L^2_W} \lesssim \frac{s}{t}. \end{aligned}$$

Proof of the claim Note that we have the pointwise estimate

$$\begin{aligned} |P_tQ_s f| \lesssim \frac{s}{t} \mathcal {M}^2 f, \end{aligned}$$
(4.21)

where \(\mathcal {M}^2 := \mathcal {M}\circ \mathcal M\) is the iterated Hardy–Littlewood maximal operator (with respect to Lebesgue measure). One may verify (4.21) by a standard argument using the size estimates and compact support of the kernels of \(P_t\) and \(Q_s\), along with the smoothness of the former, and the cancellation property of the latter. We omit the well-known details. Since \(\mathcal {M}\) is bounded on \(L^2_W\) (recall that \(W\in A_2\)), we find, with the aid of Claims 4.15 and 4.16, and (4.21):

$$\begin{aligned} \Vert U_t Q_s\Vert _{L^2_W \rightarrow L^2_W}\le & {} \Vert \theta _t P_t\Vert _{L^2_W \rightarrow L^2_W} \Vert P_t Q_s\Vert _{L^2_W \rightarrow L^2_W} + \Vert (\theta _t \mathbbm {1}) \cdot P_t\Vert _{L^2_W \rightarrow L^2_W} \Vert P_t Q_s\Vert _{L^2_W \rightarrow L^2_W} \\\lesssim & {} \Vert P_t Q_s\Vert _{L^2_W \rightarrow L^2_W} \lesssim \frac{s}{t} . \end{aligned}$$

\(\square \)

Claim 4.22

We have, uniformly on t,

$$\begin{aligned} \Vert U_t \textbf{g}\Vert _{L^2_W} \lesssim t \Vert \nabla \textbf{g}\Vert _{L^2_W}. \end{aligned}$$

Proof

The proof is inspired by [1, Lemma 3.5], and in fact is similar in spirit to that of Lemma 2.22, relying strongly in a decomposition in subcubes of the right size to use Poincaré’s inequality, and some boundedness and off-diagonal estimates. Nevertheless, let us show it in detail, because some parts of it will be reused later. Cover \(\mathbb {R}^n\) by a grid of non-overlapping dyadic cubes \(Q_k\) with sidelength \(t/2 < \ell (Q_k) \le t\). Using the easy fact that \(U_t \mathbbm {1} = 0\) we compute

$$\begin{aligned}{} & {} \Vert U_t \textbf{g}\Vert _{L^2_W}^2 = \sum _k \Vert U_t(\textbf{g} - [\textbf{g}]_{2Q_k})\Vert _{L^2_W(Q_k)}^2\nonumber \\{} & {} \lesssim \sum _k \Vert \theta _t P_t^2(\textbf{g} - [\textbf{g}]_{2Q_k})\Vert _{L^2_W(Q_k)}^2 + \sum _k \Vert (\theta _t \mathbbm {1}) \cdot P_t^2 (\textbf{g} - [\textbf{g}]_{2Q_k})\Vert _{L^2_W(Q_k)}^2 =: A + B. \end{aligned}$$
(4.23)

Let us first deal with A, denoting \(S_t := \theta _t P_t^2\) because we intend to reuse some computations later on. For each term in the series, simply using linearity and the triangle inequality

$$\begin{aligned} \Vert S_t (\textbf{g} - [\textbf{g}]_{2Q_k})\Vert _{L^2_W(Q_k)}\le & {} \Vert S_t ((\textbf{g} - [\textbf{g}]_{2Q_k})\textbf{1}_{2Q_k})\Vert _{L^2_W(Q_k)}\nonumber \\{} & {} + \sum _{j=1}^\infty \left\| S_t \left( \left( \textbf{g} - [\textbf{g}]_{2Q_k} \right) \textbf{1}_{2^{j+1}Q_k \setminus 2^jQ_k} \right) \right\| _{L^2_W(Q_k)}\nonumber \\=: & {} I^{(k)} + \sum _{j=1}^\infty II^{(k)}_j. \end{aligned}$$
(4.24)

Using the boundedness of \(S_t\) (in this case, this follows from Lemmas 2.7 and 2.11) and Poincaré’s inequality, we deduce

$$\begin{aligned} I^{(k)} \lesssim \Vert \textbf{g} - [\textbf{g}]_{2Q_k}\Vert _{L^2_W(2Q_k)} \lesssim \ell (Q_k) \Vert \nabla \textbf{g}\Vert _{L^2_W(2Q_k)} \le t \Vert \nabla \textbf{g}\Vert _{L^2_W(2Q_k)}. \end{aligned}$$
(4.25)

And for the other terms, we can use the off-diagonal estimates for \(S_t\) (in this case, this follows from Lemmas 2.14 and 2.21), and taking advantage of \(\ell (Q_k)\approx t\) and Poincaré, we obtain, similarly to the situation in Lemma 2.22,

$$\begin{aligned}{} & {} \sum _{j=1}^\infty II^{(k)}_j \lesssim e^{-c 4^j} \Vert \textbf{g} - [\textbf{g}]_{2Q_k}\Vert _{L^2_W(2^{j+1}Q_k)} \\{} & {} \lesssim \sum _{j=1}^\infty e^{-c 4^j} \left\| \textbf{g} \!-\! [\textbf{g}]_{2^{j+1}Q_k}\right\| _{L^2_W(2^{j+1}Q_k)} \!+\! \sum _{j=1}^\infty \sum _{i=1}^j e^{-c 4^j} C_D^{(j-i)/2} \left\| \textbf{g} \!-\! [\textbf{g}]_{2^{i+1}Q_k}\right\| _{L^2_W(2^{i+1}Q_k)}\\{} & {} \lesssim \sum _{j=1}^\infty e^{-\frac{c}{2} 4^j} 2^j \ell (Q_k) \Vert \nabla \textbf{g}\Vert _{L^2_W(2^{j+1}Q_k)} \lesssim t \sum _{j=1}^\infty e^{-\frac{c}{4} 4^j} \Vert \nabla \textbf{g}\Vert _{L^2_W(2^{j+1}Q_k)}. \end{aligned}$$

Thus, going back to (4.24) we obtain

$$\begin{aligned} \Vert S_t (\textbf{g} - [\textbf{g}]_{2Q_k})\Vert _{L^2_W(Q_k)} \lesssim t\Vert \nabla \textbf{g}\Vert _{L^2_W(2Q_k)} + t\sum _{j=1}^\infty e^{-\frac{c}{4} 4^j} \Vert \nabla \textbf{g}\Vert _{L^2_W(2^{j+1}Q_k)}, \end{aligned}$$

and hence

$$\begin{aligned} A \lesssim t^2 \sum _k \Vert \nabla \textbf{g}\Vert _{L^2_W(2Q_k)}^2 + t^2 \sum _k \left( \sum _{j=1}^\infty e^{-\frac{c}{4} 4^j} \Vert \nabla \textbf{g}\Vert _{L^2_W(2^{j+1}Q_k)} \right) ^2 =: t^2 (A_1 + A_2). \end{aligned}$$

By bounded overlap of the cubes \(2Q_k\) we easily get

$$\begin{aligned} A_1 \lesssim \Vert \nabla \textbf{g}\Vert _{L^2_W}^2. \end{aligned}$$

For the other term, we note that \(|x-y|\lesssim 2^j \ell (Q_k) \approx 2^j t\), whenever \(x\in Q_k\), and \(y\in 2^{j+1}Q_k\). We further note that \(W(Q_k) \approx W(B_t(x))\), for \(x\in Q_k\), and that for all \(x\in {\mathbb {R}^n}\),

$$ e^{-\frac{c}{4} 4^j} \int _{|x-y|\lesssim 2^j t} W(B_t(x))^{-1} W(x) dx \lesssim e^{-\frac{c}{8} 4^j}, $$

by the doubling property of W. We now use these observations, along with Cauchy–Schwarz, Fubini’s theorem, and the fact that the cubes \(Q_k\) are non-overlapping, to obtain

$$\begin{aligned} A_2\le & {} \sum _k \left( \sum _{j=1}^\infty e^{-\frac{c}{4} 4^j} \right) \left( \sum _{j=1}^\infty e^{-\frac{c}{4} 4^j} \Vert \nabla \textbf{g}\Vert _{L^2_W(2^{j+1}Q_k)}^2 \right) = \sum _k \sum _{j=1}^\infty e^{-\frac{c}{4} 4^j} \Vert \nabla \textbf{g}\Vert _{L^2_W(2^{j+1}Q_k)}^2 \\\lesssim & {} \sum _k \sum _{j=1}^\infty e^{-\frac{c}{4} 4^j} \int _{Q_k} W(B_t(x))^{-1}W(x) \int _{|x-y|\lesssim 2^j t} |\nabla \textbf{g}(y)|^2 W(y)\, dy dx \\= & {} \sum _{j=1}^\infty e^{-\frac{c}{4} 4^j} \int _{\mathbb {R}^n} W(B_t(x))^{-1}W(x) \int _{|x-y|\lesssim 2^j t} |\nabla \textbf{g}(y)|^2 W(y)\, dy dx \\\lesssim & {} \sum _{j=1}^\infty e^{-\frac{c}{8} 4^j} \int _{\mathbb {R}^n} |\nabla \textbf{g}(y)|^2 W(y)\, dy \lesssim \Vert \nabla \textbf{g}\Vert _{L^2_W}^2. \end{aligned}$$

Consequently, we have shown that

$$\begin{aligned} A \lesssim t^2 \Vert \nabla \textbf{g}\Vert _{L^2_W}^2. \end{aligned}$$

We can apply a similar, but simpler argument to handle term B in (4.23). We now set \(S_t := (\theta _t \mathbbm {1}) \cdot P_t^2\), and note that \(S_t\) is uniformly bounded on \(L^2_W\), by Claim 4.16 and Lemma 2.7. Moreover, the kernel of \(P^2_t\) is compactly supported in the ball of radius 2t, so the same is true for \(S_t\). Hence, for the current version of \(S_t\), we obtain a simplified variant of (4.24), in which only the term \(I^{(k)}\) appears, enjoying the same bound as in (4.25). Thus,

$$\begin{aligned} B \lesssim t^2 \Vert \nabla \textbf{g}\Vert _{L^2_W}^2. \end{aligned}$$

The proof of Claim 4.22 is now complete. \(\square \)

Claim 4.26

For \(t\le s\), we have

$$\begin{aligned} \Vert U_t Q_s\Vert _{L^2_W \rightarrow L^2_W} \lesssim \frac{t}{s}. \end{aligned}$$

Proof of the claim Using Claim 4.22 and Lemma 2.7, we have

$$\begin{aligned} \Vert U_t Q_s \textbf{g}\Vert _{L^2_W} \lesssim t \Vert \nabla Q_s \textbf{g}\Vert _{L^2_W \rightarrow L^2_W} = \frac{t}{s} \Vert s\nabla Q_s\textbf{g}\Vert _{L^2_W} \lesssim \frac{t}{s} \Vert \textbf{g}\Vert _{L^2_W}, \end{aligned}$$

as desired. \(\square \)

As noted above, the preceding claims conclude the proof of Lemma 4.14. \(\square \)

We are now ready to reduce matters to a Carleson measure estimate. Recall that to prove Theorem 4.3 (and hence Theorem 4.1), it suffices to verify estimate (4.11) (equivalently, (4.9)).

Lemma 4.29

Theorem 4.3 (and hence Theorem 4.1) follows from the Carleson measure estimate

$$\begin{aligned} \sup _Q \frac{1}{W(Q)} \int _0^{\ell (Q)} \int _Q \left| \widetilde{\theta }_t \mathbbm {1}(x)\right| ^2 W(x) \frac{dxdt}{t} < \infty . \end{aligned}$$
(4.28)

Proof

Recalling that \(\widetilde{\theta }_t \mathbbm {1} = \theta _t \mathbbm {1}\), we see that by Lemma 4.14 and a weighted version of Carleson’s embedding inequality (see [18, Lemma 2.2]), the estimate (4.28) implies (4.11). \(\square \)

Our goal then, is to prove (4.28). To this end, let us first establish a few more estimates to be used in the sequel. We define the dyadic averaging operator by

figure e

where \(Q_{x, t}\) is the half-open dyadic cube containing x for which \(t/2 < \ell (Q_{x, t}) \le t\).

Lemma 4.39

We have

$$\begin{aligned} \int _0^\infty \Vert (\theta _t \mathbbm {1}) \cdot (P_t^2 - A_t) \textbf{g}\Vert ^2_{L^2_W} \frac{dt}{t} \lesssim \Vert \textbf{g}\Vert _{L^2_W}^2, \end{aligned}$$

where the implicit constant depends on n, \(\lambda \) and \([W]_{A_2}\).

Proof

The proof of this estimate will be very similar to that of Lemma 4.14. We set

$$ \widetilde{U}_t := (\theta _t \mathbbm {1}) \cdot (P_t^2 - A_t), $$

and note that it is enough to show that \(\widetilde{U}_t\) satisfies the hypotheses of the weighted Littlewood–Paley almost-orthogonality result Lemma 2.10. The uniform boundedness of \(\widetilde{U}_t\) arises immediately from that of \((\theta _t \mathbbm {1}) \cdot P_t^2\) (see Claim 4.16 and Lemma 2.7), along with the following result:

Claim 4.30

We have, uniformly on t,

$$\begin{aligned} \Vert (\theta _t \mathbbm {1}) \cdot A_t\Vert _{L^2_W \rightarrow L^2_W} \lesssim 1. \end{aligned}$$

Proof of the claim The proof is the same as that of Claim 4.16, which treated \((\theta _t \mathbbm {1}) \cdot P_t\). Indeed, the only properties of \(P_t\) that were used in that argument were the size and support condition of its kernel. The kernel of \(A_t\) enjoys similar properties, in fact

figure f

hence, the same proof may be repeated. \(\square \)

To prove the quasi-orthogonality with the \(Q_s\) operators, the next result will be useful.

Claim 4.31

We have, uniformly on t,

$$\begin{aligned} \Vert \widetilde{U}_t \textbf{g}\Vert _{L^2_W} \lesssim t \Vert \nabla \textbf{g}\Vert _{L^2_W}. \end{aligned}$$

Proof

The proof is similar to that of Claim 4.22, but simpler: now one has to deal only with terms like “B” associated to \(S_t = (\theta _t \mathbbm {1}) \cdot A_t\) in (4.23), so that there is no “tail” as in (4.24), but rather only a local term analogous to \(I^{(k)}\). We omit the routine details. \(\square \)

The following two claims finish the proof of Lemma 4.29, and are analogous to those in the proof of Lemma 4.14.

Claim 4.32

We have, uniformly for \(t\le s\),

$$\begin{aligned} \Vert \widetilde{U}_t Q_s\Vert _{L^2_W \rightarrow L^2_W} \lesssim \frac{t}{s}. \end{aligned}$$

Proof

In view of Claim 4.31, repeating the proof of Claim 4.26, we simply write

$$\begin{aligned} \Vert \widetilde{U}_t Q_s\Vert _{L^2_W \rightarrow L^2_W} \lesssim t \Vert \nabla Q_s\Vert _{L^2_W \rightarrow L^2_W} \lesssim \frac{t}{s}. \end{aligned}$$

\(\square \)

Claim 4.33

We have, uniformly in \(s\le t\), and for some fixed \(\alpha > 0\),

$$\begin{aligned} \Vert \widetilde{U}_t Q_s\Vert _{L^2_W \rightarrow L^2_W} \lesssim \left( \frac{s}{t} \right) ^\alpha . \end{aligned}$$

Proof

On the one hand, as in Claim 4.20, and using the boundedness of Claim 4.16,

$$\begin{aligned} \Vert (\theta _t \mathbbm {1}) \cdot P_t^2 Q_s\Vert _{L^2_W \rightarrow L^2_W} \lesssim \Vert P_t Q_s\Vert _{L^2_W \rightarrow L^2_W} \lesssim \frac{s}{t}. \end{aligned}$$

On the other hand, by [8, Lemma 4.7 and its proof], we have the unweighted quasi-orthogonality estimate

$$ \Vert A_t Q_s\Vert _{L^2 \rightarrow L^2} \lesssim \left( \frac{s}{t}\right) ^\alpha , $$

for some exponent \(\alpha >0\), uniformly for \(s\le t\). Consequently, we may use the technique of Duoandikoetxea and Rubio de Francia [21], in which one first self-improves the weight W, and then uses Stein–Weiss interpolation with change of measure [44], to deduce the weighted quasi-orthogonality estimate

$$ \Vert A_t Q_s\Vert _{L_W^2 \rightarrow L_W^2} \lesssim \left( \frac{s}{t}\right) ^{\beta }, $$

for some positive \(\beta < \alpha \) (see Lemma 2.5 in [18] for more details). Hence, by Claim 4.30,

$$\begin{aligned} \Vert (\theta _t \mathbbm {1}) \cdot A_t Q_s\Vert _{L^2_W \rightarrow L^2_W} = \Vert (\theta _t \mathbbm {1}) \cdot A_t^2 Q_s\Vert _{L^2_W \rightarrow L^2_W} \lesssim \Vert A_t Q_s\Vert _{L^2_W \rightarrow L^2_W} \lesssim \left( \frac{s}{t} \right) ^{\beta }. \end{aligned}$$

\(\square \)

Collecting all the above claims, the proof of Lemma 4.29 is completed. \(\square \)

Corollary 4.34

We have the square function bound

$$ \int _0^\infty \left\| \widetilde{\theta }_t \nabla f - (\widetilde{\theta }_t \mathbbm {1}) \cdot A_t \nabla f\right\| ^2_{L^2_W} \frac{dt}{t} \lesssim \Vert \nabla f\Vert _{L^2_W}^2, $$

where the implicit constant depends on n, \(\lambda \) and \([W]_{A_2}\).

Proof

With Lemma 4.29 in hand, since \(\widetilde{\theta }_t \mathbbm {1} = \theta _t \mathbbm {1}\), it is enough to prove the following:

$$\begin{aligned} \int _0^\infty \left\| \widetilde{\theta }_t \nabla f - (\theta _t \mathbbm {1}) \cdot P^2_t \nabla f\right\| ^2_{L^2_W} \frac{dt}{t} \lesssim \Vert \nabla f\Vert _{L^2_W}^2. \end{aligned}$$

To this end, we write

$$ \widetilde{\theta }_t \nabla f - (\theta _t \mathbbm {1}) \cdot P^2_t \nabla f = \widetilde{\theta }_t \nabla (I-P^2_t) f + \left[ \widetilde{\theta }_t \nabla P^2_t f - (\theta _t \mathbbm {1}) \cdot P^2_t \nabla f\right] =: Y_t f + Z_t f. $$

By (4.12),

$$ -2 Y_t = t\widetilde{L} e^{-t^2\widetilde{L}}(I-P_t^2) =:\widetilde{R}_t, $$

where \(\widetilde{R}_t\) is precisely the same operator defined in (4.4), enjoying the square function bound established in Lemma 4.5. In addition, again using (4.12),

$$ -2\widetilde{\theta }_t \nabla P^2_t = t\widetilde{L} e^{-t^2\widetilde{L}} P_t^2 =: \widetilde{T}_t, $$

where \(\widetilde{T}_t\) is precisely the same operator defined in (4.4). We now repeat the splitting of \(\widetilde{T}_t\), exactly as in (4.7):

$$ \widetilde{T}_t f= - t e^{-t^2\widetilde{L}} L P_t^2 f - 2 t e^{-t^2\widetilde{L}} \widetilde{\textrm{div}} (A \nabla (P_t^2 f)). $$

Note that the the second term equals \(-2 \theta _t \nabla P^2_t f\) (see (4.10)). Combining these observations, we see that

$$\begin{aligned} Z_tf ={} & {} -\frac{1}{2} \widetilde{T}_t f - (\theta _t \mathbbm {1}) \cdot P^2_t \nabla f = \frac{1}{2} t e^{-t^2\widetilde{L}} L P_t^2 f \\{} & {} + \left[ \theta _t P^2_t \nabla f - (\theta _t \mathbbm {1}) \cdot P^2_t \nabla f\right] =: E_tf + U_tf, \end{aligned}$$

where the term \(E_tf\) (which is actually the error \((\widetilde{\theta }_t -\theta _t)\nabla P_t^2 f\)), satisfies the desired square function bound, by Lemma 4.8. The last term also enjoys the desired square function bound, by Lemma 4.14. This concludes the proof of the corollary. \(\square \)

4.3 The T(b) Argument

Recall that our goal is to prove the Carleson measure estimate (4.28). We now turn to this task, which will finish the proof of Theorem 4.1 (and therefore also the proof of Theorem 1.11). Our arguments here will be an adaptation of the proof of the Kato conjecture in the divergence form setting, see [8], and in particular, the extension of that proof to the degenerate elliptic case in [18].

We note that by the doubling property of W, we may assume that the supremum in (4.28) is taken over dyadic cubes Q. Given any such cube Q, a sufficiently small number \(\varepsilon \in (0, 1)\) to be chosen, and \(v \in {\mathbb {R}^n}\) with \(|v| = 1\), we define

$$\begin{aligned} f_{Q, v}^\varepsilon := e^{-(\varepsilon \ell (Q))^2 \widetilde{L}} (\varvec{\varPhi }_Q \chi _Q \cdot v), \end{aligned}$$

where \({\varvec{\varPhi }}_Q(x) = x - x_Q\), \(x_Q\) denotes the center of Q, and \(\chi _Q \in \mathscr {C}_0^\infty \) is a cut-off function such that \(\chi _Q \equiv 1\) in 2Q, \(\textrm{supp}\,\chi _Q \subset 4Q\) and \(\Vert \chi _Q\Vert _\infty + \ell (Q) \Vert \nabla \chi _Q\Vert _\infty + \ell (Q)^2 \Vert \nabla ^2 \chi _Q\Vert _\infty \lesssim 1\). Clearly,

$$\begin{aligned} \Vert \nabla (\varvec{\varPhi }_Q \chi _Q \cdot v)\Vert _\infty \lesssim 1, \end{aligned}$$
(4.35)

and also

$$\begin{aligned} \int _{\mathbb {R}^n} |\varvec{\varPhi }_Q \chi _Q \cdot v|^2 W(x) dx \lesssim \ell (Q)^2 W(Q). \end{aligned}$$
(4.36)

The following estimates hold for \(f_{Q, v}^\varepsilon \), with constants that are uniform on Q, v and \(\varepsilon \):

$$\begin{aligned} \int _{5Q} |f_{Q, v}^\varepsilon - \varvec{\varPhi }_Q \chi _Q \cdot v|^2 W dx\lesssim & {} \varepsilon ^2 \ell (Q)^2 W(Q),\end{aligned}$$
(4.37)
$$\begin{aligned} \int _{5Q} |\nabla f_{Q, v}^\varepsilon |^2 W dx + \int _{5Q} \left| \nabla \left( f_{Q, v}^\varepsilon - \varvec{\varPhi }_Q \chi _Q \cdot v\right) \right| ^2 W dx\lesssim & {} W(Q). \end{aligned}$$
(4.38)

These estimates follow at once from (4.35), Lemmas 2.22 and 2.25 (with \(t=\varepsilon \ell (Q)\)), and the doubling property of W.

The proof of (4.28) (and hence of Theorem 4.1 by Corollary 4.27) follows from the next two lemmas.

Lemma 4.40

There exists \(0 < \varepsilon = \varepsilon (\lambda , n, [W]_{A_2}) \ll 1\) and a finite set V of unit vectors in \({\mathbb {R}^n}\), whose cardinality depends only on \(\varepsilon \) and n, such that

$$\begin{aligned}{} & {} \sup _Q \frac{1}{W(Q)} \int _0^{\ell (Q)} \int _Q \left| (\widetilde{\theta }_t \mathbbm {1}) (x)\right| ^2 W(x) \frac{dxdt}{t} \\{} & {} \lesssim \sum _{v \in V} \sup _Q \frac{1}{W(Q)} \int _0^{\ell (Q)} \int _Q \left| (\widetilde{\theta }_t \mathbbm {1}) (x) \cdot \left( A_t \nabla f_{Q, v}^\varepsilon \right) (x)\right| ^2 W(x) \frac{dxdt}{t}, \end{aligned}$$

where the implicit constant depends on n, \(\lambda \) and \([W]_{A_2}\).

Proof

The reader may check that the proof of [18, Lemma 5.1] (which in turn is an adaptation to the weighted case of [8, Lemma 5.4]) works perfectly well in our situation: as long as \(W \in A_2\) and \(f_{Q, v}^\varepsilon \) satisfies the estimates (4.37) and (4.38), the proof in [18] goes throughFootnote 5. \(\square \)

With Lemma 4.39 in hand, estimate (4.28) will follow immediately from the next lemma.

Lemma 1.3

For every cube Q and unit vector v, we have

$$\begin{aligned} \int _0^{\ell (Q)} \int _Q \left| (\widetilde{\theta }_t \mathbbm {1})(x) \cdot \left( A_t \nabla f_{Q, v}^\varepsilon \right) (x)\right| ^2 W(x) \frac{dxdt}{t} \lesssim W(Q), \end{aligned}$$

where the implicit constant depends on n, \(\lambda \), \([W]_{A_2}\) and \(\varepsilon \), but is uniform on Q and v.

Proof

Fix Q and v, and abbreviate \(f := f_{Q, v}^\varepsilon \). By Corollary 4.34, we have

$$\begin{aligned}{} & {} \int _0^{\ell (Q)} \int _Q \left| (\widetilde{\theta }_t \mathbbm {1})(x) \cdot \left( A_t \nabla f \right) (x)\right| ^2 W(x) \frac{dxdt}{t}\\{} & {} \lesssim \Vert \nabla f\Vert _{L^2_W}^2 + \int _0^{\ell (Q)} \int _Q \left| (\widetilde{\theta }_t \nabla f) (x)\right| ^2 W(x) \frac{dxdt}{t} =: I + II \lesssim W(Q) + II, \end{aligned}$$

where in the last step, we have used (4.38) to obtain the desired bound for term I.

Term II can be treated as follows, using (4.12), Lemma 2.11, and the definition of \(f =f_{Q,v}^{\varepsilon }\),

$$\begin{aligned} II\approx & {} \int _0^{\ell (Q)} \int _Q \left| t e^{-t^2 \widetilde{L}} \widetilde{L} f (x)\right| ^2 W(x) \frac{dxdt}{t}\\\lesssim & {} \int _0^{\ell (Q)} t dt \int _{\mathbb {R}^n} \left| \widetilde{L} e^{-(\varepsilon \ell (Q))^2\widetilde{L}} (\varvec{\varPhi }_Q \chi _Q \cdot v)(x)\right| ^2 W(x) dx\\\approx & {} \ell (Q)^{2} \int _{\mathbb {R}^n} \left| \widetilde{L} e^{-(\varepsilon \ell (Q))^2\widetilde{L}} (\varvec{\varPhi }_Q \chi _Q \cdot v)(x)\right| ^2 W(x) dx\\\lesssim & {} \varepsilon ^{-4} \ell (Q)^{-2}\int _{\mathbb {R}^n} |\varvec{\varPhi }_Q \chi _Q \cdot v|^2 W(x) dx \lesssim \varepsilon ^{-4} W(Q), \end{aligned}$$

where in the last two steps we have first used Lemma 2.11 (vi) with \(t =\varepsilon \ell (Q)\), and then (4.36). Since \(\varepsilon \) has been fixed depending only on allowable parameters, the dependence on \(\varepsilon \) is harmless.

Collecting all the preceeding estimates, we have finished the proof. \(\square \)