Abstract
In this note, we show that the squared Wasserstein distance can be expressed as the average over the sphere of one dimensional Wasserstein distances. We name this new expression for the Wasserstein Distance Radiant Formula. Using this formula, we are able to highlight new connections between the Wasserstein distances and the Sliced Wasserstein distance, an alternative Wasserstein-like distance that is cheaper to compute.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction and basic notation
In modern mathematical language, the p-th power of the Wasserstein distance between two probability measures over \({\mathbb {R}}^d\), namely \(\mu \) and \(\nu \), is defined as
where \(l_p^p(x,y)=\sum _{i=1}^d \mid x_i-y_i\mid ^p\) and \(\Pi (\mu ,\nu )\) is the set of measures over \({\mathbb {R}}^{d}\times {\mathbb {R}}^d\) whose marginals are \(\mu \) and \(\nu \), [1].
Due to its ability of capturing the weak topology of the space of probability measures, the family of \(W_p^p\) distances has found a natural home in many applied fields, such as Computer Vision [2, 3], generative models [4,5,6], and clustering [7, 8]. For this reason, much effort has been spent to find cheap ways to compute the value of \(W_p^p\) given two measures. When \(\mu \) and \(\nu \) are discrete measures, the minimization problem (1) can be cast as an Linear Programming (LP) problem. Due to the separability of the \(l_p^p\) cost functions, it is possible to lower the complexity of these LP problems, [9, 10]; however, for many applied tasks, this is yet not enough to make \(W_p^p\) an efficient alternative to other metrics. Therefore several cheap-to-compute alternatives to the Wasserstein distance have been proposed: some approaches rely on adding an entropy regularization term [11,12,13] to the objective of (1), while other approaches considers topological equivalent alternatives, like, for example, the Fourier Based metrics [14, 15]. Another successful alternative is the Sliced Wasserstein Distance (SWD) [16, 17]. The SWD computes the distance between two measures by comparing their projection on all possible affine 1-dimensional sub-spaces of \({\mathbb {R}}^d\). Since the Wasserstein distance between measures supported over a line can be computed through an explicit formula, the SWD can be computed without solving a minimization problem.
In this note, we propose a general methodology to relate the original Wasserstein distances to the SWDs. First, we show that, when both the probability measures are supported over \({\mathbb {R}}^2\), the \(W_2^2\) distance can be represented as follows
where \(\{\zeta ^{(\theta )}\}_{\theta \in [0,2\pi ]}\) is a suitable family of measures on \({\mathbb {R}}^2\) (see Theorem 2). We call this identity Radiant formula and use it to find equivalence bounds between the classic Wasserstein distances and their sliced counterparts. We then extend these results to the case \(p\ne 2\) and use the Knothe-Rosenblatt heuristic transportation plan [18, 19] to provide an upper bound on the absolute error between the SWD and \(W^p_p\). Finally, we extend our results to any \({\mathbb {R}}^d\), with \(d\ge 2\).
2 Our contribution
For the sake of clarity, we first introduce our results for measures supported over \({\mathbb {R}}^2\) and then extend our findings to the higher dimensional setting in a dedicated subsection. In what follows, we denote with \({\mathcal {P}}({\mathbb {R}}^d)\) the set of Borel probability measures over \({\mathbb {R}}^d\).
The Radiant formula
As a starting point of our discussion, we show that any \(W_p\) distance can be computed by summing the averages two one-dimensional Wasserstein distances between \(\mu \) and two suitable probability measures.
Proposition 1
Let \(\mu ,\nu \in {\mathcal {P}}({\mathbb {R}}^2)\) and \(p\ge 1\). Then, there exists a couple of measures \((\zeta ,\eta )\),Footnote 1 such that \(\zeta \in \Pi (\nu _1,\mu _2)\), \(\eta \in \Pi (\mu _1,\nu _2)\), and
where \(\lambda _i\) is the marginal of \(\lambda \) on the i-th coordinate and \(\lambda _{\mid x_i}\) is the conditional law of \(\lambda \) given \(x_i\).
Proof
Let \(\gamma \in {\mathcal {P}}({\mathbb {R}}^2\times {\mathbb {R}}^2)\) be an optimal transportation plan between \(\mu \) and \(\nu \). We then have that
where f is the marginal of \(\gamma \) on \((x_1,x_2,y_1)\) e and g is the marginal of \(\gamma \) over \((x_1,x_2,y_2)\). Finally, let \(\eta \) and \(\zeta \) be the marginals of \(\gamma \) over \((x_1,y_2)\) and \((y_1,x_2)\), respectively. Since \(\gamma \) is optimal, from [20], we have that
Similarly, we have that \(\int _{{\mathbb {R}}^2\times {\mathbb {R}}}\mid x_2-y_2 \mid ^p dg=\int _{{\mathbb {R}}}W_{p}^p(\mu _{\mid x_1},\eta _{\mid x_1})d\mu _1\), which concludes the proof. \(\square \)
Let us set \(V_\theta =\{v_\theta ,v_{\theta ^\perp }\}\), where \(v_{\theta }=(\cos (\theta ),\sin (\theta ))\) and \(v_{\theta ^\perp }=(-\sin (\theta ),\cos (\theta ))\). We notice that, for every \(\theta \in [0,2\pi ]\), \(V_\theta \) is the basis of \({\mathbb {R}}^2\) obtained by applying a \(\theta \)-counterclockwise rotation of the canonical base \(V=\{e_1,e_2\}\). In what follows, we denote with \((x^{(\theta )}_1,x^{(\theta )}_2)\) the coordinates of \({\mathbb {R}}^2\) with respect to the base \(V_\theta \). Moreover, we denote with \(\mu _1^{(\theta )}\) and \(\mu _2^{(\theta )}\) the marginals of \(\mu \) on \(x_1^{(\theta )}\) and \(x_2^{(\theta )}\), respectively. In this framework, given \(p\in [1,\infty )\), the Sliced Wasserstein Distance is defined as follows
Finally, we denote with \(R_\theta :{\mathbb {R}}^2\rightarrow {\mathbb {R}}^2\) the rotation that satisfies \(R_\theta (e_1)=v_\theta \) and \(R_\theta (e_2)=v_{\theta ^\perp }\) and with \(\mu ^{(\theta )}:=(R_\theta )_\#\mu \) the push-forward of \(\mu \) through \(R_\theta \).Footnote 2 Notice that, according to our notation, the marginal of \(\mu ^{(\theta )}\) over the first coordinat coincides with \(\mu ^{(\theta )}_1\).
Theorem 2
(The Radiant Formula) Let \(\mu ,\nu \in {\mathcal {P}}({\mathbb {R}}^2)\). Then there exists a family of measures \(\{\zeta ^{(\theta )}\}_{\theta \in [0,2\pi ]}\) such that, for every \(\theta \in [0,2\pi ]\), it holds \(\zeta ^{(\theta )}\in \Pi (\mu _2^{(\theta )},\nu _1^{(\theta )})\) and
where \(\zeta ^{(\theta )}_{\mid x^{(\theta )}_2}\) is the conditional law of \(\zeta ^{(\theta )}\) given \(x^{(\theta )}_2\). Moreover, if both \(\mu \) and \(\nu \) are absolutely continuous, the family \(\{\zeta ^{(\theta )}\}_{\theta \in [0,2\pi ]}\) is unique.
Proof
First, we notice that the \(W_2^2\) distance between two measures \(\mu \) and \(\nu \) is preserved if we apply a rotation to \({\mathbb {R}}^2\). Indeed, if \(\gamma \) is an optimal transportation plan between \(\mu \) and \(\nu \), the plan \((R_\theta ,R_\theta )_{\#}\gamma \) is optimal between \((R_\theta )_{\#}\mu \) and \((R_\theta )_{\#}\nu \). This is due to the fact that \(l_2^2(x,y)=l_2^2(R_\theta (x),R_\theta (y))\) for every \(x,y\in {\mathbb {R}}^2\) and for every \(\theta \in [0,2\pi ]\).
Given \(\theta \in [0,2\pi ]\), Proposition 1 gives us a couple of measures, namely \(\eta ^{(\theta )}\) and \(\zeta ^{(\theta )}\) such that
Since a \(\frac{\pi }{2}\)-counterclockwise rotation swaps the basis vectors in any \(V_\theta \) base (i.e. \(v_\theta \) and \(v_{\theta ^\perp }\)), we have that \(\zeta ^{(\theta +\frac{\pi }{2})}=\eta ^{(\theta )}\). Thus, we have
for each \(\theta \in (0,2\pi ]\). By substituting (7) in (6), we find
Since (8) holds true for every \(\theta \in [0,2\pi ]\), we can take the integral media and retrieve the radiant formula
Finally, the uniqueness result follows from the uniqueness of the transportation plan between absolutely continuous probability measures, [21]. \(\square \)
In Fig. 1, we give a visual example of the family \(\{\zeta ^{(\theta )}\}_{\theta \in [0,2\pi ]}\) for two Dirac’s deltas.
To prove Theorem 2, we made use of the rotation invariance property of \(W_2^2\). This property, however, does not hold for \(W_p^p\), which prevents us from expressing \(W_p^p\) using a radiant formula. However, we bypass this issue by defining a rotation-averaged version of the \(W_p\) distance as follows
We notice that \(RW_2(\mu ,\nu )=W_2(\mu ,\nu )\) for every \(\mu ,\nu \in {\mathcal {P}}({\mathbb {R}}^2)\).
Proposition 3
The function \(RW_p\) defined in (9) is a distance over \({\mathcal {P}}({\mathbb {R}}^2)\), and it is invariant under rotation of the coordinates. Furthermore, it holds
where \((\circ )_+\) is the positive part function, \(K_p:=2\int _0^{2\pi }\mid \cos (x)\mid ^p dx\) and
Thus, up to a constant, \(RW_p^p(\mu ,\nu )\) is dominated by \(W_p^p\) and \(W_{1,p}\). Moreover, the first upper bound is tight.
Proof
We divide the proof the proposition into three pieces.
\(RW_p\) is invariant under rotations It follows from the fact that \(RW_p\) is defined as the average of the costs with respect to all the possible choices of coordinates. Indeed, given \(\phi \in [0,2\pi ]\), let \(\mu ^{(\phi )}=(R_\phi )_\#\mu \) and \(\nu ^{(\phi )}=(R_\phi )_\#\nu \). Then, it holds \((\mu ^{(\phi )})^{(\theta )}=\mu ^{(\theta +\phi )}\); thus
where we used the change of variable \(\theta '=\theta +\phi \).
\(RW_p\) is a distance First, notice that \(RW_p\) is symmetric since \(W_p\) is symmetric. Similarly, if \(\mu =\nu \), we have that \(W_p^p(\mu ^{(\theta )},\nu ^{(\theta )})=0\) for every \(\theta \), thus \(RW_p^p(\mu ,\nu )=0\). Conversely, since \(W_p^p(\mu ,\nu )\ge 0\), we have that \(RW_p^p(\mu ,\nu )=0\) if and only if \(W_p^p(\mu ^{(\theta )},\nu ^{(\theta )})=0\) for almost every \(\theta \in [0,2\pi ]\), hence \(\mu =\nu \). To conclude, we prove the triangular inequality. Let \(\mu ,\nu \), and \(\zeta \) be elements of \({\mathcal {P}}({\mathbb {R}}^2)\). From the Minkowsky’s inequality [21], we have that
which concludes the second part of the proof.
\(RW_p\) is dominated by \(W_p\) and \(W_{1,p}\) Let us consider \(x,y\in {\mathbb {R}}^2\). Let \(\rho \) and \(\phi \) be the polar coordinates of \(x-y\), so that \(x-y=\rho (\cos (\phi ),\sin (\phi ))\). We then have
We thus infer
Let \(\gamma \in \Pi (\mu ,\nu )\) be an optimal transportation plan between \(\mu \) and \(\nu \) with respect to the p-th power of the Euclidean metric, that is \(d(x,y)=\mid \mid x-y \mid \mid ^p_2\), so that
Finally, we have
where
To conclude the proof, we recall the classic inequality
The tightness of the first inequality in (10) follows by considering two Dirac’s delta. \(\square \)
Since \(RW_p^p\) is a rotation invariant distance, we are able to express it through a radiant formula.
Theorem 4
Let \(p\ge 1\). Let \(\mu \) and \(\nu \) be two measures supported over \({\mathbb {R}}^2\).
Then, there exists a family of measures \(\{\zeta ^{(\theta )}\}_{\theta \in [0,\pi ]}\) such that
Proof
Given any \(\theta \in [0,2\pi ]\), from Proposition 1, we have that there exists a couple of measures \(\zeta ^{(\theta )}\) and \(\eta ^{(\theta )}\) such that
By taking the average over \(\theta \), we conclude the thesis. \(\square \)
Relation with the sliced Wasserstein distance
We now highlight how the Radiant Formula allows us to retrieve bounds on the Sliced Wasserstein distance in terms of the classic Wasserstein distance.
Theorem 5
Given two probability measures \(\mu ,\nu \in {\mathcal {P}}({\mathbb {R}}^2)\), we have
Moreover, the bound is tight. Similarly, it holds
Proof
It follows from the convexity of the \(W_2^2\) distance [21, Theorem 4.8]. Indeed, from the Radiant Formula (5) we know that
then, following the notation in [21], if we set \(\lambda =\mu ^{(\theta )}_2\), \(\mu =\mu _{\mid x_2^{(\theta )}}\) and \(\nu =\zeta _{\mid x_2^{(\theta )}}^{(\theta )}\), we conclude
where the equality (12) comes from the fact that each \(\zeta ^{(\theta )}\in \Pi (\nu ^{(\theta )}_1,\mu ^{(\theta )}_2)\). To prove the tightness, it suffice to consider the measures \(\mu =\delta _{(0,0)}\) and \(\nu =\delta _{(1,1)}\).
By the same argument, we infer the bound on \(SW_p^p\). \(\square \)
Finally, we use the Knothe-Rosenblatt transportation plan [19, 22] to bound the absolute error we commit by using the Sliced Wasserstein Distance over the Wasserstein Distance.
Theorem 6
Given \(\mu ,\nu \in {\mathcal {P}}({\mathbb {R}}^2)\), it holds
where \(\zeta _{KR}^{(\theta )}\) is the marginal of Knothe-Rosenblatt transportation plan on the coordinates \((x^{(\theta )}_2,y^{(\theta )}_1)\). Furthermore, the upper bound in (13) is tight.
Proof
Let \(V^{(\theta )}\) be a base of the space and let \(\gamma _{KR}^{(\theta )}\) and \(\zeta _{KR}^{(\theta )}\) be the Knothe-Rosenblatt transportation plan and its projection over \((x_2^{(\theta )},y_1^{(\theta )})\), respectively.Footnote 3 Then, by definition of the Knothe-Rosenblatt transportation plan, we have
where \(W^2_{KR}(\mu ^{(\theta )},\nu ^{(\theta )})\) is the cost of the Knothe-Rosenblatt transportation plan according to the squared Euclidean distance. Since for every \(\theta \in [0,2\pi ]\) it holds \(W_2^2(\mu _1^{(\theta )},\nu _1^{(\theta )})\le W_2^2(\mu ,\nu )\), we infer
for any \(\theta \in [0,2\pi )\). Finally, by taking the average over \([0,2\pi )\), we find
and, hence
which concludes the first part of the proof.
To prove the tightness, it suffice to consider the measures \(\mu =\delta _{(0,0)}\) and any measure \(\nu \). In this case, the Knothe-Rosenblatt transportation plan between \(\mu \) and \(\nu \) is optimal, thus the inequality in (15) is an equality for every \(\theta \in [0,2\pi ]\). \(\square \)
The extension to higher dimensional spaces
To conclude, we extend our results to the case in which the measures are supported over a higher dimensional space. We denote with d the dimension of the space, so that \(\mu ,\nu \in {\mathcal {P}}({\mathbb {R}}^d)\). Moreover, let \({\mathcal {R}}_d\) the set of all the rotations from \({\mathbb {R}}^d\) to \({\mathbb {R}}^d\). Given \(R\in {\mathcal {R}}_d\), let use define \(e^{(R)}_i=R(e_i)\), where \(e_i\) is the i-th vector in the canonical base of \({\mathbb {R}}^d\). It is easy to see that \(\{e_i^{(R)}\}_{i=1,\dots ,d}\) is an orthonormal base of \({\mathbb {R}}^d\), we use \(x^{(\theta )}_1,x_2^{(\theta )},\dots ,x^{(\theta )}_d\) to denote the coordinates of \({\mathbb {R}}^d\) with respect to the base \(\{e^{(R)}_i\}\). Finally, we set \(\rho \) to be the uniform probability distribution over \({\mathcal {R}}_d\), since the set of rotation is identifiable as a compact set of the orthogonal matrices, this measure is well-defined.
Given a couple of measures \(\mu \) and \(\nu \), the SWD is defined as follows
where \({\mathbb {S}}^{(d-1)}\) is the set of all directions in \({\mathbb {R}}^d\) and \(\mu _v\) is the marginal of \(\mu \) over the span of \(v\in {\mathbb {S}}^{(d-1)}\). In what follows, we consider the equivalent formulation
Indeed, given \(v,v'\in {\mathbb {S}}^{(d-1)}\), let us define \(T_v=\{R\in {\mathcal {R}}_d\;\;\text {s.t.}\;\; R(e_1)=v\}\) and, similarly, \(T_{v'}=\{R\in {\mathcal {R}}_d\;\;\text {s.t.}\;\; R(e_1)=v'\}\). Then, it holds
Infact, let \(S\in {\mathcal {R}}_d\) be such that \(S(v)=v'\), we then define \({\mathcal {S}}:T_v\rightarrow T_{v'}\) as follows
It is easy to see that \({\mathcal {S}}\) is a bijection. Moreover, since it is induced by a rotation, its determinant is equal to 1, this combined with the fact that \(\rho \) is a uniform distribution, allows us to retrieve (17). Thus, in the following, when we refer to the SWD, we will refer to the one defined in (17).
First, it is easy to see that Proposition 1 and its proof can be easily adapted to the \({\mathbb {R}}^d\) case. In particular, we have the following.
Corollary 1
For every \(p\ge 1\) and for every \(\mu ,\nu \in {\mathcal {P}}({\mathbb {R}}^d)\) there exists a family of d measures \(\{\zeta _i\}_{i=1,\dots ,d}\in {\mathcal {P}}({\mathbb {R}}^d)\) such that
and \(\mu _{-i}=(\zeta _{i})_{-i}\) for every \(i=1,\dots ,d\), where \(x_{-i}\in {\mathbb {R}}^{d-1}\) is the vector x without the i-th coordinate and \(\mu _{-i}\) is the marginal of the measure \(\mu \) on all the coordinates but the i-th one, i.e. on \((x_1,\dots ,x_{i-1},x_{i+1},\dots ,x_d)\).
Proof
It follows from the same argument used in the proof of Proposition 1.
Indeed, let \(\gamma \) be the optimal transportation plan between \(\mu \) and \(\nu \).
We then define \(\zeta _i\) as the projection of \(\gamma \) on the coordinates \((x_1,\dots ,x_{i-1},y_i,x_{i+1},\dots ,x_d)\) for every \(i\in \{1,2,\dots ,d\}\).
Then, using again the characterization showed in [20], we infer (18) and thus the thesis. \(\square \)
Using the characterization of Corollary 1, we are able to extend the Radiant Formula to the higher dimensional case. The same goes for the bounds presented in Theorems 5 and 6.
Theorem 7
Given any couple of measures \(\mu ,\nu \in {\mathcal {P}}({\mathbb {R}}^d)\), it holds
Furthermore, for every couple of measures \(\mu ,\nu \in {\mathcal {P}}({\mathbb {R}}^d)\), it holds \(\frac{1}{d}W_2^2(\mu ,\nu )\ge SW_2(\mu ,\nu )\) and
where \(\zeta ^{(R)}_{KR}\) is the marginal of the Knothe-Rosenblatt transportation plan over the coordinates \((y^{(R)}_1,x^{(R)}_2\dots ,x^{(R)}_d)\) and \(\nu _{(x_2^{(R)},\dots ,x_d^{(R)})}\) is the marginal of \(\nu \) over \((x_2^{(R)},\dots ,x_d^{(R)})\).
Both bounds (19) and (20) are tight.
Moreover the same bounds can be inferred for \(RW_p^p\).
Proof
For the sake of simplicity, we just show how to extend the proof of Theorem 2 to prove identity (19). Indeed, the proof of (20) follows by applying the same argument to the proof of Theorem 6. The same goes for the bounds on \(RW_p^p\).
For every \(R\in {\mathcal {R}}_d\), Corollary 1 allows us find a set of d measures \(\{\zeta _i^{(R)}\}_{i=1,\dots ,d}\) such that
By taking the average with respect to \(\rho \) of both sides of the equation we get
Let \(S_i\in {\mathcal {R}}_d\) be a rotation such that \(S_i(e_1)=e_i\) holds. Then we have \(e^{(R)}_i=R(S_i(e_1))\) and, therefore,
where the last equality comes from the fact that every \(S_i\) induces a bijective map from \({\mathcal {R}}_d\) to \({\mathcal {R}}_d\) whose determinant is equal to 1 and \(\rho \) is a uniform distribution.
Using convexity, we retrieve the bound \(\frac{1}{d}W_2^2(\mu ,\nu )\ge SW_2^2(\mu ,\nu )\). \(\square \)
Finally, we study a more general class of Sliced Wasserstein distances. Let \({\mathcal {H}}_k\) be the set of all the k-dimensional hyper-plans in \({\mathbb {R}}^d\). Given \(\mu ,\nu \in {\mathcal {P}}({\mathbb {R}}^d)\), we define the k-Sliced Wasserstein Distance as
where \(\mu _H\) (\(\nu _H\)) is the marginal of \(\mu \) (\(\nu \), respectively) over the hyper-plan \(H\in {\mathcal {H}}_k\) and \({\textbf{H}}\) is the uniform probability distribution over the space of k-dimensional hyper-plans. Roughly speaking, the k-Sliced Wasserstein Distance is a variant of SW that projects the two measures over all the k dimensional sub-spaces of \({\mathbb {R}}^d\) rather than on all the 1 dimensional sub-spaces. This class of metrics is a natural extension of the \({\mathcal {S}}_k\) metrics introduced in [23].
Theorem 8
Let \(\mu \) and \(\nu \) be two probability measures over \({\mathbb {R}}^d\) and let k be an integer such that \(k<d\).
Then, it holds
Moreover, the bound is tight.
Proof
Let us consider H, a k-dimensional hyper-plan.
Moreover, let \({\mathcal {T}}:{\mathcal {R}}_d\rightarrow {\mathcal {H}}_k\) be defined as
where \(H_0\in {\mathcal {H}}_k\) is the k-dimensional hyper-space generated by \(\{e_1,\dots ,e_k\}\), i.e.
Given \(H,H'\in {\mathcal {H}}_d\), let \(S\in {\mathcal {R}}_d\) be a rotation such that \(S(H)=H'\). hen, we have that for every \(R'\in {\mathcal {T}}^{-1}(H)\), the rotation \(R'':=S\circ R'\in {\mathcal {T}}^{-1}(H')\), indeed \(R''(H_0)=H'=S(H)=S(R'(H_0))\).In particular, S induces a bijective map between \({\mathcal {T}}^{-1}(H)\) and \({\mathcal {T}}^{-1}(H')\).
Again, since the maps is induced by a rotation and \({\textbf{H}}\) is a uniform distribution, we have that \(\rho ({\mathcal {T}}^{-1}(H))=\rho ({\mathcal {T}}^{-1}(H'))\). n particular, given a function \(f:{\mathcal {R}}_d\rightarrow {\mathbb {R}}\), it holds
Given \(R\in {\mathcal {R}}_d\), there exists a set of d measures in \({\mathbb {R}}^d\), namely \(\zeta ^{(R)}_i\) such that
For every \(i\in \{1,2,\dots ,d\}\), let \(S_i\in {\mathcal {R}}_d\) be such that \(S_i(e_1)=e_i\).By definition, given \(e^{(R)}_i\), it holds true that \(e_i^{(R)}=S_i(e_1)^{(R)}=R(S_i(e_1))=e^{(R\circ S_i)}_1\).
Thus, we get
where the last equality follows from the fact that, for every \(i\in \{1,2,\dots ,d\}\), the function \({\mathcal {S}}_i:R\rightarrow R\circ S_i\) is bijective and its Jacobian’s determinant is equal to 1.
Given \(R\in {\mathcal {R}}_d\) and \(\{e^{(R)}_1,\dots ,e^{(R)}_d\}\) its related base of \({\mathbb {R}}^d\), we have that
where \(c^{(R)}_{1:k}(x,y):=\sum _{i=1}^k(x^{(R)}_i-y^{(R)}_i)^2\) and \(c^{(R)}_{k:d}(x,y):=\sum _{i=k+1}^d(x^{(R)}_i-y^{(R)}_i)^2\). Again, by Corollary 1, we can then find a couple of measures \(\Psi \in \Pi (\mu ^{(R)}_{>k},\nu ^{(R)}_{\le k})\) and \(\Theta \in \Pi (\mu ^{(R)}_{\le k},\nu ^{(R)}_{> k})\) such that
where \(\mu ^{(R)}_{\le k}\) and \(\mu ^{(R)}_{> k}\) are the marginals of \(\mu \) over the first k coordinates and the last \((d-k)\) coordinates, respectively.
Since, for every \(R\in {\mathcal {R}}_d\) and for every \((x^{(R)}_{k+1},\dots ,x^{(R)}_{d})\), we have that \(c_{1:k}\) is also separable, we can further split the Wasserstein distance \(W_{c^{(R)}_{1:k}}^2(\mu _{\mid x^{(R)}_{k+1},\dots ,x^{(R)}_d},\Psi _{\mid x^{(R)}_{k+1},\dots ,x^{(R)}_d})\) and obtain
Thus, by convexity, we infer that
where \(H=R(H_0)\). If we take the average over all the possible rotations, we get
Using identity (23), we get
By the same argument used before, we can simplify the sum
Finally, we have that
which allows to conclude(22).
Lastly, the tightness property follows by considering two Dirac’s deltas. \(\square \)
3 Conclusion
In this note, we proposed an alternative representation of the Wasserstein distance and showcased how this formula has significant connections with a classic Sliced Wasserstein Distance and used them to prove bounds on Sliced Wasserstein Distances in term of the classic Wasserstein Distances. Furthermore, we used the Knothe-Rosenblatt heuristic to prove a bound over the absolute error.
The field of Sliced Wasserstein Distance keeps flourishing and producing different alternative versions [23,24,25]. We believe the Radiant Formula is a useful item to study the relationships between old Sliced-like distances and new ones.
Data availability
Data availability is not applicable to this article as no new data were created or analysed in this study.
Notes
These measures do depend on \(\mu \), \(\nu \), and p in general, however, to lighten up the notation, we drop these indexes.
We recall that \((R_\theta )_\#\mu \) is defined as \((R_\theta )_\#\mu (A)=\mu (R_\theta ^{-1}(A))\) for every Borel set \(A\subset {\mathbb {R}}^2\).
Notice that the Knothe-Rosenblatt plan does depend on the choice of \(V^{(\theta )}\).
References
Kantorovich, L.V.: On the translocation of masses. J. Math. Sci. 133(4), 1381–1382 (2006)
Solomon, J., De Goes, F., Peyré, G., Cuturi, M., Butscher, A., Nguyen, A., Du, T., Guibas, L.: Convolutional wasserstein distances: Efficient optimal transportation on geometric domains. ACM Trans. Gr. (ToG) 34(4), 1–11 (2015)
Gangbo, W., McCann, R.J.: Shape recognition via wasserstein distance. Q. Appl. Math. 705–737 (2000)
Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein generative adversarial networks. In: Proceedings of the 34th International Conference on Machine Learning, pp. 214–223 (2017)
Yu, J., Lin, Z., Yang, J., Shen, X., Lu, X., Huang, T.S.: Generative image inpainting with contextual attention. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5505–5514 (2018)
Pan, Z., Yu, W., Wang, B., Xie, H., Sheng, V.S., Lei, J., Kwong, S.: Loss functions of generative adversarial networks (gans): opportunities and challenges. IEEE Trans. Emerg. Top. Comput. Intell. 4(4), 500–522 (2020)
Chakraborty, S., Paul, D., Das, S.: Hierarchical clustering with optimal transport. Stat. & Probab. Lett. 163, 108781 (2020)
Auricchio, G., Bassetti, F., Gualandi, S., Veneroni, M.: Computing wasserstein barycenters via linear programming. In: Integration of Constraint Programming, Artificial Intelligence, and Operations Research: 16th International Conference, CPAIOR 2019, Thessaloniki, Greece, June 4–7, 2019, Proceedings 16, pp. 355–363. Springer (2019)
Ling, H., Okada, K.: An efficient earth mover’s distance algorithm for robust histogram comparison. IEEE trans. pattern anal. mach. intell. 29(5), 840–853 (2007)
Auricchio, G., Bassetti, F., Gualandi, S., Veneroni, M.: Computing kantorovich-wasserstein distances on \( d \)-dimensional histograms using \((d+ 1) \)-partite graphs. Adv. Neural Inf. Process. Syst. 31 (2018)
Altschuler, J., Niles-Weed, J., Rigollet, P.: Near-linear time approximation algorithms for optimal transport via sinkhorn iteration. Adv. Neural Inf. Process. Syst. 30 (2017)
Schmitzer, B.: Stabilized sparse scaling algorithms for entropy regularized transport problems. SIAM J. Sci. Comput. 41(3), 1443–1481 (2019)
Cuturi, M.: Sinkhorn distances: lightspeed computation of optimal transport. Adv. Neural Inf. Process. Syst. 26 (2013)
Auricchio, G., Codegoni, A., Gualandi, S., Toscani, G., Veneroni, M.: The equivalence of fourier-based and wasserstein metrics on imaging problems. Rend. Lincei 31(3), 627–649 (2020)
Auricchio, G., Codegoni, A., Gualandi, S., Zambon, L.: The fourier discrepancy function. Commun. Math. Sci. 21(3), 627–639 (2023)
Kolouri, S., Zou, Y., Rohde, G.K.: Sliced wasserstein kernels for probability distributions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5258–5267 (2016)
Bonneel, N., Coeurjolly, D.: Spot: sliced partial optimal transport. ACM Trans. Gr. (TOG) 38(4), 1–13 (2019)
Knothe, H.: Contributions to the theory of convex bodies. Michigan Math. J. 4(1), 39–52 (1957)
Rosenblatt, M.: Remarks on a multivariate transformation. Ann. Math. Statist. 23(3), 470–472 (1952). https://doi.org/10.1214/aoms/1177729394
Auricchio, G.: On the pythagorean structure of the optimal transport for separable cost functions. Rend. Lincei Mat. Appl. 34(4), 745–771 (2024)
Villani, C.: Optimal Transport: Old and New, vol. 338. Springer, New York (2008)
Knothe, H.: Contributions to the theory of convex bodies. Michigan Math. J. 4(1), 39–52 (1957). https://doi.org/10.1307/mmj/1028990175
Paty, F.-P., Cuturi, M.: Subspace robust wasserstein distances. In: International Conference on Machine Learning, pp. 5072–5081. PMLR (2019)
Deshpande, I., Hu, Y.-T., Sun, R., Pyrros, A., Siddiqui, N., Koyejo, S., Zhao, Z., Forsyth, D., Schwing, A.G.: Max-sliced wasserstein distance and its use for gans. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10648–10656 (2019)
Nguyen, K., Ho, N., Pham, T., Bui, H.: Distributional sliced-wasserstein and applications to generative modeling. arXiv preprint arXiv:2002.07367 (2020)
Funding
This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
Author information
Authors and Affiliations
Contributions
The authors confirm contribution to the paper as follows: study conception, design, and writing: Gennaro Auricchio. All authors reviewed the results and approved the final version of the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
On behalf of all authors, the corresponding author states that there is no Conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Auricchio, G. A note on the Radiant formula and its relations to the sliced Wasserstein distance. Aequat. Math. 98, 1317–1332 (2024). https://doi.org/10.1007/s00010-024-01049-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00010-024-01049-1