1 Introduction

In light of the widespread use of the internet and multimedia systems, digital data is increasingly susceptible to unauthorized distribution and misuse, resulting in a significant infringement of the intellectual property rights of data creators. To tackle this issue, digital watermarking technology [2, 11, 18, 23, 33, 43, 45, 49] has emerged and found extensive applications in the discipline of information security. The application domains of digital watermarking encompass various fields such as image, video, audio, and more. This paper, in particular, concentrates on digital image watermarking (DIW) technology [15, 22, 24, 26, 31, 32, 41, 48]. DIW technology is characterized by three key performance metrics: invisibility, robustness, and watermark capacity. These metrics are interconnected and impose mutual constraints. Typically, DIW technology comprises two primary components: watermark embedding and watermark detection. Watermark embedding is responsible for the incorporation of watermark signals into an image, while watermark detection is employed to ascertain the presence of watermark signals within an image.

During the process of watermark embedding, two main tasks need to be accomplished. The first task is to establish a robust embedding domain. Typically, there are three embedding domains, namely the spatial domain [21, 27, 42, 44], the frequency domain [1, 13, 14, 20, 58], and the hybrid domain [34, 47, 55, 56, 59]. Among these domains, the hybrid domain, especially methods based on singular value decomposition, is considered to be the most robust. However, hybrid domain methods based on singular value decomposition do face the issue of false positives, which requires a solution [17]. The second task is to determine the embedding rules, with additive \(Y = X + \alpha w\) [4, 5, 9] and multiplicative \(Y = X + \alpha Xw\) [7, 8, 12, 35, 36] rules being commonly used. In these rules, \(V_{nm} (x,y) = V_{nm} (\rho ,\theta ) = R_{nm} (\rho )\exp (jm\theta )\) and \(\rho = \sqrt {x^{2} + y^{2} }\) represent the original data and the watermarked data, \(0 \le |m| \le n\) is the watermark sequence, and \(\theta = \tan^{ - 1} (y/x)\) is the implantation factor. Increasing \(\theta = \tan^{ - 1} (y/x)\) can enhance robustness, but its limit is constrained by the demands of the human visual system (HVS). As the multiplicative rule adapts to image content, it has gained popularity in the field of watermarking.

Watermark detection methods can be broadly categorized into two main types: correlation-based watermark detection techniques and statistical watermark detection techniques. The principle of correlation-based watermark detection is to determine whether the host image contains watermark signal by analyzing the correlation between the extracted watermark signal and the original watermark signal. This method performs well when the carrier image follows a Gaussian distribution. Previous studies have shown that in many cases, the carrier image does not obey a Gaussian distribution either in the spatial domain or in the transform domain, which affects the effectiveness of correlation-based watermark detection techniques. In contrast, statistical watermark detection techniques, where watermark detection is performed by the collaboration of statistical models and decision rules, are not constrained by this problem. The accuracy of this watermark detection method is related to the accuracy of the statistical modeling research object, which refers to the watermark embedding domain.

The study of achieving a balance between invisibility, robustness, and watermark capacity holds significant importance in the field of watermarking technology. Statistical watermarking techniques that are based on human visual characteristics and carrier statistical characteristics have the potential to address this balance effectively. Numerous watermarking methods based on statistical models have been proposed. In [14], a multiplicative watermark scheme is developed in the contourlet domain. The contourlet coefficients are modeled using the t-location scale (TLS) distribution. The authors use the Kolmogorov–Smirnov (K-S) test to demonstrate the efficiency of modeling contourlet coefficients with the TLS. In [1], a non-additive scheme is proposed, modeling discrete shearlet transform (DST) coefficients using the Laplace distribution. DST serves as the embedding domain in this scheme. In [35], a blind multichannel multiplicative watermarking scheme in the sparse domain is presented. An effective closed-form watermark detector is designed using the multivariate Cauchy distribution. Experimental results and theoretical analysis confirm the effectiveness of this watermark detector. The [36] introduces a low-complexity watermarking detector found on the stable family of Cauchy distributions. The study investigates the robustness of this detector in opposition to a variety of assaults such as noise, filtering, and compression, showing it to outperform the generalized Gaussian detector. Amini et al. [7] present a new color image watermarking scheme and an associated sparse domain detector. This scheme detects the presence of a watermark by establishing binary hypothesis testing. Experimental results point that this detector performs well in terms of robustness. In [8], a scaling watermarking scheme is proposed. This scheme embeds the watermark into low-frequency wavelet coefficients and effectively describes wavelet coefficients using a Gaussian mixture model (GMM). A maximum-likelihood (ML) watermark detector is proposed based on the GMM. In [10], a uniformly most powerful watermark detector for detecting weak signals in the wavelet domain is proposed. The detector models the noise distribution using the Bessel K-form (BKF) probability density function (PDF). The authors extensively discuss the BKF detector and assess the practical achievement of the suggested detector through receiver operating characteristic (ROC) analysis. The [37] tackles the blind watermark detection problem in the contourlet domain. It models the contourlet domain coefficients using the normal inverse Gaussian (NIG) distribution and designs an optimal blind watermark detector in the contourlet domain. The experiments demonstrate the robustness of this detector against attacks such as JPEG compression and Gaussian noise. Niu et al. [29] introduce a statistical color image watermarking method based on the Cauchy-Rayleigh distribution and the local quaternion polar complex exponential transform (QPCET). They derive a closed-form expression for the watermark detector using Cauchy-Rayleigh.

Although numerous statistical-based watermark detectors have been proposed, they primarily serve the purpose of determining whether an image contains watermark information. Some researchers have realized that decoders that can extract watermark information are of more practical value, so they have developed several watermark decoders based on statistical methods. Akhaee et al. [3] propose a blind image watermarking method that is not influenced by the main signal’s distribution and is applicable to any distribution in the transform domain. This method divides the host signal into two small blocks, with one block embedding the watermark while the other block remains unchanged. Watermark extraction is achieved by analyzing the ratio of the sum of samples in each block. In [38], a watermark decoder based on the contourlet domain, which is also the embedding domain, is introduced. This method employs the NIG distribution to model contourlet domain coefficients and designs a watermark extraction method based on the NIG distribution using ML criteria. Experimental results show the robustness of this decoder against several attacks, including noise, rotation, cropping, filtering, and compression. Amini et al. [6] propose a blind multi-bit watermark decoder found on wavelet domain vector hidden markov model (HMM). They derive a closed-form expression for the bit error rate (BER) and validate the correctness of this expression through Monte Carlo simulation experiments. In [39], a multiplicative image watermarking scheme in the contourlet domain is introduced. Although contourlet coefficients are not Gaussian distributed within subbands, the authors argue that the local distribution closely resembles a Gaussian distribution. Therefore, they use a bivariate Gaussian (BVG) distribution to model the distribution of contourlet coefficients. Based on BVG, they design an optimal blind watermark decoder in the contourlet domain. Wang et al. [51] propose a blind image watermark decoder based on the discrete nonsubsampled transform (DNST) domain. They utilize a vector Gaussian-Cauchy mixture model to model the singular values in the DNST domain. Additionally, several decoding methods are introduced in the nonsubsampled contourlet transform (NSCT) domain [53], the contourlet domain [52], and the nonsubsampled shearlet transform (NSST) domain [28]. Liu et al. [25] present a color image watermarking method found on the quaternion polar harmonic transform (QPHT) and a ML decoder. They model the QPHT magnitudes using the BKF. In [50], a statistical watermarking approach found on the local radial harmonic Fourier moments (RHFMs) magnitudes and the beta exponential distribution model is introduced. A decoder for image watermarks designed for multiplicative watermarks is designed by using the Beta exponential distribution and the ML. Xia et al. [54] offer a robust multiplicative watermark decoder based on fitting the phase harmonic Fourier moments (PHFMs) magnitudes with the Weibull distribution. They establish a statistical watermark decoder using the Weibull distribution as the prior for PHFMs magnitudes. In [57], a blind multiplicative watermark system in the curvelet domain is introduced. They employ the NIG distribution to fit the curvelet coefficients and design a watermark decoder using the NIG and ML criteria.

The aforementioned methods provide valuable insights for the design of statistical model decoder methods. However, they suffer from several drawbacks, such as low robustness in watermark embedding domain, weak model performance, and inferior decoding accuracy. To address these drawbacks, this paper proposes a watermark decoder utilizing the Student’s-t mixture model (SMM) to model the low-order pseudo-Zernike moments (PZMs) magnitude in the nonsubsampled shearlet transform (NSST) domain. The primary contributions of this study are outlined as follows:

  • To enhance the robustness of watermark embedding domain and the visual quality of watermarked images, NSST domain local PZMs magnitudes are designed, leveraging the decomposition characteristics of NSST and the robustness of low-order PZMs.

  • To ensure accuracy in modeling the NSST-PZMs magnitudes, a two-component SMM is designed based on the analysis of the statistical characteristics of NSST-PZMs magnitudes.

  • A closed-form decoder expression is derived based on SMM, which is the prior of the NSST-PZMs magnitudes, and maximum-likelihood (ML) criteria.

The remaining sections of this paper are organized as follows: Sect. 2 offers an introduction to local NSST-PZMs magnitudes, conducts an analysis of the robustness and statistical properties of NSST-PZMs magnitudes, and introduces the utilization of SMM for modeling NSST-PZMs magnitudes. Section 3 details the watermark embedding process and provides an in-depth explanation of the decoder. In Sect. 4, the performance of the proposed method is analyzed from the perspectives of invisibility, robustness, and watermark capacity, and it is compared with some advanced methods. The paper concludes in Sect. 5.

2 Preliminaries

2.1 Pseudo-Zernike Moments (PZMs)

Pseudo-Zernike moments (PZMs) [46] appear in our scheme as they have some important properties, i.e., their magnitudes are invariant to image rotation and flipping. Properties of PZMs to note: firstly, low-order PZMs are capable of representing image features, while higher-order PZMs excel in representing image details; secondly, low-order PZMs exhibit better robustness compared to higher-order PZMs; and lastly, the number of magnitudes generated by the PZMs is related to the number of orders, with the number of magnitudes obtained by an \(T\)-order PZMs being \((T + 1) \times (2T + 1)\). This leads to the conclusion that PZMs cannot simultaneously possess strong robustness and high image representation capability.

When constructing an embedding domain, robustness, imperceptibility, and watermark capacity must be considered. Considering that low-order PZMs exhibit strong robustness and the number of invariants generated by PZMs is related to their order, a local decomposition strategy is proposed to ensure the generation of sufficient invariants and utilize the robustness of low-order PZMs. Specifically, the “non-overlapping equal-sized” strategy is used to segment the host image into multiple local blocks, and low-order PZM is applied to these local blocks, as shown in Fig. 1. In this study, the block size is set to \(8 \times 8\), and the order of PZMs is set to 5.

Fig. 1
figure 1

Schematic diagram of local decomposition strategy

Taking Couple, Airplane, Boat, Lena, and Peppers as examples, Fig. 2 displays the original images and images reconstructed based on 5-order PZMs. Based on Fig. 2, the quality of the reconstructed images using PZMs is extremely low and fails to meet the requirements of the HVS. This is because low-order PZMs can only reconstruct the feature of the image and are unable to reconstruct the detail of an image.

Fig. 2
figure 2

Original images and reconstructed images based on PZMs

2.2 Nonsubsampled Shearlet Transform (NSST) Domain Local PZMs Magnitudes

The shearlet transform (ST), compared to other multiscale methods used for image representation like pyramids, wavelets, and curvelets, demonstrates superior effectiveness in capturing features of the host image in various directions. Unfortunately, the standard ST lacks translation invariance due to the downsampling operation, making it susceptible to exhibiting pseudo-Gibbs artifacts during image fusion. To tackle this issue, Easley et al. [13] introduced a nonsubsampled version of the ST, known as the nonsubsampled shearlet transform (NSST). Since the low-level NSST decomposition preserves critical image feature information, we incorporated NSST into this study to filter out less important image data. The NSST decomposition level is set to 2.

Features based on NSST and PZMs, we propose the construction strategy of the NSST domain local PZMs magnitude. This strategy aims to leverage the robustness of low-order PZM, generate sufficient invariants to accommodate watermarks, and ensure image reconstruction quality. Taking a 512 × 512 image as an example, the generation process of NSST-PZMs magnitude is as follows:

First, a two-level NSST decomposition is applied to the host image, resulting in high-frequency subbands at two scales, with each scale containing high-frequency subbands in four different directions.

Next, the high-frequency subbands are segmented using a “non-overlapping equal-sized” strategy, with a block size of 8 × 8.

Finally, a 5-order PZMs calculation is executed on each frequency domain block to yield the locally NSST-PZMs magnitude.

The essence of this strategy lies in utilizing NSST to filter out redundant information in the image, thereby reducing the amount of information required for PZMs reconstruction. This effectively ensures the quality of image reconstruction while fully leveraging the robustness of low-order PZMs and generating sufficient invariants. Taking a high-frequency subband obtained by NSST decomposition as an example, Fig. 3 shows the local NSST-PZMs magnitude generation process. Each high-frequency subband can generate 4096 local NSST-PZMs magnitude blocks, and assembling these blocks in an orderly manner yields the NSST-PZMs magnitude image.

Fig. 3
figure 3

Schematic diagram of local NSST-PZMs magnitude generation

Taking Couple, Airplane, Boat, Lena, and Peppers as examples, Fig. 4 displays the original images and the images reconstructed based on NSST-PZMs magnitude. Comparing the reconstructed images in Fig. 4 to those in Fig. 2, it is clear that the rebuilding quality in Fig. 4 is higher, making it challenging for the HVS to discern any differences between the reconstructed images and the original images.

Fig. 4
figure 4

Original images and reconstructed images based on NSST-PZMs

2.3 Robustness Analysis of Local NSST-PZMs Magnitudes

We employ the normalization error to measure the robustness of NSST-PZMs magnitudes, and its calculation formula is as shown in (1):

$$ E = \frac{1}{M \times N}\sum\limits_{a = 1}^{M} {\sum\limits_{b = 1}^{N} {\frac{{X{(}a{,}b{)} - X_{{{\text{min}}}} }}{{X_{\max } - X_{{{\text{min}}}} }}} } $$
(1)
$$ X(a,b) = |f(a,b){ - }f^{\prime}(a,b)| $$
(2)

where \(E\) represents the result of normalization error calculation, with a smaller \(E\) indicating stronger robustness. \(M \times N\) represents the size of the sample. \(f(a,b)\) represents the original image, and \(f^{\prime}(a,b)\) is the image after the attack. \(X(a,b)\) denotes the absolute difference between \(f(a,b)\) and \(f^{\prime}(a,b)\). \(X_{\min }\) represents the minimum value of \(X(a,b)\), and \(X_{\max }\) denotes the maximum value. 2000 images are randomly selected from the BOSSbase dataset as test images, and the spatial domain, NSST domain, PZMs magnitude domain, and NSST-PZMs magnitude domain are the testing subjects. Here, the NSST-PZMs magnitude domain is generated by the high-frequency subband with the highest variance in Scale 2 after the 2-level decomposition of NSST, which is the target region for hidden watermarks. The results of the normalization error experiments are summarized in Table 1. It is clear from Table 1 that, in comparison to the spatial domain, NSST domain, and PZMs domain, the normalization error of the NSST-PZMs domain is smaller. This indicates the greater resistance to attacks achieved by NSST-PZMs.

Table 1 Comparison results of normalization error

The NSST-PZMs magnitude image is composed of 4096 NSST-PZMs magnitude blocks, which are organized in a specific order. Based on the order and repetition of PZMs, the magnitude at the corresponding order and repetition is calculated for each NSST-PZMs magnitude block. To assess the robustness of NSST-PZMs magnitude, the attacked NSST-PZMs magnitude is compared with the unattacked NSST-PZMs magnitude for judging its performance under various attacks. The results of the robustness assessment experiment for NSST-PZMs magnitudes, conducted with 500 images, are presented in Fig. 5. As indicated by Fig. 5, the magnitudes at the partial order and repetition exhibit only slight fluctuations of less than 0.2. This observation suggests that NSST-PZMs magnitude demonstrates robustness against various attacks.

Fig. 5
figure 5

The robustness results of NSST-PZMs magnitudes

2.4 Statistical Characteristics Analysis of NSST-PZMs Magnitudes

The test subjects are generated from the high-frequency subband with the maximum variance within the second scale of NSST. The statistical characteristics of NSST-PZMs magnitudes are analyzed using frequency histograms and kurtosis. We select several hundred images, and the frequency histograms and corresponding kurtosis of six images are illustrated in Fig. 6. It is important to note that the peak value for a standard Gaussian distribution is 3. According to Fig. 6, NSST-PZMs magnitudes exhibit non-Gaussian characteristics with a sharp peak and heavy tail.

Fig. 6
figure 6

Frequency histograms and peaks of NSST-PZMs magnitudes

2.5 Statistical Modeling of NSST-PZMs Magnitudes

One of the key aspects of building a high-quality decoder is the design of an effective and simple statistical model. In this study, “effective” means the model can accurately capture NSST-PZMs magnitudes, while “simple” implies that the model has a limited number of parameters. The student’s-t distribution features a shape-adjustable parameter, \(\nu\), which imparts a high degree of flexibility to the distribution. The student’s-t distribution approximates the normal distribution when \(\nu \to + \infty\), but has heavy tails when \(\nu \to 0\). Given the characteristics of NSST-PZMs magnitudes, with a sharp peak and heavy tail, the student’s-t mixture model (SMM) [30] is introduced. SMM comprises multiple student’s-t distributions. The PDF of SMM is represented as follows:

$$ f(x_{i} ;\mu_{k} ,\Sigma_{k} ,v_{k} ) = \sum\limits_{k = 1}^{K} {\eta_{k} \frac{{\Gamma (\frac{{v_{k} + d}}{2})|\Sigma_{k} |^{{ - \frac{1}{2}}} }}{{(v_{k} \pi )^{\frac{d}{2}} \Gamma (\frac{{v_{k} }}{2})[1 + v_{k}^{ - 1} (x_{i} - \mu_{{_{k} }} )^{T} \Sigma_{{_{k} }}^{ - 1} (x_{i} - \mu_{{_{k} }} )]^{{\frac{{v_{k} + d}}{2}}} }}} $$
(3)

where \(K\) represents the number of components, \(\eta_{k}\) is the weight of \(k - th\) the component, \(\sum\limits_{k = 1}^{K} {\eta_{k} } = 1\). \(\upsilon_{k}\) denotes the degrees of freedom of the \(k - th\) component. \(d\) is the dimension. \(\Sigma_{k}\) is the covariance matrix of the \(k - th\) component. \(\mu_{k}\) is the mean of the \(k - th\) component. Given the “sharp peak and heavy tail” characteristics of NSST-PZMs magnitudes, \(K\) is set to 2, which results in the SMM consisting of two student’s-t distributions. This choice allows one component to model the “sharp peak” while the other component models the “heavy tail.”

Six different distributions are selected for analysis, including the Gamma distribution, Rayleigh distribution, Exponential distribution, Gaussian distribution, Weibull distribution, and SMM. The NSST-PZMs magnitude images, generated from the high-frequency subband of scale 2 after NSST 2-level decomposition of Lena, Barbara, and Couple images, are used as test objects. The K-S test is employed to measure the goodness of fit of these distributions to the test objects, where a smaller K-S value indicates a better fit to the data. The K-S experimental results are summarized in Table 2. In comparison to the other distributions, the SMM displays the smallest K-S value, suggesting that the SMM is particularly well-suited for modeling NSST-PZMs magnitudes. This conclusion can be extended to other natural images as well.

Table 2 Results of KS experiments

Furthermore, using the Peppers as an example, the NSST-PZMs magnitude images generated from scale 2 are taken as the target objects. Figure 7 displays the fit of the Gamma distribution, Rayleigh distribution, Exponential distribution, Gaussian distribution, Weibull distribution, and SMM to the target objects. Maximum likelihood estimation is employed to estimate the parameters for the Gamma, Rayleigh, Exponential, Gaussian, and Weibull distributions, while the parameter of SMM is estimated using the Expectation–Maximization (EM) method [30]. As depicted in Fig. 7, in comparison to the other distributions, the SMM exhibits the closest fit to the target objects. This further demonstrates that the SMM is suitable for modeling NSST-PZMs magnitudes.

Fig. 7
figure 7

Fitting results of different distributions to the sample data

3 Proposed Method

3.1 Digital Watermark Embedding

Let \(I = \{ f(i,j),0 \le i < M,0 \le i < N\}\) represents the carrier image, and let \({\mathbf{w}} = \{ w_{1} , \ldots ,w_{l} , \ldots ,w_{L} \}\) be a binary watermark sequence of length \(L\) generated using a pseudo-random sequence (PRS), where \(w_{l}\) consists of \(\{ - 1, + 1\}\). Figure 8 illustrates the process of watermark embedding, and the following are detailed explanations:

Fig. 8
figure 8

Generation process of watermarked image

Step 1: The 2-level NSST decomposition is applied to \(I\), resulting in four high-frequency subbands for each of the two scales. The high frequency subband with the highest variance at scale 2 is the goal subband \(Sub_{tar}\).

Step 2: \(Sub_{tar}\) is divided into \(B_{N}\) NSST domain blocks using a “non-overlapping and equal-sized” strategy, \(B_{N} \ge L\). The entropy values of these \(B_{N}\) blocks are calculated. The entropy-based method is utilized to select the optimal watermark embedding locations because the HVS is less sensitive to changes in regions with higher entropy.

Step 3: \(L\) high-entropy blocks are selected, and these blocks undergo PZMs. As a result, \(L\) target NSST-PZMs magnitude blocks, \({\mathbf{B}} = \{ B_{1} , \ldots ,B_{l} , \ldots ,B_{L} \}\), are obtained.

Step 4: The watermark sequence is embedded into \({\mathbf{B}}\) using the multiplicative way, which is specified as:

$$ y{}_{i} = (1 + \alpha w_{l} )x_{i} $$
(4)

where \(x_{i}\) represents the original NSST-PZMs magnitude; \(w_{l}\) is the \(l\) th watermark bit from \({\mathbf{w}}\); \(y_{i}\) denotes the NSST-PZMs magnitude containing watermarks; \(\alpha\) is the embedding strength, \(0 < \alpha \le 1\), it is resoluted by the embedding domain variance and the watermark document ratio (WDR). The rule for \(\alpha\) are as follows:

$$ \alpha = \sqrt {10^{{\frac{WDR}{{10}}}} \times \sigma^{2} } $$
(5)

In this research, a single target NSST-PZMs magnitude block is used to embed only one watermark bit. This means that one watermark bit is combined with \((T + 1) \times (2T + 1)\) magnitudes within the target block, with the aim of increasing the watermark containment rate.

Step 5: The \(L\) NSST-PZMs magnitude blocks containing watermarks are reconstructed using the PZMs. As a result, \(L\) NSST domain blocks containing watermarks are obtained. These \(L\) NSST domain blocks with watermarks are then combined with the \(\left( {N_{b} - L} \right)\) original NSST domain blocks to perform NSST reconstruction, resulting in the watermarked image \(I_{w}\), \(I_{w} = \{ f_{w} (i,j),0 \le i < M,0 \le i < N\}\).

3.2 Digital Watermark Extraction

Essentially, the task of a decoder is to extract the intended signal from a noisy context. In this research, the decoder needs to extract \(w\) from the NSST-PZMs magnitudes. A ML decoder based on the SMM is proposed to extract the watermark in \(I_{w} = \{ f_{w} (i,j),0 \le i < M,0 \le i < N\}\). Figure 9 illustrates the process of watermark decoding, and the following are the details:

Fig. 9
figure 9

Watermark decoding process

Firstly, the target subband \(Sub_{w\_tar}\) is selected from the 2-level NSST decomposition of \(I_{w}\) based on the highest variance at scale two. \(Sub_{w\_tar}\) is divided into \(B_{N}\) equally-sized non-overlapping blocks.

Next, \(L\) high-entropy blocks from \(B_{N}\) are chosen, and PZMs are applied to them to identify \(L\) NSST-PZMs magnitude blocks for decoding \(w\).

Furthermore, all \(B_{N}\) blocks undergo PZMs, resulting in \(B_{N}\) NSST-PZMs magnitude blocks. Using the EM method, the coefficients in \(B_{N}\) NSST-PZMs magnitude blocks are used to train the SMM parameters.

Finally, the decoder extracts the \(l\) th watermark bit from the \(l\) th target block, \(l = 1,2, \ldots ,L\). The derivation process for the decoder expression \(SMM\_ML\) is as follows:

The watermark sequence created by PRS consists of “ + 1” and “ − 1,” with each watermark bit having an equal probability. Based on the characteristics of the watermark sequence, the regions containing “ + 1” can be represented as \(H_{1}\), and the regions containing “ − 1” can be represented as \(H_{0}\).

$$ \begin{aligned} H_{1} :y_{i} = & (1 + \alpha )x_{i} ,w_{l} = & + 1 \\ H_{0} :y_{i} = & (1 - \alpha )x_{i} ,w_{l} = & - 1 \\ \end{aligned} $$
(6)

where \(x_{i}\) represents the NSST-PZMs magnitude without watermark;\(y_{i}\) is the NSST-PZMs magnitude with watermark. \(\alpha\) is the embedding strength; \(i \in B_{l}\), \(B_{l}\) is the \(l\) th watermark embedding block. Based on the ML criterion, (6) is represented as:

$$ \prod\limits_{{i \in B_{l} }} {f_{{\mathbf{y}}} (y_{i} |w_{l} = + 1)_{\begin{subarray}{l} < \\ H_{0} \end{subarray} }^{\begin{subarray}{l} H_{1} \\ > \end{subarray} } } \prod\limits_{{i \in B_{l} }} {f_{{\mathbf{y}}} (y_{i} |w_{l} = - 1)} $$
(7)

We take the logarithmic of (7) and bring it to the same side, as follows:

$$ SMM\_ML_{l} ({\mathbf{y}}) = \sum\limits_{{i \in B_{l} }} {\ln \frac{{f_{{\mathbf{y}}} (y_{i} |w_{l} = + 1)^{{{_{ > }{H_{1} }} }} }}{{f_{{\mathbf{y}}} (y_{i} |w_{l} = { - }1)_{{_{{H_{0} }}^{ < } }} }}} 0 $$
(8)

Substitute \(f_{{\mathbf{y}}} (y_{i} |w_{l} = \pm 1) = \frac{1}{1 \pm \alpha }f_{{\mathbf{x}}} (\frac{{y_{i} }}{1 \pm \alpha })\) into (8) to obtain the detailed version of \(SMM\_ML_{l} ({\mathbf{y}})\):

$$ SMM\_ML_{l} ({\mathbf{y}}) = \sum\limits_{{i \in B_{l} }} {\ln \frac{1 - \alpha }{{1 + \alpha }}} + \sum\limits_{{i \in B_{l} }} {\ln \frac{{f_{{\mathbf{x}}} (\frac{{y_{i} }}{1 + \alpha })^{\begin{subarray}{l} H_{1} \\ > \end{subarray} } }}{{f_{{\mathbf{x}}} (\frac{{y_{i} }}{1 - \alpha })_{\begin{subarray}{l} < \\ H_{0} \end{subarray} } }}} 0 $$
(9)

According to (9), the criterion for defining the operation of the decoder is as follows:

$$ \mathop {w_{l} }\limits^{ \wedge } = \left\{ {\begin{array}{*{20}c} { + 1,} & {Z_{l} ({\mathbf{y}}) \ge T_{l} } \\ { - 1,} & {Z_{l} ({\mathbf{y}}) < T_{l} } \\ \end{array} } \right. $$
(10)

where \(\mathop {w_{l} }\limits^{ \wedge }\) denotes the \(l\)-th watermark bit of the decoding. As (10), if \(Z_{l} ({\mathbf{y}}) \ge T_{l}\), the embedded watermark bit in the target domain is “ + 1”; otherwise, it is “ − 1.” The expression for \(Z_{l} ({\mathbf{y}})\) is

$$ Z_{l} (y) = \sum\limits_{{i \in B_{l} }} {\ln } \left( {\frac{{\sum\limits_{k = 1}^{K} {\eta^{k} \frac{{\Gamma (\frac{{v^{k} + d}}{2})|\Sigma^{k} |^{ - 1/2} }}{{(v^{k} \pi )^{d/2} \Gamma (v^{k} /2)\left[ {1 + (v^{k} )^{ - 1} (\frac{{y_{i} }}{1 + \alpha } - \mu^{k} )^{T} (\Sigma^{k} )^{ - 1} (\frac{{y_{i} }}{1 + \alpha } - \mu^{k} )} \right]^{{(v^{k} + d)/2}} }}} }}{{\sum\limits_{k = 1}^{K} {\eta^{k} \frac{{\Gamma (\frac{{v^{k} + d}}{2})|\Sigma^{k} |^{ - 1/2} }}{{(v^{k} \pi )^{d/2} \Gamma (v^{k} /2)\left[ {1 + (v^{k} )^{ - 1} (\frac{{y_{i} }}{1 - \alpha } - \mu^{k} )^{T} (\Sigma^{k} )^{ - 1} (\frac{{y_{i} }}{1 - \alpha } - \mu^{k} )} \right]^{{(v^{k} + d)/2}} }}} }}} \right) $$
(11)

The expression for \(T_{l}\) is

$$ T_{l} = \sum\limits_{{i \in B_{l} }} {\ln \frac{1 + \alpha }{{1 - \alpha }}} $$
(12)

4 Experimental Results

We conducted an extensive performance evaluation of the proposed method, considering imperceptibility, robustness, and watermark capacity. The evaluation also includes a comparison with other existing methods. For this experiment, we used 512 × 512 grayscale images obtained from three different datasets: the SIPI dataset (http://sipi.usc.edu/database/), the CVG-UGR dataset (http://decsai.ugr.es/cvg/dbimagenes/g512.php), and the BOSSbase dataset (http://agents.fel.cvut.cz/boss/). The watermark sequence utilized in these experiments comprises “ + 1” or “ − 1,” with each watermark bit having an equal probability in the sequence. The experimental settings were as follows: NSST level set to 2, PZM order set to 5, block size at 8 × 8, and a WDR of -40.

4.1 Performance Evaluation

4.1.1 Imperceptibility Analysis

We conducted an evaluation of the proposed method, focusing on its imperceptibility. In the first phase, we assessed the imperceptibility of the proposed method from an objective standpoint using two widely recognized metrics: peak signal-to-noise ratio (PSNR) and structural similarity index measure (SSIM). A PSNR exceeding 40 dB is typically considered to meet the requirements of the HVS [19], while an SSIM value close to 1 indicates minimal differences between the watermarked image and the original image. Twenty groups of watermark sequences of lengths 128-bit, 256-bit, 512-bit, 1024-bit, and 2048-bit were created. We performed the evaluation on the SIPI, CVG-UGR, and BOSSbase datasets. The results of the evaluation are presented in Tables 3 and 4. As observed in these tables, the proposed method attains \(PSNR > 51dB\) and \(SSIM > 0.99\), demonstrating its effectiveness in imperceptibility.

Table 3 PSNR assessment results (objective)
Table 4 SSIM assessment results (objective)

The imperceptibility of the proposed method was further evaluated from a subjective perspective, focusing on the observation of differences between the watermark-containing image and the original image through the HVS. Test images, including Lena, Barbara, Peppers, Airplane, and Couple, were utilized, and a 2048-bit watermark sequence was created. In Fig. 10, we can observe the original image, the watermark-containing image, and the difference image. To make the modifications made during watermark embedding more visible, the pixels of the difference image were magnified by a factor of 30. As indicated in Fig. 10, without the assistance of the 30 × magnification, it is challenging for the HVS to perceive the modification position, highlighting the strong imperceptibility of the proposed method.

Fig. 10
figure 10

The results of imperceptibility assessment (subjective)

The method we have developed ensures that PSNR exceeds 40 for several reasons: a. information in high-entropy regions of an image is usually more chaotic, and the HVS is less sensitive to minor changes in these areas; b. we implemented an adaptive image content multiplication embedding method, allowing for the flexible adjustment of \(\alpha\); c. our use of NSST helps filter out detailed information, effectively compensating for the limitations of low-order PZMs in reconstructing image details.

4.1.2 Robustness Analysis

The robustness of the proposed method was evaluated in three main aspects. Firstly, the evaluation used three datasets: SIPI, CVG-UGR, and BOSSbase, and for each dataset, 1024-bit watermark sequences were randomly generated in 20 groups. The robustness is quantified using the BER, which is the ratio of the number of incorrectly decoded bits to the total number of bits. Various attacks were introduced in the test, including JPEG compression (60), Gaussian filtering (7 × 7), Rotation (1.5°), Scaling (1.5), Additive white Gaussian noise (AWGN) (35), Salt and pepper noise (0.07), Gamma correction (0.75), and Translations (H5, V3). Table 5 presents the results of the BER experiments conducted under these multiple attacks. It is worth noting that the BER of the decoder remains consistently below 0.033, even when subjected to these diverse attacks.

Table 5 The experimental results of BER

Second, the proposed method is applied in practice to evaluate its robustness. We embedded into Lena according to the proposed method and then attacked the Lena containing watermarks with different types of noise. We used the proposed decoder to decode the watermarks from the attacked watermark-containing Lena. Figure 11 shows watermarked image obtained by decoding. According to Fig. 11, the decoder recovers the watermarked image from the attacked watermarked-containing image and ensures that the key information of the watermarked image is discriminable.

Fig. 11
figure 11

Robustness assessment results

In addition, we evaluated the robustness of our algorithm in real-world scenarios. We randomly selected 500 grayscale images of 512 × 512 from the SIPI, CVG-UGR, and BOSSbase databases as test images. The watermark sequences of 128-bit, 512-bit, and 2048-bit generate five groups each. Following our proposed watermark embedding method, these watermark sequences were embedded into the test images, respectively. Then, the watermarked images were uploaded to 8 different social platforms: Weibo, Zhihu, QQ, WeChat, Bilibili, Baidu Tieba, Hupu, and LOFTER, and these images were subsequently downloaded. Afterwards, the watermark within the images was extracted using our proposed decoding algorithm. The robustness of the method is assessed by the BER. We calculated the PSNR between the original image and the watermarked image downloaded from the social media platform. Table 6 shows the experimental data for BER and PSNR. According to Table 6, when the embedded watermark sequence is 2048-bit, the PSNR exceeds 40 dB, and the BER is below 0.015. In the past, images uploaded to social media platforms often suffered from compression attacks because the platforms needed to save storage space and speed up image loading. However, with the intensification of competition among social platforms and increasing emphasis on user experience, they do not overly compress uploaded images, and some may not compress at all. This is one of the key reasons our algorithm performed well in this test.

Table 6 Experimental results of BER and PSNR

4.1.3 Watermark Capacity

We assessed the performance of the proposed method with respect to watermark capacity, recognizing that embedding a substantial amount of watermark information necessitates extensive modifications to the original image. These modifications inevitably impact the image quality, leading to a conflict between watermark capacity and imperceptibility. To evaluate the method, we embedded watermark sequences of varying lengths into images and examined whether the PSNR of the watermarked images exceeded 40 dB. This assessment allowed us to gauge the performance of the proposed method in terms of watermark capacity. We selected 500 test images and used watermark sequences with lengths of 500-bit, 1000-bit, 1500-bit, 2000-bit, 2500-bit, and 3000-bit, with each watermark sequence created in 20 sets. Figure 12 illustrates the PSNR results at different watermark capacities. It is important to note that in our method, to enhance watermark retention, all magnitude coefficients in a target block are modified when embedding a single watermark bit. As shown in Fig. 12, benefiting from the local decomposition strategy and the NSST-PZMs magnitude domain, our method maintains PSNR > 40 dB even when embedding a 3000-bit watermark sequence.

Fig. 12
figure 12

The evaluation results of watermark capacity

4.2 Comparisons with State-of-The-Art Schemes

We conducted a comparative analysis of our method with methods [6, 16, 26, 38, 40, 50, 52,53,54], and [56] in respect of robustness and imperceptibility when embedding watermark sequences of the same length. Methods [6, 38, 40, 50, 52, 53], and [54] are watermark methods based on statistical models, while methods [16, 26], and [56] are watermark methods based on other techniques.

In our comparative analysis, we first compared our method with [6, 50, 52], and [53] in terms of imperceptibility and robustness. To conduct this comparison, we selected 200 images and generated 20 groups of watermark sequences of lengths 128-bit, 256-bit, 512-bit, 1024-bit, and 2048-bit. The results of the imperceptibility comparison are presented in Table 7. In terms of robustness, we compared our method with [6, 50, 52], and [53] against various attacks, including JPEG (10, 20, 30, 40, 50, 60, 70, 80), AWGN (5, 10, 15, 20, 25, 30, 35), Median filtering and Gaussian filtering (3 × 3, 5 × 5, 7 × 7, 9 × 9), Salt (0.01, 0.03, 0.05, 0.07, 0.1), Gamma correction (2, 1.5, 0.9, 0.75), Rotation (− 2, − 1.5, − 1, − 0.5, 0, 0.5, 1, 1.5, 2), and Scaling (0.8, 0.9, 1.1, 1.2, 1.5). Since the results under scaling attacks are not provided in [50] and [52], to ensure the fairness of the comparison, we only compared with methods [6] and [53]. We referred to methods [6, 50, 52], and [53] as VB-Gaussian, Beta-exponential, Cauchy-HMT, and VB-Cauchy, respectively. The results of the BER comparison are depicted in Fig. 13. According to the experimental results in Fig. 13 and Table 7, our proposed method outperforms [6, 50, 52], and [53] in respect of robustness and imperceptibility.

Table 7 Comparison results of imperceptibility with [6, 50, 52], and [53]
Fig. 13
figure 13

BER comparison results with [6, 50, 52, 53] (2048-bit)

According to the results in Table 7, conclusions can be drawn regarding the performance comparison of watermark capacity. The performance of watermark capacity is typically assessed using imperceptibility evaluations, such as PSNR. It is important to clarify that watermark capacity refers to the number of watermark bits that can be accommodated by the carrier under conditions satisfying visual perception (PSNR > 40 dB). In other words, a method with a higher PSNR when embedding the same number of watermarks indicates better performance in terms of watermark capacity, implying that its PSNR may only decrease to that of other methods after continuing to embed a certain number of watermarks. In Table 7, compared to methods [6, 50, 52], and [53], the proposed method demonstrates higher PSNR when embedding the same number of watermarks, indicating an advantage in watermark capacity performance.

Additionally, we performed a comparative analysis with the method [54], using a 512-bit watermark sequence created in 50 sets. We randomly selected 100 images from the BOSSbase dataset for this evaluation. Table 8 presents the results of the robustness comparison against [54]. For assessing imperceptibility, we utilized the test images: Couple, Barbara, Lena, and Peppers. The results of the imperceptibility comparison are given in Table 9. Based on the outcomes in Tables 8 and 9, under the same watermark capacity, our method outperforms [54] in both robustness and imperceptibility.

Table 8 Robustness comparison results with [54]
Table 9 The imperceptibility comparison results with [54]

We conducted a comparative analysis with methods [26, 38, 40], and [56] in respect of imperceptibility and robustness. In addition to ensuring the same test images, we generated 50 sets of watermark sequences for testing. Table 10 provides the results of the imperceptibility comparison, while Table 11 presents the BER comparison results in the absence of attacks. Based on the data in Tables 10 and 11, our method outperforms [26, 38, 40], and [56] in both imperceptibility and robustness.

Table 10 The imperceptibility comparison results with [26, 38, 40], and [56]
Table 11 The robustness comparison results with [26, 38, 40], and [56]

Fang et al. [16] introduced a pioneering learning-based watermarking algorithm. The proposed method includes a new two-stage deep neural network and two newly designed templates. This method uses deep neural networks to enhance the resistance to attacks on digital editing. We selected 400 test images from the SIPI, CVG-UGR, and BOSSbase datasets and randomly generated 20 sets of 256-bit watermark sequences. Table 12 displays the comparison of the average BER under various attacks with [16]. As indicated in Table 12, compared to [16], the proposed method shows an advantage in resisting conventional attacks; however, its performance against geometric attacks is inferior to [16]. This discrepancy arises from the characteristics of the learning-based watermarking technology, which employs feature learning and data augmentation techniques to train models. These trained models are effectively equipped to handle geometric attacks of varying types and intensities.

Table 12 BER results compared to [16]

The results presented in Tables 711, as well as Fig. 13, collectively demonstrate that when embedding watermark sequences of the same length, our method outperforms [6, 26, 38, 40, 50, 52,53,54, 56] in respect of robustness and imperceptibility. According to Table 12, the proposed method outperforms [16] in resisting conventional attacks but is less effective than [16] against geometric attacks. The superior performance of the proposed method can be attributed to several key factors: a. leveraging the robustness of low-order PZMs and the decomposition characteristics of NSST to construct local NSST-PZMs magnitudes; b. the strategy of watermark embedding in entire blocks, which helps enhance robustness; c. the design of the two-component SMM based on the characteristics of NSST-PZMs magnitudes, allowing for accurate modeling; d. closed-form decoder expression based on SMM and ML. It is important to note that all scaling results are obtained in a state where prior knowledge of scaling is known.

5 Conclusion

In this paper, the watermark decoder that uses SMM to model the NSST-PZMs magnitudes is proposed. During watermark embedding, the robustness of low-order PZMs and the decomposition characteristics of NSST are utilized to construct local NSST-PZMs magnitudes as the embedding domain. The watermarks are embedded into these local NSST-PZMs magnitudes using a multiplicative method. A two-component SMM is designed to describe the NSST-PZMs magnitudes based on their “peak-tail” characteristics. During decoding, a closed-form decoder expression is derived based on the SMM and the ML criterion. According to experimental results, when embedding a 1024-bit watermark, the proposed method achieves PSNR > 58 dB and BER < 0.033 on the SIPI, CVG-UGR, and BOSSbase datasets. When compared to some decoding methods for embedding the same capacity watermark, the proposed method demonstrates superior performance in imperceptibility and robustness. We are actively working to mitigate the limitation of requiring prior knowledge when confronted with scaling attacks. In the next step, we will extend this method to color images. Meanwhile, we will try designing a new watermarking method in combination with deep learning techniques.