Introduction

Augmented reality (AR) as a technology gained broader recognition in the mid-1990s and is described as being part of “mixed reality” technologies on the reality-virtuality continuum (Milgram & Kishino, 1994). Here, “an AR system supplements the real world with virtual objects that appear to coexist in the same space as the real world” (Azuma et al., 2001, p. 34). Further characteristics of AR include the three-dimensional (3D) alignment of real and virtual objects as well as real-time interactivity of the representations (Azuma, 1997).

While early central publications about AR focused on application areas such as medicine or entertainment (Azuma et al., 2001; Milgram & Kishino, 1994), later publications offered the perspective of using AR for education (Szalavári et al., 1998).

AR in Education

Today, AR has been widely applied to teaching and learning, with most applications in the natural sciences followed by social sciences (Chang et al., 2022). AR can be implemented using different technologies and various devices (mobile devices such as smartphones or tablet computers, AR glasses) or even without any personal equipment (spatial AR) (Buchner et al., 2022). Currently, the most common applications are those in which AR content is accessed via mobile devices (Xu et al., 2022; Zhang et al., 2022). Although several applications for AR glasses exist, hardly any empirical findings have been reported on this technology.

Affective Characteristics

Of the findings that have been reported, those for educational contexts without differentiation by discipline are related to affective characteristics (Bacca et al., 2014; Cao & Yu, 2023; Chang et al., 2022), such as attitudes or motivation, when AR-based learning environments are compared to traditional (non-AR) learning environments. Concerning these affective variables, results have shown that learning in AR-based learning environments is perceived more positively than traditional non-AR learning environments (Hedge’s g = 0.49, p < .001; Chang et al., 2022) with small effect sizes.Footnote 1 A more differentiated analysis of affective variables has indicated that attitudes toward AR-based learning environments are more positive than those for non-AR learning environments (Cohen’s d = 1.08, p = .001; Cao & Yu, 2023), but no differences have been identified regarding learning motivation (Cohen’s d = 0.25, p = .120; Cao & Yu, 2023). For example, studies in the field of language learning that examined the duration of an intervention have suggested that as the duration of the intervention increases, AR may have a stronger positive influence on motivation (Cai et al., 2022).

Cognitive Load

Different studies have analyzed learners’ cognitive load (Sweller, 1988, 2011), which is an important predictor of learning achievement (Sweller et al., 2011). Cognitive load is directly related to the Cognitive Theory of Multimedia Learning (Mayer, 2009), which provides insights into the design of multimedia learning environments. In a study examining students’ cognitive load for AR-based learning environments, cognitive load was reported by learners as being lower than or equal to the cognitive load in non-AR learning environments, and higher learning achievement was also shown for the AR-based learning environment (Buchner et al., 2022). This has also been confirmed by other comparative analyses with less differentiated consideration and joint analysis of learning outcomes including both cognitive load and learning achievement (Zhang et al., 2022).

Learning Achievement

Numerous overview studies (Bacca et al., 2014; Cao & Yu, 2023; Chang et al., 2022; Zhang et al., 2022) have demonstrated advantages of AR-based learning environments for learning achievement with medium effect sizes (Hedge’s g = 0.64, p < .001) according to a recent second-order meta-analysis (Malone et al., 2023) comparing AR-based learning environments vs. traditional learning environments without AR. The mean effect size can be interpreted in such a way that all individual analyses demonstrate medium to large effects and are therefore consistent regarding the positive impact of AR on learning achievement (Hedge’s gmin = 0.42, p = .04; Hedge’s gmax = 1.69, p = .03).

AR may positively influence learning according to educational sciences due to the coexistence of the real world and virtual objects causing an impression of immersion for learners. Here, immersion is described as a construct related to flow experience (Csikszentmihalyi, 1975, 1990). According to different studies (Antonietti et al., 2000; Conole & Dyke, 2004; Dalgarno & Lee, 2010; Dunleavy et al., 2009), immersion via AR involves corresponding affective variables that seem to be positively related to learning outcomes.

Further, cognitive psychological reasoning according to the Cognitive Theory of Multimedia Learning (Mayer, 2009) shows that integrating different multiple representations with high spatial and temporal contiguity is particularly effective for learning (Ginns, 2006; Schroeder & Cenkci, 2018). For interactive representations, high contiguity can be realized easily with AR.

AR in Science Education

As indicated above, theories (Cognitive Load Theory, Cognitive Theory of Multimedia Learning) and empirical findings suggest that AR has strong potential for use in learning environments. AR appears particularly promising for science education, where field studies and virtual information can be enriched by adding visualizations of associated models when conducting experiments. This rationale has been used in numerous recent studies empirically analyzing AR when applied in science education, such as in physics, particularly for conducting experiments in lab settings (Strzys et al., 2018; Kapp et al., 2019; Schlummer et al., 2023; Wagner et al., 2023).

Affective Characteristics

Review studies in science education have generally found evidence for a positive effect of AR-based learning environments on affective variables, such as motivation or attitudes, compared to non-AR learning environments (Ibáñez & Delgado-Kloos, 2018; Sırakaya & Sırakaya, 2020). Individual studies have shown that no differences for positive emotions between learning environments occur with and without AR (Elford et al., 2022). Also, other studies on flow experience have shown advantages of AR-based learning environments (Ibáñez et al., 2014). Overall, however, structured data on the affective outcomes of AR-based learning environments compared to non-AR learning environments are scarce (Xu et al., 2022).

Cognitive Load

Similar findings as for affective traits have also emerged for analyses of cognitive load. Again, review articles have suggested that AR seems to be able to reduce cognitive load (Ibáñez & Delgado-Kloos, 2018; Sırakaya & Sırakaya, 2020). However, these reviews contain neither concrete statements on the reduction of cognitive load regarding different types of cognitive load (extraneous, germane, intrinsic) nor quantitative information on effect sizes. Examining individual studies suggests, analogous to studies of cognitive load without differentiation by discipline for the field of science education, that no difference in cognitive load exists between AR-based learning environments and non-AR learning environments (Altmeyer et al., 2020; Elford et al., 2022) or that extraneous cognitive load is only somewhat reduced by AR-based learning environments (Thees et al., 2020).

Learning Achievement

In science education, studies have indicated that students in AR-based learning environments show higher learning achievement regarding their ability to remember information, their conceptual understanding and their effectiveness in problem-solving procedures compared to non-AR learning environments (Ibáñez & Delgado-Kloos, 2018). This is also supported by a second-order meta-analysis (Malone et al., 2023) showing that AR learning environments positively influence learning with medium effect sizes (Hedge’s g = 0.66, p < .001). However, differences have emerged for different disciplines within science education (Xu et al., 2022). For example, no advantages have been found in biology for AR-based learning environments, whereas advantages in physics have medium effect sizes (Hedge’s g = 0.64) and advantages in the earth sciences are the strongest (Hedge’s g = 1.45). One suggested reason for these strong effects in earth sciences, which includes astronomy and geography, is that spatial skills are crucial and can be promoted particularly well by AR, while in biology, methods with real observations are important and cannot be taught better via AR (Xu et al., 2022).

Furthermore, while studies on AR based on mobile devices dominate in the field of science education and across all disciplines, at the time of this study only one published article was found that analyzed the effects of AR glasses in learning settings in the field of science education (Thees et al., 2020). This study found no differences in the learning effectiveness of AR compared to traditional settings, which seems to be highly important for science education and especially for physics. In physics, using AR to promote interaction with physics concepts (Azuma et al., 2001) can be effectively achieved with AR glasses when AR is used as an additive element in experimentation, since students’ hands remain free for interacting with the experiment. In addition, numerous physics applications already make use of AR glasses (Strzys et al., 2018; Kapp et al., 2019; Schlummer et al., 2023; Wagner et al., 2023).

Research Questions

The previous discussion concerning the state of science education research on the use of AR in learning processes reveals that only a few systematic findings are available regarding combined studies of students’ affective responses to AR-based learning environments and their cognitive load, although both aspects are significant for learning success. Furthermore, previous studies have focused on AR used on mobile devices (smartphones or tablet computers). However, the use of AR glasses in laboratory settings appears to be promising based on findings from educational psychology, since only this technology allows for unrestricted practical action and manual interaction as well as a high degree of spatial contiguity between experimental setup and model-related visualizations. Nevertheless AR glasses have hardly been considered for empirical studies. To provide additional findings for the effects of AR glasses on affective characteristics, cognitive load and learning achievement in science education and to include AR glasses as an application, the following research questions will be investigated exemplarily for a physics topic, namely optical polarization:

How does the use of AR glasses in a laboratory experiment for optical polarization compared to a learning setting without AR influence students’…

  • RQ1… motivation to learn?

  • RQ2… cognitive load during the learning process?

  • RQ3… learning achievement?

Methods

The experimental study (Fig. 1) compared learning for an AR group (laboratory experiment with additional use of AR) and a non-AR group (laboratory experiment with traditional supplemental material, namely further traditional experimental equipment and printed teaching material) with equal information and representations, see “Laboratory experiment: Optical polarization” section. The treatment comprised a 3-h laboratory experiment on optical polarization, where learners worked individually on individual experimental setups. The students were randomly assigned to the experimental groups.

Fig. 1
figure 1

Flowchart of experimental comparison group design of the study comparing learning in a physics laboratory course for a laboratory experiment supplemented by augmented reality content (AR group) and a laboratory experiment with traditional supplemental material (non-AR group)

Laboratory Experiment: Optical polarization

The laboratory experiment provided learners with a setup to investigate the properties of linearly polarized light and the functionalities of various optical components that are frequently used in optics laboratories (Schlummer et al., 2023). The main setup consisted of a diode laser whose intensity can be measured by means of a photodiode (Fig. 2).

Fig. 2
figure 2

Experimental setup to investigate optical polarization for a laboratory experiment supplemented by augmented reality content (AR group; top) and a laboratory experiment with traditional supplemental material (non-AR group; bottom)

Between the laser and the photodiode, up to three optical components could be installed. In a sequence of four different activities, the goal of the laboratory experiment was to:

  • Activity 1.Determine the degree of polarization of the used laser diode, as well as its angle of maximum emission.

  • Activity 2.Malus’ law: Characterize the selective absorption properties of a linear polarizer by finding an appropriate functional dependency.

  • Activity 3.Investigate the effects of a half-wave plate on linearly polarized light and compare it to the behavior of a polarizer.

  • Activity 4.Characterize the two output channels of a polarizing beam splitter and explore how to use it as a measurement device in combination with a half-wave plate.

In each of the four activities, the learners would need to perform the necessary steps to create a functional configuration by installing the respective components, investigating the effects of rotating components within the setup qualitatively and taking appropriate data to describe the functionality of each component by means of appropriate parameters.

For the AR group, the learners had access to interactive model visualizations during their work with the experiment. These visualizations included written instructions on the current task as well as vector-representations for each component in the setup and a real-time plot of measurement data, Fig. 2 top. Figure 3 shows the most relevant types of model representations in more detail.

Fig. 3
figure 3

Detailed view of the different forms of representations provided to the user. Fields A-C show the vector diagrams for the respective components underneath, Field D shows an example for a measurement diagram in this specific setup

In the specific example, partially polarized light from a laser diode coming from the left successively passes an initial polarizer (A), a half-wave-plate (B) and a second polarizer (C). The respective vector representations match with the rotation angles of the components set in the physical setup. Each diagram shows the polarization direction and relative intensity of the incoming light in dark green and for the transmitted light in light green.

While the general layout of the diagrams is the same for all components, there are several small differences between them indicating the different physical properties of the respective components. Before the light enters the initial polarizer (A) in Fig. 3, it is only partially polarized. Consequently, there is no well-defined vector for the incoming light in the first diagram. Instead, the polarization characteristics of the partially polarized laser diode is added as a dark green shape. This helps the user to set the initial polarizer to an angle that provides optimal intensity for the light beam. After passing the initial polarizer, the light can be assumed to be linearly polarized, as is indicated by the radiant green double arrow in the diagram.

Diagram (B) illustrates that the half wave plate rotates the polarization of the incoming light but does not change its intensity, as the dark and the light green double arrows have the same length. This is different in the third diagram, where it is clearly visible that the polarizer changes both the direction and the intensity of the transmitted light.

Quantitative data on the transmitted intensity can be visualized in the measurement diagram (D). In this case, the light intensity behind the setup is measured as a function of the half-wave plate’s rotation angle.

The non-AR group received a workbook with written instructions which also contained static versions of the interactive vector representations available for the AR group. The workbook also provided a diagram template to be filled out for each task so that learners could create their own visualizations of the measured data (Fig. 2, bottom). This was done to ensure a fair comparison with respect to the live-diagram functionality of the AR group.

Setting and Participants

The present study considers data collected between April and July 2022 during a regular laboratory course for physics students (Bachelor of Science, Bachelor of Education). N = 75 students participated in the study. Students were randomly assigned to a treatment condition (AR group or non-AR group). Further information describing the sample can be found in Table 1. NAR = 19 and NNon-AR = 20 students indicated previous experience with immersive headsets, mainly in the context of VR gaming applications.

Table 1 Sample description of the study and description of the study groups indicating students’ semester of study as well as gender. (Only n = 37 students in the AR group were included in the calculation of mean values and standard deviations due to two participant’s incorrect indication of semester)

The learners in the AR group used a Microsoft HoloLens 2 (mixed reality head-mounted display). The glasses are available for educational institutions such as universities and schools for a price of around USD 3,500. The glasses have a diagonal field of view of 52 degrees and provide interaction by hand gestures and eye-tracking with no need for handheld controllers (Fig. 4).

Fig. 4
figure 4

Photo of the learners working with the HoloLens 2 in the laboratory experiment supplemented by the visualizations of the AR

Data Collection and Analysis

Data was collected using an online questionnaire for the pretest and posttest. Due to the simplified handling during the experiment, respective data was collected by means of a paper–pencil questionnaire.

Motivation

Learners’ motivation was measured using the Intrinsic Motivation Inventory (IMI; McAuley et al., 1989) in the German-language abridged version (Wilde et al., 2009) Specifically, the IMI subscales interest/enjoyment (IE; 3 items, 5-point Likert scale) and perceived competence (PC; 3 items, 5-point Likert scale) were used. We considered that the IMI explicitly covers the construct motivation very broadly and that the scales can also be used individually depending on the content fit (McAuley et al., 1989). Motivation was assessed during the treatment (four measurement time points; each after the four specific activities of the experiment described in Sect. 3.1.). Mean scores were taken for each subscale, and then total motivation was calculated as the sum of the mean scores for interest/enjoyment and perceived competence.

The analysis of the internal consistency shows very good to excellent values for both subscales by determining Cronbach’s (IE = .90; PC = .89; Cortina, 1993).

Cognitive Load

Cognitive load was measured as a three-dimensional construct divided into intrinsic load (IL), extraneous load (EL) and germane load (GL). This separate measurement of the individual contributions seems necessary because, based on theories such as the Cognitive Theory of Multimedia Learning (Mayer, 2009), it can be assumed that the design of the learning environment only influences the extraneous load part of the cognitive load (Mayer & Fiorella, 2014). In contrast to extraneous load, measurement of intrinsic load and germane load can be used to ensure that the learning environment does not differ in features other than technological support provided by AR and non-AR, respectively. A self-assessment instrument (7 items, 7-point Likert scale; Klepsch et al., 2017) was used to measure extraneous load with 3 items and intrinsic load and germane load with 2 items each. The dependent variables IL, EL and GL were also measured after each of the four specific activities of the experiment described in “Laboratory experiment: Optical polarization” section. Mean scores were calculated for each subscale. Total cognitive load was calculated as the sum score of all three contributions.

Analysis of the internal consistency shows good to very good values for all three subscales (IL, EL and GL) by Cronbach’s (IL = .88; EL = .82; GL = .70; Cortina, 1993).

Learning Achievement

Learning achievement was measured by a self-developed test instrument on the learners’ subject knowledge as a dependent variable for both groups. The use of a self-developed test instrument was intended to ensure that the specific learning gains of the experiment could be determined with a high fit to the content taught. The test included 19 multiple-choice items. While the items were used in their entirety to measure subject knowledge in the posttest, the measurement in the pretest was only for 8 items, because certain items were too specific to be answered by the learners before the laboratory course, such as the item shown in Fig. 5. The fit of the items to the content of the laboratory experiment was tested in previous studies in the same teaching–learning context. The items tested whether learners were able to recall the operation of the experimental components (retention) and transfer what they learned to other experimental configurations (transfer). The indication of total values was done by sum scores.

Fig. 5
figure 5

Example item for the extended set of items used in the posttest. The participant was presented with a graphical representation of an experimental setup and asked to identify the direction of the polarization vector that fits the given experimental results (the correct answer is option [c] in this case)

The pretest and posttest subscales had varying internal consistencies as described by Cronbach’s (Pre = 0.46; Post = 0.79; Cortina, 1993). The low consistency of the corresponding test on subject knowledge arose because different content was covered pretest, and these concepts were made clear only via the laboratory course itself; so, high values for Cronbach’s on the pretest were not necessarily expected (Taber, 2018; Stadler et al., 2021).

Further Variables

Demographic data used to describe the sample included gender, semester of study and prior experience using immersive headsets for VR and AR applications.

Given the analysis of the instructional impact of a new technology (AR), it seems useful to analyze learners’ technology affinity (TA). There is evidence that positive technology affinity is positively correlated with learning in technology-based learning environments but negatively correlated with learning in traditional learning settings (Backhaus et al., 2019). Furthermore, previous studies also suggest that learners’ attitudes towards education, including their motivation (Mills et al., 2013) as well as their learning intentions (Jin & Divitini, 2020), may be influenced by their technology affinity. Furthermore, there is no clear indication of the extent to which technology affinity correlates with learners’ cognitive load, as some studies have found a corresponding correlation (Albus et al., 2021), while others did not (Binder et al., 2021; Laun et al., 2022). Therefore, learners’ technology affinity was included as a potential covariate for the analyses of motivation, cognitive load and learning achievement. Technology affinity was analyzed using the two subscales enthusiasm for technology (ET; 5 items, 5-point Likert scale) and (self-assessed) competence in using technology (CT; 4 items, 5-point Likert scale) of the widely used TA-EG (Karrer et al., 2009; Siebert et al., 2022). The analysis of the internal consistency shows good values for the two subscales of technology affinity via Cronbach’s (ET = .84; CT = .86; Cortina, 1993).

Findings

The results for research questions 1–3 are described separately below. Data was analyzed with R (Version 4.2.3) in RStudio (Version 2022.07.1). Effect sizes were specified according to Cohen (1988): Wilcoxon effect size |r|≥ .10 small, |r|≥ .30 medium, |r|≥ .50 large effect size; partial η2 ≥ .01 small, η2 ≥ .06 medium, η2 ≥ .14 large effect size.

Before applying the parametric test procedures used below, the appropriateness of the assumption of a normal distribution was checked. The test was carried out for all eight characteristics included as dependent variables (interest/enjoyment, perceived competence, total motivation, extraneous cognitive load, germane cognitive load, intrinsic cognitive load, total cognitive load, subject knowledge posttest) and divided into the two groups (AR group, non-AR group). Normality was assumed as determined by the Shapiro–Wilk test and upon visual inspection as of the quantile–quantile-plots.

AR vs. Non-AR Regarding Motivation (RQ1)

Following previous studies, learners’ motivation in completing the laboratory course may be influenced by their technology affinity, as hypothesized in Sect. 3.3.4. To determine whether the learners’ technology affinity should be considered in analyses of differences in motivation between the AR and non-AR groups, two forced-entry linear regressions, see Table 2, were carried out (independent variables: enthusiasm for technology, self-assessed competence in using technology; dependent variables: interest/enjoyment, perceived competence). Results showed that neither enthusiasm for technology nor competence in using technology significantly determined interest/enjoyment or perceived competence for the laboratory experiment. Therefore, analyses of group differences between AR and non-AR group were performed without any covariates.

Table 2 Fit parameters obtained from regression models for interest/enjoyment and perceived competence as dependent variables

The analysis of group differences (between-subjects factor) and time courses (within-subjects factor) regarding motivational characteristics was performed using type II mixed ANOVAs for interest/enjoyment, perceived competence, and total motivation as dependent variables; see Table 3. Results included the overall group differences between the AR group and non-AR group (group effect), the temporal development of motivational characteristics over the four measurement time points without considering the groups (time effect), and the corresponding interaction effect.

Table 3 Results for main effects and interaction effects (group differences) between AR group and non-AR group for time course of interest/enjoyment, perceived competence and total motivation (type II mixed ANOVAs)

The analysis for group effects revealed that the students in the AR group rated their interest/enjoyment higher (M = 3.69, SD = 0.65) than students in the non-AR group (M = 3.01, SD = 0.71), and the mixed ANOVA showed that this is a large significant effect (F(1.00,70.00) = 10.90, p = .002, partial η2 = 0.14). Although the perceived competence of the AR group (M = 3.57, SD = 0.88) and the non-AR group (M = 3.57, SD = 0.71) did not differ significantly (F(1.00,70.00) = 0.01, p = .941), there was a significant difference, with a small to medium effect size (F(1.00,70.00) = 4.40, p = .039, partial η2 = .059), in total motivation of the learners in the AR group (M = 3.62, SD = 0.64) compared to the non-AR group (M = 3.29, SD = 0.70). Since all motivational variables were assessed as mean scores of 3 items on 5-point Likert scales, results indicate that learners expressed medium to rather high ratings (Fig. 6). In addition, no significant time effect or interaction effect is present in the data.

Fig. 6
figure 6

Time course of total motivation over the four measurement time points for the AR and the non-AR groups

AR vs. Non-AR Regarding Cognitive Load (RQ2)

The amount of cognitive load students experienced during the experiment might be related to learners’ technology affinity, as explained in “Futher Variables” section, although this does not seem clear. To determine whether technology affinity should be considered in analyses of differences in cognitive load between the AR and non-AR groups, a forced-entry linear regression was carried out for each cognitive load component (independent variables: enthusiasm for technology, self-assessed competence in using technology; dependent variables: cognitive load components).

The linear regression, see Table 4, revealed that for the present data, neither enthusiasm for technology nor competence in using technology significantly determined intrinsic cognitive load, extraneous cognitive load or germane cognitive load for the laboratory experiment. Therefore, analyses of group differences between AR and non-AR groups were performed without taking any covariates into account.

Table 4 Fit parameters obtained from regression models for intrinsic cognitive load, extraneous cognitive load and germane cognitive load as dependent variables

The analysis of group differences and trajectories regarding cognitive load was performed using type II mixed ANOVAs for intrinsic cognitive load, extraneous cognitive load, germane cognitive load and total cognitive load as dependent variables, see Table 5. Results include overall group differences between the AR group and non-AR group (group effect), the temporal development of cognitive load over the four measurement time points without considering the groups (time effect), and the corresponding interaction effect.

Table 5 Results for main effects and interaction effects (group differences) between AR group and non-AR group for time course of intrinsic cognitive load, extraneous cognitive load, germane cognitive load and total cognitive load (mixed ANOVAs)

The analysis for group effects revealed that the students in the AR group rated their intrinsic cognitive load as higher (M = 3.12, SD = 1.04) than students in the non-AR group (M = 2.65, SD = 0.97), and the mixed ANOVA showed that this is a significant effect of moderate size (F(1.00,70.00) = 4.28, p = .042, partial η2 = 0.06). The differences for extraneous cognitive load between the AR group (M = 3.19, SD = 1.13) and the non-AR group (M = 3.17, SD = 0.88) as well as the differences for germane cognitive load between the AR group (M = 4.73, SD = 1.01) and the non-AR group (M = 4.55, SD = 1.35) are not significant (extraneous cognitive load: F(1.00,70.00) = 0.01, p = .910; germane cognitive load: F(1.00,70.00) = 0.37, p = .545). Also, total cognitive load differences between the AR group (M = 3.68, SD = 0.82 and the non-AR group (M = 3.46, SD = 0.79) are not significant (F(1.00,70.00) = 1.45, p = .233).

Regarding time effects, results showed that intrinsic cognitive load (F(2.64,184.47) = 20.75, p < .001, partial η2 = 0.23), extraneous cognitive load (F(2.58,180.40) = 9.12, p < .001, partial η2 = 0.12) and total cognitive load (F(2.62,183.20) = 19.16, p < .001, partial η2 = 0.22) changed over the course of the laboratory experiment with moderate to large effect size (Fig. 7). Concerning interaction effects, we found no differences in the temporal development of cognitive load over the four measurement time points between the two groups.

Fig. 7
figure 7

Time course of total cognitive load over the four measurement time points for the AR and the non-AR groups

AR vs. Non-AR Regarding Learning Achievement (RQ3)

We also analyzed possible effects on the students’ learning achievement. Due to possible correlations between motivation and technology affinity to learning achievement, first a forced-entry preliminary linear regression was carried out with a model function that included subject knowledge in the posttest as an independent variable and subject knowledge in the pretest as a further dependent variable. The preliminary linear regression, see Table 6 (left; iteration 1), revealed a significant contribution only for subject knowledge pretest score and perceived competence. Therefore, further potential predictors were dropped from the model before a second linear regression was carried out, see Table 6 (right; iteration 2). The resulting model represents a fit of the posttest score of students’ subject knowledge that accounts for about 50% of the variance in the population.

Table 6 Fit parameters obtained from regression models for subject knowledge (post) as dependent variable. Parameters with p > 0.05 in the first regression were omitted in the second regression

Prior knowledge (subject knowledge in the pretest) and perceived competence were predictors for subject knowledge in the posttest. Hence, pretest subject knowledge was considered as a covariate in subsequent comparative studies of the AR and non-AR groups. Since perceived competence was assessed as a treatment-dependent variable during the experiment, it could not be considered as an additional covariate. The fact that perceived competence is correlated to posttest performance is rather considered a proof of methodological consistency, because it indicates that the students’ self-assessment of their performance during the experiment matched the objective outcome in terms of posttest scores.

A comparison of subject knowledge in the posttest (as dependent) for the AR and non-AR groups was performed using a type II ANCOVA considering subject knowledge in the pretest, perceived competence, and extraneous cognitive load as covariates. The Bonferroni-Holm correction was applied for multiple testing.

The analysis revealed that students in the AR group had a higher subject knowledge on the posttest (M = 9.57, SD = 3.94) than students in the non-AR group (M = 8.63, SD = 3.46) (Fig. 8). However, this difference is not significant, F(1.00, 68.00) = 0.90, p = .347).

Fig. 8
figure 8

Estimated marginal means of posttest subject knowledge in the AR group and non-AR group

The learning achievement was further analyzed with respect to the increase in subject knowledge between pretest and posttest, by performing a type II mixed ANOVA with subject knowledge as a dependent variable, group (AR, non-AR) as a between-subjects factor and time (pretest or posttest) as a within-subjects factor. Only the 8 items that were asked at both test times were selected, see “Learning Achievement” section, to be able to compare the results of the pretest and posttest.

The results (Fig. 9), show a significant main effect with large effect size, F(1.00, 70.00) = 24.59, p < .001, partial η2 = 0.26; we hence conclude that the students overall improved their performance in the subject knowledge test over the course of the laboratory experiment. However, the interaction effect does not appear to be significant, F(1.00, 70.00) = 0.08, p = 0.781, meaning that the learning gains in the AR group and the non-AR group are of similar magnitude.

Fig. 9
figure 9

Interaction plot for AR group and non-AR group and both measurement time points (pretest, posttest)

Discussion

For motivation (RQ1), our study shows that learning with AR has advantages over the traditional learning setting for laboratory experiments. The small to moderate overall effect (p = .039, partial η2 = 0.06) can be traced back to large effects in interest/enjoyment (p = .002, partial η2 = 0.14) in working with the AR environment, but these large effects were countered by the absence of differences in perceived competence. This result appears consistent with the slightly heterogeneous preliminary findings on the use of AR in learning settings in general (Cao & Yu, 2023; Chang et al., 2022) and for science education in particular (Ibáñez & Delgado-Kloos, 2018 However, the present data go beyond previous research as they report findings for the use of AR glasses as a technology, which have not been analyzed before. The fact that perceived competence of learners did not differ significantly across groups is surprising because only about half of the learners in the AR group reported prior experience with immersive headsets (19 of 39 learners).

For cognitive load (RQ2), the only significant difference between the AR group and non-AR group was found for intrinsic cognitive load (p = .042, partial η2 = 0.06). This difference might have occurred due to differences, for example, in learners’ prior knowledge. Another possibility is that the additional visualizations were perceived as increasing the complexity of the learning environment because learners in the AR group were required to make sense of these representations to successfully complete the tasks. While learners in the non-AR condition also had the visualizations available in their workbooks, they did not necessarily need to reflect on them to read out measurement values from the multimeter. Also surprising was that no detectable differences were found in extraneous cognitive load between the groups. Based on the Cognitive Theory of Multimedia Learning (Mayer, 2009) and a broad empirical basis on the spatial contiguity effect (Ginns, 2006; Schroeder & Cenkci, 2018), one might have assumed that the spatial integration of the experimental components and the setup as well as the model-related visualizations in the AR group should have reduced the extraneous cognitive load. Yet, most studies on AR use have not shown that AR clearly reduces the cognitive load in learning situations in general (Radu, 2014; Buchner et al., 2022). Specifically in science education, studies show no clear trend regarding cognitive load for AR (Altmeyer et al., 2020; Elford et al., 2022; Thees et al., 2020). However, in this context, one might argue that findings in which both groups have equal amounts of cognitive load represent a benefit of the AR environment, because dealing with a new technology can be considered an additional challenge that is absent for learners in the non-AR environment (Radu, 2014).

The most striking finding in relation to previous research was for the students’ learning achievement (RQ3). Although students in the AR group showed a higher subject knowledge on the posttest (M = 9.57, SD = 3.94) than students in the non-AR group (M = 8.63, SD = 3.46), there was no significant difference in learning growth in subject knowledge (p = .347). Both for the application of AR in education in general (Malone et al., 2023) and for the special field of science education and specifically for physics as a discipline (Xu et al., 2022), clear findings in systematic reviews and meta-analyses have shown learning benefits associated with AR compared to traditional learning settings. The present data appear to support the findings of Thees et al. (2020) and, thereby, open the question of how the specific application of AR glasses differs from other fields of application of AR and other technologies. As reported earlier, our data showed that the perceived competence of both groups was not significantly different, suggesting that the lack of advantage in the AR group was not due to difficulties in using the technology.

In addition to the specific results comparing AR to non-AR, it seems interesting that the laboratory course in this study proves also suitable for training further content knowledge in both groups, as there was a strong overall effect in learning achievement (p < .001, partial η2 = 0.26). This appears to contradict popular analyses from the field of physics education (Holmes & Wiemann, 2018; Holmes et al., 2020) and raises the question of the extent to which lab courses are suitable for the development of content knowledge through the integration of concept-related content, including visualizations. A possible explanation may be found by Xu et al. (2022) indicating strong advantages of AR in the field of earth sciences can be traced back to this field’s specific focus on spatial abilities. In contrast, AR has shown no advantages in biology, where it is assumed that knowledge of real objects is fundamental. For the learning setting of the present study, it can be assumed that spatial abilities are not particularly important for learning the concept of optical polarization; but what is important is the integration of AR with real experimental setups. Instead of focusing on features of the AR environment to explain the absence of effects, an interesting alternative explanation might instead to consider positive features of the design of the non-AR environment, which was explicitly designed to provide a fair comparison condition. This included providing the students with the same visualizations (although static and not interactive) and prompting them to generate their own visualizations of measurement data. Hence, the non-AR environment was designed to explicitly compensate for potential drawbacks based on the differences in technological implementation. This explanation highlights the primacy of good instructional design independent of technological implementation (Feldon et al., 2021), and it further points out the importance of establishing fair conditions in media comparison studies to obtain meaningful results.

Research Limitations

This study was designed with special consideration for enabling a fair comparison between AR and non-AR conditions, particilarly with respect to the available representations. Critically, however, we cannot exclude that novelty effects due to the experience of using a new technology (Hamari et al., 2014) might have positively influenced learners’ motivation in the AR group. This is an issue that has also not been rigorously accounted for in previous studies related to educational technology in general (Tsay et al. 2019).

In addition, this study focused on quantitative measures such as learning achievement and cognitive load. To gain a deeper understanding of the quantitative results, it would be desirable to investigate how learners interact and work with the experiments from a qualitative perspective. For example, it is not yet clear how learners in the AR and non-AR environments perceived the provided representations and how they included them in their learning process, or how the interactive functions of the AR environment might have influenced their experimental strategies.

The unequal distribution of female and male learners results in a limitation for the present study regarding the sample, as the proportion of female participants overall was only just over 20 percent. This proportion is roughly comparable to the likewise significantly lower proportion of females compared to males in the study by Thees et al. (2020) and corresponds to the typical gender gap of women within STEM education (Cimpian et al., 2020). The extent to which the unequal distribution influences the results of the study appears unclear, as both Ibáñez and Delgado-Kloos (2018) and Xu et al. (2022) point out that the influence of gender on learning in AR-based learning environments in science education has not yet been investigated.

A further limitation relates to the selection of the survey instruments. The survey for cognitive load was carried as a self-assessment, which is quite common, but more objective measurement methods would be desirable. As this was a laboratory experiment, it also seems relevant to examine the extent to which the experimental approach itself influenced the results; this could be done, for example, by checking whether the experimental tasks were performed similarly well in both comparison groups. For both aspects mentioned above, it seems sensible to incorporate eye-tracking data, which can easily be done using AR glasses (Souchet et al., 2022).

Conclusion and Implications

We examined the extent to which the use of AR glasses in laboratory experiments is beneficial for students’ motivation, cognitive load and learning achievement compared to a traditional learning setting. We used an experimental comparison group design with N = 75 students in the context of a laboratory experiment on optical polarization.

In line with existing findings, the results show that learners appear more motivated when using an AR application. However, a novelty effect cannot be excluded, so longitudinal surveys over a longer intervention period seem necessary. Regarding cognitive load, no significant difference between learners with and without AR were found. On the one hand, this is promising, as potential challenges in dealing with AR as a novel technology have not proved to be a hinderance (Buchner & Kerres, 2023). But, on the other hand it is also surprising, as the significantly higher (spatial) contiguity in the AR experiment could be expected to lower the extraneous cognitive load, according to theoretical and empirical findings on the Cognitive Theory of Multimedia Learning. Finally, it should be noted that contrary to most studies (Bacca et al., 2014; Cao & Yu, 2023; Chang et al., 2022; Zhang et al., 2022), all learners acquired knowledge in the same way regardless of the learning technology. This contradicts the findings from meta-analyses on the use of AR in physics (Xu et al., 2022) but is consistent with the only finding to date (according to the present literature review of our study) on the use of AR glasses for learning in physics (Thees et al., 2020). As such, these results could suggest that it may be inappropriate for media comparison studies to focus on surface features, such as the specific AR technology or the teaching subject. Rather, future studies should focus on the content design of AR applications and, by analyzing the deep structure, identify features that make AR applications conducive to learning. A good starting point for future research could be to examine the initial findings of existing comparative studies on AR in biology applications, which showed real experiences were more important than virtual ones, and studies on AR in earth sciences applications, which showed that AR was particularly supportive in developing spatial ability skills critical for the field. These findings should be investigated more systematically and transferred to other subjects such as physics.