1 Introduction

Since astronomer Richard Christopher Carrington first observed solar flares in 1859, researchers have been studying solar activity. A solar radio burst is a sharp increase in the intensity of radio radiation produced when the atmosphere is strongly perturbed (Bouratzis et al. 2015). It is a highly variable and frequently short component of solar radio radiation that reflects the evolution and interactions of plasma, non-thermal electrons, and magnetic fields in the solar eruption activity region (Huang 2010).

Solar radio bursts are classified into various types according to frequency and time characteristics. Radio spectral images, with time on the X-axis and frequency on the Y-axis, present different patterns of bursts. These patterns can be divided into the following five categories of meter-wave solar radio bursts: Type I SRBs, caused by accelerated electrons generating base frequency radiation in seconds; Type II SRBs, characterized by a slow change in frequency over minutes; Type III SRBs, marked by a rapid drift in frequency from high to low, lasting for minutes; Type IV SRBs, which are continuous broadband bursts often accompanied by Type II SRBs; and Type V SRBs, which are constant, diffuse bursts excited by accelerated electron scattering, often accompanied by Type III SRBs. The continuous bursts of Type III SRBs are classified as Type IIIs SRBs. Each type of solar radio burst event is associated with specific solar activity phenomena and is related to various phenomena in the near-Earth space environment.

In recent years, solar observation has experienced significant technological advances, mainly due to the development of fully digital radio spectrometers. These spectrometers can cover a wider frequency range, allowing solar observers to capture radio radiation at multiple frequencies, and the daily generated data have reached GB and TB levels, ushering solar observation into the era of big data. Currently, various solar radio spectrometers are in use worldwide. In China, the National Astronomical Observatory of the Chinese Academy of Sciences (NAOC) and Yunnan Observatory of the Chinese Academy of Sciences (YNAO) possess several solar radio observing systems, and the Mingantu Spectral Radioheliograph (MUSER) has been put into use (Zhang et al. 2021). During observation periods, solar radio bursts are rare events, making manual detection and categorization infeasible with massive amounts of observation data. Therefore, there is an urgent need for automated and intelligent algorithms to quickly recognize and classify the types of solar radio bursts and automatically extract their parameters. This will help improve data processing efficiency, enable real-time monitoring, facilitate in-depth studies of solar activity processes, and enhance the accuracy of space weather warnings.

Currently, existing studies have focused on detecting the presence of bursts in the solar radio spectrum or identifying a single burst category. Chen et al. proposed a multimodal deep-learning method for classifying solar radio spectrum images with and without bursts, in addition, they attempted to use PCA dimensionality reduction combined with SVM classification to compare with the multimodal method in their paper (Chen et al. 2016). The experimental results show that the multimodal learning network significantly improves classification accuracy. Chen (2018) proposed a convolutional neural network-based algorithm for solar radio spectrum classification, achieving a TPR value of 84.6%. Lobzin et al. studied the automatic detection of Type III and Type II SRBs using the Hough transform to detect straight lines in the spectrogram, where several straight lines can be combined to form an event, and then identification is achieved through the matching of straight lines (Lobzin et al. 2009). Guo et al. used meta-learning and transfer-learning methods to classify solar radio spectrograms into bursts, non-bursts, and calibrations (Guo et al. 2022). Li et al. used self-supervised learning to discriminate solar radio bursts with a 75% mask in the radio images, achieving an accuracy of 99.5% (Li et al. 2022a). Chen et al. used Swin Transformer to accurately classify radio burst data from NAOC to detect whether they are solar radio bursts (Chen et al. 2022). Zhang used the Faster-RCNN algorithm to localize and detect Type III SRBs observed at Crouch Mountain Observatory (Zhang 2020). The existing methods have problems such as low detection accuracy, few detection types, and poor real-time performance.

Real-time detection of solar radio bursts can be conceptualized as a target detection problem within the radio spectrum, akin to target detection in computer vision. Target detection aims to identify predefined target types and their locations from images. Traditional algorithms for target detection rely on manually crafted features, image transformation for feature extraction, sliding window methods for candidate frame selection, and statistical learning for target classification and regression. These methods often suffer from low accuracy, slow processing speeds, and limited generalization. In recent years, deep convolutional networks have made significant strides in machine vision. Networks like AlexNet (Krizhevsky et al. 2017), VGG (Simonyan 2014), and GoogleNet (Szegedy et al. 2015) have elevated large-scale image classification tasks to new heights with deeper layers and improved architectures. Additionally, the integration of Transformer models into image processing has revitalized the field of target detection. Deep learning-based target detection algorithms are broadly categorized into single-stage (e.g., YOLO series, SSD series) and two-stage (e.g., R-CNN series) approaches.

To meet the high accuracy and real-time requirements of solar radio burst detection and classification, this paper adopts a lightweight network known for its fast and efficient features. You Only Look Once (YOLO) is a widely recognized target detection algorithm (Redmon et al. 2016), with its eighth-generation version, YOLOv8, representing a state-of-the-art model that builds upon its predecessors by introducing new features and improvements to enhance performance and flexibility.

In this study, we propose a solar radio burst detection model based on the enhanced YOLOv8. This model achieves real-time detection of Type II, Type III, Type IIIs, Type IV, and Type V solar radio bursts with precise localization. Given the irregular shapes characteristic of solar radio bursts, we introduce a full-dimensional dynamic convolution algorithm to enhance feature extraction, along with the MSFF-NeXt (Multi-scale Feature Fusion based on ConvNeXt) network designed specifically for capturing these burst characteristics. Additionally, the model incorporates CIoULoss to improve generalization and robustness.

This paper makes several important contributions:

1. Dataset creation: This paper collects and constructs a dataset of solar radio burst spectral images sourced from e-CALLISTO. It includes labeling for five types of solar radio bursts, facilitating future public release for use by other researchers.

2. Enhanced model architecture: To effectively capture features from solar radio burst spectrograms, this paper integrates full-dimensional dynamic convolution into the YOLOv8 model. This enhancement combines global sensing with local generalization bias, thereby improving detection accuracy.

3. MSFF-NeXt network proposal: Addressing the varied characteristics of different solar radio burst classes, this paper proposes the MSFF-NeXt multiscale feature fusion network. This network replaces the neck component of YOLOv8, further enhancing detection accuracy.

4. Fast and versatile model: The proposed model is a fully convolutional neural network capable of real-time solar radio burst detection, characterized by high-speed processing. Furthermore, its applicability extends to other astronomical data detection tasks involving time and frequency domain features.

2 Dataset construction and augmentation

2.1 Source and construction of the solar radio burst spectrogram dataset

In this paper, solar radio spectrograms sourced from the e-CALLISTO (Compound Astronomical Low-frequency Low-cost Instrument for Spectroscopy and Transportable Observatory) website were utilized to construct the dataset (Monstein et al. 2023). We prioritized selecting sites with high-quality data, comprehensive coverage, and easy accessibility for download to build our dataset. The e-CALLISTO constitutes a global array of space weather instruments primarily deployed for observing solar radio bursts, monitoring radio frequency interference, and promoting astronomy education. The network compiles data from all spectrometers into FIT files, each scan encompassing up to 400 frequencies (He et al. 2023).

Initially, FITS files containing solar radio burst events spanning from 2020 to 2023 were downloaded, comprising four components: an ASCII header, a binary spectrum, and two binary tables dedicated to frequency and time. Subsequently, this paper utilized the Python Library To Process The CALLISTO Spectrometer Data (pyCallisto) to parse these FITS files (Pawase et al. 2020). The pyCallisto is a Python module equipped with various utility functions tailored for visualizing and manipulating frequency and time spectra obtained from radio-solar spectrometer data, such as CALLISTO. Functions include plotting, slicing, concatenating, and generating light curves and frequency profiles. With pyCallisto, the FITS files were processed to produce radio spectrum images after background subtraction, each containing at least one of the five identified burst categories. Finally, leveraging specific timestamp and category data from e-CALLISTO, each radio spectrum image underwent meticulous annotation using LabelImg. These annotations encompassed burst location, category, duration, and frequency range. This process culminated in creating a novel solar radio target detection dataset. Table 1 provides an overview of the dataset, detailing the number of instances for each burst type. Figure 1 presents various types of spectrograms from the dataset.

Fig. 1
figure 1

Various Types of Solar Radio Bursts

Table 1 Parameters of burst event detection dataset

The number of burst instances exceeds the number of images because multiple types of burst events can occur within a single radio spectrogram. Among solar radio burst types, Type I SRBs exhibit distinct, transient, high-intensity radio signals on spectrograms, superimposed over a stable or slowly changing continuous background. Due to their unique characteristics, which make them challenging to collect and label using the e-CALLISTO network, Type I SRBs are excluded from this dataset. Type III SRBs are characterized by rapid frequency drift, short duration, and the presence of continuous emissions. In this study, Type IIIs SRBs, which occur in groups, are categorized as a subtype of Type III SRBs.

2.2 Dataset augmentation

A significant portion of the spectrogram data acquired by the station contains noise and trailing artifacts. The background subtraction algorithm effectively mitigates much of the environmental and instrumental noise by calculating and subtracting the median value for each frequency channel during the data-to-spectrogram conversion process. This serves as an effective form of data enhancement, improving the quality and reliability of the input data.

The dataset exhibits limited total volume, homogeneous background, and uneven category distribution. To address these challenges, this paper employs data augmentation techniques to enrich the training samples. Specifically, the techniques used include Random Scaling, Random Cropping, Photometric Distortion, and Random Splicing, as illustrated in Fig. 2.

Fig. 2
figure 2

Data augmentation of solar radio burst images

In Fig. 2, Random cropping involves randomly selecting a region from different locations within an image and using that region as a new training sample. The size of the cropped region can vary randomly within a predefined range. This method also probabilistically isolates noise and trailing artifacts at the image level, further enhancing data quality for model training. Random scaling increases data diversity by adjusting the image size. A scaling ratio is randomly chosen from a predefined range, and then the image width and height are adjusted accordingly. Photometric Distortion (PMD) techniques typically encompass the random perturbation of brightness, contrast, saturation, and hue, in addition to the application of random color space transformations. These techniques help generate more diverse training samples. In contrast, random patching involves stitching together four units to form a larger image, which is then used as a training sample.

3 Methods

3.1 The overall architecture of the network

In this paper, we enhance YOLOv8n and propose a network model better suited for real-time monitoring of solar radio bursts. The YOLOv8n network structure comprises four modules: Input, Backbone, Neck, and Head, integrating several state-of-the-art (SOTA) technologies to improve flexibility and accuracy over many previous versions.

Firstly, the Omni-Dimensional Dynamic Convolution (ODConv) algorithm is utilized to dynamically adjust the shape and size of the convolution kernel based on the input data features, enhancing the neural network’s feature extraction capability. Secondly, the MSFF-NeXt network is introduced to replace the neck part, which fuses multiscale features and enhances the detection accuracy of solar radio burst images. Finally, the overall classification and detection performance is further improved by refining the regression loss function. The structure of the proposed model is illustrated in Fig. 3, with detailed descriptions of the ODConv dynamic convolution, the MSFF-NeXt network, and the improved regression loss function provided in Sects. 3.2-3.4, respectively.

Fig. 3
figure 3

Overall network structure

3.2 Introducing the ODConv full-dimensional dynamic convolution algorithm

Conventional convolution involves a single static convolution kernel, predetermined by the experimenter, thus remaining independent of the input samples. Chen et al. introduced Dynamic Convolution in 2019, which integrates a dynamic attention mechanism with conventional convolutional operations to enhance performance in tasks such as image classification (Chen et al. 2020). Dynamic Convolution linearly weights multiple convolution kernels, with the weighted values being input-dependent, thus rendering Dynamic Convolution input-dependent. It can be described as follows:

$$ y = \left ( \alpha _{\omega 1}W_{1} + \cdots + \alpha _{\omega n}W_{n} \right ) * x $$
(1)

Here, \(W_{1}\) to \(W_{n}\) represent different convolution kernels, whereas \(\phi _{\omega i}\left ( x \right )\) utilizes a single distinct attention scalar \(\alpha _{\omega i}\), ranging from 0 to 1 with a scalar sum of 0. Consequently, its output filters possess uniform attention values for the input. Therefore, conventional dynamic convolution often overlooks the spatial dimensions of the convolution kernel \(W_{i}\), along with the input and output channel dimensions, limiting its exploration within the kernel space (Li et al. 2022b).

This paper introduces ODConv, a full-dimensional dynamic convolution that extends dynamic characteristics into multiple dimensions. ODConv simultaneously considers information in four dimensions: spatial kernel size, number of input channels, number of output channels, and number of convolution kernels. This approach enhances the model’s feature extraction capability, as described below:

$$ \begin{aligned} y &= \big( \alpha _{\omega 1} \odot \alpha _{f1} \odot \alpha _{c1} \odot \alpha _{s1}W_{1} + \cdots \\ &\quad + \alpha _{\omega n} \odot \alpha _{fn} \odot \alpha _{cn} \odot \alpha _{sn}W_{n} \big) * x \end{aligned} $$
(2)

In this notation, \(\alpha _{\omega 1}\) represents the attention scalar of the convolution kernel \(W_{i}\), while \(\alpha _{fi}\), \(\alpha _{ci}\), and \(\alpha _{si}\) represent three newly introduced attentions used to extract rich contextual information along the spatial dimension, input channel dimension, and output channel dimension, respectively. Figure 4 illustrates the specific structures of traditional dynamic convolution and ODConv full-dimensional dynamic convolution.

Fig. 4
figure 4

Full-dimensional dynamic convolution

3.3 MSFF-NeXt network

The Swin Transformer utilizes a multi-head attention mechanism to improve computational efficiency and enhance classification accuracy. Nevertheless, the ConvNeXt pure convolutional network, as outlined in the ConvNeXt literature, exceeds Swin Transformer in accuracy and exhibits faster inference speed. The ConvNeXt network examines the attributes of common backbone networks, integrates the Transformer’s structural setup and training methods, and progressively builds the ConvNeXt architecture. DWConv performs similarly to W-MSA (Chollet 2017), and the ConvNeXtBlock mimics the Swin Transformer Block by transitioning deep convolutions upward and integrating deep convolutions with a kernel size of 7*7 (Liu et al. 2022).

In this paper, we propose the MSFF-NeXt multi-scale feature fusion network utilizing the ConvNeXtBlock as the basic module. A three-stage pyramid structure is adopted, wherein the \(x_{2}\) and \(x_{3}\) feature in the YOLOv8 backbone network are fused at one stage. Subsequently, following traversal through a C2f module, a 1*1 convolution kernel facilitates cross-channel information fusion, coupled with dimensionality reduction applied to the input data. The processed data then proceeds through the ConvNeXt module, which adopts the same inverse bottleneck layer architecture as the Transformer, effectively mitigating information loss.

Given the diverse types and structural variations observed in solar radio bursts, this study focuses on capturing feature information across multiple scales with varying resolutions and receptive fields. In the initial stages of the network, characterized by higher resolutions, detailed information is readily captured. Conversely, in the third level of the MSFF-NeXt network, although the resolution diminishes, the expanding receptive field is conducive to detecting larger targets. Analysis of the size distribution of each target frame in the solar radio burst spectrum reveals that the majority of data samples correspond to large targets, except some Type III SRBs exhibiting characteristics of small targets. Notably, Type IV samples exhibit substantial continuity, prolonged duration, and broad-spectrum bursts on the spectrogram, occupying a larger spatial extent.

For scenarios where all targets to be detected are predominantly large, in this paper we connect the detection heads at the end of the second and third stages of the MSFF-NeXt network. During this process, the feature map \(x_{i}\) traverses at least two ConvNeXt modules, which possess a broad sensing field, thereby circumventing information loss. Furthermore, the MSFF-NeXt network pre-establishes connections among the feature maps in the backbone network at the second stage, where the feature maps retain higher-resolution information, thereby facilitating the subsequent detection of Type III small targets. In summary, the MSFF-NeXt multiscale feature fusion network, as depicted in Fig. 5, is designed in this paper. Drawing inspiration from the idea and training strategy of Swin Transformer, the MSFF-NeXt network proposes distinct stages of feature fusion mechanisms tailored to the characteristics of various solar radio burst types, thereby replacing the NECK part of YOLOv8n.

Fig. 5
figure 5

Multi-scale Feature Fusion based on ConvNeXt(MSFF-NeXt)

3.4 Regression loss function

In the realm of target detection, regression loss functions are employed to quantify the disparity between the model-predicted bounding box and the true bounding box. Various regression loss functions, including GIoULoss, DIoULoss, WIoULoss, CIoULoss, and smoothed L1 loss, have demonstrated advantages in different scenarios. GIoULoss offers enhanced accuracy in measuring the overlap between the predicted bounding box and the true bounding box. WIoULoss introduces weights during the computation of overlap between bounding boxes, rendering it more sensitive to small targets. In instances where the data distribution of the target detection task is straightforward and bounding box alterations are not excessively pronounced, the smoothed L1 loss ensures a stable training process. Conversely, two loss functions, DIoULoss and CIoULoss, integrate more geometric considerations and exhibit faster convergence, besides being more responsive to alterations in bounding box position and size (Zheng et al. 2021). However, while calculating the distance between bounding boxes, DIoULoss solely accounts for the distance between centroids, whereas CIoULoss incorporates not only centroid distance but also the bounding box aspect ratio.

Given the irregular variations in detection location and size in the solar radio burst detection task, alongside the presence of only a few small targets, this paper opts for CIoULoss as the regression loss function for the task, as it accounts for a broader range of geometric factors. The formulation of CIoULoss is as follows:

$$ \begin{aligned} & L_{CIoU} = 1 - IoU + \frac{\rho ^{2}\left ( b,b^{gt} \right )}{c^{2}} + \alpha v \\ & v = \frac{4}{\pi ^{2}}\left ( arctan\frac{w^{gt}}{h^{gt}} - arctan\frac{w}{h} \right )^{2}, \\ & \alpha = \frac{v}{\left ( 1 - IoU \right ) + v} \end{aligned} $$
(3)

In the context of object detection, \(\rho ^{2}\left ( b,b^{gt} \right )\) denotes the Euclidean distance between the center point of the predicted bounding box and the actual bounding box, while \(c\) represents the distance between the diagonals of the enclosed areas of the two rectangular boxes. Their respective physical interpretations are illustrated in Fig. 6, where the green box signifies the actual box, the yellow box represents the predicted box, and the gray box delineates the minimal enclosing region. CIoULoss supplements DIoULoss with factor \(\alpha v\) to incorporate the aspect ratio in bounding box computations. \(\alpha \) symbolizes the weight coefficient, derived from \(v\) and IoU, used to gauge the resemblance of aspect ratios between the predicted and actual bounding boxes.

Fig. 6
figure 6

Physical significance of CIoU

4 Experimental results and analysis

4.1 Experimental environment

The hardware and software configurations, along with their corresponding versions, employed in the experiments conducted in this paper are detailed in Table 2. The initial learning rate is set to 0.01, utilizing the SGD optimizer, with an input image size of 640x640 pixels. Warm_epochs is defined as 3, and the total number of training epochs is configured to 500. Resume functionality is enabled, and training is halted if the model evaluation metrics fail to exhibit significant improvement within a specified number of rounds. Additionally, Mosaic data augmentation is disabled during the final 10 epochs of training.

Table 2 Experimental environment

4.2 Evaluation indicators

In this paper, we utilize mean Average Precision(mAP) and the frames per second (fps) as the evaluation criteria for this target detection model. mAP@0.5 represents the average accuracy computed using an Intersection over Union (IoU) threshold of 0.5. The IoU threshold determines the extent of overlap between the predicted bounding box and the ground truth bounding box, and detection is deemed correct when the IoU of the two bounding boxes is greater than or equal to 0.5. mAP@0.5 accounts for the fact that detections are considered accurate only if the IoU exceeds or equals 0.5. On the other hand, mAP@0.5:0.95 denotes the average accuracy calculated as the IoU threshold varies from 0.5 to 0.95. The frames per second denotes the number of samples or data instances processed by the model per second during the inference phase, typically used to assess the real-time speed and efficiency of the model.

4.3 Experiments to determine ODConv parameters

To determine the parameters of ODConv, this paper conducts a comparative experiment with ODConv based on YOLOv8, and the results are presented in Table 3. The experiments reveal that applying ODConv to fewer layers of the backbone network results in faster operation. Particularly, the most significant improvement is observed when ODConv is applied to the second layer position of the model, where the decrease in fps is merely 8.2 compared to the original model. This finding underscores the importance of balancing operational efficiency and accuracy. Moreover, the experiments demonstrate that the introduction of ODConv enhances the overall algorithmic accuracy with minimal impact on speed.

Table 3 Comparison of Different amounts of ODconv under YOLOv8

4.4 Attention analysis of feature extraction networks

The experiments employ GradCAM to analyze the attention distribution within two model feature extraction networks. This technique visualizes the model’s attention to different regions of the image by generating a heat map, thereby uncovering the crucial feature regions guiding the model’s predictions. The intensity of the heat map reflects the sensitivity of each region, with higher temperatures indicating greater sensitivity and lower temperatures indicating less sensitivity. As depicted in Fig. 7, notable drift is observed in Type II SRBs, with the model in this paper better aligning with the inherent nature of the burst images. For Type IV large targets, our model exhibits a broader area of attention. Furthermore, the model shows increased attention to the tail elongation feature of Type V SRBs and provides attention to Type III small bursts where the original model does not. In summary, the feature extraction network proposed in this paper directs more attention to the frequency and time domain drift characteristics of solar radio bursts.

Fig. 7
figure 7

Comparison of GradCAM heat maps of solar radio burst spectrum detection images

4.5 Ablation experiment

To investigate the detection performance of the proposed model in this paper and verify the effectiveness of each improvement method, five groups of ablation experiments were conducted based on YOLOv8n. All experiments utilized identical parameter settings, and the results are presented in Table 4. In the table, “ODConv” refers to Omni-Dimensional Dynamic Convolution, “MSFF-NeXt” represents Multi-scale Feature Fusion based on ConvNeXt, “CIoU” denotes CIoULoss, and “✓” indicates that the module was enabled in the changed set of modules during the experiment. In group 1, no improvement scheme was added to YOLOv8n, and the loss function used was the original IoULoss.

Table 4 Results of ablation experiments

In group 2, the second layer convolution of YOLOv8n is substituted with ODConv, and CIoULoss is activated. This modification leads to a 2.1% improvement in mAP@0.5 and a 1.6% improvement in mAP@0.5:0.95, albeit with a decrease in speed by 8.2 fps. These results demonstrate that integrating ODConv into the backbone network effectively extracts crucial feature information and enhances detection accuracy while maintaining high speed. In group 3, the head part of YOLOv8n is replaced with MSFF-NeXt, and CIoULoss is activated. Compared to the original model, mAP@0.5 improves by 2%, mAP@0.5:0.95 by 0.6%, and speed decreases by 10.8 fps. This finding indicates that MSFF-NeXt effectively prevents feature information loss, facilitates feature fusion, and enhances model accuracy. Group 4 enables both ODConv and MSFF-NeXt, resulting in a 2.7% improvement in mAP@0.5 and a speed decrease of approximately 55 fps. This confirms that the combination of ODConv and MSFF-NeXt significantly enhances model performance. However, the results of group 5 reveal that model complexity continues to rise with additional modules, leading to a decrease in inference speed to 140.9 fps.

4.6 Quantitative comparison experiments with other models

To validate the state-of-the-art capabilities of the model proposed in this paper for real-time detection of solar radio bursts, it has been trained on a solar radio spectrum dataset created using the experimental setup and model architecture described above. Figure 8 illustrates the variation of the detection metrics mAP@0.5 and mAP@0.5:0.95 for the validation set. Both curves depict a gradual increase followed by stabilization of the model’s detection accuracy.

Fig. 8
figure 8

AP Change Curve

To assess the detection accuracy of the model, this paper utilizes mAP@0.5 as a metric and compares it with several classical models, including Faster R-CNN, YOLOv3, YOLOv5, YOLOv6, MobileViT-SSDlite, and YOLOv8n. The experiments demonstrate that the enhanced YOLOv8 network model is better suited for solar radio burst detection. Compared with the YOLOv8n benchmark model, the mean value of mAP@0.5 is improved by 3.5%, and the mAP@0.5 of Type II SRBs, Type IIIs SRBs, and Type V SRBs are improved by 3.8%, 0.2%, and 1.9%, respectively. The most substantial improvement is observed for burst Type IV SRBs, with a remarkable increase of up to 12.3%, attributed to the improved model’s proficiency in detecting large burst images. However, the mAP@0.5 for Type III small targets experiences a slight decrease of 0.8%, likely due to modifications in the model that reduce attention to small targets compared to the baseline model. For other mainstream models, the enhanced version presented in this paper surpasses them in terms of mAP@0.5. This provides evidence that the experimental method outlined in this paper substantially enhances the detection performance of the algorithm, making it more suitable for the task of solar radio burst detection.

Table 5 presents a comparison of the detection speeds among the different models. The experiment employs the fps metric to measure the detection speed of the models, assessing their suitability for real-time detection of solar radio bursts. While the fps in this paper may not be the highest, the processing speed is comparable to that of the original YOLOv8n model, capable of detecting approximately 140.9 images per second, showcasing impressive inference speed.

Table 5 Comparison with other models

4.7 Qualitative evaluation

This paper utilizes five groups of solar radio burst images to qualitatively evaluate the detection effect of both YOLOv8n and the model proposed herein. The five groups of experimental data encompass five types of solar radio bursts, and the experimental results are depicted in Fig. 9. In Type II, Type III, and Type V experimental groups, both YOLOv8 and the proposed model successfully localize the solar radio burst location and accurately identify the burst type. However, the model proposed in this paper exhibits a higher confidence level, indicating greater confidence and reliability in the detection results. In the Type IIIs experimental group, YOLOv8n provides two predictions of Type III and IIIs for the same burst, whereas the model in this paper more confidently predicts the burst as Type IIIs, which theoretically belongs to the continuum embodiment of Type III SRBs. However, it is preferable for the model to make a single accurate judgment of bursts. In the Type IV experimental group, images with multiple bursts are selected to test the model’s performance. YOLOv8n produces redundant frames and incorrectly predicts both Type IV SRBs and Type II SRBs as Type II SRBs, whereas the model in this paper accurately predicts both types of bursts. Overall, the proposed model demonstrates better target detection compared to YOLOv8n, indicating that the improvements in this paper are more beneficial for extracting semantic information from solar radio burst images and enhancing network performance.

Fig. 9
figure 9

Solar radio burst detection results of Type II, III, IIIs, IV, and V

Moreover, this paper selects images with mislabeling to evaluate the model’s ability to learn the features of solar radio burst images and examines whether the model can label the unlabeled bursts and correct the mislabeled bursts. The experimental results are shown in Fig. 10. In the first set of Ground Truth images, due to the inability of manual labeling to achieve 100% accuracy, this paper incorrectly labels class IIIs as class III and omits labeling class IIIs near the left boundary. In the second set of Ground Truth images, a class III outbreak was missed to be labeled. From the experimental results, it appears that the model can effectively correct the omission and mislabeling of the markers, indicating its strong performance in solar radio burst detection. Additionally, it can provide valuable reference annotation information to further expand the solar radio burst spectrogram dataset.

Fig. 10
figure 10

Correct prediction of the model for incorrect labeling

5 Conclusion

The field of astronomical observation has entered the era of big data, necessitating high-precision and high-rate detection methods. This paper addresses this need by constructing the e-CALLISO solar radio burst spectrogram dataset, annotated with five types of bursts: Type II, Type III, Type IIIs, Type IV, and Type V. Subsequently, a real-time solar radio burst detection model based on YOLOv8n is proposed, capable of automatically detecting five types of solar radio bursts across numerous spectrograms and precisely locating their positions. In experimental evaluations, the model achieves an accuracy of 82.4%, demonstrating its ability to fully extract detailed features of the bursts and make accurate predictions. Furthermore, the model employs a fully convolutional architecture with an impressive fps of 140.9, providing a speed advantage over the current popular DETR architecture and meeting the real-time detection requirements of solar radio bursts, thus significantly aiding solar physics and space weather research.

However, there are limitations in this study. Due to the rarity of class IV bursts, the amount of real data available is limited. Although this paper introduces some improvements to the model structure aimed at enhancing the detection of Type IV SRBs, the achieved detection accuracy remains relatively low at 73.2%, which is 10% lower than the average accuracy. Enriching the training samples using generative models in the domain presents a valuable avenue for future research. Moreover, the continuous generation of new data by e-CALLISO global base stations underscores the importance of automatically capturing valuable information, representing a focal point of ongoing research. Addressing these limitations can further enhance the performance and applicability of the proposed model in solar radio burst detection.