Objective

Indonesia lies at the intersection of the Ring of Fire and the Alpide belt, which fuels volcanic activity and geothermal heat, resulting in its abundance of hot springs [1]. Bali- an island in Indonesia, is well-known for its unique and diversified flora-fauna and hot springs. Hot springs in Indonesia are rich reservoirs of microbial life. Various researchers have investigated the hot springs for the discovery of novel thermophilic bacteria. However, the data about the exploration of thermophilic bacteria utilizing culturing and molecular techniques from the Indonesian hot springs is very low [2, 3]. Also, the metagenomic data aiming at microbial diversity identification is unavailable. Microbial profiling by the 16 S rRNA amplicon sequencing and shotgun metagenomic sequencing provides a comprehensive picture of the hot spring microbial community [4, 5] and leads to discovering the many novel and rare species, their metabolites, and biocatalysts [6].

Due to less cultiviability of the thermophiles, 16 S rRNA amplicon-based metagenomic analysis is the best way to determine the diversity of thermophilic bacteria living in hot springs [7]. It is suitable for the isolation of microbes having potential for the production of novel metabolites. Long-read 16 S rRNA gene amplicon sequencing using Oxford Nanopore Technologies (ONT) is better than other NGS platforms [8]. The long-reads sequencing using NGS has transformed microbiome taxonomic classification and profiling to understand microbial life and its potential for groundbreaking discoveries [9]. So, the 16 S rRNA amplicon sequencing data provides a window into the unseen world of microbes and offers invaluable insights into the composition, distribution, and potential of microbial communities. The present study explored the microbial diversity associated with three hot springs located in Bali, Indonesia. Data obtained from the present study may act as the benchmark for researchers aiming at mapping of microbial diversity associated with hot springs. It will also provide comprehensive view of the microbial community. The data is crucial for hot springs’ health.

Data description

Sample collection

The sterile thermal bottles were used for the collection of water samples from three hot springs. Metadata like temperatures, pH, color and turbidity of water were recorded during sampling. Water samples were collected multiple times during the day in July and September 2023. Samples were brought to the laboratory on the same day. The samples were then pooled, and100 mL of each sample were filtered using a membrane filter, and retarded biomass on the filters was subjected to DNA isolation.

Metagenomic DNA extraction

The BioLit Genomic DNA Extraction Mini Kit (SRL, Mumbai, India) was used to isolate DNA from water samples from three hot springs. The quantity and quality of the DNA were determined using 0.8% agarose gel followed by a NanoDrop spectrophotometer and a Qubit fluorometer. After QC of isolated DNA, 50 µl of each DNA sample was used for the sequencing.

16 S amplicon sequencing

The 16 S rRNA gene sequence libraries were created using the 16 S Rapid Amplicon Barcoding Kit (ONT, Oxford, UK) by following the manufacturer’s instructions. LongAmp® Taq 2X master mix (New England Biolabs, Ipswich, USA) and the barcoded nanopore sequence primers 27 F 5′-AGA GTT TGA TCM TGG CTC AG-3′ and 1492R: 5′-GGT TAC CTTGTT ACG ACT T-3′ were used to amplify the full-length (1600 bp) 16 S rRNA gene. Following the quantification of 16 S rRNA gene amplicons, equal amounts of amplicons per sample were pooled, and the library was processed according to the manufacturer’s instructions. After incubating the library with Library Loading Beads (ONT, Oxford, UK), the mixture was loaded into the GridION flow cell (version R.9.4, ONT, Oxford, UK). The GridION nanopore sequencer was used for 14 h of sequencing at PT. Genetika Science Indonesia (https://ptgenetika.com). Nanopore sequencing was operated by MinKNOW software version 23.04.5. Basecalling was performed using Guppy version 6.5.7 with a high-accuracy model [10].

Data processing

The output data (FASTQ files) generated more than 93,000 amplified sequences in each sample, subjected to QC using NanoPlot 1.40.0. Quality filtering was done using NanoFit 2.8.0. to obtain 0.34GB data in each sample with 1600 bp average sequence length in all three samples. The average sequence quality was 30 (Phred Score). Filtered reads were classified using the Centrifuge classifier [11]. The Bacteria and Archaea index was built using the NCBI 16 S RefSeq database [12]. Data is publicly available at EMBL-EBI ENA under the study ID PRJEB70710 (Table 1) [13]. The project is ongoing and no other data and analysis were published earlier.

Table 1 Overview of the 16s rRNA sequencing data files/data sets

Limitations

While 16 S rRNA sequencing successfully assigned taxonomy to the hot spring microbiome, it could not provide functional analysis of the microbes. 16 S rRNA sequencing doesn’t detect fungi, viruses, or other non-bacterial/archaeal organisms in the sample.

Secondly, our sampling strategy involved collecting water samples multiple times throughout a single day in July and September 2023. The DNA was then isolated from pooled samples to capture a broader range of species. However, this approach only provides a snapshot and may not represent the seasonal variations within the hot spring’s microbial community. To achieve a more comprehensive understanding of microbial dynamics, it would be ideal to collect water samples from all hot springs throughout the year.