Database plays an important role in enterprises, and Huawei GaussDB database is one of the “main forces” in Kunpeng ecology.

Databases can be divided into relational databases and non-relational databases. Relational databases include OLTP databases for enterprise production and transactions and OLAP databases for enterprise analysis. For OLTP application scenarios Huawei launched cloud database GaussDB (for MySQL) and GaussDB (openGauss); for OLAP scenarios, it launched data warehouse service GaussDB (DWS ). As to non-relational databases (NoSQL), Huawei currently has GaussDB (for Mongo) and GaussDB (for Cassandra).

Database technology innovation is breaking the existing order, and cloud-based, distributed, and multi-mode processing are the main trends in the future. This chapter focuses on the features and application scenarios of Huawei GaussDB (for MySQL) cloud database, and introduces some application cases.

After learning this chapter, readers will master the following contents.

  1. (1)

    The features of GaussDB database.

  2. (2)

    Knowledge of Huawei relational database.

  3. (3)

    Knowledge of Huawei NoSQL.

8.1 GaussDB Database Overview

8.1.1 GaussDB Database Family

Everything from scratch, from weak to strong, means the accumulation of time and the precipitation of experience. The decade whets one sword Huawei officially released GaussDB database series products on May 15, 2019.

In order to pay tribute to German mathematician Gauss, Huawei named its self-developed databases GaussDB . The Kunpeng ecology develops in three technology directions: chip/media, operating system, and database. Among these, Huawei GaussDB database is one of the “main forces” in Kunpeng ecology.

Databases are generally divided into relational databases and non-relational databases. Non-relational databases include professional document databases, graph databases, etc., which are oriented to refined scenarios and more targeted, but their application areas are narrower, with less market share (< 20%). In the next 5 years, the main market of database is still focused on relational database, whose market share is more than 80%. The current mainstream databases can be mainly divided into two categories of OLTP and OLAP in terms of services orientation. Huawei also targets these two types of services and has launched the transactional database GaussDB (for MySQL) for OLTP scenarios and the analytical database GaussDB (DWS ) for OLAP scenarios, respectively. What's more, Huawei GaussDB database holds two important innovations.

  1. (1)

    It is the industry's first AI-Native distributed database , which integrates AI into the database kernel, making the database more intelligent to achieve self-O&M, self-management, self-tuning, fault self-diagnosis and self-healing. It is the first self-tuning algorithm rooted in deep reinforcement learning under transaction, analysis and mixed load scenarios based on optimization theory.

  2. (2)

    It supports heterogeneous computing architecture and can take full advantage of multiple algorithms such as X86GPU and NPU to make the database more efficient by releasing diverse computing power. It is also the industry's first ARM-enabled enterprise-class database.

Figure 8.1 shows the GaussDB database upgraded to a full-scene service, relying on Huawei Cloud and Huawei CloudStack. Huawei has seven research institutes around the world engaged in database basic research, with more than 10 years of technical accumulation in the database field, more than 1000 database-specialized talents, and more than 30,000 global database applications. After upgrading to Huawei's self-developed database brand, the business covers both relational and non-relational database services. The business upgrade relies on Huawei Cloud and Huawei Cloud Stack to continuously serve users with cloud services, aiming to improve delivery and O&M efficiency, help users focus on core business innovation, and introduce innovative technologies and new services faster. Rich ecological options, in addition to the commitment to build Huawei ecology, are also compatible with widely used open ecology, such as MySQL, etc., to facilitate users' application migration and development, ensuring continuity of user investment and business.

Fig. 8.1
figure 1

GaussDB full-scene services

8.1.2 Typical OLTP and OLAP Databases

OLTP refers to online transaction processing . OLTP , as the main application of traditional relational database, mainly supports the basic daily transaction processing and business activities of enterprises by storing the activity data in query service applications. Typical OLTP systems involve in e-commerce, banking and securities trading systems, etc. The business database of eBay in the US is a very typical OLTP database. OLAP refers to online analytical processing , also known as DSS decision support system, is often referred to as the data warehouse. OLAP , as the main application of the data warehouse system, supports complex analytical operations by storing historical data, focusing on decision support, and provides intuitive and easy-to-understand query results.

GaussDB (for MySQL) database is recommended for systems with high transactional requirements such as business systems, financial systems, sales systems and customer service systems; if a large amount of data generated based on business needs to be stored using data warehouses for subsequent data analysis, data mining, and supporting business decisions, GaussDB (DWS ) database is recommended, as shown in Fig. 8.2.

Fig. 8.2
figure 2

Typical OLTP and OLAP databases

8.2 Relational Database Products and Related Tools

8.2.1 GaussDB (for MySQL)

The cloud database GaussDB (for MySQL) is Huawei's next-generation self-developed distributed database , which is highly scalable, supports massive storage, and is fully compatible with MySQL. Based on Huawei's next-generation DFV storage, it adopts a computing/storage separation architecture and has a massive storage space of 128TB, with no need to separate libraries and tables, and can achieve zero data loss. It combines the high availability of commercial databases with the low cost of open source databases.

GaussDB (for MySQL) employs a multi-node cluster architecture with a write node (master node) and multiple read nodes (read-only nodes) in the cluster, and each node shares the underlying DFV . In general, GaussDB (for MySQL) cluster should be located at the same location as the elastic cloud server instances to achieve the highest access performance.

  • Has high user value.

  • With 128TB storage space and no sub-library and sub-table, solves the problem of huge amount of data.

  • Easy to use, fully compatible with MySQL, with no application modification required.

  • With 15 read-only nodes and read/write separation, solves the performance scaling problem.

  • With cross-AZ deployment, off-site disaster recovery, realizes high reliability.

An AZ (availability zone) is a collection of one or more physical data centers, and the resources such as computes, networks, and storage are logically subdivided into multiple clusters within the AZ. An AZ is a physical area with independent power and network within a geographic region. AZs communicate with each other within the intranet, but are physically isolated. Each AZ is unaffected by the failure of other AZs and provides low-cost, low-latency network connectivity to other AZs in the same region. The use of GaussDB (for MySQL) within a separate AZ protects users' applications from single-location failures. There is no substantial difference between different AZs in the same region.

There are many needs and pain points in the database market today, as shown in Table 8.1.

Table 8.1 Needs and pain points in the database market

The core benefits of GaussDB (for MySQL) are shown in Table 8.2.

Table 8.2 Core benefits

In the context of the cloud era, enterprise IT business is deployed across regions and globally, and IT application software is gradually cloud-based and distributed, so the database is also required to be designed based on cloud scenario architecture and have the ability to be deployed across regions in a distributed manner. Huawei Cloud Native distributed database is precisely such a new type of database, following the five design principles below.

  1. (1)

    Decoupling: separation of computation and storage; master-slave decoupling.

  2. (2)

    Push-down of near data calculation: push-down from I/O intensive load to storage node completion, such as redo processing and page reconstruction.

  3. (3)

    Full use of cloud storage: independent fault tolerance and self-healing service on the storage layer; shared access (write once and read many).

  4. (4)

    Full play to the advantages of solid state disk (SSD): avoiding write amplification caused by random writing; less wear and shorter time delay; full use of the random read performance of SSD.

  5. (5)

    Transfer of performance bottleneck from computing and storage to network: less network traffic; new network technologies and hardware, such as remote direct memory access (RDMA).

When designing the Cloud Native database, Huawei took into account the need for flexibility, including the switch between the host and the standby and the increase of nodes, so as to sink more operations. Huawei Cloud Native benefited from a strong team in hardware and deep cooperation with Huawei's storage department who provided a special platform to sink the operations of the database itself to the storage node. Huawei Cloud Native maximizes the properties of SSDs to improve the performance of the database, in addition to the considerations based on multi-tenancy. It uses new network technologies including AI technology to help users improve the throughput of data centers, improve the scalability of network applications, and implement auto-tuning.

In fact, Huawei divides the database into three parts: SQL layer, abstraction layer and storage layer. From the physical level, it can be divided into two layers: one is the SQL layer, which adopts a one-master-multi-standby model; the other is the storage abstraction layer, which maintains database services for different tenants, including building pages, log processing, and other related functions, as shown in Fig. 8.3.

Fig. 8.3
figure 3

Three parts of database

For the SQL layer, the plan, query and management transactions can be isolated by managing client connections and parsing SQL requests in the form of one read-write and multiple read-only copies. Meanwhile, Huawei also launched HWSQL and has made many performance improvements based on HWSQL, including query result cache, query plan cache and online DD.

The whole design uniquely features the reduction of frequent page reading operations from memory by SQL replication of multiple nodes. When an update occurs on the master server, the Replicas SQL database also receives the transaction and commits the update list.

There is also a storage abstract layer (SAL). SAL is a logical layer that isolates SQL front ends, transactions, and queries within a storage unit. When manipulating database pages, SAL support accessing multiple versions of the same page. Based on spaceID and pageID, SAL can shard all data, with its storage and memory resources growing proportionally.

In terms of performance, GaussDB (for MySQL) takes full advantage of some features of Huawei. The system container uses Huawei's Hi1882 high-performance chip, so it is better than the general container in terms of performance; the RDMA application greatly reduces computational costs; the Co-Processor achieves data processing with as few resources as possible, reducing the workload of the SQL nodes, as shown in Fig. 8.4.

Fig. 8.4
figure 4

Features of GaussDB (for MySQL)

The architecture of GaussDB (for MySQL) is shown in Fig 8.5.

Fig. 8.5
figure 5

Architecture of GaussDB (for MySQL)

  1. (1)

    Ultimate reliability: zero data loss, flash recovery from failure, and support for cross-AZ high availability.

  2. (2)

    Multi-dimensional expansion: compute nodes expansion in both directions. Horizontal expansion: support for horizontal expansion in 1-write &15-read mode. Vertical expansion: online elastic expansion, and on-demand billing.

  3. (3)

    Massive storage: single-instance scalable data up to 128TB, no need to split libraries and tables, and fast service go-live on the cloud.

  4. (4)

    Innovative self-research: Cloud Native distributed database architecture, based on Huawei's new generation of DFV to achieve the separation of computing and storage, to ensure cost effectiveness in scalability; storage on pushed-down database logic, to achieve minimum network load and ultimate performance.

  5. (5)

    Excellent performance: performance improved up to 7 times of native MySQL, 100% compatibility with MySQL, and industry leading.

  6. (6)

    Cutting-edge hardware: industry-leading hardware combination based on V5 CPU + Optane DC SSD + RDMA network, and stable and fast data processing.

Kernel optimization of GaussDB (for MySQL) is mainly reflected in the following aspects.

  1. (1)

    Removal of secondary writes.

  2. (2)

    Query Cache/Plan Cache optimization.

  3. (3)

    Innodb Lock Management optimization.

  4. (4)

    Audit Plugin efficiency optimization.

  5. (5)

    Community bug fixes.

Hardware enhancements are mainly reflected in the following areas.

  1. (1)

    Containerization.

  2. (2)

    Hi1822 offload.

  3. (3)

    Use of NVMe SSD.

  4. (4)

    RDMA.

Through the elastic cloud server or devices that can access GaussDB (for MySQL) database, connect to GaussDB (for MySQL) database instance with the corresponding client and import the exported SQL files into GaussDB (for MySQL) database.

The CPU and memory specifications of the cluster can be changed according to the service needs, and if the status of the cluster changes from “changing specifications” to “normal”, the change is successful. After GaussDB (for MySQL) 8.0 cluster specifications are changed successfully, the system will adjust the values of the following parameters according to the new memory size: “innodb_buffer_pool_size” “innodb_log_buffer _size” “max_ connections” “innodb_buffer_pool_instances” “ query_cache_size”.

Users can retrieve the monitoring metrics and alert information generated by the cloud database GaussDB (for MySQL) through the API provided by the cloud monitor.

gaussdb _mysql010_innodb_buf_usage: buffer pool utilization ratio, used to count the ratio of dirty data to data in InnoDB cache, with value ranging from 0 to 1.

gaussdb _mysql011_innodb_buf_hit: buffer pool hit rate, used to count the ratio of read hits to read requests, with value ranging from 0% to 100%.

gaussdb _mysql012_innodb_buf_dirty: dirty block rate of buffer pool, used to count the ratio of used pages to the total data in InnoDB cache, with value ranging from 0 to 1.

gaussdb _mysql013_innodb_reads: InnoDB read throughput, used to count the average number of bytes per second read by InnoDB, with value ≥ 0 bytes/s.

gaussdb _mysql014_innodb_writes: InnoDB write throughput, used to count the average number of bytes per second written by InnoDB, with value ≥ 0 counts/s.

gaussdb _mysql017_innodb_log_write_req_count: InnoDB log write request frequency, used to count the average number of log write requests per second, with value ≥ 0 counts/s.

If the backup policy of the instance is enabled, a full automatic backup will be triggered immediately. The binlog backup does not need to be set by the user; instead, GaussDB (for MySQL) system will automatically do it every 5 min, either full backup or binlog backup is stored on the object storage service.

GaussDB (for MySQL) expands horizontally fast and requires different data to be synchronized compared to traditional addition of read-only copies. GaussDB (for MySQL) only takes about 5 min to add compute nodes due to shared storage, no matter how much data there is.

GaussDB (for MySQL) adopts distributed storage, with storage capacity up to 128TB. The storage is paid on demand, with no need to plan storage capacity in advance, reducing user costs.

GaussDB (for MySQL) delivers faster master-standby reversal, eliminating binlog replication latency, and ensuring guaranteed RTO.

GaussDB (for MySQL) database is fast in crash recovery. and the storage layer is constantly advancing the logs in an asynchronous and distributed manner.

The fast backup recovery and the distributed storage system customized for GaussDB (for MySQL) engine greatly improves the data backup and recovery performance. It also provides powerful data snapshot processing capability through AppendOnly vs. WriteInPlace, storing natural data at multiple time points and multiple copies, and supporting second-level snapshot generation and massive snapshot. Fast rollback at any point in time, based on the multi-point characteristics of the underlying storage system, without incremental log playback, can directly realize rollback by point in time. Parallel high-speed backup and recovery, as well as backup and recovery logic sinking to each storage node, enable local access data to directly interact with the third-party storage system, realizing high concurrency and high performance. Through asynchronous data replication plus on-demand real-time data loading mechanism, the fast instance recovery function enables GaussDB (for MySQL) instance to be fully functional within a few minutes.

GaussDB (for MySQL) is more cost effective with shared DFV storage and only one copy of storage compared to traditional RDS for MySQL. When adding a read-only node, you only need to add one compute node, with no need to purchase additional storage. The more read-only nodes there are, the more storage costs are saved. Compared with the traditional RDS for MySQL, the Active-Active architecture no longer has a backup library, with all read-only in active state, and bear the read traffic, which makes the resource utilization rate higher And compared with the traditional RDS for MySQL, the log-as-data architecture no longer needs to refresh pages, and all update operations only record logs, removing secondary writes, thus reducing the consumption of precious network bandwidth.

The instance specifications for GaussDB (for MySQL) are shown in Table 8.3.

Table 8.3 Instance specifications for GaussDB (for MySQL)

The financial industry is currently asset-light, and rapid expansion is the driver for its use of cloud databases. However, the whole industry is experiencing the pain point of unpredictable user traffic and generated data, and the user experience is affected at the peak of business, and even the service must be stopped for expansion.

GaussDB (for MySQL) compute nodes support bi-directional expansion, based on cloud virtualization, where the specification can be changed on a single node, which supports 1 write and 15 read nodes, with an expansion ratio of 0.9. It also supports storage pooling, with a maximum of 128TB storage space. The expansion of compute nodes will not bring about an increase in storage costs.

In the enterprise-level market where SaaS applications enter, the business pain points of large Internet companies and traditional large enterprises are huge business, high throughput, and unsolved open source database problems, so it is necessary to adopt complicated solutions such as sub-database and sub-table. Enterprise users generally prefer to commercial databases (eg. SQL Server and Oracle), which cost highly in license.

GaussDB (for MySQL) adopts storage pooling, uses MySQL native optimization, and also has advantages in hardware, such as RDMA, V5CPU, and Optance, and in terms of architecture, database logic is pushed down to release arithmetic power and reduce network overhead.

8.2.2 GaussDB (openGauss)

GaussDB (openGauss) is Huawei's next-generation enterprise-class distributed database , fully self-developed in combination with its own technology accumulation, supporting both centralized and distributed deployment forms; on the basis of supporting traditional business, it provides unlimited possibilities for enterprises to face the challenges of the 5G era.

GaussDB (openGauss) database has advantages as follows.

  1. (1)

    High performance: high throughput and strong consistency transaction capability. Supports Kunpeng two-way server, bearing 32 nodes with 12 million tpmC to achieve distributed strong consistency.

  2. (2)

    High availability: active-active and two-site and three-center deployment. High availability within the cluster, no data loss, supporting second-level business interruption; co-location cross-AZ disaster recovery, no data loss, supporting minute-level recovery; support for two-site and three-center deployment.

  3. (3)

    High scalability: horizontal expansion of capacity and performance on demand. 256-node scalability, and excellent linearity ratio; online capacity expansion.

  4. (4)

    Easy management: easy migration, easy monitoring, and easy O&M. Compatibility with SQL2003 standard syntax + enterprise expansion package; support for data replication, monitoring O&M, and tool development.

The full open source kernel of openGauss centralized version is the result of Huawei's ten-year effort in database field, which has gone through the process from internal self-use incubation stage to joint-creating productization stage, and then to the open source stage of openGauss centralized version. The development process and role of openGauss are shown in Table 8.4.

Table 8.4 Development process and role of openGauss

As an open source relational database management system , openGauss deeply integrates Huawei's years of experience in the database field. Huawei hopes to attract more contributors in virtue of the charm of open source and jointly builds an enterprise-class open source database community that integrates diverse technical architectures. OpenGauss kernel has experienced long-term evolution and is now giving back to the community. GaussDB database services in Huawei and public cloud are precisely developed based on openGauss, so the kernel will continue to evolve for a long time.

The openGauss kernel is derived from PostgreSQL and focuses on continuously enhancing competitiveness in the direction of architecture, transaction, storage engine, optimizer, etc. It is deeply optimized on ARM architecture chips and compatible with X86 architecture to achieve the following technical features.

  1. (1)

    Concurrency control technology based on multi-core architecture, NUMA-Aware storage engine, and SQL-Bypass intelligent routing execution technology, releasing multi-core expansion capability of the processor and achieving the performance of 1.5 million tpmC in two-way Kunpeng 128-core scenario.

  2. (2)

    Support for fast fault reversal with RTO <10s, and full link data protection, to meet security and reliability requirements.

  3. (3)

    Simplification of O&M through intelligent parameter tuning, slow SQL diagnosis, multi-dimensional performance self-monitoring, online SQL time prediction, etc.

openGauss adopts Mulan PSL v2, which allows all community participants to freely modify, use and reference the codes. The openGauss community has also set up a technical committee to welcome all developers to contribute codes and documents.

Huawei always upholds the overall development strategy of “open hardware, open source software, and enabling partners”, and supports partners to build their own brand of commercial databases based on openGauss, so as to support partners to enhance their commercial competitiveness continuously, as shown in Fig. 8.6.

Fig. 8.6
figure 6

openGauss

openGauss provides the following support for partners:

  1. (1)

    Training: Builds training certification system, carries out kernel technology salon, and sets up user groups;

  2. (2)

    Support: Delivers community support teams;

  3. (3)

    Developer ecology: Builds a developer ecology jointly; promotes university course development and book publication.

GaussDB database helps Huawei user cloud achieve intelligent business operations. In terms of business requirements and challenges, Huawei user cloud's big data platform centrally stores and manages business-side data with a hybrid architecture of Hadoop + MPP databases. The challenges it faces are as follows:

  1. (1)

    Rapid business development, with annual data growth of more than 30%;

  2. (2)

    Real-time analysis capability required for the data analysis platform to achieve intelligent user experience;

  3. (3)

    Support for independent report development and visual analysis.

To this end, GaussDB database gives the following solutions:

  1. (1)

    On-demand elastic expansion to support rapid business development;

  2. (2)

    SQL on HDFS support for real-time analysis of instant exploration scenarios, Kafka stream data entry at high speed, and real-time report generation;

  3. (3)

    Key technologies such as multi-tenant load management and approximate calculation enabling efficient report development and visual analysis.

These solutions generate the following user benefits:

  1. (1)

    On-demand capacity expansion without business interruption;

  2. (2)

    Real-time analysis results thanks to the new data analysis model, with marketing accuracy rate increased by more than 50%;

  3. (3)

    Response time of typical visual report query and analysis reduced from the past minute level to within 5 s, and report development cycle reduced from the past 2 weeks to 0.5 h.

GaussDB database is suitable for small and medium-sized banks' Internet-based transaction systems, such as mobile apps, websites, etc. It is compatible with the industry's mainstream commercial database ecology, with high performance, security and reliability, etc.

The advantages of GaussDB database are as follows.

  1. (1)

    Security and reliability. It supports SSL encrypted connection and KMS data encryption to ensure data security; supports database master-standby architecture, and when the host machine fails, where when the master machine fails, the standby machine is automatically upgraded to the master to ensure business continuity.

  2. (2)

    Ultra-high performance. With high performance and low latency transaction processing capability, the performance of Sysbench data under typical configuration is 30% to 50% higher than that of open source database.

8.2.3 GaussDB (DWS )

Data warehouse service (DWS) is an online data processing database based on public cloud infrastructure and platform, providing out-of-the-box, scalable and fully managed analytical database services. GaussDB (DWS ) is a service based on Huawei Cloud's native converged data warehouse, GaussDB database, which is compliant with the ANSI SQL 99 and SQL 2003 standards, providing competitive solutions for petabyte-scale massive big data analysis in various industries.

GaussDB (DWS ) can be widely used in finance, automotive networking, government and enterprises, e-commerce, energy, telecommunications and other fields, and has been selected in the “Magic Quadrant” data management solution list released by Gartner for three consecutive years from 2017 to 2019. It is several times more cost effective than traditional data warehouses, with massive scalability and enterprise-class reliability.

GaussDB (DWS ) is distributed and on-demand, with the advantages of distributed architecture, high reliability of master-standby/multi-live design, storage and computing separation, and independent expansion on demand. It is compliant with the standard SQL 2003 and supports transaction ACID feature to provide strong data consistency guarantee; supports X86 and ARM platform servers and is vertically optimized based on Kunpeng chip, which improves performance by 30% compared with the same generation of X86, as shown in Fig. 8.7.

Fig. 8.7
figure 7

Distributed architecture

GaussDB (DWS ) is based on a non-shared distributed architecture with MPP engine, which consists of many logical nodes with independent and non-shared CPUs, memories, storages and other system resources. In such a system architecture, business data is scattered across multiple nodes, and data analysis tasks are pushed to the data site for execution nearby, so that large-scale data processing can be done in parallel and fast response to data processing can be realized.

The application layer provides data loading tools, ETL (Extract-Transform-Load) tools, BI tools, data mining and analysis tools, all of which can be integrated with GaussDB (DWS ) through a standard interface. GaussDB (DWS ) is compatible with the PostgreSQL ecosystem and the SQL syntax is processed to be compatible with MySQL, Oracle and Teradata. Applications can migrate smoothly to GaussDB (DWS ) with only a few changes.

The interface supports applications to connect to GaussDB (DWS ) via standard JDBC 4.0 and ODBC 3.5.

A GaussDB (DWS ) cluster (MPP cluster) consists of multiple nodes with the same specifications in the same subnet, which jointly provide services. Each DN in the cluster is responsible for storing data, with disk as the storage medium. The coordinator node (CN) is responsible for receiving access requests from applications and returning execution results to clients. In addition, the CN is responsible for decomposing tasks and scheduling task slices to be executed in parallel on each DN.

Automatic data backup supports automatic backup of cluster snapshots to EB-level OBS, which facilitates periodic backup of the cluster on business idleness to ensure data recovery after cluster abnormalities. Snapshot is a complete backup of GaussDB (DWS ) cluster at a certain point in time, recording all configuration data and business data of the specified cluster at that moment.

The tool chain provides the data parallel loading tool GDS (General Data Service), SQL syntax migration tool DSC, and SQL development tool Data Studio, and supports O&M monitoring of the cluster through the console.

GaussDB (DWS )'s logical architecture is shown in Fig. 8.8.

Fig. 8.8
figure 8

Logical Architecture

  • CM: Cluster Manager, which manages and monitors the operation of each functional unit and physical resources in the distributed system to ensure the stable operation of the whole system.

  • GTM: Global Transaction Manager, which provides the information required for global transaction control and uses multi-version concurrency control mechanism (based on multiple versions and concurrency control protocol).

  • WLM: Workload Manager, which controls the allocation of system resources and prevents excessive business load from hitting the system, leading to business congestion and system crashes.

  • Coordinator Node: Acts as the business entry and result return of the whole system; receives access requests from business applications; decomposes tasks and schedules parallel execution of task shards.

  • Data Node: The logical entity that executes query task sharding.

  • GDS Loader: Parallel data loading, multiple configurable; supports text file format with automatic error data recognition.

GaussDB (DWS ) has the following main features and significant advantages over traditional data warehouses, which can solve the problem of multi-industry ultra-large data processing and common platform management.

  1. (1)

    Easy use.

    One-stop visualization and convenient management: Uses GaussDB (DWS ) management console to complete the O&M management work such as application and data warehouse connection, data backup, data recovery, and data warehouse resources and performance monitoring.

    Seamless integration with big data: You can use standard SQL to query data on HDFS and OBS without data relocation.

    One-click heterogeneous database migration tool: Provides migration tools that support the migration of SQL scripts from MySQL, Oracle and Teradata to GaussDB (DWS ).

  2. (2)

    Easy scalability.

    On-demand expansion: Non-shared open architecture, where nodes can be added at any time according to business conditions to improve the data storage capacity and query analysis performance of the system.

    Linear performance improvement upon expansion: Capacity and performance improving linearly with the cluster expands, with a linear ratio of 0.8.

    Capacity expansion without business interruption: The expansion process supports data addition, deletion, modification and check operations, as well as DDL operations (DROP/ TRUNCATE/ALTER TABLE); table-level online expansion technology, with no business interruption and no perception during expansion.

  3. (3)

    High performance.

    Cloud-based distributed architecture: GaussDB (DWS ) adopts fully parallel MPP architecture, where business data is scattered across multiple nodes, and data analysis tasks are pushed to the data site for execution nearby, so that large-scale data processing can be done in parallel and fast response to data processing can be realized.

    High performance of query, and trillion data response within seconds: GaussDB (DWS ) background realizes parallel execution of instructions in registers through multi-threaded parallel execution of algorithms and vectorized computation engine, and also reduces redundant conditional logic judgments during query through dynamic compilation of underlying virtual machine (framework system of architecture compiler), which helps improve data query performance. GaussDB (DWS ) supports row-column hybrid storage, which can provide users with better data compression ratio (column storage), better index performance (column storage), and better point update and point query (row storage) performance at the same time.

    Fast data loading: GaussDB (DWS ) provides GDS extremely fast parallel large-scale data loading tool.

    Data compression under column storage: For inactive early data, it can be compressed to reduce its space occupation and lower down procurement and O&M costs; it can select compression algorithms self-adaptively according to data characteristics, with an average compression ratio of 7:1; compressed data can be accessed directly and transparent to business, thus greatly reducing the preparation time for historical data access.

  4. (4)

    High reliability.

    ACID: It supports distributed transaction ACID feature to provide strong data consistency guarantee.

    All-round HA design: All software processes of GaussDB (DWS ) have primary and secondary guarantees, and all logical components of the cluster such as CNs and DNs have primary and secondary guarantees; in the case of physical failure of any single point, the system can still ensure reliable and consistent data, while providing services to the outside world; hardware-level high reliability includes disk Raid, switch stacking, NIC bond, and uninterruptible power supply (UPS).

    Security: GaussDB (DWS ) supports transparent data encryption, and can be docked with database security services, based on network isolation and security group rules to protect system and user privacy and ensure data security; GaussDB (DWS ) also supports automatic full and incremental data backup to improve data reliability.

  5. (5)

    Low cost.

    Pay-as-you-go: GaussDB (DWS ) is billed according to actual usage and usage length; users just need to pay very low fees and only pay for the actual consumed resources.

    Low threshold: Users do not need to invest more fixed costs in the early stage, and can start with a low-specification data warehouse instance, and then flexibly adjust the required resources according to the business situation at any time and spend as needed.

8.2.4 Data Studio

Data Studio's graphical integrated development environment can help database developers to quickly carry out database development.

Data Studio provides various database development and debugging functions, including the following.

  1. (1)

    Creates and manages database objects (databases, schemas, tables, views, indexes, functions, and stored procedures, etc.).

  2. (2)

    Database DML, DDL, and DCL operations.

  3. (3)

    Creates, runs and debugs PL/SQL procedures.

Data warehouse migration is a Data Studio application scenario, as shown in Fig. 8.9.

Fig. 8.9
figure 9

Data warehouse migration

Smooth migration: GaussDB (DWS ) provides supporting migration tools, which can support smooth migration of common data analysis systems such as TeraData, Oracle, MySQL, SQL Server, PostgreSQL, Greenplum, Impala, etc.

Compatibility with traditional data warehouse: GaussDB (DWS ) supports SQL 2003 standard, compatible with some syntaxes and data structures of Oracle, supports stored procedures, and can be seamlessly connected with common BI tools, with minimal modification during business migration.

Security and reliability: GaussDB (DWS ) supports data encryption and can also be docked with database security services to ensure data security on the cloud.

Big data fusion analysis is also an application scenario for Data Studio, as shown in Fig. 8.10.

Fig. 8.10
figure 10

Big data fusion analysis

Unified analysis portal: GaussDB (DWS )'s SQL is used as the unified portal for upper-layer applications, and application developers can access all data using familiar SQL.

Real-time interactive analysis: For immediate analysis needs, analysts can get information from the big data platform in real time.

Flexible adjustment: Adding nodes can expand the system's data storage capacity and query and analysis performance, which can support petabyte-scale data storage and calculation.

Data Studio application scenarios also include enhanced ETL and real-time BI analysis, as shown in Fig. 8.11.

Fig. 8.11
figure 11

Enhanced ETL and real-time BI analysis

Data Migration: It supports multiple data sources, as well as efficient real-time data import in batch.

High performance: It supports petabyte-scale data storage at low cost and trillions of data correlation analysis with second-level response.

Real time: Real-time integration of business data streams helps users optimize and adjust business decisions in a timely manner.

The application scenarios of Data Studio also include real-time data analysis, as shown in Fig. 8.12.

Fig. 8.12
figure 12

Real-time data analysis

Real-time streaming data entry: IoT, Internet and other data can be written to GaussDB (DWS ) in real time after being processed by streaming computing and AI services.

Real-time monitoring and prediction: It analyzes and predicts against data, monitors equipment, and predicts behavior for control and optimization.

AI fusion analysis: The analysis results of AI services on data such as images and text can be correlated and analyzed with other business data in GaussDB (DWS ) to achieve fused data analysis.

8.3 NoSQL Databases

8.3.1 GaussDB (for Mongo)

NoSQL, also called “Not Only SQL” and “non-relational”, refers to a non-relational database that is different from the traditional relational databases.

There are many significant differences between NoSQL and relational databases. For example, NoSQL does not guarantee the ACID feature of relational databases; NoSQL does not use SQL as the query language; NoSQL data storage can be used without a fixed table schema; NoSQL often avoids the use of SQL JOIN operations. NoSQL features easy scalability, high performance, etc.

Huawei's self-developed distributed multi-mode NoSQL database service with computing-storage separation architecture covers four mainstream NoSQL database services: GaussDB (for Mongo), GaussDB (for Cassandra), GaussDB (for Redis), and GaussDB (for Influx), as shown in Fig. 8.13.

Fig. 8.13
figure 13

GaussDB NoSQL

GaussDB NoSQL supports cross-3AZ clusters of high availability, and has the advantages of minute-level computing capacity expansion, second-level storage capacity expansion, strong data consistency, ultra-short latency, and high-speed backup recovery compared with the community version, which is cost-effective and suitable for IoT, meteorology, Internet, games and other fields.

The cloud database GaussDB (for Mongo) is a cloud-native NoSQL database compatible with MongoDB ecology. It features enterprise-class performance, flexibility, high reliability, visual management, etc.

GaussDB (for Mongo), which supports computing-storage separation, extreme availability and massive storage, mainly demonstrates the following benefits.

  1. (1)

    Separation of storage and computing: The storage layer adopts DFV high-performance distributed storage, and the computing and storage resources are expanded independently on demand.

  2. (2)

    Extreme availability: It supports distributed deployment with 3–12 nodes, tolerates n-1 node failure, and has three copies of data storage to ensure data security.

  3. (3)

    Massive storage: It allows up to 96TB storage space.

  4. (4)

    Autonomy and controllability: It supports Kunpeng architecture.

  5. (5)

    Compatibility: It is compatible with MongoDB protocol for consistent development experience.

The computing-storage separation architecture of GaussDB (for Mongo) allows computing and storage to expand on-demand separately, effectively reducing costs; based on shared storage, Rebalance does not migrate data; 3AZ disaster recovery is supported.

GaussDB (for Mongo) offloads replica sets to distributed storage, reducing the number of storage copies; all ShardServer can handle business; distributed storage is based on sharded replication, which can better aggregate I/O performance and fault reconstruction performance; RocksDB storage engine guarantees good write performance; local SSD read Cache (cache) is used to optimize read performance; snapshot-based physical backup avoids logical backups to export data, ensuring better performance; clear backup time points are set; performance is continuously optimized, including infrastructure, thread pool, and storage RDMA; the cluster size is automatically scaled up and down according to the business load, reducing user costs by more than 50%; instantaneous recovery, incremental backup, table-level backup, and arbitrary point-in-time recovery are supported.

User case: JAC's Internet of Vehicles scenario. It meets nearly one million concurrent queries per second, with timely response and stable business operation; the performance of the same concurrency is improved by three times compared with the same cost based on ECS self-built or open source service solution.

8.3.2 GaussDB (for Cassandra)

GaussDB (for Cassandra) is a massively scalable open source NoSQL database suitable for managing large amounts of structured, semi-structured and unstructured data across multiple data centers and clouds. Cassandra is continuously available, linearly scalable, and simple to operate on multiple commercial servers, with no single point of failure. Its powerful dynamic data model allows for flexibility and rapid response. GaussDB (for Cassandra) features the following benefits

  1. (1)

    Cluster stability: no complete garbage collection problem.

  2. (2)

    Computing-storage separation: minute-level node capacity expansion; second-level storage capacity expansion.

  3. (3)

    Active-Active: distributed architecture; n-1 node failure tolerance.

  4. (4)

    High performance: performance times higher than the community version.

  5. (5)

    Massive data: single set of instances up to 100TB data.

  6. (6)

    High reliability: minute-level backup recovery; strong data consistency.

GaussDB (for Cassandra) database supports elastic expansion, super read/write, high availability, fault tolerance, strong consistency, continuous query language (CQL), computing-storage separation, etc., without full GC problem. Its benefits are shown in Table 8.5.

Table 8.5 Benefits of GaussDB (for Cassandra) database

Figure 8.14 shows the GaussDB (for Cassandra) use cases for industrial manufacturing and meteorology industries. The large-scale cluster deployment is suitable for the scenarios of massive data storage in industrial manufacturing and meteorological industries. The full P2P architecture based on consistent hashing ensures high availability of business and easy scalability of nodes, which supports 7 × 24 real-time writing of multi-sensor terminal data, with minute-level expansion for easily coping with operation or project peak.

Fig. 8.14
figure 14

GaussDB (for Cassandra) use cases for industrial manufacturing and meteorology industries

8.4 Summary

This chapter introduces the database features, including Huawei relational databases GaussDB (for MySQL), GaussDB (openGauss) and Huawei GaussDB (DWS ), and expounds the product features and business value of NoSQL databases, including GaussDB (for Mongo) and GaussDB ( for Cassandra).

8.5 Exercises

  1. 1.

    [True or False] GaussDB (for MySQL) supports computing-storage separation. ( )

    1. A.

      True

    2. B.

      False

  2. 2.

    [Multiple Choice] What are the main advantages of GaussDB (for MySQL) database products? ( )

    1. A.

      High reliability

    2. B.

      High scalability

    3. C.

      Ultra high performance

    4. D.

      High compatibility

  3. 3.

    [Single Choice] What is the maximum number of read-only nodes that can be added to a GaussDB (for MySQL) cluster? ( )

    1. A.

      12

    2. B.

      13

    3. C.

      14

    4. D.

      15

  4. 4.

    [Short Answer Question] How does GaussDB (for MySQL) automatically perform failover?

  5. 5.

    [True or False] GaussDB (openGauss) is the world's first fully self-developed enterprise-class OLAP database that supports the Kunpeng hardware architecture. ( )

    1. A.

      True

    2. B.

      False

  6. 6.

    [Multiple Choice] An e-commerce company uses GaussDB (openGauss) database for its business. Which of the following are the advantages of GaussDB (openGauss) database? ( )

    1. A.

      Excellent performance

    2. B.

      High scalability

    3. C.

      Easy management

    4. D.

      Security and reliability.

  7. 7.

    [Multiple Choice] GaussDB (openGauss) is based on an innovative database kernel, which supports high-performance transaction processing capabilities in real time. Which of the following are the main features of its high performance? ( )

    1. A.

      Distributed strong consistence

    2. B.

      Support for Kunpeng two-way server

    3. C.

      High throughput and strong consistency transaction capability

    4. D.

      Compatibility with SQL2003 standard syntaxes

  8. 8.

    [Single Choice] Which of the following components is responsible for receiving access requests from the application and returning execution results to the client? ( )

    1. A.

      GTM

    2. B.

      WLM

    3. C.

      CN

    4. D.

      DN

  9. 9.

    [Multiple Choice] Which of the following product advantages does GaussDB (DWS ) have over traditional data warehouses? ( )

    1. A.

      High performance

    2. B.

      High reliability

    3. C.

      Easy use

    4. D.

      Easy scalability

  10. 10.

    [True or False] GaussDB (DWS ) provides double HA protection mechanism for data nodes to ensure uninterrupted business. ( )

    1. A.

      True

    2. B.

      False