NTU General GBase 8a Enterprise Enhancement 1 .

Mondo Technology Updated on 2024-02-21

Distributed storage of data

Columns and rows are mixed

The data managed by NTU's general gbase 8a is organized and physically stored in columns on disk. In the face of massive data analysis, analytical databases store table data in columns, and the column storage architecture has natural advantages for query, statistics, and analysis operations.

Its advantages are reflected in the following aspects:

Lower IO

Only access to the columns involved in the query will result in disk Io, and columns that are not involved in the query do not need to be accessed and do not result in disk io.

High compression ratio The compression ratio can reach 2 to 20 times.

Mixed rows and columns are supported

NTU General GBASE 8A MPP Cluster supports mixed rows and columns. For a clustered architecture of a columnstore, when the operation involves a large number of columns and the data records accessed are very discrete, a large number of discrete ios will occur. The row-column hybrid feature improves disk IO performance by storing information for redundant rows.

Distributed storage

GBASE 8A MPP Cluster can process structured data above petabyte, and can adopt random data storage distribution policy mode or hash data storage distribution policy mode for large table data. Users can choose the appropriate data storage distribution strategy according to the needs of business scenarios, so as to obtain the best balance between performance, reliability, and flexibility.

Random data storage distribution policy pattern

The random data storage distribution strategy mode refers to the database creating a randomly distributed distribution table, and the data will be randomly and evenly distributed to each data node when the data is stored.

Hash data storage distribution policy pattern

The hash data storage distribution policy mode refers to the processing of each piece of data in the original data according to the specified hash distribution column when the data is stored in the database, and the processed data is loaded into a specific hash bucket according to the hash value, and each hash bucket corresponds to a cluster data node. In this way, the data obtained by each node has some common characteristics (the specified columns all have the same hash value), and the optimization engine can optimize the query plan according to these common characteristics at query time, so as to achieve the purpose of shortening the query time.

Related Pages