SeaTunnel synchronizes Oracle data to ClickHouse

Mondo Technology Updated on 2024-01-31

SeaTunnel is a distributed, high-performance, and easily scalable data integration platform for massive data synchronization and transformation. It can achieve stable and efficient data synchronization between various heterogeneous data sources such as MySQL, Oracle, SqlServer, PostgreSQL, MongoDB2, and Redis. You only need to configure the job information to synchronize the data. Once the job is submitted, the source connector is responsible for reading the data in parallel and sending it to a downstream transformation or directly to a sink, which writes the data to the destination.

Features of Seatunnle:

Rich and scalable connectors:Seatunnle provides a connector API that is not dependent on a specific execution engine. Connectors (source, transform, sink) developed based on this API can run on many different engines, such as the currently supported seatunnle engine, flink, spark. Connector Plugin:The plugin design allows users to easily develop their own connectors and integrate them into a seatunnle project. Currently, Seatunnle already supports more than 100 connectors, and the number is growing. Bulk Streaming Integration:The connector developed based on the Seatunnle connector API is perfectly compatible with scenarios such as offline synchronization, real-time synchronization, full synchronization, and incremental synchronization. Dramatically reduces the difficulty of managing data integration tasks. Multi-engine support:By default, seatunnle uses the seatunnle engine for data synchronization. At the same time, Seatunnle also supports the use of Flink or Spark as the execution engine of the connector to adapt to the existing technology components of the enterprise. Seatunnle supports multiple versions of Spark and Flink. JDBC multiplexing, multi-table parsing of database logs:Seatunnle supports multi-table or full-database synchronization, which solves the problem of too many JDBC connectionsSupports multi-table or full-database log read/write parsing, which solves the problem of repeated reading and parsing logs in CDC multi-table synchronization scenarios. High throughput and low latencySeatunnle supports parallel read/write and provides stable and reliable data synchronization with high throughput and low latency. Perfect real-time monitoring:Seatunnle supports detailed monitoring information for each step in the data synchronization process, allowing users to easily understand the amount of data read and written, data size, QPS, and other information of the synchronization task. ClickHouse is a rising star in the field of ORAP** analytics, with excellent query performance and rich analysis functions, which can help analysts flexibly and quickly extract the value of massive amounts of data.

Synchronizing data from Oracle to ClickHouse can help improve data processing speed and query performance, provide better data management and analysis capabilities, reduce costs, and improve economic benefits.

This topic describes how to use Seatunnle to synchronize Oracle data to a ClickHouse data warehouse.

Seatunnel task configuration and starting

In this example, 9,999 data entries in the Oracle test table are synchronized to the defaulttest0.

Oracle table creation statement, as follows:

create table test (id int,name varchar(40),quantity int,primary key (id) )

Insert data into Oracle in the following format:

insert into test (id,name,quantity) values(1,'banana',1);

The following statement is used to create a table for ClickHouse:

create table default.test0

id` int32,`name` string,`quantity` int32

engine = mergetree

primary key id

order by id

1、**jdbc

*Oracle JDBC and put it into'$seatnunnel_home/plugins/jdbc/'directory.

2. Write a configuration file

In'$seatnunnel_home/config'directory, create a configuration file.

An example of the content of the configuration file is as follows:

env sink

3. Start the task

In'$seatnunnel_home'directory, use the startup command:

bin/seatunnel.sh --config ./config/oracletock.template -e local

This command will run your seatunnel job in local mode.

When the task is completed, the summary information of the task is displayed

Go to ClickHouse and select count() from test0 to view the writing status and see that 9,999 pieces of test data have been written to ClickHouse.

Next, we will describe more about the data synchronization process from a database to a clickhouse.

Related Pages