When extracting hive data to mysql, the parallelism parameter setting is invalid. about seatunnel HOT 5 OPEN

luckyliush commented on June 1, 2024

When extracting hive data to mysql, the parallelism parameter setting is invalid.

from seatunnel.

Comments (5)

liunaijie commented on June 1, 2024

try to add rewriteBatchedStatements=true parameter to your jdbc url

from seatunnel.

luckyliush commented on June 1, 2024

try to add rewriteBatchedStatements=true parameter to your jdbc url

Thank you, but this parameter has been added before，did not meet expectations

from seatunnel.

liunaijie commented on June 1, 2024

try to add rewriteBatchedStatements=true parameter to your jdbc url

Thank you, but this parameter has been added before，did not meet expectations

plugin_name = jdbc
user = xxxxx
url = "jdbc:mysql://xxxxxx/xxxxxxx?allowMultiQueries=true&useUnicode=true&characterEncoding=UTF-8&serverTimezone=Asia/Shanghai&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true&useSSL=false"
enable_upsert = true
generate_sink_sql = true
database = db_name
table = table_name
primary_keys = [xxx,xxx]

try with this config, it will auto generate insert sql, i use this config, the write speed is good

from seatunnel.

luckyliush commented on June 1, 2024

try to add rewriteBatchedStatements=true parameter to your jdbc url

Thank you, but this parameter has been added before，did not meet expectations
plugin_name = jdbc
user = xxxxx
url = "jdbc:mysql://xxxxxx/xxxxxxx?allowMultiQueries=true&useUnicode=true&characterEncoding=UTF-8&serverTimezone=Asia/Shanghai&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true&useSSL=false"
enable_upsert = true
generate_sink_sql = true
database = db_name
table = table_name
primary_keys = [xxx,xxx]
try with this config, it will auto generate insert sql, i use this config, the write speed is good

`env {
execution.parallelism = 10
job.mode = "BATCH"
}

source {
Hive {
table_name = ""
metastore_uri = ""
result_table_name = "Table_test"
hdfs_site_path = "/home/hadoop/hadoop-3.2.2/etc/hadoop/hdfs-site.xml"
hive_site_path = "/home/hadoop/hive-2.3.9/conf/hive-site.xml"
}
}

transform {
sql {
source_table_name="Table_test"
query = "select xxx,xxx from Table_test"
result_table_name = "Table_test2"
}
}

sink {
Jdbc {

url = "jdbc:mysql://xxx:3306/xxx?allowMultiQueries=true&useUnicode=true&characterEncoding=UTF-8&serverTimezone=Asia/Shanghai&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true&useSSL=false"
driver = "com.mysql.cj.jdbc.Driver"
user = "root"
enable_upsert = true
generate_sink_sql = true
password = "xxx"
database = "xxx"
primary_keys = [xxx,xxx,xxx]
table = "xxx"
}
}`

The version I am using is 2.3.1, and the configuration is as shown above, but the extraction speed is the same as before and has not improved. Is there something wrong with my configuration?
Approximately 12,000 pieces of data can be extracted per second.
I passed the primary key parameters based on the granular fields of the hive table, but these fields are not set as primary keys in the mysql table. Does this have any impact?

from seatunnel.

luckyliush commented on June 1, 2024

try to add rewriteBatchedStatements=true parameter to your jdbc url

Thank you, but this parameter has been added before，did not meet expectations
plugin_name = jdbc
user = xxxxx
url = "jdbc:mysql://xxxxxx/xxxxxxx?allowMultiQueries=true&useUnicode=true&characterEncoding=UTF-8&serverTimezone=Asia/Shanghai&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true&useSSL=false"
enable_upsert = true
generate_sink_sql = true
database = db_name
table = table_name
primary_keys = [xxx,xxx]
try with this config, it will auto generate insert sql, i use this config, the write speed is good

Hello, now in the seatunnel-2.3.5 version, using the same configuration, the parallelism parameter will not take effect.
But after adding the parameter read_limit.rows_per_second=10000 to seatunnel-2.3.5, the parallelism parameter will take effect and the extraction speed will be significantly improved. Do you know the reason?

from seatunnel.

When extracting hive data to mysql, the parallelism parameter setting is invalid. about seatunnel HOT 5 OPEN

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent