Git Product home page Git Product logo

carbondata's Introduction

Apache CarbonData is an indexed columnar data store solution for fast analytics on big data platform, e.g. Apache Hadoop, Apache Spark, etc.

You can find the latest CarbonData document and learn more at: https://carbondata.apache.org

CarbonData cwiki

Status

Spark2.4: Coverage Status Coverity Scan Build Status

Features

CarbonData file format is a columnar store in HDFS, it has many features that a modern columnar format has, such as splittable, compression schema, complex data type etc, and CarbonData has following unique features:

  • Stores data along with index: it can significantly accelerate query performance and reduces the I/O scans and CPU resources, where there are filters in the query. CarbonData index consists of multiple level of indices, a processing framework can leverage this index to reduce the task it needs to schedule and process, and it can also do skip scan in more finer grain unit (called blocklet) in task side scanning instead of scanning the whole file.
  • Operable encoded data: through supporting efficient compression and global encoding schemes, can query on compressed/encoded data, the data can be converted just before returning the results to the users, which is "late materialized".
  • Supports for various use cases with one single Data format : like interactive OLAP-style query, Sequential Access (big scan), Random Access (narrow scan).

Building CarbonData

CarbonData is built using Apache Maven, to build CarbonData

Online Documentation

Experimental Features

Some features are marked as experimental because the syntax/implementation might change in the future.

  1. Hybrid format table using Add Segment.
  2. Accelerating performance using MV on parquet/orc.
  3. Merge API for Spark DataFrame.
  4. Hive write for non-transactional table.
  5. Secondary Index as a Coarse Grain Index in query processing.

Integration

Other Technical Material

Fork and Contribute

This is an active open source project for everyone, and we are always open to people who want to use this system or contribute to it. This guide document introduces how to contribute to CarbonData.

Contact us

To get involved in CarbonData:

About

Apache CarbonData is an open source project of The Apache Software Foundation (ASF).

carbondata's People

Contributors

ajantha-bhat avatar akashrn5 avatar anubhav100 avatar ashokblend avatar bjangir avatar chenliang613 avatar dhatchayani avatar gvramana avatar jackylk avatar kevinjmh avatar kumarvishal09 avatar kunal642 avatar manishgupta88 avatar manishnalla1994 avatar manoharvanam avatar mohammadshahidkhan avatar nareshpr avatar qiangcai avatar rahulk2 avatar ravikiran23 avatar ravipesala avatar shreelekhyag avatar sounakr avatar sraghunandan avatar sujith71955 avatar vikramahuja1001 avatar xubo245 avatar xuchuanyin avatar zhangshunyu avatar zzcclp avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

carbondata's Issues

SDKS3Example is not working

There are some error when I run the example :org.apache.carbondata.examples.sdk.SDKS3Example

Errors:

2023-04-09 14:35:16 INFO  CarbonProperties:1700 - Considered value for min max byte limit for string is: 200
2023-04-09 14:35:16 INFO  CarbonProperties:1725 - Using default value for carbon.detail.batch.size 100
2023-04-09 14:35:17 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2023-04-09 14:35:17 INFO  CarbonDataProcessorUtil:100 - Successfully created dir: /var/folders/lw/4y5plg0x7rq45h38m4sfxlbm0000gn/T//08f6e51ef0ba4ffda2c6f3ad15c7baf2_attempt_cfaf6637-f2e9-4b0b-9258-7dd5a4494e51_0000_m_-811962868_1490758150
2023-04-09 14:35:19 INFO  DataLoadExecutor:49 - Data Loading is started for table _temptable-c7c3e84c-37d4-4ea3-aaea-12d4b7d53c20_1681022116553
2023-04-09 14:35:19 INFO  CarbonDataProcessorUtil:100 - Successfully created dir: /var/folders/lw/4y5plg0x7rq45h38m4sfxlbm0000gn/T//08f6e51ef0ba4ffda2c6f3ad15c7baf2_attempt_cfaf6637-f2e9-4b0b-9258-7dd5a4494e51_0000_m_-811962868_1490758150/Fact/Part0/Segment_null/57cb395e3c304c7ba51bfd6215df27e4
2023-04-09 14:35:21 WARN  CarbonOutputIteratorWrapper:87 - try to poll a row batch one more time.
2023-04-09 14:35:21 INFO  AbstractFactDataWriter:172 - Total file size: 1073741824 and dataBlock Size: 966367642
2023-04-09 14:35:21 INFO  AbstractFactDataWriter:181 - Carbondata will directly write fact data to store path.
2023-04-09 14:35:21 INFO  CarbonFactDataWriterImplV3:98 - Sort Scope : NO_SORT
2023-04-09 14:35:21 WARN  CarbonOutputIteratorWrapper:87 - try to poll a row batch one more time.
2023-04-09 14:35:21 WARN  CarbonOutputIteratorWrapper:87 - try to poll a row batch one more time.
2023-04-09 14:35:21 WARN  UnsafeMemoryManager:83 - It is not recommended to set off-heap working memory size less than 512MB, so setting default value to 512
2023-04-09 14:35:21 INFO  UnsafeMemoryManager:109 - Off-heap Working Memory manager is created with size 536870912 with OFFHEAP
2023-04-09 14:35:21 INFO  CarbonFactDataWriterImplV3:178 - Number of Pages for blocklet is: 1 :Rows Added: 3
2023-04-09 14:38:51 ERROR CarbonFactDataWriterImplV3:390 - Problem while writing the index file
org.apache.carbondata.core.exception.CarbonFileException: Unable to get file status: 
	at org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.getAbsolutePath(AbstractDFSCarbonFile.java:115)
	at org.apache.carbondata.core.datastore.filesystem.S3CarbonFile.getDataOutputStream(S3CarbonFile.java:100)
	at org.apache.carbondata.core.datastore.impl.FileFactory.getDataOutputStream(FileFactory.java:230)
	at org.apache.carbondata.core.writer.ThriftWriter.open(ThriftWriter.java:82)
	at org.apache.carbondata.core.writer.CarbonIndexFileWriter.openThriftWriter(CarbonIndexFileWriter.java:53)
	at org.apache.carbondata.processing.store.writer.AbstractFactDataWriter.writeIndexFile(AbstractFactDataWriter.java:454)
	at org.apache.carbondata.processing.store.writer.v3.CarbonFactDataWriterImplV3.closeWriter(CarbonFactDataWriterImplV3.java:388)
	at org.apache.carbondata.processing.store.CarbonFactDataHandlerColumnar.closeHandler(CarbonFactDataHandlerColumnar.java:500)
	at org.apache.carbondata.processing.loading.steps.CarbonRowDataWriterProcessorStepImpl.processingComplete(CarbonRowDataWriterProcessorStepImpl.java:277)
	at org.apache.carbondata.processing.loading.steps.CarbonRowDataWriterProcessorStepImpl.finish(CarbonRowDataWriterProcessorStepImpl.java:255)
	at org.apache.carbondata.processing.loading.steps.CarbonRowDataWriterProcessorStepImpl.doExecute(CarbonRowDataWriterProcessorStepImpl.java:221)
	at org.apache.carbondata.processing.loading.steps.CarbonRowDataWriterProcessorStepImpl.execute(CarbonRowDataWriterProcessorStepImpl.java:146)
	at org.apache.carbondata.processing.loading.DataLoadExecutor.execute(DataLoadExecutor.java:51)
	at org.apache.carbondata.hadoop.api.CarbonTableOutputFormat.lambda$getRecordWriter$0(CarbonTableOutputFormat.java:280)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:266)
	at java.util.concurrent.FutureTask.run(FutureTask.java)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.FileNotFoundException: No such file or directory: s3a://obs-xubo4/sdk/57cb395e3c304c7ba51bfd6215df27e4_batchno0-0-null-1681022116553.carbonindex
	at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:996)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:77)
	at org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.getAbsolutePath(AbstractDFSCarbonFile.java:113)
	... 19 more
2023-04-09 14:38:51 ERROR CarbonRowDataWriterProcessorStepImpl:279 - Problem while writing the index file
org.apache.carbondata.core.datastore.exception.CarbonDataWriterException: Problem while writing the index file
	at org.apache.carbondata.processing.store.writer.v3.CarbonFactDataWriterImplV3.closeWriter(CarbonFactDataWriterImplV3.java:391)
	at org.apache.carbondata.processing.store.CarbonFactDataHandlerColumnar.closeHandler(CarbonFactDataHandlerColumnar.java:500)
	at org.apache.carbondata.processing.loading.steps.CarbonRowDataWriterProcessorStepImpl.processingComplete(CarbonRowDataWriterProcessorStepImpl.java:277)
	at org.apache.carbondata.processing.loading.steps.CarbonRowDataWriterProcessorStepImpl.finish(CarbonRowDataWriterProcessorStepImpl.java:255)
	at org.apache.carbondata.processing.loading.steps.CarbonRowDataWriterProcessorStepImpl.doExecute(CarbonRowDataWriterProcessorStepImpl.java:221)
	at org.apache.carbondata.processing.loading.steps.CarbonRowDataWriterProcessorStepImpl.execute(CarbonRowDataWriterProcessorStepImpl.java:146)
	at org.apache.carbondata.processing.loading.DataLoadExecutor.execute(DataLoadExecutor.java:51)
	at org.apache.carbondata.hadoop.api.CarbonTableOutputFormat.lambda$getRecordWriter$0(CarbonTableOutputFormat.java:280)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:266)
	at java.util.concurrent.FutureTask.run(FutureTask.java)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.carbondata.core.exception.CarbonFileException: Unable to get file status: 
	at org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.getAbsolutePath(AbstractDFSCarbonFile.java:115)
	at org.apache.carbondata.core.datastore.filesystem.S3CarbonFile.getDataOutputStream(S3CarbonFile.java:100)
	at org.apache.carbondata.core.datastore.impl.FileFactory.getDataOutputStream(FileFactory.java:230)
	at org.apache.carbondata.core.writer.ThriftWriter.open(ThriftWriter.java:82)
	at org.apache.carbondata.core.writer.CarbonIndexFileWriter.openThriftWriter(CarbonIndexFileWriter.java:53)
	at org.apache.carbondata.processing.store.writer.AbstractFactDataWriter.writeIndexFile(AbstractFactDataWriter.java:454)
	at org.apache.carbondata.processing.store.writer.v3.CarbonFactDataWriterImplV3.closeWriter(CarbonFactDataWriterImplV3.java:388)
	... 13 more
Caused by: java.io.FileNotFoundException: No such file or directory: s3a://obs-xubo4/sdk/57cb395e3c304c7ba51bfd6215df27e4_batchno0-0-null-1681022116553.carbonindex
	at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:996)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:77)
	at org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.getAbsolutePath(AbstractDFSCarbonFile.java:113)
	... 19 more
2023-04-09 14:38:51 ERROR CarbonRowDataWriterProcessorStepImpl:160 - Failed for table: _temptable-c7c3e84c-37d4-4ea3-aaea-12d4b7d53c20_1681022116553 in DataWriterProcessorStepImpl
org.apache.carbondata.core.datastore.exception.CarbonDataWriterException: 
	at org.apache.carbondata.processing.loading.steps.CarbonRowDataWriterProcessorStepImpl.finish(CarbonRowDataWriterProcessorStepImpl.java:260)
	at org.apache.carbondata.processing.loading.steps.CarbonRowDataWriterProcessorStepImpl.doExecute(CarbonRowDataWriterProcessorStepImpl.java:221)
	at org.apache.carbondata.processing.loading.steps.CarbonRowDataWriterProcessorStepImpl.execute(CarbonRowDataWriterProcessorStepImpl.java:146)
	at org.apache.carbondata.processing.loading.DataLoadExecutor.execute(DataLoadExecutor.java:51)
	at org.apache.carbondata.hadoop.api.CarbonTableOutputFormat.lambda$getRecordWriter$0(CarbonTableOutputFormat.java:280)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:266)
	at java.util.concurrent.FutureTask.run(FutureTask.java)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.carbondata.processing.loading.exception.CarbonDataLoadingException: 
	at org.apache.carbondata.processing.loading.steps.CarbonRowDataWriterProcessorStepImpl.processingComplete(CarbonRowDataWriterProcessorStepImpl.java:280)
	at org.apache.carbondata.processing.loading.steps.CarbonRowDataWriterProcessorStepImpl.finish(CarbonRowDataWriterProcessorStepImpl.java:255)
	... 10 more
Caused by: org.apache.carbondata.core.datastore.exception.CarbonDataWriterException: Problem while writing the index file
	at org.apache.carbondata.processing.store.writer.v3.CarbonFactDataWriterImplV3.closeWriter(CarbonFactDataWriterImplV3.java:391)
	at org.apache.carbondata.processing.store.CarbonFactDataHandlerColumnar.closeHandler(CarbonFactDataHandlerColumnar.java:500)
	at org.apache.carbondata.processing.loading.steps.CarbonRowDataWriterProcessorStepImpl.processingComplete(CarbonRowDataWriterProcessorStepImpl.java:277)
	... 11 more
Caused by: org.apache.carbondata.core.exception.CarbonFileException: Unable to get file status: 
	at org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.getAbsolutePath(AbstractDFSCarbonFile.java:115)
	at org.apache.carbondata.core.datastore.filesystem.S3CarbonFile.getDataOutputStream(S3CarbonFile.java:100)
	at org.apache.carbondata.core.datastore.impl.FileFactory.getDataOutputStream(FileFactory.java:230)
	at org.apache.carbondata.core.writer.ThriftWriter.open(ThriftWriter.java:82)
	at org.apache.carbondata.core.writer.CarbonIndexFileWriter.openThriftWriter(CarbonIndexFileWriter.java:53)
	at org.apache.carbondata.processing.store.writer.AbstractFactDataWriter.writeIndexFile(AbstractFactDataWriter.java:454)
	at org.apache.carbondata.processing.store.writer.v3.CarbonFactDataWriterImplV3.closeWriter(CarbonFactDataWriterImplV3.java:388)
	... 13 more
Caused by: java.io.FileNotFoundException: No such file or directory: s3a://obs-xubo4/sdk/57cb395e3c304c7ba51bfd6215df27e4_batchno0-0-null-1681022116553.carbonindex
	at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:996)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:77)
	at org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.getAbsolutePath(AbstractDFSCarbonFile.java:113)
	... 19 more
2023-04-09 14:38:51 INFO  AbstractDataLoadProcessorStep:139 - Total rows processed in step Data Writer: 3
2023-04-09 14:38:51 INFO  AbstractDataLoadProcessorStep:139 - Total rows processed in step Data Converter: 3
2023-04-09 14:38:51 INFO  AbstractDataLoadProcessorStep:139 - Total rows processed in step Input Processor: 3
2023-04-09 14:38:51 ERROR CarbonTableOutputFormat:481 - Error while loading data
java.util.concurrent.ExecutionException: java.lang.RuntimeException: org.apache.carbondata.processing.loading.exception.CarbonDataLoadingException: Error while initializing data handler : 
	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
	at java.util.concurrent.FutureTask.get(FutureTask.java:192)
	at org.apache.carbondata.hadoop.api.CarbonTableOutputFormat$CarbonRecordWriter.close(CarbonTableOutputFormat.java:479)
	at org.apache.carbondata.sdk.file.CSVCarbonWriter.close(CSVCarbonWriter.java:166)
	at org.apache.carbondata.examples.sdk.SDKS3Example.main(SDKS3Example.java:87)
Caused by: java.lang.RuntimeException: org.apache.carbondata.processing.loading.exception.CarbonDataLoadingException: Error while initializing data handler : 
	at org.apache.carbondata.hadoop.api.CarbonTableOutputFormat.lambda$getRecordWriter$0(CarbonTableOutputFormat.java:292)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:266)
	at java.util.concurrent.FutureTask.run(FutureTask.java)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.carbondata.processing.loading.exception.CarbonDataLoadingException: Error while initializing data handler : 
	at org.apache.carbondata.processing.loading.steps.CarbonRowDataWriterProcessorStepImpl.execute(CarbonRowDataWriterProcessorStepImpl.java:162)
	at org.apache.carbondata.processing.loading.DataLoadExecutor.execute(DataLoadExecutor.java:51)
	at org.apache.carbondata.hadoop.api.CarbonTableOutputFormat.lambda$getRecordWriter$0(CarbonTableOutputFormat.java:280)
	... 6 more
Caused by: org.apache.carbondata.core.datastore.exception.CarbonDataWriterException: 
	at org.apache.carbondata.processing.loading.steps.CarbonRowDataWriterProcessorStepImpl.finish(CarbonRowDataWriterProcessorStepImpl.java:260)
	at org.apache.carbondata.processing.loading.steps.CarbonRowDataWriterProcessorStepImpl.doExecute(CarbonRowDataWriterProcessorStepImpl.java:221)
	at org.apache.carbondata.processing.loading.steps.CarbonRowDataWriterProcessorStepImpl.execute(CarbonRowDataWriterProcessorStepImpl.java:146)
	... 8 more
Caused by: org.apache.carbondata.processing.loading.exception.CarbonDataLoadingException: 
	at org.apache.carbondata.processing.loading.steps.CarbonRowDataWriterProcessorStepImpl.processingComplete(CarbonRowDataWriterProcessorStepImpl.java:280)
	at org.apache.carbondata.processing.loading.steps.CarbonRowDataWriterProcessorStepImpl.finish(CarbonRowDataWriterProcessorStepImpl.java:255)
	... 10 more
Caused by: org.apache.carbondata.core.datastore.exception.CarbonDataWriterException: Problem while writing the index file
	at org.apache.carbondata.processing.store.writer.v3.CarbonFactDataWriterImplV3.closeWriter(CarbonFactDataWriterImplV3.java:391)
	at org.apache.carbondata.processing.store.CarbonFactDataHandlerColumnar.closeHandler(CarbonFactDataHandlerColumnar.java:500)
	at org.apache.carbondata.processing.loading.steps.CarbonRowDataWriterProcessorStepImpl.processingComplete(CarbonRowDataWriterProcessorStepImpl.java:277)
	... 11 more
Caused by: org.apache.carbondata.core.exception.CarbonFileException: Unable to get file status: 
	at org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.getAbsolutePath(AbstractDFSCarbonFile.java:115)
	at org.apache.carbondata.core.datastore.filesystem.S3CarbonFile.getDataOutputStream(S3CarbonFile.java:100)
	at org.apache.carbondata.core.datastore.impl.FileFactory.getDataOutputStream(FileFactory.java:230)
	at org.apache.carbondata.core.writer.ThriftWriter.open(ThriftWriter.java:82)
	at org.apache.carbondata.core.writer.CarbonIndexFileWriter.openThriftWriter(CarbonIndexFileWriter.java:53)
	at org.apache.carbondata.processing.store.writer.AbstractFactDataWriter.writeIndexFile(AbstractFactDataWriter.java:454)
	at org.apache.carbondata.processing.store.writer.v3.CarbonFactDataWriterImplV3.closeWriter(CarbonFactDataWriterImplV3.java:388)
	... 13 more
Caused by: java.io.FileNotFoundException: No such file or directory: s3a://obs-xubo4/sdk/57cb395e3c304c7ba51bfd6215df27e4_batchno0-0-null-1681022116553.carbonindex
	at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:996)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:77)
	at org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.getAbsolutePath(AbstractDFSCarbonFile.java:113)
	... 19 more
Exception in thread "main" java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.RuntimeException: org.apache.carbondata.processing.loading.exception.CarbonDataLoadingException: Error while initializing data handler : 
	at org.apache.carbondata.hadoop.api.CarbonTableOutputFormat$CarbonRecordWriter.close(CarbonTableOutputFormat.java:482)
	at org.apache.carbondata.sdk.file.CSVCarbonWriter.close(CSVCarbonWriter.java:166)
	at org.apache.carbondata.examples.sdk.SDKS3Example.main(SDKS3Example.java:87)
Caused by: java.util.concurrent.ExecutionException: java.lang.RuntimeException: org.apache.carbondata.processing.loading.exception.CarbonDataLoadingException: Error while initializing data handler : 
	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
	at java.util.concurrent.FutureTask.get(FutureTask.java:192)
	at org.apache.carbondata.hadoop.api.CarbonTableOutputFormat$CarbonRecordWriter.close(CarbonTableOutputFormat.java:479)
	... 2 more
Caused by: java.lang.RuntimeException: org.apache.carbondata.processing.loading.exception.CarbonDataLoadingException: Error while initializing data handler : 
	at org.apache.carbondata.hadoop.api.CarbonTableOutputFormat.lambda$getRecordWriter$0(CarbonTableOutputFormat.java:292)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:266)
	at java.util.concurrent.FutureTask.run(FutureTask.java)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.carbondata.processing.loading.exception.CarbonDataLoadingException: Error while initializing data handler : 
	at org.apache.carbondata.processing.loading.steps.CarbonRowDataWriterProcessorStepImpl.execute(CarbonRowDataWriterProcessorStepImpl.java:162)
	at org.apache.carbondata.processing.loading.DataLoadExecutor.execute(DataLoadExecutor.java:51)
	at org.apache.carbondata.hadoop.api.CarbonTableOutputFormat.lambda$getRecordWriter$0(CarbonTableOutputFormat.java:280)
	... 6 more
Caused by: org.apache.carbondata.core.datastore.exception.CarbonDataWriterException: 
	at org.apache.carbondata.processing.loading.steps.CarbonRowDataWriterProcessorStepImpl.finish(CarbonRowDataWriterProcessorStepImpl.java:260)
	at org.apache.carbondata.processing.loading.steps.CarbonRowDataWriterProcessorStepImpl.doExecute(CarbonRowDataWriterProcessorStepImpl.java:221)
	at org.apache.carbondata.processing.loading.steps.CarbonRowDataWriterProcessorStepImpl.execute(CarbonRowDataWriterProcessorStepImpl.java:146)
	... 8 more
Caused by: org.apache.carbondata.processing.loading.exception.CarbonDataLoadingException: 
	at org.apache.carbondata.processing.loading.steps.CarbonRowDataWriterProcessorStepImpl.processingComplete(CarbonRowDataWriterProcessorStepImpl.java:280)
	at org.apache.carbondata.processing.loading.steps.CarbonRowDataWriterProcessorStepImpl.finish(CarbonRowDataWriterProcessorStepImpl.java:255)
	... 10 more
Caused by: org.apache.carbondata.core.datastore.exception.CarbonDataWriterException: Problem while writing the index file
	at org.apache.carbondata.processing.store.writer.v3.CarbonFactDataWriterImplV3.closeWriter(CarbonFactDataWriterImplV3.java:391)
	at org.apache.carbondata.processing.store.CarbonFactDataHandlerColumnar.closeHandler(CarbonFactDataHandlerColumnar.java:500)
	at org.apache.carbondata.processing.loading.steps.CarbonRowDataWriterProcessorStepImpl.processingComplete(CarbonRowDataWriterProcessorStepImpl.java:277)
	... 11 more
Caused by: org.apache.carbondata.core.exception.CarbonFileException: Unable to get file status: 
	at org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.getAbsolutePath(AbstractDFSCarbonFile.java:115)
	at org.apache.carbondata.core.datastore.filesystem.S3CarbonFile.getDataOutputStream(S3CarbonFile.java:100)
	at org.apache.carbondata.core.datastore.impl.FileFactory.getDataOutputStream(FileFactory.java:230)
	at org.apache.carbondata.core.writer.ThriftWriter.open(ThriftWriter.java:82)
	at org.apache.carbondata.core.writer.CarbonIndexFileWriter.openThriftWriter(CarbonIndexFileWriter.java:53)
	at org.apache.carbondata.processing.store.writer.AbstractFactDataWriter.writeIndexFile(AbstractFactDataWriter.java:454)
	at org.apache.carbondata.processing.store.writer.v3.CarbonFactDataWriterImplV3.closeWriter(CarbonFactDataWriterImplV3.java:388)
	... 13 more
Caused by: java.io.FileNotFoundException: No such file or directory: s3a://obs-xubo4/sdk/57cb395e3c304c7ba51bfd6215df27e4_batchno0-0-null-1681022116553.carbonindex
	at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:996)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:77)
	at org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.getAbsolutePath(AbstractDFSCarbonFile.java:113)
	... 19 more

how to use MERGE INTO

Support MERGE INTO SQL Syntax
CarbonData now supports MERGE INTO SQL syntax along with the API support. This will help the users to write CDC job and merge job using SQL also now.

how to use MERGE INTO ?
Please add in the use document

There are some error when I run the SDKS3SchemaReadExample

There are some error when I run the example:examples/spark/src/main/java/org/apache/carbondata/examples/sdk/SDKS3SchemaReadExample.java

···
2023-04-09 14:54:47 INFO CarbonProperties:1725 - Using default value for carbon.detail.batch.size 100
2023-04-09 14:54:47 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Exception in thread "main" com.amazonaws.AmazonClientException: Unable to load AWS credentials from any provider in the chain
at com.amazonaws.auth.AWSCredentialsProviderChain.getCredentials(AWSCredentialsProviderChain.java:117)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3521)
at com.amazonaws.services.s3.AmazonS3Client.headBucket(AmazonS3Client.java:1031)
at com.amazonaws.services.s3.AmazonS3Client.doesBucketExist(AmazonS3Client.java:994)
at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:297)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2653)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.(AbstractDFSCarbonFile.java:86)
at org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.(AbstractDFSCarbonFile.java:75)
at org.apache.carbondata.core.datastore.filesystem.HDFSCarbonFile.(HDFSCarbonFile.java:43)
at org.apache.carbondata.core.datastore.filesystem.S3CarbonFile.(S3CarbonFile.java:49)
at org.apache.carbondata.core.datastore.impl.DefaultFileTypeProvider.getCarbonFile(DefaultFileTypeProvider.java:107)
at org.apache.carbondata.core.datastore.impl.FileFactory.getCarbonFile(FileFactory.java:157)
at org.apache.carbondata.core.util.path.CarbonTablePath.getActualSchemaFilePath(CarbonTablePath.java:189)
at org.apache.carbondata.core.util.path.CarbonTablePath.getSchemaFilePath(CarbonTablePath.java:170)
at org.apache.carbondata.sdk.file.CarbonSchemaReader.readSchema(CarbonSchemaReader.java:171)
at org.apache.carbondata.examples.sdk.SDKS3SchemaReadExample.main(SDKS3SchemaReadExample.java:54)

···

it does not support merge into ,please modify the document.

https://cwiki.apache.org/confluence/display/CARBONDATA/Apache+CarbonData+2.1.1+Release
it does not support merge into ,please modify the document.

hive --version  Hive 1.2.1000.2.6.5.0-292
[hdfs@hadoop-node-1 spark-2.3.4-bin-hadoop2.7]$ bin/beeline
Beeline version 1.2.1.spark2 by Apache Hive
beeline> !connect jdbc:hive2://hadoop-node-1:10000
Connecting to jdbc:hive2://hadoop-node-1:10000
Enter username for jdbc:hive2://hadoop-node-1:10000: abcdsesss
Enter password for jdbc:hive2://hadoop-node-1:10000: **********
Connected to: Spark SQL (version 2.3.4)
Driver: Hive JDBC (version 1.2.1.spark2)
Transaction isolation: TRANSACTION_REPEATABLE_READ

0: jdbc:hive2://hadoop-node-1:10000> merge into test_table t using ( select t1.name name,t1.id age, t1.age id, t1.city city from test_table t1 )s on (t.id=s.id) when matched then update set t.age=s.age  ;
Error: org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException: Parse failed! (state=,code=0)
0: jdbc:hive2://hadoop-node-1:10000>

finally,MERGE is available starting in Hive 2.2. and carbondata table is the table that support ACID ?

Merge
Version Information

MERGE is available starting in Hive 2.2.

Merge can only be performed on tables that support ACID. See Hive Transactions for details.

DELETE FROM TABLE default.test_table WHERE SEGMENT.ID IN reported an error in benline

carbondata 2.1.1

DELETE FROM TABLE default.test_table WHERE SEGMENT.ID IN reported an error in benline


0: jdbc:hive2://hadoop-node-1:10016> show segments for table test_table;
+-------+--------------------+--------------------------+------------------+------------+------------+-------------+--------------+--+
|  ID   |       Status       |     Load Start Time      | Load Time Taken  | Partition  | Data Size  | Index Size  | File Format  |
+-------+--------------------+--------------------------+------------------+------------+------------+-------------+--------------+--+
| 21    | Compacted          | 2021-07-09 09:22:41.538  | 7.399S           | NA         | 619.53KB   | 54.21KB     | columnar_v3  |
| 20    | Compacted          | 2021-07-08 18:15:33.536  | 1.454S           | NA         | 411.54KB   | 54.02KB     | columnar_v3  |
| 19    | Compacted          | 2021-07-08 18:14:44.265  | 8.104S           | NA         | 259.04KB   | 53.96KB     | columnar_v3  |
| 18    | Compacted          | 2021-07-08 18:09:25.752  | 7.792S           | NA         | 178.86KB   | 53.90KB     | columnar_v3  |
| 17    | Compacted          | 2021-07-08 18:09:02.815  | 5.136S           | NA         | 88.90KB    | 26.86KB     | columnar_v3  |
| 16.1  | Compacted          | 2021-07-12 13:51:47.44   | 2.452S           | NA         | 390.78KB   | 54.30KB     | columnar_v3  |
| 16    | Compacted          | 2021-07-08 18:03:54.558  | 7.348S           | NA         | 44.62KB    | 13.42KB     | columnar_v3  |
| 15    | Compacted          | 2021-07-08 15:03:17.527  | 1.354S           | NA         | 12.61KB    | 1.29KB      | columnar_v3  |
| 14    | Compacted          | 2021-07-08 14:32:53.337  | 0.485S           | NA         | 7.48KB     | 1.29KB      | columnar_v3  |
| 13    | Compacted          | 2021-07-08 14:32:36.673  | 0.44S            | NA         | 4.83KB     | 1.28KB      | columnar_v3  |
| 12.1  | Compacted          | 2021-07-12 13:51:47.44   | 1.122S           | NA         | 22.06KB    | 1.30KB      | columnar_v3  |
| 12    | Compacted          | 2021-07-08 14:30:41.506  | 0.43S            | NA         | 3.59KB     | 1.28KB      | columnar_v3  |
| 11    | Compacted          | 2021-07-08 14:29:57.866  | 0.436S           | NA         | 2.95KB     | 1.27KB      | columnar_v3  |
| 10    | Compacted          | 2021-07-08 14:29:45.201  | 0.445S           | NA         | 2.57KB     | 1.27KB      | columnar_v3  |
| 9     | Compacted          | 2021-07-08 14:28:36.513  | 0.438S           | NA         | 2.38KB     | 1.27KB      | columnar_v3  |
| 8.1   | Compacted          | 2021-07-12 13:51:47.44   | 0.837S           | NA         | 3.52KB     | 1.28KB      | columnar_v3  |
| 8     | Compacted          | 2021-07-08 14:27:50.502  | 0.541S           | NA         | 2.28KB     | 1.26KB      | columnar_v3  |
| 7     | Compacted          | 2021-07-08 14:27:08.431  | 0.49S            | NA         | 2.20KB     | 1.26KB      | columnar_v3  |
| 6     | Marked for Delete  | 2021-07-08 10:48:47.684  | 0.386S           | NA         | 1.08KB     | 656.0B      | columnar_v3  |
| 5     | Compacted          | 2021-07-08 10:44:38.283  | 14.552S          | NA         | 1.06KB     | 646.0B      | columnar_v3  |
| 4     | Compacted          | 2021-07-08 10:43:51.58   | 14.259S          | NA         | 1.05KB     | 644.0B      | columnar_v3  |
| 3     | Marked for Delete  | 2021-07-08 10:43:19.104  | 16.868S          | NA         | 1.05KB     | 644.0B      | columnar_v3  |
| 2.3   | Success            | 2021-07-12 13:52:15.043  | 1.342S           | NA         | 1.14MB     | 54.60KB     | columnar_v3  |
| 2.2   | Compacted          | 2021-07-12 13:51:47.44   | 1.389S           | NA         | 23.36KB    | 1.30KB      | columnar_v3  |
| 2.1   | Compacted          | 2021-07-12 13:51:47.44   | 0.56S            | NA         | 2.28KB     | 1.27KB      | columnar_v3  |
| 2     | Compacted          | 2021-07-08 10:27:01.657  | 0.487S           | NA         | 1.14KB     | 659.0B      | columnar_v3  |
| 1     | Marked for Delete  | 2021-07-08 10:21:01.823  | 0.45S            | NA         | 1.06KB     | 646.0B      | columnar_v3  |
| 0     | Marked for Delete  | 2021-07-08 10:20:36.083  | 0.738S           | NA         | 1.05KB     | 644.0B      | columnar_v3  |
+-------+--------------------+--------------------------+------------------+------------+------------+-------------+--------------+--+
28 rows selected (0.063 seconds)
0: jdbc:hive2://hadoop-node-1:10016> DELETE FROM TABLE default.test_table WHERE SEGMENT.ID IN ("0","1");
Error: org.apache.spark.sql.catalyst.analysis.NoSuchTableException: Table or view 'table' not found in database 'default'; (state=,code=0)
0: jdbc:hive2://hadoop-node-1:10016>

[SDK Optimization] Multiple SimpleDateFormat initialization in CarbonReader

I found that reading carbon files from CarbonReader takes long time in "SimpleDateFormat.", see attached file for output of
profiling.

SimpleDateFormat dateFormat = new SimpleDateFormat(carbonDateFormat);

I wonder if it is OK if we add some lazy initialization to SimpleDateFormat in the class, and if so should it support multi-threading.

profile.zip

[ SDK PERFORMANCE] The execution of the show tables command takes a long time.

image

As shown in the above figure, CarbonShowTablesCommand obtains metadata from metastore for each table. Currently, when there are 180,000 tables, it takes a long time (about 1 hours) to run the show tables command in spark-sql shell, which needs to be optimized.
When the filter function is not invoked, it takes about 12 seconds to obtain 180,000 tables by running the show tables command.As shown in the following figure.
image

Compile issues with spark 3.1

[ERROR] Failed to execute goal on project carbondata-hive: Could not resolve dependencies for project org.apache.carbondata:carbondata-hive:jar:2.3.1-SNAPSHOT: Failed to collect dependencies at org.apache.hive:hive-service:jar:3.1.0 -> org.apache.hive:hive-llap-server:jar:3.1.0 -> org.apache.hbase:hbase-server:jar:2.0.0-alpha4 -> org.glassfish.web:javax.servlet.jsp:jar:2.3.2 -> org.glassfish:javax.el:jar:3.0.1-b06-SNAPSHOT: Failed to read artifact descriptor for org.glassfish:javax.el:jar:3.0.1-b06-SNAPSHOT: Could not transfer artifact org.glassfish:javax.el:pom:3.0.1-b06-SNAPSHOT from/to jvnet-nexus-snapshots (https://maven.java.net/content/repositories/snapshots): sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR] mvn -rf :carbondata-hive
chenliangdeMacBook-Pro:carbondata2023 apple$ mvn -DskipTests -Pspark-3.1 clean package

There are some errors when run the test case with spark2.3

There are some errors when run the test case with spark2.3

- Test restructured array<timestamp> as index column on SI with compaction
2023-04-10 03:13:52 ERROR CarbonInternalMetastore$:254 - Adding/Modifying tableProperties operation failed: Recursive load
2023-04-10 03:13:53 ERROR CarbonInternalMetastore$:254 - Adding/Modifying tableProperties operation failed: Recursive load
- Test restructured array<string> and string columns as index columns on SI with compaction
2023-04-10 03:13:56 ERROR CarbonInternalMetastore$:254 - Adding/Modifying tableProperties operation failed: Recursive load
2023-04-10 03:13:56 ERROR CarbonInternalMetastore$:254 - Adding/Modifying tableProperties operation failed: Recursive load
- test array<string> on secondary index with compaction
2023-04-10 03:14:00 ERROR CarbonInternalMetastore$:254 - Adding/Modifying tableProperties operation failed: Recursive load
2023-04-10 03:14:00 ERROR CarbonInternalMetastore$:254 - Adding/Modifying tableProperties operation failed: Recursive load
- test array<string> and string as index columns on secondary index with compaction
- test load data with array<string> on secondary index
- test SI global sort with si segment merge enabled for complex data types
- test SI global sort with si segment merge enabled for newly added complex column
- test SI global sort with si segment merge enabled for primitive data types
- test SI global sort with si segment merge complex data types by rebuild command
- test SI global sort with si segment merge primitive data types by rebuild command
- test si creation with struct and map type
- test si creation with array
2023-04-10 03:14:26 ERROR CarbonInternalMetastore$:254 - Adding/Modifying tableProperties operation failed: Recursive load
2023-04-10 03:14:26 ERROR CarbonInternalMetastore$:254 - Adding/Modifying tableProperties operation failed: Recursive load
- test complex with null and empty data
- test array<date> on secondary index
- test array<timestamp> on secondary index
2023-04-10 03:14:31 ERROR CarbonInternalMetastore$:254 - Adding/Modifying tableProperties operation failed: Recursive load
2023-04-10 03:14:31 ERROR CarbonInternalMetastore$:254 - Adding/Modifying tableProperties operation failed: Recursive load
- test array<varchar> and varchar as index columns on secondary index
2023-04-10 03:14:34 ERROR CarbonInternalMetastore$:254 - Adding/Modifying tableProperties operation failed: Recursive load
2023-04-10 03:14:34 ERROR CarbonInternalMetastore$:254 - Adding/Modifying tableProperties operation failed: Recursive load
- test multiple SI with array and primitive type
2023-04-10 03:14:40 ERROR CarbonInternalMetastore$:254 - Adding/Modifying tableProperties operation failed: Recursive load
2023-04-10 03:14:40 ERROR CarbonInternalMetastore$:254 - Adding/Modifying tableProperties operation failed: Recursive load
- test SI complex with multiple array contains
TestCarbonInternalMetastore:
- test delete index silent
2023-04-10 03:14:43 ERROR CarbonInternalMetastore$:118 - Exception occurred while drop index table for : Some(test).unknown : Table or view 'unknown' not found in database 'test';
2023-04-10 03:14:43 ERROR CarbonInternalMetastore$:131 - Exception occurred while drop index table for : Some(test).index1 : Table or view 'index1' not found in database 'test';
- test delete index table silently when exception occur
org.apache.spark.sql.catalyst.analysis.NoSuchTableException: Table or view 'index1' not found in database 'test';
	at org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:81)
	at org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:81)
	at scala.Option.getOrElse(Option.scala:121)
	at org.apache.spark.sql.hive.client.HiveClient$class.getTable(HiveClient.scala:81)
	at org.apache.spark.sql.hive.client.HiveClientImpl.getTable(HiveClientImpl.scala:83)
	at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getRawTable$1.apply(HiveExternalCatalog.scala:118)
	at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getRawTable$1.apply(HiveExternalCatalog.scala:118)
	at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
	at org.apache.spark.sql.hive.HiveExternalCatalog.getRawTable(HiveExternalCatalog.scala:117)
	at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:684)
	at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:684)
	at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
	at org.apache.spark.sql.hive.HiveExternalCatalog.getTable(HiveExternalCatalog.scala:683)
	at org.apache.spark.sql.catalyst.catalog.SessionCatalog.lookupRelation(SessionCatalog.scala:674)
	at org.apache.spark.sql.hive.CarbonFileMetastore.lookupRelation(CarbonFileMetastore.scala:197)
	at org.apache.spark.sql.hive.CarbonFileMetastore.lookupRelation(CarbonFileMetastore.scala:191)
	at org.apache.spark.sql.secondaryindex.events.SIDropEventListener$$anonfun$onEvent$1.apply(SIDropEventListener.scala:69)
	at org.apache.spark.sql.secondaryindex.events.SIDropEventListener$$anonfun$onEvent$1.apply(SIDropEventListener.scala:65)
	at scala.collection.Iterator$class.foreach(Iterator.scala:893)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
	at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
	at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
	at org.apache.spark.sql.secondaryindex.events.SIDropEventListener.onEvent(SIDropEventListener.scala:65)
	at org.apache.carbondata.events.OperationListenerBus.fireEvent(OperationListenerBus.java:83)
	at org.apache.carbondata.events.package$.withEvents(package.scala:26)
	at org.apache.carbondata.events.package$.withEvents(package.scala:22)
	at org.apache.spark.sql.execution.command.table.CarbonDropTableCommand.processMetadata(CarbonDropTableCommand.scala:93)
	at org.apache.spark.sql.execution.command.AtomicRunnableCommand$$anonfun$run$3.apply(package.scala:160)
	at org.apache.spark.sql.execution.command.AtomicRunnableCommand$$anonfun$run$3.apply(package.scala:159)
	at org.apache.spark.sql.execution.command.Auditable$class.runWithAudit(package.scala:118)
	at org.apache.spark.sql.execution.command.AtomicRunnableCommand.runWithAudit(package.scala:155)
	at org.apache.spark.sql.execution.command.AtomicRunnableCommand.run(package.scala:159)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
	at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:190)
	at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:190)
	at org.apache.spark.sql.Dataset$$anonfun$51.apply(Dataset.scala:3265)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
	at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3264)
	at org.apache.spark.sql.Dataset.<init>(Dataset.scala:190)
	at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:75)
	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642)
	at org.apache.spark.sql.test.SparkTestQueryExecutor.sql(SparkTestQueryExecutor.scala:37)
	at org.apache.spark.sql.test.util.QueryTest.sql(QueryTest.scala:123)
	at org.apache.carbondata.spark.testsuite.secondaryindex.TestCarbonInternalMetastore.beforeEach(TestCarbonInternalMetastore.scala:49)
	at org.scalatest.BeforeAndAfterEach$class.runTest(BeforeAndAfterEach.scala:220)
	at org.apache.carbondata.spark.testsuite.secondaryindex.TestCarbonInternalMetastore.runTest(TestCarbonInternalMetastore.scala:33)
	at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:229)
	at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:229)
	at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:396)
	at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:384)
	at scala.collection.immutable.List.foreach(List.scala:381)
	at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:384)
	at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:379)
	at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:461)
	at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:229)
	at org.scalatest.FunSuite.runTests(FunSuite.scala:1560)
	at org.scalatest.Suite$class.run(Suite.scala:1147)
	at org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1560)
	at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:233)
	at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:233)
	at org.scalatest.SuperEngine.runImpl(Engine.scala:521)
	at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:233)
	at org.apache.carbondata.spark.testsuite.secondaryindex.TestCarbonInternalMetastore.org$scalatest$BeforeAndAfterAll$$super$run(TestCarbonInternalMetastore.scala:33)
	at org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:213)
	at org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:210)
	at org.apache.carbondata.spark.testsuite.secondaryindex.TestCarbonInternalMetastore.run(TestCarbonInternalMetastore.scala:33)
	at org.scalatest.Suite$class.callExecuteOnSuite$1(Suite.scala:1210)
	at org.scalatest.Suite$$anonfun$runNestedSuites$1.apply(Suite.scala:1257)
	at org.scalatest.Suite$$anonfun$runNestedSuites$1.apply(Suite.scala:1255)
	at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
	at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
	at org.scalatest.Suite$class.runNestedSuites(Suite.scala:1255)
	at org.scalatest.tools.DiscoverySuite.runNestedSuites(DiscoverySuite.scala:30)
	at org.scalatest.Suite$class.run(Suite.scala:1144)
	at org.scalatest.tools.DiscoverySuite.run(DiscoverySuite.scala:30)
	at org.scalatest.tools.SuiteRunner.run(SuiteRunner.scala:45)
	at org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$1.apply(Runner.scala:1340)
	at org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$1.apply(Runner.scala:1334)
	at scala.collection.immutable.List.foreach(List.scala:381)
	at org.scalatest.tools.Runner$.doRunRunRunDaDoRunRun(Runner.scala:1334)
	at org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1011)
	at org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1010)
	at org.scalatest.tools.Runner$.withClassLoaderAndDispatchReporter(Runner.scala:1500)
	at org.scalatest.tools.Runner$.runOptionallyWithPassFailReporter(Runner.scala:1010)
	at org.scalatest.tools.Runner$.main(Runner.scala:827)
	at org.scalatest.tools.Runner.main(Runner.scala)
- test show index when SI were created before the change CARBONDATA-3765
- test refresh index with different value of isIndexTableExists
- test refresh index with indexExists as false and empty index table
- test refresh index with indexExists as null
Run completed in 15 minutes, 36 seconds.
Total number of tests run: 283
Suites: completed 32, aborted 0
Tests: succeeded 282, failed 1, canceled 0, ignored 1, pending 0
*** 1 TEST FAILED ***
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO] 
[INFO] Apache CarbonData :: Parent ........................ SUCCESS [  2.509 s]
[INFO] Apache CarbonData :: Common ........................ SUCCESS [ 15.990 s]
[INFO] Apache CarbonData :: Format ........................ SUCCESS [ 32.657 s]
[INFO] Apache CarbonData :: Core .......................... SUCCESS [01:32 min]
[INFO] Apache CarbonData :: Processing .................... SUCCESS [ 33.903 s]
[INFO] Apache CarbonData :: Hadoop ........................ SUCCESS [ 22.800 s]
[INFO] Apache CarbonData :: Materialized View Plan ........ SUCCESS [01:15 min]
[INFO] Apache CarbonData :: Hive .......................... SUCCESS [02:05 min]
[INFO] Apache CarbonData :: SDK ........................... SUCCESS [02:03 min]
[INFO] Apache CarbonData :: CLI ........................... SUCCESS [05:03 min]
[INFO] Apache CarbonData :: Lucene Index .................. SUCCESS [ 22.601 s]
[INFO] Apache CarbonData :: Bloom Index ................... SUCCESS [ 12.992 s]
[INFO] Apache CarbonData :: Geo ........................... SUCCESS [ 23.719 s]
[INFO] Apache CarbonData :: Streaming ..................... SUCCESS [ 33.608 s]
[INFO] Apache CarbonData :: Spark ......................... FAILURE [  01:27 h]
[INFO] Apache CarbonData :: Secondary Index ............... FAILURE [16:28 min]
[INFO] Apache CarbonData :: Index Examples ................ SUCCESS [ 11.280 s]
[INFO] Apache CarbonData :: Flink Proxy ................... SUCCESS [ 15.864 s]
[INFO] Apache CarbonData :: Flink ......................... SUCCESS [05:29 min]
[INFO] Apache CarbonData :: Flink Build ................... SUCCESS [  5.949 s]
[INFO] Apache CarbonData :: Presto ........................ SUCCESS [02:37 min]
[INFO] Apache CarbonData :: Examples ...................... SUCCESS [02:54 min]
[INFO] Apache CarbonData :: Flink Examples ................ SUCCESS [  8.313 s]
[INFO] Apache CarbonData :: Assembly ...................... FAILURE [ 14.763 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 01:54 h (Wall Clock)
[INFO] Finished at: 2023-04-10T03:14:53+08:00
[INFO] Final Memory: 245M/2221M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.scalatest:scalatest-maven-plugin:1.0:test (test) on project carbondata-spark_2.3: There are test failures -> [Help 1]
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-shade-plugin:2.4.3:shade (default) on project carbondata-assembly: Error creating shaded jar: /Users/xubo/Desktop/xubo/git/carbondata1/integration/spark/target/classes (Is a directory) -> [Help 2]
[ERROR] Failed to execute goal org.scalatest:scalatest-maven-plugin:1.0:test (test) on project carbondata-secondary-index: There are test failures -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR] [Help 2] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <goals> -rf :carbondata-spark_2.3
[INFO] Build failures were ignored.

Process finished with exit code 0

Spark:


�[32mAlterTableColumnRenameTestCase:�[0m
�[32m- test only column rename operation�[0m
�[32m- CARBONDATA-4053 test rename column, column name in table properties changed correctly�[0m
�[32m- Rename more than one column at a time in one operation�[0m
�[32m- rename complex columns with invalid structure/duplicate-names/Map-type�[0m
�[31m- test alter rename struct of (primitive/struct/array) *** FAILED ***�[0m
�[31m  Results do not match for query:�[0m
�[31m  == Parsed Logical Plan ==�[0m
�[31m  'Project ['str33.a22]�[0m
�[31m  +- 'UnresolvedRelation `test_rename`�[0m
  
�[31m  == Analyzed Logical Plan ==�[0m
�[31m  a22: struct<b11:int>�[0m
�[31m  Project [str33#74649.a22 AS a22#74653]�[0m
�[31m  +- SubqueryAlias test_rename�[0m
�[31m     +- Relation[str1#74648,str33#74649,str3#74650,intfield#74651] CarbonDatasourceHadoopRelation�[0m
  
�[31m  == Optimized Logical Plan ==�[0m
�[31m  Project [str33#74649.a22 AS a22#74653]�[0m
�[31m  +- Relation[str1#74648,str33#74649,str3#74650,intfield#74651] CarbonDatasourceHadoopRelation�[0m
  
�[31m  == Physical Plan ==�[0m
�[31m  *(1) Project [str33#74649.a22 AS a22#74653]�[0m
�[31m  +- *(1) Scan CarbonDatasourceHadoopRelation default.test_rename[str33#74649] Batched: false, DirectScan: false, PushedFilters: [], ReadSchema: [str33.a22]�[0m
�[31m  == Results ==�[0m
�[31m  !== Correct Answer - 2 ==   == Spark Answer - 2 ==�[0m
�[31m  ![[2]]                      [[3]]�[0m
�[31m  ![[3]]                      [null] (QueryTest.scala:93)�[0m
�[32m- test alter rename array of (primitive/array/struct)�[0m
�[32m- test alter rename and change datatype for map of (primitive/array/struct)�[0m
�[32m- test alter rename and change datatype for struct integer�[0m
�[32m- test alter rename and change datatype for map integer�[0m




[32m- test LocalDictionary with True�[0m
�[31m- test LocalDictionary with custom Threshold *** FAILED ***�[0m
�[31m  scala.this.Predef.Boolean2boolean(org.apache.carbondata.core.util.CarbonTestUtil.checkForLocalDictionary(org.apache.carbondata.core.util.CarbonTestUtil.getDimRawChunk(TestNonTransactionalCarbonTable.this.writerPath, scala.this.Predef.int2Integer(0)))) was false (TestNonTransactionalCarbonTable.scala:2447)�[0m
�[32m- test Local Dictionary with FallBack�[0m
�[32m- test local dictionary with External Table data load �[0m
�[33m- test inverted index column by API !!! IGNORED !!!�[0m
�[32m- test Local Dictionary with Default�[0m
�[32m- Test with long string columns with 1 MB pageSize�[0m
�[32mIntegerDataTypeTestCase:�[0m
�[32m- select empno from integertypetablejoin�[0m
�[32mVarcharDataTypesBasicTestCase:�[0m
�[32m- long string columns cannot be sort_columns�[0m
�[32m- long string columns can only be string columns�[0m
�[32m- cannot alter sort_columns dataType to long_string_columns�[0m
�[32m- check compaction after altering range column dataType to longStringColumn�[0m
�[32m- long string columns cannot contain duplicate columns�[0m
�[32m- long_string_columns: column does not exist in table �[0m
�[32m- long_string_columns: columns cannot exist in partitions columns�[0m
�[32m- long_string_columns: columns cannot exist in no_inverted_index columns�[0m
�[32m- test alter table properties for long string columns�[0m 


�[32m- test duplicate columns with select query�[0m
�[36mRun completed in 1 hour, 23 minutes, 47 seconds.�[0m
�[36mTotal number of tests run: 3430�[0m
�[36mSuites: completed 302, aborted 0�[0m
�[36mTests: succeeded 3428, failed 2, canceled 0, ignored 82, pending 0�[0m
�[31m*** 2 TESTS FAILED ***�[0m

index:

[32mCarbonIndexFileMergeTestCaseWithSI:�[0m
�[32m- Verify correctness of index merge�[0m
�[33m- Verify command of index merge !!! IGNORED !!!�[0m
�[31m- Verify command of index merge without enabling property *** FAILED ***�[0m
�[31m  org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 2001.0 failed 1 times, most recent failure: Lost task 1.0 in stage 2001.0 (TID 79695, localhost, executor driver): java.lang.RuntimeException: Failed to merge index files in path: /Users/xubo/Desktop/xubo/git/carbondata1/integration/spark/target/warehouse/nonindexmerge/Fact/Part0/Segment_1. Table status update with mergeIndex file has failed�[0m
�[31m	at org.apache.carbondata.core.writer.CarbonIndexFileMergeWriter.mergeCarbonIndexFilesOfSegment(CarbonIndexFileMergeWriter.java:122)�[0m
�[31m	at org.apache.carbondata.core.writer.CarbonIndexFileMergeWriter.mergeCarbonIndexFilesOfSegment(CarbonIndexFileMergeWriter.java:386)�[0m
�[31m	at org.apache.spark.rdd.CarbonMergeFilesRDD$$anon$1.<init>(CarbonMergeFilesRDD.scala:322)�[0m
�[31m	at org.apache.spark.rdd.CarbonMergeFilesRDD.internalCompute(CarbonMergeFilesRDD.scala:287)�[0m
�[31m	at org.apache.carbondata.spark.rdd.CarbonRDD.compute(CarbonRDD.scala:84)�[0m
�[31m	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)�[0m
�[31m	at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)�[0m
�[31m	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)�[0m
�[31m	at org.apache.spark.scheduler.Task.run(Task.scala:109)�[0m
�[31m	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)�[0m
�[31m	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)�[0m
�[31m	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)�[0m
�[31m	at java.lang.Thread.run(Thread.java:748)�[0m
�[31mCaused by: java.io.IOException: Table status update with mergeIndex file has failed�[0m
�[31m	at org.apache.carbondata.core.writer.CarbonIndexFileMergeWriter.writeMergeIndexFileBasedOnSegmentFile(CarbonIndexFileMergeWriter.java:327)�[0m
�[31m	at org.apache.carbondata.core.writer.CarbonIndexFileMergeWriter.mergeCarbonIndexFilesOfSegment(CarbonIndexFileMergeWriter.java:114)�[0m
�[31m	... 12 more�[0m
�[31m�[0m
�[31mDriver stacktrace:�[0m
�[31m  at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1661)�[0m
�[31m  at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1649)�[0m
�[31m  at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1648)�[0m
�[31m  at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)�[0m
�[31m  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)�[0m
�[31m  at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1648)�[0m
�[31m  at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)�[0m
�[31m  at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)�[0m
�[31m  at scala.Option.foreach(Option.scala:257)�[0m
�[31m  at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:831)�[0m
�[31m  ...�[0m
�[31m  Cause: java.lang.RuntimeException: Failed to merge index files in path: /Users/xubo/Desktop/xubo/git/carbondata1/integration/spark/target/warehouse/nonindexmerge/Fact/Part0/Segment_1. Table status update with mergeIndex file has failed�[0m
�[31m  at org.apache.carbondata.core.writer.CarbonIndexFileMergeWriter.mergeCarbonIndexFilesOfSegment(CarbonIndexFileMergeWriter.java:122)�[0m
�[31m  at org.apache.carbondata.core.writer.CarbonIndexFileMergeWriter.mergeCarbonIndexFilesOfSegment(CarbonIndexFileMergeWriter.java:386)�[0m
�[31m  at org.apache.spark.rdd.CarbonMergeFilesRDD$$anon$1.<init>(CarbonMergeFilesRDD.scala:322)�[0m
�[31m  at org.apache.spark.rdd.CarbonMergeFilesRDD.internalCompute(CarbonMergeFilesRDD.scala:287)�[0m
�[31m  at org.apache.carbondata.spark.rdd.CarbonRDD.compute(CarbonRDD.scala:84)�[0m
�[31m  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)�[0m
�[31m  at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)�[0m
�[31m  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)�[0m
�[31m  at org.apache.spark.scheduler.Task.run(Task.scala:109)�[0m
�[31m  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)�[0m
�[31m  ...�[0m
�[31m  Cause: java.io.IOException: Table status update with mergeIndex file has failed�[0m
�[31m  at org.apache.carbondata.core.writer.CarbonIndexFileMergeWriter.writeMergeIndexFileBasedOnSegmentFile(CarbonIndexFileMergeWriter.java:327)�[0m
�[31m  at org.apache.carbondata.core.writer.CarbonIndexFileMergeWriter.mergeCarbonIndexFilesOfSegment(CarbonIndexFileMergeWriter.java:114)�[0m
�[31m  at org.apache.carbondata.core.writer.CarbonIndexFileMergeWriter.mergeCarbonIndexFilesOfSegment(CarbonIndexFileMergeWriter.java:386)�[0m
�[31m  at org.apache.spark.rdd.CarbonMergeFilesRDD$$anon$1.<init>(CarbonMergeFilesRDD.scala:322)�[0m
�[31m  at org.apache.spark.rdd.CarbonMergeFilesRDD.internalCompute(CarbonMergeFilesRDD.scala:287)�[0m
�[31m  at org.apache.carbondata.spark.rdd.CarbonRDD.compute(CarbonRDD.scala:84)�[0m
�[31m  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)�[0m
�[31m  at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)�[0m
�[31m  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)�[0m
�[31m  at org.apache.spark.scheduler.Task.run(Task.scala:109)�[0m
�[31m  ...�[0m
�[32m- Verify index index merge with compaction�[0m
�[32m- Verify index index merge for compacted segments�[0m
�[32m- test refresh index with indexExists as null�[0m
�[36mRun completed in 15 minutes, 36 seconds.�[0m
�[36mTotal number of tests run: 283�[0m
�[36mSuites: completed 32, aborted 0�[0m
�[36mTests: succeeded 282, failed 1, canceled 0, ignored 1, pending 0�[0m
�[31m*** 1 TEST FAILED ***�[0m

Cannot create table with partitions in Spark in EMR

I am running spark in EMR

Release label:emr-5.24.1
Hadoop distribution:Amazon 2.8.5
Applications:
Hive 2.3.4, Pig 0.17.0, Hue 4.4.0, Flink 1.8.0, Spark 2.4.2, Presto 0.219, JupyterHub 0.9.6

Jar complied with:

apache-carbondata:2.2.0
spark:2.4.5
hadoop:2.8.3

the spark configured to use AWS Glue

When trying to create a table like this:

CREATE TABLE IF NOT EXISTS will_not_work(
timestamp string,
name string
)
PARTITIONED BY (dt string, hr string)
STORED AS carbondata
LOCATION 's3a://my-bucket/CarbonDataTests/will_not_work

I get the following error:

org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException: Partition is not supported for external table
  at org.apache.spark.sql.parser.CarbonSparkSqlParserUtil$.buildTableInfoFromCatalogTable(CarbonSparkSqlParserUtil.scala:219)
  at org.apache.spark.sql.CarbonSource$.createTableInfo(CarbonSource.scala:235)
  at org.apache.spark.sql.CarbonSource$.createTableMeta(CarbonSource.scala:394)
  at org.apache.spark.sql.execution.command.table.CarbonCreateDataSourceTableCommand.processMetadata(CarbonCreateDataSourceTableCommand.scala:69)
  at org.apache.spark.sql.execution.command.MetadataCommand$$anonfun$run$1.apply(package.scala:137)
  at org.apache.spark.sql.execution.command.MetadataCommand$$anonfun$run$1.apply(package.scala:137)
  at org.apache.spark.sql.execution.command.Auditable$class.runWithAudit(package.scala:118)
  at org.apache.spark.sql.execution.command.MetadataCommand.runWithAudit(package.scala:134)
  at org.apache.spark.sql.execution.command.MetadataCommand.run(package.scala:137)
  at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
  at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
  at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
  at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194)
  at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194)
  at org.apache.spark.sql.Dataset$$anonfun$53.apply(Dataset.scala:3364)
  at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
  at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
  at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
  at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3363)
  at org.apache.spark.sql.Dataset.<init>(Dataset.scala:194)
  at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:79)
  at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:643)
  ... 64 elided

How to build successful of presto 333 version ?

@ajantha-bhat hello,I always build fail for your presto version about 333,I don't know what's problem,it must be use jdk11 to build it ? My mvn build command is my build command is mvn -DskipTests -Pspark-2.4 -Pprestosql -Dspark.version=2.4.5 -Dhadoop.version=2.7.7 -Dhive.version=3.1.0 ,but it say error is: has been compiled by a more recent version of the Java Runtime (class file version 55.0), this version of the Java Runtime only recognizes class file versions up to 52.0 . This is mean not support jdk8

Compile issue with spark2.3

I want to compile CarbonData with the document: https://github.com/apache/carbondata/tree/master/build.

mvn command:
clean -B -Pspark-2.3 -Pbuild-with-format -DskipTests install

but it failed:

[WARNING] Could not transfer metadata org.glassfish:javax.el:3.0.1-b08-SNAPSHOT/maven-metadata.xml from/to jvnet-nexus-snapshots (https://maven.java.net/content/repositories/snapshots): sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
[WARNING] Failure to transfer org.glassfish:javax.el:3.0.1-b08-SNAPSHOT/maven-metadata.xml from https://maven.java.net/content/repositories/snapshots was cached in the local repository, resolution will not be reattempted until the update interval of jvnet-nexus-snapshots has elapsed or updates are forced. Original error: Could not transfer metadata org.glassfish:javax.el:3.0.1-b08-SNAPSHOT/maven-metadata.xml from/to jvnet-nexus-snapshots (https://maven.java.net/content/repositories/snapshots): sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
Downloading: https://maven.java.net/content/repositories/snapshots/org/glassfish/javax.el/3.0.1-b08-SNAPSHOT/javax.el-3.0.1-b08-SNAPSHOT.pom
Downloading: https://maven.java.net/content/repositories/snapshots/org/glassfish/javax.el/3.0.1-b11-SNAPSHOT/maven-metadata.xml
[WARNING] Could not transfer metadata org.glassfish:javax.el:3.0.1-b11-SNAPSHOT/maven-metadata.xml from/to jvnet-nexus-snapshots (https://maven.java.net/content/repositories/snapshots): sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
[WARNING] Failure to transfer org.glassfish:javax.el:3.0.1-b11-SNAPSHOT/maven-metadata.xml from https://maven.java.net/content/repositories/snapshots was cached in the local repository, resolution will not be reattempted until the update interval of jvnet-nexus-snapshots has elapsed or updates are forced. Original error: Could not transfer metadata org.glassfish:javax.el:3.0.1-b11-SNAPSHOT/maven-metadata.xml from/to jvnet-nexus-snapshots (https://maven.java.net/content/repositories/snapshots): sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO] 
[INFO] Apache CarbonData :: Parent ........................ SUCCESS [  3.005 s]
[INFO] Apache CarbonData :: Common ........................ SUCCESS [ 12.142 s]
[INFO] Apache CarbonData :: Format ........................ SUCCESS [ 21.524 s]
[INFO] Apache CarbonData :: Core .......................... SUCCESS [ 55.579 s]
[INFO] Apache CarbonData :: Processing .................... SUCCESS [ 21.825 s]
[INFO] Apache CarbonData :: Hadoop ........................ SUCCESS [ 13.909 s]
[INFO] Apache CarbonData :: Materialized View Plan ........ SUCCESS [01:07 min]
[INFO] Apache CarbonData :: Hive .......................... FAILURE [ 18.494 s]
[INFO] Apache CarbonData :: SDK ........................... SKIPPED
[INFO] Apache CarbonData :: CLI ........................... SKIPPED
[INFO] Apache CarbonData :: Lucene Index .................. SKIPPED
[INFO] Apache CarbonData :: Bloom Index ................... SKIPPED
[INFO] Apache CarbonData :: Geo ........................... SKIPPED
[INFO] Apache CarbonData :: Streaming ..................... SKIPPED
[INFO] Apache CarbonData :: Spark ......................... SKIPPED
[INFO] Apache CarbonData :: Secondary Index ............... SKIPPED
[INFO] Apache CarbonData :: Index Examples ................ SKIPPED
[INFO] Apache CarbonData :: Flink Proxy ................... SKIPPED
[INFO] Apache CarbonData :: Flink ......................... SKIPPED
[INFO] Apache CarbonData :: Flink Build ................... SKIPPED
[INFO] Apache CarbonData :: Presto ........................ SKIPPED
[INFO] Apache CarbonData :: Examples ...................... SKIPPED
[INFO] Apache CarbonData :: Flink Examples ................ SKIPPED
[INFO] Apache CarbonData :: Assembly ...................... SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 03:34 min
[INFO] Finished at: 2023-04-09T00:12:51+08:00
[INFO] Final Memory: 99M/867M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal on project carbondata-hive: Could not resolve dependencies for project org.apache.carbondata:carbondata-hive:jar:2.3.1-SNAPSHOT: Failed to collect dependencies at org.apache.hive:hive-jdbc:jar:3.1.0 -> org.apache.hive:hive-service:jar:3.1.0 -> org.apache.hive:hive-llap-server:jar:3.1.0 -> org.apache.hbase:hbase-server:jar:2.0.0-alpha4 -> org.glassfish.web:javax.servlet.jsp:jar:2.3.2 -> org.glassfish:javax.el:jar:3.0.1-b06-SNAPSHOT: Failed to read artifact descriptor for org.glassfish:javax.el:jar:3.0.1-b06-SNAPSHOT: Could not transfer artifact org.glassfish:javax.el:pom:3.0.1-b06-SNAPSHOT from/to jvnet-nexus-snapshots (https://maven.java.net/content/repositories/snapshots): sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:

tez will report an error

Data can only be read through hive. If you use hive to write input, tez will report an error.

Caused by: java.lang.RuntimeException: Failed to load plan: hdfs://hadoop-node-1:8020/tmp/hive/hdfs/010e1336-6251-4157-9499-e15efce79293/hive_2021-07-07_16-38-01_759_6301054491957594370-1/40e54cbd-439d-4e35-979a-fdc38dfa680f/map.xml: org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find class: org.apache.carbondata.hive.MapredCarbonInputFormat

Why opened task less than available executors in case of insert into/load data

In case of insert into or load data, the total number of tasks in the stage is almost equal to the number of hosts, and in general it is much smaller than the available executors. The low parallelism of the stage results in slower execution. Why must the parallelism be constrained on the distinct host? Can start more tasks to increase parallelism and improve resource utilization? Thanks

org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala: loadDataFrame

  /**
   * Execute load process to load from input dataframe
   */
  private def loadDataFrame(
      sqlContext: SQLContext,
      dataFrame: Option[DataFrame],
      carbonLoadModel: CarbonLoadModel
  ): Array[(String, (LoadMetadataDetails, ExecutionErrors))] = {
    try {
      val rdd = dataFrame.get.rdd
      val nodeNumOfData = rdd.partitions.flatMap[String, Array[String]] { p =>
        DataLoadPartitionCoalescer.getPreferredLocs(rdd, p).map(_.host)
      }.distinct.length
      val nodes = DistributionUtil.ensureExecutorsByNumberAndGetNodeList(
        nodeNumOfData,
        sqlContext.sparkContext) 
      val newRdd = new DataLoadCoalescedRDD[Row](sqlContext.sparkSession, rdd, nodes.toArray
        .distinct)

      new NewDataFrameLoaderRDD(
        sqlContext.sparkSession,
        new DataLoadResultImpl(),
        carbonLoadModel,
        newRdd
      ).collect()
    } catch {
      case ex: Exception =>
        LOGGER.error("load data frame failed", ex)
        throw ex
    }
  }

ANTLR Tool Incompatible version

spark version 2.3.4 use ANTLR Tool version 4.7 ,but carbondata use ANTLR 4.8
An error occurred in the spark sql , please use version 4.7
error log
ANTLR Tool version 4.7 used for code generation does not match the current runtime version 4.8ANTLR Runtime version 4.7 used for parser compilation does not match the current runtime version 4.8ANTLR Tool version 4.7 used for code generation does not match the current runtime version 4.8ANTLR Runtime version 4.7 used for parser compilation does not match the current runtime version 4.8Error in query:
Operation not allowed: STORED AS with file format 'carbondata'(line 6, pos 10)

Integration with Spark

Will Carbonata be integrated with Spark in the future? Can Spark version 3.1.2 be integrated with Carbonata

use java 11 build spark 3.1 failed

use the follow command to build carbondata, got error message as attachment show.

mvn -DskipTests -Dfindbugs.skip=true -Dcheckstyle.skip=true -Pspark-3.1 -Pbuild-with-format clean package install

java version:
openjdk version "11.0.2" 2019-01-15
OpenJDK Runtime Environment 18.9 (build 11.0.2+9)
OpenJDK 64-Bit Server VM 18.9 (build 11.0.2+9, mixed mode)

carbon-sdk support hdfs ?

Writing carbondata files from other application which does not use Spark,it is support hdfs configure? how can i write the carbondata to hdfs system?

FusionInsightHD 6518 spark2.3.2 carbon-2.0.0 skewedJoin adaptive execution no use.

spark.sql.adaptive.enabled=true
spark.sql.adaptive.skewedJoin.enabled=true
spark.sql.adaptive.skewedPartitionMaxSplits=5
spark.sql.adaptive.skewedPartitionRowCountThreshold=10000000
spark.sql.adaptive.skewedPartitionSizeThreshold=67108864
spark.sql.adaptive.skewedPartitionFactor : 5

--- In Spark2x JDBC no use for it.

t1 left join t2 on t1.id = t2.id column id has one key, for example 0000-00-00 ,has 100,000 records t2 has same key in column id also has 100,000 records ,this will generate 100000*100000 = 10B records!! for only one reducer.

carbon solution no use for it,please check it. -- call hw.

Cannot Insert data to table with a partitions in Spark in EMR

After 42f6982
I have successfully created a table with partitions, but when I trying insert data the job end with a success
but the segment is marked as "Marked for Delete"

I am running:

CREATE TABLE lior_carbon_tests.mark_for_del_bug(
timestamp string,
name string
)
STORED AS carbondata
PARTITIONED BY (dt string, hr string)
INSERT INTO lior_carbon_tests.mark_for_del_bug select '2021-07-07T13:23:56.012+00:00','spark','2021-07-07','13'
select * from lior_carbon_tests.mark_for_del_bug

gives

+---------+----+---+---+
|timestamp|name| dt| hr|
+---------+----+---+---+
+---------+----+---+---+

And

show segments for TABLE lior_carbon_tests.mark_for_del_bug

gives

+---+-----------------+-----------------------+---------------+---------+---------+----------+-----------+
|ID |Status           |Load Start Time        |Load Time Taken|Partition|Data Size|Index Size|File Format|
+---+-----------------+-----------------------+---------------+---------+---------+----------+-----------+
|0  |Marked for Delete|2021-09-02 15:24:21.022|11.798S        |NA       |NA       |NA        |columnar_v3|
+---+-----------------+-----------------------+---------------+---------+---------+----------+-----------+

I took a looking at the folder structure in S3 and it seems fine

Failed to execute goal org.codehaus.mojo:findbugs-maven-plugin:3.0.4:check

when I change hadoop version from 2.7.2 to 3.0.0 while compileing the source the error is coming : Failed to execute goal org.codehaus.mojo:findbugs-maven-plugin:3.0.4:check (analyze-compile) on project carbondata-core: failed with 1 bugs and 0 errors -> [Help 1]
[ERROR]

has anyone encountered ?

carbonReader read incomplete data

Look at this line of code. boolean hasNext = currentReader.nextKeyValue();
If hasNext returns false and currentReader is not the last one, it indicates that the iterator exits and subsequent data is not parsed. How to solve this problem?

/**

   * Return true if has next row
   */
  public boolean hasNext() throws IOException, InterruptedException {
    if (0 == readers.size() || currentReader == null) {
      return false;
    }
    validateReader();
    if (currentReader.nextKeyValue()) {
      return true;
    } else {
      if (index == readers.size() - 1) {
        // no more readers
        return false;
      } else {
        // current reader is closed
        currentReader.close();
        // no need to keep a reference to CarbonVectorizedRecordReader,
        // until all the readers are processed.
        // If readers count is very high,
        // we get OOM as GC not happened for any of the content in CarbonVectorizedRecordReader
        readers.set(index, null);
        index++;
        currentReader = readers.get(index);
        boolean hasNext = currentReader.nextKeyValue();
        if (hasNext) {
          return true;
        }
      }
    }
    return false;
  }

【feat】Does it already support spark 3.2 or above?

Is it compatible with spark 3.2 and above?

Is this project still being maintained and upgraded? I see there has been no update for more than 3 months.

Will there be any major feature changes in the future?

Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.