Git Product home page Git Product logo

datax's People

Contributors

asdf2014 avatar binaryworld avatar cch1996 avatar heljoyliu avatar kevinwangcs avatar lw309637554 avatar ryan-mei avatar sufism avatar trafalgarluo avatar wanda1416 avatar wuchase avatar zhongjiajie avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

datax's Issues

JF1367818433

你好,请问怎么使用DATAX读取parquet格式的数据呢?

自检失败呢

源码编译后 进入target/datax/datax,运行 python bin/datax.py job/job.json
报错:
2020-09-25 20:27:24.531 [main] INFO ErrorRecordChecker - percentage使用标准的百分比(配置值忽略百分号),如 [45.45%] 的配置为:"percentage": 45.45
2020-09-25 20:27:24.532 [main] INFO ErrorRecordChecker - 配置了 errorLimit.record, 其优先级高于 errorLimit.percentage 会将覆盖 errorLimit.percentage
2020-09-25 20:27:24.533 [main] WARN Engine - prioriy set to 0, because NumberFormatException, the value is: null
2020-09-25 20:27:24.534 [main] INFO PerfTrace - PerfTrace traceId=job_-1, isEnable=false, priority=0
2020-09-25 20:27:24.535 [main] INFO JobContainer - DataX jobContainer starts job.
2020-09-25 20:27:24.537 [main] INFO JobContainer - Set jobId = 0
2020-09-25 20:27:24.559 [job-0] INFO JobContainer - jobContainer starts to do prepare ...
2020-09-25 20:27:24.560 [job-0] INFO JobContainer - DataX Reader.Job [streamreader] do prepare work .
2020-09-25 20:27:24.561 [job-0] INFO JobContainer - DataX Writer.Job [streamwriter] do prepare work .
2020-09-25 20:27:24.561 [job-0] INFO JobContainer - jobContainer starts to do split ...
2020-09-25 20:27:24.568 [job-0] ERROR JobContainer - Exception when job run
com.alibaba.datax.common.exception.DataXException: Code:[Framework-03], Description:[DataX引擎配置错误,该问题通常是由于DataX安装错误引起,请联系您的运维解决 .]. - 在有总bps限速条件下,单个channel的bps值不能为空,也不能为非正数
at com.alibaba.datax.common.exception.DataXException.asDataXException(DataXException.java:26) ~[datax-common-0.0.1-SNAPSHOT.jar:na]
at com.alibaba.datax.core.job.JobContainer.adjustChannelNumber(JobContainer.java:430) ~[datax-core-0.0.1-SNAPSHOT.jar:na]
at com.alibaba.datax.core.job.JobContainer.split(JobContainer.java:387) ~[datax-core-0.0.1-SNAPSHOT.jar:na]
at com.alibaba.datax.core.job.JobContainer.start(JobContainer.java:117) ~[datax-core-0.0.1-SNAPSHOT.jar:na]
at com.alibaba.datax.core.Engine.start(Engine.java:92) [datax-core-0.0.1-SNAPSHOT.jar:na]
at com.alibaba.datax.core.Engine.entry(Engine.java:171) [datax-core-0.0.1-SNAPSHOT.jar:na]
at com.alibaba.datax.core.Engine.main(Engine.java:204) [datax-core-0.0.1-SNAPSHOT.jar:na]
2020-09-25 20:27:24.576 [job-0] INFO StandAloneJobContainerCommunicator - Total 0 records, 0 bytes | Speed 0B/s, 0 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.000s | All Task WaitReaderTime 0.000s | Percentage 0.00%
2020-09-25 20:27:24.577 [job-0] ERROR Engine -

经DataX智能分析,该任务最可能的错误原因是:
com.alibaba.datax.common.exception.DataXException: Code:[Framework-03], Description:[DataX引擎配置错误,该问题通常是由于DataX安装错误引起,请联系您的运维解决 .]. - 在有总bps限速条件下,单个channel的bps值不能为空,也不能为非正数
at com.alibaba.datax.common.exception.DataXException.asDataXException(DataXException.java:26)
at com.alibaba.datax.core.job.JobContainer.adjustChannelNumber(JobContainer.java:430)
at com.alibaba.datax.core.job.JobContainer.split(JobContainer.java:387)
at com.alibaba.datax.core.job.JobContainer.start(JobContainer.java:117)
at com.alibaba.datax.core.Engine.start(Engine.java:92)
at com.alibaba.datax.core.Engine.entry(Engine.java:171)
at com.alibaba.datax.core.Engine.main(Engine.java:204)

谢谢

elasticsearchwriter支持自定义日期类型 无效

输出信息

2020-12-17 19:52:52.083 [0-0-0-writer] ERROR StdoutPluginCollector - 脏数据:
{"message":"status:[400], error: {\"type\":\"mapper_parsing_exception\",\"reason\":\"failed to parse field [created_at] of type [date] in document with id '11'. Preview of field's value: '2020-10-28T11:16:06.000+08:00'\",\"caused_by\":{\"type\":\"illegal_argument_exception\",\"reason\":\"failed to parse date field [2020-10-28T11:16:06.000+08:00] with format [yyyy-MM-dd HH:mm:ss]\",\"caused_by\":{\"type\":\"date_time_parse_exception\",\"reason\":\"Text '2020-10-28T11:16:06.000+08:00' could not be parsed at index 10\"}}}","record":[{"byteSize":2,"index":0,"rawData":11,"type":"LONG"},{"byteSize":7,"index":1,"rawData":"手机端淘宝首页","type":"STRING"},{"byteSize":5,"index":2,"rawData":15369,"type":"LONG"},{"byteSize":4,"index":3,"rawData":2929,"type":"LONG"},{"byteSize":8,"index":4,"rawData":1603854966000,"type":"DATE"}],"type":"writer"}
2020-12-17 19:52:52.085 [0-0-0-writer] ERROR StdoutPluginCollector - 脏数据:
{"message":"status:[400], error: {\"type\":\"mapper_parsing_exception\",\"reason\":\"failed to parse field [created_at] of type [date] in document with id '16'. Preview of field's value: '2020-10-28T11:16:06.000+08:00'\",\"caused_by\":{\"type\":\"illegal_argument_exception\",\"reason\":\"failed to parse date field [2020-10-28T11:16:06.000+08:00] with format [yyyy-MM-dd HH:mm:ss]\",\"caused_by\":{\"type\":\"date_time_parse_exception\",\"reason\":\"Text '2020-10-28T11:16:06.000+08:00' could not be parsed at index 10\"}}}","record":[{"byteSize":2,"index":0,"rawData":16,"type":"LONG"},{"byteSize":4,"index":1,"rawData":"过年海报","type":"STRING"},{"byteSize":6,"index":2,"rawData":104630,"type":"LONG"},{"byteSize":4,"index":3,"rawData":5888,"type":"LONG"},{"byteSize":8,"index":4,"rawData":1603854966000,"type":"DATE"}],"type":"writer"}
2020-12-17 19:52:52.086 [0-0-0-writer] ERROR StdoutPluginCollector - 脏数据:

配置



        "reader": {
                    "name": "mysqlreader",
                    "parameter": {
                        "username": "",
                        "password": "",
                        "connection": [
                            {
                                "querySql": [
                                    "select id,keyword,scount,icount,created_at from t"
                                ],
                                "jdbcUrl": [
                                    ""
                                ]
                            }
                        ]
                    }
                },

 "writer": {
                    "name": "elasticsearchwriter",
                    "parameter": {
                        "endpoint": "http://127.0.0.1:9200",
                        "index": "aaa",
                        "type": "_doc",
                        "cleanup": false,
                        "discovery": false,
                        "batchSize": 1000,
                        "splitter": ",",
                        "dynamic": true,
                        "column" : [
                            {"name": "id", "type": "id"},
                            {"name": "keyword", "type": "text", "analyzer": "ccc"},
                            {"name": "scount", "type": "integer"},
                            {"name": "icount", "type": "integer"},
                            {"name": "created_at", "type": "date", "fromFormat": "yyyy-MM-dd HH:mm:ss"}
                        ]
                    }
                }

ES

"mappings": {
    "properties": {
      "icount": {
          "type": "integer"
        },
      "scount": {
          "type": "integer"
        },
      "created_at": {
        "format": "yyyy-MM-dd HH:mm:ss",
        "type": "date"
      },
      "id": {
        "type": "integer"
      },
      "keyword": {
        "analyzer": "ccc",
        "type": "text"
      }
    }  
  }

数据格式

insert into t ( `icount`, `pinyin`, `scount`, `created_at`, `keyword`, `updated_at`) values ( '253300', 'beijing', '14432285', '2020-10-28 11:16:06', '背景', '2020-10-28 11:16:06');

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.