Comments (6)
from seatunnel.
针对(1) b)
udf实现方式
sqlContext.udf.register("strLen", (s:String) => s.length)
sqlContext.sql("select strlen(a) as lens from table")
filters实现方式
val strLen = udf((s:String) => s.length)
df.withColumn("lens", strLen($"a"))
from seatunnel.
问题一
为了在sql中使用udf,需要在整个项目初始化阶段register所有我们可以提供的方法(filterObj.singleFunc),目前我认为有两种方法
- 获取package
org.interestinglab.waterdrop.filter
下所有的class文件名,利用Class.forName反射
这么做的问题有两个- 并不是所有的filter都能够提供对应的udf,例如Split这种需要根据配置文件参数处理的
- 相关方法获取packages下class文件名结果并不能直接使用需要过滤
baseFilter.scala Split$$anonfun$1.scala Split$$anonfun$2.scala Split.scala ...
- 利用ANTLR4的全部变量
@garyelephant 帮忙评估一下
问题二
BaseFilter接口如何定义?
当前为了使用withColumn方式需要定义一个方法,我想把这个方法接口singleFunc
在BaseFilter中定义,因为大部分插件都会用到这个方法,但是有个问题是singleFunc
在不同的插件中返回值类型是不一样的,那么这种类型的接口如何定义?
from seatunnel.
问题二已经没问题了。
问题一我考虑的出发点是,用户的需求配置如下:
inputs {
}
filters {
sql {
table = 'my_table'
sql = "select Split(a, b, c) as key1 from my_table"
}
}
outputs{
}
Split
方法在filter列表里面是存在的, 但是Split
并不是UDF
,所以运行会有报错。
这种需求需要满足吗?
from seatunnel.
并不是每个filter,必须注册为udf,这个完全看filter开发者的意愿。如果注册了,就在filter的文档里面写明对应的udf的名称即可。
from seatunnel.
@RickyHuo 问题一
直接可用的方案:
方案一:java service loader/Service provider interface(s) (SPI)机制: https://www.javacodegeeks.com/2012/12/how-to-create-extensible-java-applications.html, http://literatejava.com/extensibility/java-serviceloader-extensible-applications/, http://www.logicbig.com/tutorials/core-java-tutorial/java-se-api/service-loader/, https://stackoverflow.com/a/45604272/1145750, https://docs.oracle.com/javase/8/docs/api/java/util/ServiceLoader.html, https://docs.oracle.com/javase/tutorial/ext/basics/spi.html#define-the-service-provider-interface
方案二:org.reflections库: https://github.com/ronmamo/reflections
方案三:Apache Commons库:http://commons.apache.org/proper/commons-lang/javadocs/api-release/org/apache/commons/lang3/ClassUtils.html
我之前说的presto就是采用的方案一,我们这个项目也计划采用方案一。
from seatunnel.
Related Issues (20)
- 2.3.4版本为什么运行demo都无法成功 HOT 4
- HIVE开启并行同步报错
- when source is ORACLE-CDC target is ORACLE, using is_exactly_once=true then will lock target table HOT 1
- We plan to synchronize the data of MySQL, Oracle and db2 in real time. Can you support it later? Do you have the requirements of DB license
- We plan to synchronize the data of MySQL, Oracle and db2 in real time. Can you support it later? Do you have the requirements of DB license HOT 1
- [Bug] [Mysql-cdc] Mysql-cdc cannot automatically synchronize newly created fields in DDL HOT 1
- [Feature][ Zeta Rest Api] add authorization for zeta rest api HOT 1
- mysqlcdc,postgresqlcdc支持将已经被更新或删除的数据再次同步到sink吗? HOT 2
- [Bug] [SinkMetrics] the sink metrics collect will get NPE when data type is String[] HOT 1
- SeaTunnel Community Meeting HOT 6
- Missing parameter for jdbc sink savemode case
- [Bug] [SQLserver CDC] sqlserver jdbc url encrypt=false not work HOT 1
- [Bug] [Module Name] Bug title
- [Feature][Kafka Connector enhancement] Support more deserialization mode in Kafka Source Connector HOT 2
- [Bug] [Postgre CDC] Postgre CDC upsert information where before is null HOT 1
- [Bug] [Transform v2] java.lang.UnsupportedOperationException: The Factory has not been implemented and the deprecated Plugin will be used. HOT 2
- hadoop.security.authentication must be kerberos HOT 2
- [Bug] [seatunnel-engine-storage] org.apache.seatunnel.engine.checkpoint.storage.exception.CheckpointStorageException HOT 3
- [Improve][SQL-Transform] improve sql transform to support struct query like row_a.inner_row_b.column_c
- [Multi node seatunnel cluster] Unable to add a client node to seatunnel server node. The order of steps is very confusing HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from seatunnel.