Git Product home page Git Product logo

Comments (6)

RickyHuo avatar RickyHuo commented on May 17, 2024

https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala

from seatunnel.

RickyHuo avatar RickyHuo commented on May 17, 2024

针对(1) b)

udf实现方式

sqlContext.udf.register("strLen", (s:String) => s.length)
sqlContext.sql("select strlen(a) as lens from table")

filters实现方式

val strLen = udf((s:String) => s.length)
df.withColumn("lens", strLen($"a"))

from seatunnel.

RickyHuo avatar RickyHuo commented on May 17, 2024

问题一

为了在sql中使用udf,需要在整个项目初始化阶段register所有我们可以提供的方法(filterObj.singleFunc),目前我认为有两种方法

  • 获取package org.interestinglab.waterdrop.filter下所有的class文件名,利用Class.forName反射
    这么做的问题有两个
    • 并不是所有的filter都能够提供对应的udf,例如Split这种需要根据配置文件参数处理的
    • 相关方法获取packages下class文件名结果并不能直接使用需要过滤
    baseFilter.scala
    Split$$anonfun$1.scala
    Split$$anonfun$2.scala
    Split.scala
    ...
    
  • 利用ANTLR4的全部变量
    @garyelephant 帮忙评估一下

问题二

BaseFilter接口如何定义?

当前为了使用withColumn方式需要定义一个方法,我想把这个方法接口singleFunc在BaseFilter中定义,因为大部分插件都会用到这个方法,但是有个问题是singleFunc在不同的插件中返回值类型是不一样的,那么这种类型的接口如何定义?

from seatunnel.

RickyHuo avatar RickyHuo commented on May 17, 2024

问题二已经没问题了。

问题一我考虑的出发点是,用户的需求配置如下:

inputs {
}
filters {
    sql {
         table = 'my_table'
         sql = "select Split(a, b, c) as key1 from my_table"
    }
}
outputs{
}

Split方法在filter列表里面是存在的, 但是Split并不是UDF,所以运行会有报错。
这种需求需要满足吗?

from seatunnel.

garyelephant avatar garyelephant commented on May 17, 2024

并不是每个filter,必须注册为udf,这个完全看filter开发者的意愿。如果注册了,就在filter的文档里面写明对应的udf的名称即可。

from seatunnel.

garyelephant avatar garyelephant commented on May 17, 2024

@RickyHuo 问题一

直接可用的方案:

方案一:java service loader/Service provider interface(s) (SPI)机制: https://www.javacodegeeks.com/2012/12/how-to-create-extensible-java-applications.html, http://literatejava.com/extensibility/java-serviceloader-extensible-applications/, http://www.logicbig.com/tutorials/core-java-tutorial/java-se-api/service-loader/, https://stackoverflow.com/a/45604272/1145750, https://docs.oracle.com/javase/8/docs/api/java/util/ServiceLoader.html, https://docs.oracle.com/javase/tutorial/ext/basics/spi.html#define-the-service-provider-interface

方案二:org.reflections库: https://github.com/ronmamo/reflections

方案三:Apache Commons库:http://commons.apache.org/proper/commons-lang/javadocs/api-release/org/apache/commons/lang3/ClassUtils.html


我之前说的presto就是采用的方案一,我们这个项目也计划采用方案一。

from seatunnel.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.