Git Product home page Git Product logo

doric's People

Contributors

alfonsorr avatar amalicia avatar dependabot[bot] avatar eruizalo avatar jserranohidalgo avatar mrpowers avatar neutropolis avatar nvander1 avatar scala-steward avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

doric's Issues

Partition transform functions

  • def bucket(numBuckets: Int, e: Column): Column ❌ won't do, same function with DoricColumn
  • def bucket(numBuckets: Column, e: Column): Column
  • def days(e: Column): Column
  • def hours(e: Column): Column
  • def months(e: Column): Column
  • def years(e: Column): Column

Sorting functions

DataFrame functions:

  • def sortWithinPartitions(sortCol: String, sortCols: String*): Dataset[T] ❌ won't do, same function with DoricColumn
  • def sortWithinPartitions(sortExprs: Column*): Dataset[T] #233
  • def sort(sortCol: String, sortCols: String*): Dataset[T] ❌ won't do, same function with DoricColumn
  • def sort(sortExprs: Column*): Dataset[T] #233
  • def orderBy(sortCol: String, sortCols: String*): Dataset[T] ❌ won't do, same function with DoricColumn
  • def orderBy(sortExprs: Column*): Dataset[T] #233

Column functions:

  • def asc(columnName: String): Column WIP: #233
  • def asc_nulls_first(columnName: String): Column #233
  • def asc_nulls_last(columnName: String): Column #233
  • def desc(columnName: String): Column #233
  • def desc_nulls_first(columnName: String): Column #233
  • def desc_nulls_last(columnName: String): Column #233

String functions

  • def ascii(e: Column): Column
  • def base64(e: Column): Column
  • def concat_ws(sep: String, exprs: Column*): Column
  • def decode(value: Column, charset: String): Column
  • def encode(value: Column, charset: String): Column
  • def format_number(x: Column, d: Int): Column
  • def format_string(format: String, arguments: Column*): Column
  • def initcap(e: Column): Column
  • def instr(str: Column, substring: String): Column
  • def length(e: Column): Column
  • def levenshtein(l: Column, r: Column): Column
  • def locate(substr: String, str: Column, pos: Int): Column
  • def locate(substr: String, str: Column): Column
  • def lower(e: Column): Column
  • def lpad(str: Column, len: Int, pad: String): Column
  • def ltrim(e: Column, trimString: String): Column
  • def ltrim(e: Column): Column
  • def overlay(src: Column, replace: Column, pos: Column): Column
  • def overlay(src: Column, replace: Column, pos: Column, len: Column): Column
  • def regexp_extract(e: Column, exp: String, groupIdx: Int): Column
  • def regexp_replace(e: Column, pattern: Column, replacement: Column): Column
  • def regexp_replace(e: Column, pattern: String, replacement: String): Column
  • def repeat(str: Column, n: Int): Column
  • def rpad(str: Column, len: Int, pad: String): Column
  • def rtrim(e: Column, trimString: String): Column
  • def rtrim(e: Column): Column
  • def soundex(e: Column): Column
  • def split(str: Column, pattern: String, limit: Int): Column
  • def split(str: Column, pattern: String): Column
  • def substring(str: Column, pos: Int, len: Int): Column
  • def substring_index(str: Column, delim: String, count: Int): Column
  • def translate(src: Column, matchingString: String, replaceString: String): Column
  • def trim(e: Column, trimString: String): Column
  • def trim(e: Column): Column
  • def unbase64(e: Column): Column
  • def upper(e: Column): Column

Build doric for scala 2.11

As asked in #184 this thread is a list of all neaded for milestone Doric for scala 2.11

Pending tasks to migrate doric to scala 2.11 (spark 2.4 at least)

  • Delete newtype #204
  • Have spark-fast-tests build for scala 2.11 MrPowers/spark-fast-tests#106
  • Modularize Doric column to have a common interface that abstracts its use fro the cats version(no needed, look next)
  • Cats imports must be ok #206
  • Review methods that are not in spark 2.4

v1 milestones & release

I'm really excited about this project!

Think about the features that'll be included in the "initial public release". Once all the initial features are built, ping me, and I'll make a commit to make a compelling sell in the project README.

Once the README is updated, I'll start marketing the project to try to get users and feedback on the code.

Sounds like a good plan? I'm definitely interested in seeing this project grow & get a lot of users!

Enable dynamic invocations for `Row` columns

Given a Row column, we'd like to access its fields as regular Scala object fields.

Currently:

case class User(name: String, age: Int)
val df = List((User("John", "doe", 34), 1)).toDF("user", "col2")

val dn: DoricColumn[Row] = col[Row]("user").getChild[String]("name")
val da: DoricColumn[Row] = col[Row]("user").getChild[Int]("age")

Desired:

...
val dn: DoricColumn[Row] = col[Row]("user").name[String]
val da: DoricColumn[Row] = col[Row]("user").age[Int]

And, even better:

...
val dn: DoricColumn[Row] = row.user[Row].name[String]
val da: DoricColumn[Row] = row.user[Row].age[Int]
val db: DoricColumn[Int] = row.col2[Int]

The last example is equivalent to col[Int]("col2"). The idea is to make row similar to this.

If the parent column is not a Row, the invocation should not compile. E.g., the following expression

...
col[Int]("n").name[String]

will make the compiler complain that Int is not equal to Row.

Window functions

  • def cume_dist(): Column
  • def dense_rank(): Column
  • def lag(e: Column, offset: Int, defaultValue: Any): Column
  • def lag(columnName: String, offset: Int, defaultValue: Any): Column ❌ won't do, same function with DoricColumn
  • def lag(columnName: String, offset: Int): Column ❌ won't do, same function with DoricColumn
  • def lag(e: Column, offset: Int): Column
  • def lead(e: Column, offset: Int, defaultValue: Any): Column
  • def lead(columnName: String, offset: Int, defaultValue: Any): Column ❌ won't do, same function with DoricColumn
  • def lead(e: Column, offset: Int): Column
  • def lead(columnName: String, offset: Int): Column ❌ won't do, same function with DoricColumn
  • def nth_value(e: Column, offset: Int): Column #236
  • def nth_value(e: Column, offset: Int, ignoreNulls: Boolean): Column #236
  • def ntile(n: Int): Column
  • def percent_rank(): Column
  • def rank(): Column
  • def row_number(): Column

Map column

Add a Map column that validates the map according to their key and value types

[Feature request]: Time comparators

Doric doesn't have time comparators to work with
Please, check there is no other feature request like this one to avoid duplicates: see doric issues

  • It seems this feature is not requested yet:
    • YES / NO

Expected Behavior

col[LocalDate]("aaa") > col("bbb")

Current Behavior (if so)

None

[Feature request]: Retrieve last doric version from maven for documentation

Feature request duplicity

  • It seems this feature is not requested yet: YES

Feature suggestion

It would be nice to retrieve the latest version of doric from maven central and use it to publish documentation

⚠️⚠️ CAREFUL! 👀

Documentation from a release may not take the correct version

Current behaviour

Currently documentation has to be manually updated

Refactoring

  • Apis, in terms of implicit classes. In package syntax.
  • In package types, type classes SparkType, NumericType, Cast, etc. Instances of type classes, in companion objects.
  • Remove XType.apply/unapply
  • Package sem, with Errors, DataFrameOps, ...
  • Companion object DoricColumn: getters, sparkfunction, ...
  • String concat, Array concat, ... to syntax; package function with mixins from syntax
  • Control syntax: when builder, ...
  • Refactor package habla.doric to simply doric

Collection functions

Array transformations

Implemented here

  • def aggregate(expr: Column, initialValue: Column, merge: (Column, Column) ⇒ Column): Column Infix method #58
  • def aggregate(expr: Column, initialValue: Column, merge: (Column, Column) ⇒ Column, finish: (Column) ⇒ Column): Column Infix method aggregateWT #58
  • def array_contains(column: Column, value: Any): Column #109
  • def array_distinct(e: Column): Column #109
  • def array_except(col1: Column, col2: Column): Column #109
  • def array_intersect(col1: Column, col2: Column): Column #109
  • def array_join(column: Column, delimiter: String): Column #109
  • def array_join(column: Column, delimiter: String, nullReplacement: String): Column #109
  • def array_max(e: Column): Column #109
  • def array_min(e: Column): Column #109
  • def array_position(column: Column, value: Any): Column #109
  • def array_remove(column: Column, element: Any): Column #109
  • def array_repeat(e: Column, count: Int): Column ❌ won't do, "duplicated"
  • def array_repeat(left: Column, right: Column): Column #109
  • def array_sort(e: Column): Column #109
  • def array_union(col1: Column, col2: Column): Column #109
  • def arrays_overlap(a1: Column, a2: Column): Column #109
  • def arrays_zip(e: Column*): Column #287
  • def concat(exprs: Column*): Column Prefix method concat for strings, concatArrays for arrays #58
  • def element_at(column: Column, value: Any): Column #109
  • def exists(column: Column, f: (Column) ⇒ Column): Column #109
  • def explode(e: Column): Column #109
  • def explode_outer(e: Column): Column #109
  • def filter(column: Column, f: (Column, Column) ⇒ Column): Column #109
  • def filter(column: Column, f: (Column) ⇒ Column): Column Infix method filter #109
  • def flatten(e: Column): Column #227
  • def forall(column: Column, f: (Column) ⇒ Column): Column #109
  • def map_concat(cols: Column*): Column #109
  • def posexplode(e: Column): Column #287
  • def posexplode_outer(e: Column): Column #287
  • def reverse(e: Column): Column #109
  • def schema_of_csv(csv: Column, options: Map[String, String]): Column #287
  • def schema_of_csv(csv: Column): Column ❌ won'd do, "duplicated"
  • def schema_of_csv(csv: String): Column ❌ won'd do, "duplicated"
  • def schema_of_json(json: Column, options: Map[String, String]): Column #287
  • def schema_of_json(json: Column): Column ❌ won'd do, "duplicated"
  • def schema_of_json(json: String): Column ❌ won'd do, "duplicated"
  • def sequence(start: Column, stop: Column): Column #109
  • def sequence(start: Column, stop: Column, step: Column): Column #109
  • def shuffle(e: Column): Column #109
  • def size(e: Column): Column #109
  • def slice(x: Column, start: Column, length: Column): Column #109
  • def slice(x: Column, start: Int, length: Int): Column ❌ won'd do, "duplicated"
  • def sort_array(e: Column, asc: Boolean): Column #109
  • def sort_array(e: Column): Column #109
  • def transform(column: Column, f: (Column, Column) ⇒ Column): Column Infix method #58
  • def transform(column: Column, f: (Column) ⇒ Column): Column infix method transformWithIndex #58
  • def zip_with(left: Column, right: Column, f: (Column, Column) ⇒ Column): Column #109

Map transformations

Implemented here

  • def map_entries(e: Column): Column #287
  • def map_filter(expr: Column, f: (Column, Column) ⇒ Column): Column #109
  • def map_from_entries(e: Column): Column #287
  • def map_from_arrays(keys: Column, values: Column): Column #287
  • def map_keys(e: Column): Column Postfix method keys #58
  • def map_values(e: Column): Column Postfix method values #58
  • def map_zip_with(left: Column, right: Column, f: (Column, Column, Column) ⇒ Column): Column #109
  • def transform_keys(expr: Column, f: (Column, Column) ⇒ Column): Column #109
  • def transform_values(expr: Column, f: (Column, Column) ⇒ Column): Column #109
  • def explode(e: Column): Column #287
  • def explode_outer(e: Column): Column #287

Structure

  • def from_csv(e: Column, schema: Column, options: Map[String, String]): Column #287
  • def from_csv(e: Column, schema: StructType, options: Map[String, String]): Column #287
  • def from_json(e: Column, schema: Column, options: Map[String, String]): Column #287
  • def from_json(e: Column, schema: Column): Column ❌ won'd do, "duplicated"
  • def from_json(e: Column, schema: String, options: Map[String, String]): Column ❌ won'd do, "duplicated"
  • def from_json(e: Column, schema: DataType): Column #287
  • def from_json(e: Column, schema: StructType): Column #287
  • def from_json(e: Column, schema: DataType, options: Map[String, String]): Column #287
  • def from_json(e: Column, schema: StructType, options: Map[String, String]): Column #287
  • def get_json_object(e: Column, path: String): Column #287
  • def to_csv(e: Column): Column ❌ won'd do, "duplicated"
  • def to_csv(e: Column, options: Map[String, String]): Column #287
  • def to_json(e: Column): Column ❌ won'd do, "duplicated"
  • def to_json(e: Column, options: Map[String, String]): Column #287
  • def json_tuple(json: Column, fields: String*): Column #287

Custom matchers for testing

Testing might be improved with custom matchers.

For instance, instead of:

  val df = List((User("John", "doe", 34), 1))
      .toDF("col", "delete")
      .select("col")
  ...
   it("throws an error if the sub column is not of the provided type") {
      colStruct("col")
        .getChild[String]("age")
        .elem
        .run(df)
        .toEither
        .left
        .value
        .head shouldBe ColumnTypeError("col.age", StringType, IntegerType)

it'd be nice to write something like this:

   it("..."){
      df.select(colStruct("col").getChild[String]("age")) should 
          failWith(ColumnTypeError("col.age", StringType, IntegerType))
   }

In general, testing should not expose implementation details, and stick itself to the programmer API whenever possible.

Static & dynamic field accessors for product types

We can't create column expressions that access fields of product type columns. For instance, given the following product type:

case class User(name: String, age: Int)

we would like to access fields of users as follows:

col[User]("user").getChildSafe("name"): DoricColumn[String]
col[User]("user").getChildSafe("age"): DoricColumn[Int]

Moreover, we would like to get a compilation error if we try to access a non-existing field:

col[User]("user").getChildSafe("surname") // should not compile

Dynamic invocations should be allowed too:

col[User]("user").child.name[String]: DoricColumn[String]
col[User]("user").child.age[Int]: DoricColumn[Int]
col[User]("user").child.surname[String] // should raise a non-existent column error at runtime

Implemented Aggregate functions

List of all aggregate functions in spark marked all implemented in doric

  • approx_count_distinct
  • avg
  • collect_list
  • collect_set
  • corr
  • count
  • countDistinct
  • covar_pop
  • covar_samp
  • first
  • grouping (waiting for bitetype) #123
  • grouping_id #123
  • kurtosis
  • last
  • max
  • mean
  • min
  • percentile_approx
  • skewness #123
  • stddev #123
  • stddev_pop #123
  • stddev_samp #123
  • sum
  • sumDistinct #123
  • var_pop #123
  • var_samp #123
  • variance #123
  • approxCountDistinct #123

Date time functions

  • def add_months(startDate: Column, numMonths: Column): Column
  • def add_months(startDate: Column, numMonths: Int): Column
  • def current_date(): Column
  • def current_timestamp(): Column
  • def date_add(start: Column, days: Column): Column
  • def date_add(start: Column, days: Int): Column
  • def date_format(dateExpr: Column, format: String): Column
  • def date_sub(start: Column, days: Column): Column
  • def date_sub(start: Column, days: Int): Column
  • def date_trunc(format: String, timestamp: Column): Column
  • def datediff(end: Column, start: Column): Column
  • def dayofmonth(e: Column): Column
  • def dayofweek(e: Column): Column
  • def dayofyear(e: Column): Column
  • def from_unixtime(ut: Column, f: String): Column
  • def from_unixtime(ut: Column): Column
  • def from_utc_timestamp(ts: Column, tz: Column): Column
  • def from_utc_timestamp(ts: Column, tz: String): Column
  • def hour(e: Column): Column
  • def last_day(e: Column): Column
  • def minute(e: Column): Column
  • def month(e: Column): Column
  • def months_between(end: Column, start: Column, roundOff: Boolean): Column
  • def months_between(end: Column, start: Column): Column
  • def next_day(date: Column, dayOfWeek: String): Column
  • def quarter(e: Column): Column
  • def second(e: Column): Column
  • def timestamp_seconds(e: Column): Column
  • def to_date(e: Column, fmt: String): Column
  • def to_date(e: Column): Column
  • def to_timestamp(s: Column, fmt: String): Column
  • def to_timestamp(s: Column): Column
  • def to_utc_timestamp(ts: Column, tz: Column): Column
  • def to_utc_timestamp(ts: Column, tz: String): Column
  • def trunc(date: Column, format: String): Column
  • def unix_timestamp(s: Column, p: String): Column
  • def unix_timestamp(s: Column): Column
  • def unix_timestamp(): Column
  • def weekofyear(e: Column): Column
  • def window(timeColumn: Column, windowDuration: String): Column
  • def window(timeColumn: Column, windowDuration: String, slideDuration: String): Column
  • def window(timeColumn: Column, windowDuration: String, slideDuration: String, startTime: String): Column
  • def year(e: Column): Column

Fix complex scaladoc links

Adding scaladoc links to high order spark functions breaks the doc. Here they are some examples:

  • Related with #109:

    • org.apache.spark.sql.functions.array
    • org.apache.spark.sql.functions.transform
    • org.apache.spark.sql.functions.aggregate
    • org.apache.spark.sql.functions.filter
    • org.apache.spark.sql.functions.array_join
  • Related with #138:

    • org.apache.spark.sql.functions.locate
    • org.apache.spark.sql.functions.split
    • org.apache.spark.sql.functions.to_utc_timestamp
    • org.apache.spark.sql.functions.from_utc_timestamp

Tag added for future fixing @todo scaladoc link

Math functions

  • def abs(e: Column): Column #223
  • def acos(columnName: String): Column ❌ won't do, same function with DoricColumn
  • def acos(e: Column): Column #223
  • def acosh(columnName: String): Column ❌ won't do, same function with DoricColumn
  • def acosh(e: Column): Column #223
  • def asin(columnName: String): Column ❌ won't do, same function with DoricColumn
  • def asin(e: Column): Column #223
  • def asinh(columnName: String): Column ❌ won't do, same function with DoricColumn
  • def asinh(e: Column): Column #223
  • def atan(columnName: String): Column ❌ won't do, same function with DoricColumn
  • def atan(e: Column): Column #223
  • def atan2(yValue: Double, xName: String): Column ❌ won't do, same function with DoricColumn
  • def atan2(yValue: Double, x: Column): Column ❌ won't do, same function with DoricColumn
  • def atan2(yName: String, xValue: Double): Column ❌ won't do, same function with DoricColumn
  • def atan2(y: Column, xValue: Double): Column ❌ won't do, same function with DoricColumn
  • def atan2(yName: String, xName: String): Column ❌ won't do, same function with DoricColumn
  • def atan2(yName: String, x: Column): Column ❌ won't do, same function with DoricColumn
  • def atan2(y: Column, xName: String): Column ❌ won't do, same function with DoricColumn
  • def atan2(y: Column, x: Column): Column #223
  • def atanh(columnName: String): Column ❌ won't do, same function with DoricColumn
  • def atanh(e: Column): Column #223
  • def bin(columnName: String): Column ❌ won't do, same function with DoricColumn
  • def bin(e: Column): Column #223
  • def bround(e: Column, scale: Int): Column #223
  • def bround(e: Column): Column #223
  • def cbrt(columnName: String): Column ❌ won't do, same function with DoricColumn
  • def cbrt(e: Column): Column #223
  • def ceil(columnName: String): Column ❌ won't do, same function with DoricColumn
  • def ceil(e: Column): Column #223
  • def conv(num: Column, fromBase: Int, toBase: Int): Column #223
  • def cos(columnName: String): Column ❌ won't do, same function with DoricColumn
  • def cos(e: Column): Column #223
  • def cosh(columnName: String): Column ❌ won't do, same function with DoricColumn
  • def cosh(e: Column): Column #223
  • def degrees(columnName: String): Column ❌ won't do, same function with DoricColumn
  • def degrees(e: Column): Column #223
  • def exp(columnName: String): Column ❌ won't do, same function with DoricColumn
  • def exp(e: Column): Column #223
  • def expm1(columnName: String): Column ❌ won't do, same function with DoricColumn
  • def expm1(e: Column): Column #223
  • def factorial(e: Column): Column #223
  • def floor(columnName: String): Column ❌ won't do, same function with DoricColumn
  • def floor(e: Column): Column #223
  • def hex(column: Column): Column #223
  • def hypot(l: Double, rightName: String): Column ❌ won't do, same function with DoricColumn
  • def hypot(l: Double, r: Column): Column ❌ won't do, same function with DoricColumn
  • def hypot(leftName: String, r: Double): Column ❌ won't do, same function with DoricColumn
  • def hypot(l: Column, r: Double): Column ❌ won't do, same function with DoricColumn
  • def hypot(leftName: String, rightName: String): Column ❌ won't do, same function with DoricColumn
  • def hypot(leftName: String, r: Column): Column ❌ won't do, same function with DoricColumn
  • def hypot(l: Column, rightName: String): Column ❌ won't do, same function with DoricColumn
  • def hypot(l: Column, r: Column): Column #223
  • def log(base: Double, columnName: String): Column ❌ won't do, same function with DoricColumn
  • def log(base: Double, a: Column): Column #223
  • def log(columnName: String): Column ❌ won't do, same function with DoricColumn
  • def log(e: Column): Column #223
  • def log10(columnName: String): Column ❌ won't do, same function with DoricColumn
  • def log10(e: Column): Column #223
  • def log1p(columnName: String): Column ❌ won't do, same function with DoricColumn
  • def log1p(e: Column): Column #223
  • def log2(columnName: String): Column ❌ won't do, same function with DoricColumn
  • def log2(expr: Column): Column #223
  • def pmod(dividend: Column, divisor: Column): Column #223
  • def pow(l: Double, rightName: String): Column ❌ won't do, same function with DoricColumn
  • def pow(l: Double, r: Column): Column ❌ won't do, same function with DoricColumn
  • def pow(leftName: String, r: Double): Column ❌ won't do, same function with DoricColumn
  • def pow(l: Column, r: Double): Column ❌ won't do, same function with DoricColumn
  • def pow(leftName: String, rightName: String): Column ❌ won't do, same function with DoricColumn
  • def pow(leftName: String, r: Column): Column ❌ won't do, same function with DoricColumn
  • def pow(l: Column, rightName: String): Column ❌ won't do, same function with DoricColumn
  • def pow(l: Column, r: Column): Column #223
  • def radians(columnName: String): Column ❌ won't do, same function with DoricColumn
  • def radians(e: Column): Column #223
  • def rint(columnName: String): Column ❌ won't do, same function with DoricColumn
  • def rint(e: Column): Column #223
  • def round(e: Column, scale: Int): Column ❌ won't do, same function with DoricColumn
  • def round(e: Column): Column #223
  • def shiftLeft(e: Column, numBits: Int): Column #223
  • def shiftRight(e: Column, numBits: Int): Column #223
  • def shiftRightUnsigned(e: Column, numBits: Int): Column #223
  • def signum(columnName: String): Column ❌ won't do, same function with DoricColumn
  • def signum(e: Column): Column #223
  • def sin(columnName: String): Column ❌ won't do, same function with DoricColumn
  • def sin(e: Column): Column #223
  • def sinh(columnName: String): Column ❌ won't do, same function with DoricColumn
  • def sinh(e: Column): Column #223
  • def sqrt(colName: String): Column ❌ won't do, same function with DoricColumn
  • def sqrt(e: Column): Column #223
  • def tan(columnName: String): Column ❌ won't do, same function with DoricColumn
  • def tan(e: Column): Column #223
  • def tanh(columnName: String): Column ❌ won't do, same function with DoricColumn
  • def tanh(e: Column): Column #223
  • def unhex(column: Column): Column #223
  • def toDegrees(columnName: String): Column ❌ won't do, deprecated("Use degrees", "2.1.0")
  • def toDegrees(e: Column): Column ❌ won't do, deprecated("Use degrees", "2.1.0")
  • def toRadians(columnName: String): Column ❌ won't do, deprecated("Use radians", "2.1.0")
  • def toRadians(e: Column): Column ❌ won't do, deprecated("Use radians", "2.1.0")

Non-aggregate functions

  • def array(colName: String, colNames: String*): Column function array
  • def array(cols: Column*): Columnfunction array
  • def bitwiseNOT(e: Column): Column #262
  • def broadcast[T](df: Dataset[T]): Dataset[T] no needed because its not Column depentent
  • def coalesce(e: Column*): Column
  • def col(colName: String): Column use col[T](colName: String) or specializations colString colInt etc
  • def column(colName: String): Column use the same as col
  • def expr(expr: String): Column ❌ won't do, this could not be a doric function, as it needs a string with no datatype
  • def greatest(columnName: String, columnNames: String*): Column ❌ won't do, "duplicated"
  • def greatest(exprs: Column*): Column #113
  • def input_file_name(): Column #113
  • def isnan(e: Column): Column
  • def isnull(e: Column): Column postfix isnull
  • def least(columnName: String, columnNames: String*): Column ❌ won't do, "duplicated"
  • def least(exprs: Column*): Column #113
  • def lit(literal: Any): Column use lit[T](literal: T)
  • def map(cols: Column*): Column
  • def map_from_arrays(keys: Column, values: Column): Column
  • def monotonically_increasing_id(): Column #113
  • def nanvl(col1: Column, col2: Column): Column #262
  • def negate(e: Column): Column #262
  • def not(e: Column): Column #113
  • def rand(): Column #113
  • def rand(seed: Long): Column #113
  • def randn(): Column #113
  • def randn(seed: Long): Column #113
  • def spark_partition_id(): Column #113
  • def struct(colName: String, colNames: String*): Column ❌ won't do, not a doric function
  • def struct(cols: Column*): Column #95
  • def when(condition: Column, value: Any): Column method builder when
  • def monotonicallyIncreasingId(): Column@deprecated("Use monotonically_increasing_id()", "2.0.0")

Documentation

  • Types & Custom types (expected: #250)
  • Dynamic field accessor
  • API extension: when builder
  • Modularity update
  • How it's made

Struct column

Add a column that validates that the data type is a struct.

[Bug report]: Interpolator outputs null if one column is null

Issue duplicity

  • I have checked the current created issues and it seems there is no other issue like this one.

Doric version

0.0.2

What happened?

As concat spark function, using ds"Column value: $myColumn" will output null if myColumn has a null

What should have happened?

It should have output:

  • "Column value: null
  • "Column value:
  • something like that

Relevant log output

No response

Joins

Create a DoricJoinColum:

Kleisly[DoricValidated, (DataFrame, Dataframe), Column]

???

Misc functions

  • def assert_true(c: Column, e: Column): Column
  • def assert_true(c: Column): Column
  • def crc32(e: Column): Column
  • def hash(cols: Column*): Column
  • def md5(e: Column): Column
  • def raise_error(c: Column): Column
  • def sha1(e: Column): Column
  • def sha2(e: Column, numBits: Int): Column
  • def xxhash64(cols: Column*): Column

Spark version matrix

Review how to publish doric according to the spark version.
Check dependencies for all versions.

  • Sbt change to allow multiple versions (#184)
  • Github actions matrix (#187)
  • Dependabot for multiple versions? (should not affect)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.