Git Product home page Git Product logo

Comments (6)

filipeo2-mck avatar filipeo2-mck commented on July 25, 2024 2

Thank you for the explanation, I'll discuss this approach with my team today :)

About my use case, I'm experimenting the conversion of an existing functionality from pyspark to ibis, where the pyspark.sql.functions.expr(string) and it's column creation as a result is possible.
This issue covers specifically the column creation and I also had the idea of trying literal_eval() for the expr(string) too ๐Ÿ‘

from ibis.

deepyaman avatar deepyaman commented on July 25, 2024 2

I played a little with literal_eval()/eval() when trying to replace pyspark.sql.functions.expr() (as you suggested too) and it turns out that after python 3.7, operations like additions or subtractions in literal_eval() are not permitted anymore (discussion link) so it's not possible to evaluate common expressions and a # ValueError: malformed node or string on line 1: <ast.BinOp object at 0x...> is returned instead:

Ah, makes sense; literal_eval wouldn't know enough about other types. At the point you're executing more arbitrary logic like this, I suppose you're already living live on the edge, and may accept some of the dangers of evaling. ๐Ÿ˜‚

Your suggested approach above works fine, tks for this. I believe this issue can be closed, as we have some kind of workaround for it ๐Ÿ‘

Will do!

from ibis.

deepyaman avatar deepyaman commented on July 25, 2024 1

@filipeo2-mck You're correct that there's no such thing as an unbounded column.

However, something like this will work:

>>> def test(t):
...     return (t.bill_length_mm / t.bill_depth_mm).name("test")
... 
>>> my_cols_dict = {
...     "test": test(_)
... }
>>> type(my_cols_dict["test"])
<class 'ibis.common.deferred.Deferred'>
>>> penguins.select("species", "island", my_cols_dict["test"])
r0 := DatabaseTable: penguins
  species           string
  island            string
  bill_length_mm    float64
  bill_depth_mm     float64
  flipper_length_mm int64
  body_mass_g       int64
  sex               string
  year              int64

Project[r0]
  species: r0.species
  island:  r0.island
  test:    r0.bill_length_mm / r0.bill_depth_mm
>>> penguins.select("species", "island", my_cols_dict["test"]).execute()
       species     island      test
0       Adelie  Torgersen  2.090909
1       Adelie  Torgersen  2.270115
2       Adelie  Torgersen  2.238889
3       Adelie  Torgersen       NaN
4       Adelie  Torgersen  1.901554
..         ...        ...       ...
339  Chinstrap      Dream  2.818182
340  Chinstrap      Dream  2.403315
341  Chinstrap      Dream  2.725275
342  Chinstrap      Dream  2.673684
343  Chinstrap      Dream  2.684492

[344 rows x 3 columns]

This leverages the deferred expression API, better explained on https://ibis-project.org/how-to/analytics/chain_expressions.

You can also do:

>>> my_cols_dict = {
...     "test": test(penguins)
... }
>>> penguins.select("species", "island", my_cols_dict["test"]).execute()
       species     island      test
0       Adelie  Torgersen  2.090909
1       Adelie  Torgersen  2.270115
2       Adelie  Torgersen  2.238889
3       Adelie  Torgersen       NaN
4       Adelie  Torgersen  1.901554
..         ...        ...       ...
339  Chinstrap      Dream  2.818182
340  Chinstrap      Dream  2.403315
341  Chinstrap      Dream  2.725275
342  Chinstrap      Dream  2.673684
343  Chinstrap      Dream  2.684492

[344 rows x 3 columns]

I think you want the first option, given you want to be able to apply it without knowing the parent table upfront.

As to more reasons why some of the things you tried may not work, .sql() specifically is a black box, where we can't guarantee the number of columns will even remain the same. Further, aliasing the table is basically creating a new table, which is why you get the IntegrityError.

Echoing @cpcloud, what is the exact use case, and will this help? If you do specifically have strings, maybe you'll have to use some eval (ast.literal_eval?) magic to bridge the gap.

from ibis.

deepyaman avatar deepyaman commented on July 25, 2024

I think this is very similar to #8304. However, looking at the discussion/code there, it seems _ensure_expr has since been removed.

from ibis.

cpcloud avatar cpcloud commented on July 25, 2024

What's the specific use case other than that pyspark allows it and why can't it be done with existing functionality?

from ibis.

filipeo2-mck avatar filipeo2-mck commented on July 25, 2024

Hello, @deepyaman ,

Your suggested approach above works fine, tks for this. I believe this issue can be closed, as we have some kind of workaround for it ๐Ÿ‘

I played a little with literal_eval()/eval() when trying to replace pyspark.sql.functions.expr() (as you suggested too) and it turns out that after python 3.7, operations like additions or subtractions in literal_eval() are not permitted anymore (discussion link) so it's not possible to evaluate common expressions and a # ValueError: malformed node or string on line 1: <ast.BinOp object at 0x...> is returned instead:

literal_eval("penguins.bill_length_mm + penguins.bill_depth_mm")
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[69], line 1
----> 1 literal_eval("penguins.bill_length_mm + penguins.bill_depth_mm")

File ~/lib/python3.10/ast.py:110, in literal_eval(node_or_string)
    108                 return left - right
    109     return _convert_signed_num(node)
--> 110 return _convert(node_or_string)

File ~/lib/python3.10/ast.py:102, in literal_eval.<locals>._convert(node)
     99     return dict(zip(map(_convert, node.keys),
    100                     map(_convert, node.values)))
    101 elif isinstance(node, BinOp) and isinstance(node.op, (Add, Sub)):
--> 102     left = _convert_signed_num(node.left)
    103     right = _convert_num(node.right)
    104     if isinstance(left, (int, float)) and isinstance(right, complex):

File ~/lib/python3.10/ast.py:83, in literal_eval.<locals>._convert_signed_num(node)
     81     else:
     82         return - operand
---> 83 return _convert_num(node)

File ~/lib/python3.10/ast.py:74, in literal_eval.<locals>._convert_num(node)
     72 def _convert_num(node):
     73     if not isinstance(node, Constant) or type(node.value) not in (int, float, complex):
---> 74         _raise_malformed_node(node)
     75     return node.value

File ~/lib/python3.10/ast.py:71, in literal_eval.<locals>._raise_malformed_node(node)
     69 if lno := getattr(node, 'lineno', None):
     70     msg += f' on line {lno}'
---> 71 raise ValueError(msg + f': {node!r}')

ValueError: malformed node or string on line 1: <ast.Attribute object at 0x1772e8a90>

Another option would be to use a plain eval(), which works but is insecure:

from ast import literal_eval

def literal_eval_wo_table(expression):
    parsed_expression = literal_eval(expression)
    return (parsed_expression).name("literal_eval_wo_table")

def eval_wo_table(expression):
    parsed_expression = eval(expression)  # insecure
    return (parsed_expression).name("eval_wo_table")

penguins.select(
    "species",
    # literal_eval_wo_table("_.bill_length_mm + _.bill_depth_mm"),  # ValueError: malformed node or string on line 1: <ast.BinOp object at 0x...>
    eval_wo_table("_.bill_length_mm + _.bill_depth_mm"),  # insecure!
)
โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”“
โ”ƒ species โ”ƒ eval_wo_table โ”ƒ
โ”กโ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ฉ
โ”‚ string  โ”‚ float64       โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ Adelie  โ”‚          57.8 โ”‚
โ”‚ Adelie  โ”‚          56.9 โ”‚
โ”‚ Adelie  โ”‚          58.3 โ”‚
โ”‚ Adelie  โ”‚          NULL โ”‚
โ”‚ Adelie  โ”‚          56.0 โ”‚
โ”‚ Adelie  โ”‚          59.9 โ”‚
โ”‚ Adelie  โ”‚          56.7 โ”‚
โ”‚ Adelie  โ”‚          58.8 โ”‚
โ”‚ Adelie  โ”‚          52.2 โ”‚
โ”‚ Adelie  โ”‚          62.2 โ”‚
โ”‚ โ€ฆ       โ”‚             โ€ฆ โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

from ibis.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.