Comments (6)
Thank you for the explanation, I'll discuss this approach with my team today :)
About my use case, I'm experimenting the conversion of an existing functionality from pyspark
to ibis
, where the pyspark.sql.functions.expr(string)
and it's column creation as a result is possible.
This issue covers specifically the column creation and I also had the idea of trying literal_eval()
for the expr(string)
too ๐
from ibis.
I played a little with
literal_eval()
/eval()
when trying to replacepyspark.sql.functions.expr()
(as you suggested too) and it turns out that after python 3.7, operations like additions or subtractions inliteral_eval()
are not permitted anymore (discussion link) so it's not possible to evaluate common expressions and a# ValueError: malformed node or string on line 1: <ast.BinOp object at 0x...>
is returned instead:
Ah, makes sense; literal_eval
wouldn't know enough about other types. At the point you're executing more arbitrary logic like this, I suppose you're already living live on the edge, and may accept some of the dangers of eval
ing. ๐
Your suggested approach above works fine, tks for this. I believe this issue can be closed, as we have some kind of workaround for it ๐
Will do!
from ibis.
@filipeo2-mck You're correct that there's no such thing as an unbounded column.
However, something like this will work:
>>> def test(t):
... return (t.bill_length_mm / t.bill_depth_mm).name("test")
...
>>> my_cols_dict = {
... "test": test(_)
... }
>>> type(my_cols_dict["test"])
<class 'ibis.common.deferred.Deferred'>
>>> penguins.select("species", "island", my_cols_dict["test"])
r0 := DatabaseTable: penguins
species string
island string
bill_length_mm float64
bill_depth_mm float64
flipper_length_mm int64
body_mass_g int64
sex string
year int64
Project[r0]
species: r0.species
island: r0.island
test: r0.bill_length_mm / r0.bill_depth_mm
>>> penguins.select("species", "island", my_cols_dict["test"]).execute()
species island test
0 Adelie Torgersen 2.090909
1 Adelie Torgersen 2.270115
2 Adelie Torgersen 2.238889
3 Adelie Torgersen NaN
4 Adelie Torgersen 1.901554
.. ... ... ...
339 Chinstrap Dream 2.818182
340 Chinstrap Dream 2.403315
341 Chinstrap Dream 2.725275
342 Chinstrap Dream 2.673684
343 Chinstrap Dream 2.684492
[344 rows x 3 columns]
This leverages the deferred expression API, better explained on https://ibis-project.org/how-to/analytics/chain_expressions.
You can also do:
>>> my_cols_dict = {
... "test": test(penguins)
... }
>>> penguins.select("species", "island", my_cols_dict["test"]).execute()
species island test
0 Adelie Torgersen 2.090909
1 Adelie Torgersen 2.270115
2 Adelie Torgersen 2.238889
3 Adelie Torgersen NaN
4 Adelie Torgersen 1.901554
.. ... ... ...
339 Chinstrap Dream 2.818182
340 Chinstrap Dream 2.403315
341 Chinstrap Dream 2.725275
342 Chinstrap Dream 2.673684
343 Chinstrap Dream 2.684492
[344 rows x 3 columns]
I think you want the first option, given you want to be able to apply it without knowing the parent table upfront.
As to more reasons why some of the things you tried may not work, .sql()
specifically is a black box, where we can't guarantee the number of columns will even remain the same. Further, aliasing the table is basically creating a new table, which is why you get the IntegrityError
.
Echoing @cpcloud, what is the exact use case, and will this help? If you do specifically have strings, maybe you'll have to use some eval
(ast.literal_eval
?) magic to bridge the gap.
from ibis.
I think this is very similar to #8304. However, looking at the discussion/code there, it seems _ensure_expr
has since been removed.
from ibis.
What's the specific use case other than that pyspark allows it and why can't it be done with existing functionality?
from ibis.
Hello, @deepyaman ,
Your suggested approach above works fine, tks for this. I believe this issue can be closed, as we have some kind of workaround for it ๐
I played a little with literal_eval()
/eval()
when trying to replace pyspark.sql.functions.expr()
(as you suggested too) and it turns out that after python 3.7, operations like additions or subtractions in literal_eval()
are not permitted anymore (discussion link) so it's not possible to evaluate common expressions and a # ValueError: malformed node or string on line 1: <ast.BinOp object at 0x...>
is returned instead:
literal_eval("penguins.bill_length_mm + penguins.bill_depth_mm")
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[69], line 1
----> 1 literal_eval("penguins.bill_length_mm + penguins.bill_depth_mm")
File ~/lib/python3.10/ast.py:110, in literal_eval(node_or_string)
108 return left - right
109 return _convert_signed_num(node)
--> 110 return _convert(node_or_string)
File ~/lib/python3.10/ast.py:102, in literal_eval.<locals>._convert(node)
99 return dict(zip(map(_convert, node.keys),
100 map(_convert, node.values)))
101 elif isinstance(node, BinOp) and isinstance(node.op, (Add, Sub)):
--> 102 left = _convert_signed_num(node.left)
103 right = _convert_num(node.right)
104 if isinstance(left, (int, float)) and isinstance(right, complex):
File ~/lib/python3.10/ast.py:83, in literal_eval.<locals>._convert_signed_num(node)
81 else:
82 return - operand
---> 83 return _convert_num(node)
File ~/lib/python3.10/ast.py:74, in literal_eval.<locals>._convert_num(node)
72 def _convert_num(node):
73 if not isinstance(node, Constant) or type(node.value) not in (int, float, complex):
---> 74 _raise_malformed_node(node)
75 return node.value
File ~/lib/python3.10/ast.py:71, in literal_eval.<locals>._raise_malformed_node(node)
69 if lno := getattr(node, 'lineno', None):
70 msg += f' on line {lno}'
---> 71 raise ValueError(msg + f': {node!r}')
ValueError: malformed node or string on line 1: <ast.Attribute object at 0x1772e8a90>
Another option would be to use a plain eval()
, which works but is insecure:
from ast import literal_eval
def literal_eval_wo_table(expression):
parsed_expression = literal_eval(expression)
return (parsed_expression).name("literal_eval_wo_table")
def eval_wo_table(expression):
parsed_expression = eval(expression) # insecure
return (parsed_expression).name("eval_wo_table")
penguins.select(
"species",
# literal_eval_wo_table("_.bill_length_mm + _.bill_depth_mm"), # ValueError: malformed node or string on line 1: <ast.BinOp object at 0x...>
eval_wo_table("_.bill_length_mm + _.bill_depth_mm"), # insecure!
)
โโโโโโโโโโโณโโโโโโโโโโโโโโโโ
โ species โ eval_wo_table โ
โกโโโโโโโโโโโโโโโโโโโโโโโโโโฉ
โ string โ float64 โ
โโโโโโโโโโโผโโโโโโโโโโโโโโโโค
โ Adelie โ 57.8 โ
โ Adelie โ 56.9 โ
โ Adelie โ 58.3 โ
โ Adelie โ NULL โ
โ Adelie โ 56.0 โ
โ Adelie โ 59.9 โ
โ Adelie โ 56.7 โ
โ Adelie โ 58.8 โ
โ Adelie โ 52.2 โ
โ Adelie โ 62.2 โ
โ โฆ โ โฆ โ
โโโโโโโโโโโดโโโโโโโโโโโโโโโโ
from ibis.
Related Issues (20)
- feat: deprecate `ibis.memtable` in favor of expanding `ibis.table` HOT 2
- refactor: remove ibis.NA HOT 4
- bug: `.collect()` handling of null values is backend-specific HOT 3
- bug: ibis-snowflake 8.0.0 insert failure HOT 3
- Impala backend how to drop memtable tmp table after `insert(data)`
- feat: `ibis.select()` HOT 13
- bug: Snowflake connection hangs indefinitely using "authenticator="externalbrowser" HOT 1
- feat: expose `highest_precedence(*dtypes)` HOT 2
- Omission of data frame origins of S and R HOT 4
- bug: cannot do table.mutate(column_name="some_string") HOT 1
- perf(pandas): "pandas via ibis" is 4 times slower than running pandas directly HOT 7
- `Backend.sql()` throws error if db user has only read permissions HOT 1
- feat(selectors): parse Python types in `s.of_type`
- feat: support read_parquet for ms sql backend HOT 1
- refactor: remove `ImpalaTable` subclass HOT 1
- feat: use `ibis.desc()`/`.desc()` with selectors HOT 3
- Support for sharded BigQuery tables HOT 1
- bug(oracle): Ibis assumes that a boolean type exists in the version of oracle being used
- feat: allow specify the length of string column HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ibis.