What happened? In a web application, I'm creating rather complicat

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Naive benchmark here, but for a quick test: <div class="highlight highlight-source

perf: Ibis slows down considerably for wider tables about ibis HOT 5 OPEN

binste commented on May 24, 2024

perf: Ibis slows down considerably for wider tables

from ibis.

Comments (5)

cpcloud commented on May 24, 2024 1

Regarding drop and relocate, they're both implemented using a somewhat naive approach:

drop: project every column except the input columns
relocate: basically a reprojection that loops through every column

I suspect that for drop we can turn it into a special operation that we should at least be able to make the operation scale with the number of dropped columns instead of the total number of columns. A place to start might be:

create a special Drop relational operation
implement the column drop in select merging by looping over the drop list and removing each element of the drop list from the Select fields. Select fields are dicts, so that should get us to O(droplist) instead of O(allcolumns).

We may be able to take a similar approach for relocate by using a data structure more optimized for the operation. I think something that has fast ordered inserts (perhaps sortedcontainers has a sorted map container?) might be a good place to start.

from ibis.

binste commented on May 24, 2024

I've benchmarked Ibis 8 vs. 9 on 55 Ibis expressions which are part of a Data Warehouse built using https://github.com/binste/dbt-ibis. I've measured:

Execution of Ibis code: The time it takes to execute the Ibis code to get to the final expression
Convert Ibis expression to SQL: The time it takes to compile the final Ibis expression to SQL

Here some pseudo code to illustrate

import ibis

t = ibis.table(...)

# --------------------------
# START: Execution of Ibis code
# --------------------------
t = t.group_by(...).select(...)

# --------------------------
# END: Execution of Ibis code
# --------------------------

# --------------------------
# START: Convert Ibis expression to SQL
# --------------------------
ibis.to_sql(t)
# --------------------------
# END: Convert Ibis expression to SQL
# --------------------------

The great news is that the compilation to SQL got significantly faster with the move to SQLGlot which is super nice! :) The execution of Ibis code on the other hand got a bit slower with one expression taking significantly longer with 11 seconds. I've profiled that expression and most time is spent in the following statements:

Table.relocate ibis/expr/types/relations.py:4279
- Code here looks like this: .relocate("valid_to", after="valid_from")
Join.wrapper ibis/expr/types/joins.py:188
Table.drop ibis/expr/types/relations.py:2329
Table.rename ibis/expr/types/relations.py:2140

Hope this helps!

from ibis.

cpcloud commented on May 24, 2024

@binste Thanks for digging into this. Interesting results!

Can you share the query that's now taking 11 seconds with Ibis 9.0?

from ibis.

binste commented on May 24, 2024

Unfortunately, I can't as I'd need to mask column names and code logic for IP reasons. It's 2 input tables with each around 50 columns and 1 table with ~10 columns and then various operations on top of it. But happy to test out any PRs if there is a wheel file available!

from ibis.

gforsyth commented on May 24, 2024

Naive benchmark here, but for a quick test:

# drop_test.py
import ibis
import time
from contextlib import contextmanager


@contextmanager
def tictoc(num_cols):
    tic = time.time()
    yield
    toc = time.time()
    print(f"| {num_cols} | {toc - tic} seconds")


print(f"{ibis.__version__=}")


for num_cols in [10, 20, 50, 100, 200, 500, 1000]:
    t = ibis.table(name="t", schema=[(f"a{i}", "int") for i in range(num_cols)])

    with tictoc(num_cols):
        t.drop("a8")

🐚 python drop_test.py
ibis.__version__='8.0.0'
| 10 | 0.18016910552978516 seconds
| 20 | 0.0016529560089111328 seconds
| 50 | 0.0037081241607666016 seconds
| 100 | 0.007429361343383789 seconds
| 200 | 0.013902902603149414 seconds
| 500 | 0.03390693664550781 seconds
| 1000 | 0.06650948524475098 seconds

🐚 python drop_test.py
ibis.__version__='9.0.0'
| 10 | 0.002298593521118164 seconds
| 20 | 0.005956888198852539 seconds
| 50 | 0.027918338775634766 seconds
| 100 | 0.09690213203430176 seconds
| 200 | 0.36721324920654297 seconds
| 500 | 2.301510810852051 seconds
| 1000 | 9.317416191101074 seconds

from ibis.

perf: Ibis slows down considerably for wider tables about ibis HOT 5 OPEN

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent