When using real data with a size of 100k rows and a large number of columns, metrics,

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Slow test execution and metric calculation about evidently HOT 8 OPEN

c0t0ber commented on June 2, 2024

Slow test execution and metric calculation

from evidently.

Comments (8)

nick-konovalchuk commented on June 2, 2024

I checked the insides of ClassificationPreset and DataDriftPreset. I've seen a lot of data copying, which is far from ideal. Also pandas is used here, while in some cases faster alternatives could be utilized.
But imo the most inefficient part is embedding actual data into html reports

from evidently.

nick-konovalchuk commented on June 2, 2024

I wonder if metric calculation can be done at least in several processes

from evidently.

c0t0ber commented on June 2, 2024

@nick-konovalchuk

The best solution, I believe, would be to use parallel execution, but we need to explore the feasibility of its application. Trying to optimize individual sections is of little use because in my case, we are calculating ~2000 different tests and metrics.

I don't see any problems with generating HTML since you're only using HTML when necessary.

from evidently.

c0t0ber commented on June 2, 2024

Also using polars with lazy calculations instead of pandas can be a good solution if we are talking about calc optimization

from evidently.

nick-konovalchuk commented on June 2, 2024

@c0t0ber
Personally I've never used polars, but I think I remember it using all the cores of a CPU. In such setting multiprocessing would be harmful.

from evidently.

nick-konovalchuk commented on June 2, 2024

@c0t0ber
I don't see a problem with generating HTML. I wish they also had an option of generating actual plotly objects, that I can display using streamlit, for instance. I still can display HTML in streamlit.
The problem is embedding redundant data in HTML. Do you really need all data point to draw a histogram given that you can't change the bin size after the report is generated? Because as far as I understand they embed ALL data points for ClassificationProbDistribution and charts.

from evidently.

nick-konovalchuk commented on June 2, 2024

Also I'm less sure about it, but the data points in HTML may be duplicated in context of several metrics/tests

from evidently.

nick-konovalchuk commented on June 2, 2024

Idk if this is correct and/or possible, but the following would be cool

Generate actual Plotly objects
Extract HTML from them when the report is rendered. I think this HTML won't have redundant data embedded

from evidently.

Recommend Projects

Slow test execution and metric calculation about evidently HOT 8 OPEN

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent