Comments (8)
I checked the insides of ClassificationPreset and DataDriftPreset. I've seen a lot of data copying, which is far from ideal. Also pandas is used here, while in some cases faster alternatives could be utilized.
But imo the most inefficient part is embedding actual data into html reports
from evidently.
I wonder if metric calculation can be done at least in several processes
from evidently.
The best solution, I believe, would be to use parallel execution, but we need to explore the feasibility of its application. Trying to optimize individual sections is of little use because in my case, we are calculating ~2000 different tests and metrics.
I don't see any problems with generating HTML since you're only using HTML when necessary.
from evidently.
Also using polars with lazy calculations instead of pandas can be a good solution if we are talking about calc optimization
from evidently.
@c0t0ber
Personally I've never used polars, but I think I remember it using all the cores of a CPU. In such setting multiprocessing would be harmful.
from evidently.
@c0t0ber
I don't see a problem with generating HTML. I wish they also had an option of generating actual plotly objects, that I can display using streamlit, for instance. I still can display HTML in streamlit.
The problem is embedding redundant data in HTML. Do you really need all data point to draw a histogram given that you can't change the bin size after the report is generated? Because as far as I understand they embed ALL data points for ClassificationProbDistribution and charts.
from evidently.
Also I'm less sure about it, but the data points in HTML may be duplicated in context of several metrics/tests
from evidently.
Idk if this is correct and/or possible, but the following would be cool
- Generate actual Plotly objects
- Extract HTML from them when the report is rendered. I think this HTML won't have redundant data embedded
from evidently.
Related Issues (20)
- ClassificationPreset bug when data is multiclassifcation and true_value doesn't contain all possible values
- Add different aggregation options
- NDCG Error for scoring
- Vulnerability patching: update pyspark >= 3.3.2
- custom a test as a python function that can be access visually from self-host UI or in cloud
- help to upgrade to new api
- Evidently Collector in v0.4.16 HOT 3
- Can we still customize histogram plots in the latest version? HOT 2
- Setting S3 as back store for remote server HOT 1
- Could you please add skew to the evidently.metrics.data_integrity.column_summary_metric.ColumnSummaryResult object?
- evidently collector service HOT 2
- Weird labelling for y-axis in some error plots
- Collector service cannot process np.nan, np.inf, np.NINF values passed to endpoints
- Multi-output regression metrics HOT 1
- UI 500 when opening a report or tests if one of the metrics or tests ends with an error
- UI dashboard cannot update if the project does not contain a Git repository
- Report import error in evidently.report package in databricks notebook HOT 1
- UI dashboard doesn't automatically update HOT 1
- Add relative percent change to TestColumnShareOfMissingValues HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from evidently.