Git Product home page Git Product logo

Comments (8)

nick-konovalchuk avatar nick-konovalchuk commented on June 2, 2024

I checked the insides of ClassificationPreset and DataDriftPreset. I've seen a lot of data copying, which is far from ideal. Also pandas is used here, while in some cases faster alternatives could be utilized.
But imo the most inefficient part is embedding actual data into html reports

from evidently.

nick-konovalchuk avatar nick-konovalchuk commented on June 2, 2024

I wonder if metric calculation can be done at least in several processes

from evidently.

c0t0ber avatar c0t0ber commented on June 2, 2024

@nick-konovalchuk

The best solution, I believe, would be to use parallel execution, but we need to explore the feasibility of its application. Trying to optimize individual sections is of little use because in my case, we are calculating ~2000 different tests and metrics.

I don't see any problems with generating HTML since you're only using HTML when necessary.

from evidently.

c0t0ber avatar c0t0ber commented on June 2, 2024

Also using polars with lazy calculations instead of pandas can be a good solution if we are talking about calc optimization

from evidently.

nick-konovalchuk avatar nick-konovalchuk commented on June 2, 2024

@c0t0ber
Personally I've never used polars, but I think I remember it using all the cores of a CPU. In such setting multiprocessing would be harmful.

from evidently.

nick-konovalchuk avatar nick-konovalchuk commented on June 2, 2024

@c0t0ber
I don't see a problem with generating HTML. I wish they also had an option of generating actual plotly objects, that I can display using streamlit, for instance. I still can display HTML in streamlit.
The problem is embedding redundant data in HTML. Do you really need all data point to draw a histogram given that you can't change the bin size after the report is generated? Because as far as I understand they embed ALL data points for ClassificationProbDistribution and charts.

from evidently.

nick-konovalchuk avatar nick-konovalchuk commented on June 2, 2024

Also I'm less sure about it, but the data points in HTML may be duplicated in context of several metrics/tests

from evidently.

nick-konovalchuk avatar nick-konovalchuk commented on June 2, 2024

Idk if this is correct and/or possible, but the following would be cool

  1. Generate actual Plotly objects
  2. Extract HTML from them when the report is rendered. I think this HTML won't have redundant data embedded

from evidently.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.