Git Product home page Git Product logo

Comments (10)

mehtamohit013 avatar mehtamohit013 commented on June 12, 2024 1

Hi @idomic
Can you please provide me with posthob.ipynb or where it is located?
I can't seem to find it.

from ploomber-engine.

edublancas avatar edublancas commented on June 12, 2024

I ran two sample notebooks (sample-notebooks.zip) to understand the issue a bit more, some comments:

printed output

I was unable to reproduce the issue, all print statements are displayed when passing --log-output; ploomber-engine only displays whatever is sent to stdout so my guess is that papermill is also displaying stderr and/or the text results from each cell - we should run a more detailed analysis and then ensure that both produce the same output. Another thing we can add is the cell delimiter (papermill prints: Executing Cell X -----)

ploomber-engine print.ipynb /dev/null --log-output

dual progress bar

I could not reproduce this by creating notebook that displays a progressbar using tqdm and executing it with --log-output, so we need to investigate more:

ploomber-engine progress.ipynb /dev/null --log-output

from ploomber-engine.

idomic avatar idomic commented on June 12, 2024

Yeah the delimiter can be a good option, it prints all together.
To recreate you can run the posthob.ipynb.

from ploomber-engine.

mehtamohit013 avatar mehtamohit013 commented on June 12, 2024

Hi,
So, I am running this code:

print(1+2)
print(3+4)
print(1+7)
from tqdm.auto import tqdm
import time
my_list = list(range(100))

with tqdm(total=len(my_list)) as pbar:
    for x in my_list:
       time.sleep(0.01)
       pbar.update(1)
       if x%20==0:
           print(x)
print(1)

Running with papermill: Screenshot_2023-03-17_11-14-43

Running with ploomber engine on CLI give me output:
Screenshot_2023-03-17_11-27-26

Observations:

  • Progress bar of cell 5 is not displayed
  • As you can see, ploomber-engine skips \n characters when executing
  • Also ploomber-engine execution time which is around 5-8sec is not consistent and it is slower than papermill 3-4sec
  • Also, I can't find documentation of ploomber-engine CLI command

Commands used

 ploomber-engine rough.ipynb output.ipynb --log-output
 papermill rough.ipynb output.ipynb --log-output 

PS: I am not able to run the notebook @idomic mentioned
Edit: Updated Images and fix spellings

from ploomber-engine.

idomic avatar idomic commented on June 12, 2024

@mehtamohit013 A few thoughts:

Also, I can't find documentation of ploomber-engine CLI command

I've opened an issue about it last week I think

Progress bar of cell 5 is not displayed

I think if the --log-output is there we need to research why, sounds like a bug.

Also ploomber-engine execution time which is around 5-8sec is not consistent and it is slower than papermill 3-4sec

It runs on a different process, that's why the difference, but try profiling it, see what's causing this delay.

Let's connect on the notebook I'll help you run it!

from ploomber-engine.

edublancas avatar edublancas commented on June 12, 2024

I think the missing output might be that the tqdm progress bar is printed to standard error and we're just displaying standard output. If that's the case, we should ensure we also display standard error in the console.

You can check this with:

import sys

print("printing to stderr", file=sys.stderr)

and see if ploomber-engine displays it

from ploomber-engine.

mehtamohit013 avatar mehtamohit013 commented on June 12, 2024

Some clarification regarding performance

  • I have extracted 15 samples from profiling with a mean of 1.02846 sec and std dev of 0.0047 sec
  • I have used the time command of zsh, and I am getting time in the range of 1.15 - 1.22 sec, while papermill is in the range 1.65-1.70 sec
  • The 5-8 sec that I mentioned above is the time, the zsh shell is taking to generate a new command for me to input. So maybe it should include the delay in stdout displaying to the shell.

Just a minor observation: We cannot pass the file name to which data should be saved in --save-profiling-data. It creates output-profiling-data.csv by default

from ploomber-engine.

idomic avatar idomic commented on June 12, 2024

Just a minor observation: We cannot pass the file name to which data should be saved in --save-profiling-data. It creates output-profiling-data.csv by default

Please open an issue about it, I think there should be an option to pass an argument.

from ploomber-engine.

idomic avatar idomic commented on June 12, 2024

The 5-8 sec that I mentioned above is the time, the zsh shell is taking to generate a new command for me to input. So maybe it should include the delay in stdout displaying to the shell.

Seems like it's faster than papermill, but the output is slower, but we still need to figure out why and how to fix it.

from ploomber-engine.

mehtamohit013 avatar mehtamohit013 commented on June 12, 2024

I think the missing output might be that the tqdm progress bar is printed to standard error and we're just displaying standard output. If that's the case, we should ensure we also display standard error in the console.

Hi @edublancas ,
Currently, ploomber engine prints the output from stdout only when the cell is completely executed, however, this is not ideal as the output should be printed to the console as soon as it is printed to notebook stdout

I have mentioned more details in PR #66

from ploomber-engine.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.