Git Product home page Git Product logo

Comments (12)

eschnett avatar eschnett commented on May 2, 2024

Here is a typical output:

$ caliban cluster job submit --cluster_name einsteintoolkit-cluster --min_cpu 2000 --nogpu ./run-einsteintoolkit.sh -- /usr/app/Cactus/exe/cactus_sim /usr/app/Cactus/repos/cactusamrex/azure-pipelines/carpetx.par
I0802 15:22:43.826885 4368600512 core.py:493] Generating Docker image with parameters:
I0802 15:22:43.827569 4368600512 core.py:494] {'adc_path': '/Users/eschnett/.config/gcloud/application_default_credentials.json',
 'build_path': '/Users/eschnett/src/CarpetX',
 'caliban_config': {'apt_packages': {'cpu': [], 'gpu': []},
                    'base_image': 'eschnett/carpetx-caliban:cpu',
                    'build_time_credentials': False,
                    'default_mode': <JobMode.CPU: 'CPU'>,
                    'gcloud': {},
                    'julia_version': None,
                    'mlflow_config': None},
 'conda_env_path': None,
 'credentials_path': '/Users/eschnett/.config/service_key.json',
 'extra_dirs': None,
 'job_mode': <JobMode.CPU: 'CPU'>,
 'no_cache': False,
 'package': Package(executable=['/bin/bash'], package_path='.', script_path='./run-einsteintoolkit.sh', main_module=None),
 'requirements_path': None,
 'setup_extras': None}
I0802 15:22:43.831600 4368600512 build.py:645] Running command: docker build --rm -f- /Users/eschnett/src/CarpetX
Sending build context to Docker daemon  781.6MB
Step 1/14 : FROM eschnett/carpetx-caliban:cpu
 ---> fb968fce7080
Step 2/14 : RUN [ $(getent group 20) ] || groupadd --gid 20 20
 ---> Using cache
 ---> 09f2ad0c1e9b
Step 3/14 : RUN useradd --no-log-init --no-create-home -u 501 -g 20 --shell /bin/bash eschnett
 ---> Using cache
 ---> f406a157f866
Step 4/14 : RUN mkdir -m 777 /usr/app /.creds /.resources /home/eschnett
 ---> Using cache
 ---> 67fa19d0d939
Step 5/14 : ENV HOME=/home/eschnett
 ---> Using cache
 ---> 924f54f21fa0
Step 6/14 : WORKDIR /usr/app
 ---> Using cache
 ---> c426e9fb88b5
Step 7/14 : USER 501:20
 ---> Using cache
 ---> 864eb4616310
Step 8/14 : COPY --chown=501:20 .caliban_default_creds.json /.creds/credentials.json
 ---> Using cache
 ---> 5877e5cf29a6
Step 9/14 : RUN gcloud auth activate-service-account --key-file=/.creds/credentials.json &&   git config --global credential.'https://source.developers.google.com'.helper gcloud.sh
 ---> Using cache
 ---> 1c8294587031
Step 10/14 : ENV GOOGLE_APPLICATION_CREDENTIALS=/.creds/credentials.json
 ---> Using cache
 ---> c90d15a06bc4
Step 11/14 : COPY --chown=501:20 .caliban_adc_creds.json /home/eschnett/.config/gcloud/application_default_credentials.json
 ---> Using cache
 ---> 8664752dc5f4
Step 12/14 : COPY --chown=501:20 cloud_sql_proxy.py /.resources
 ---> Using cache
 ---> d0328c8deda3
Step 13/14 : COPY --chown=501:20 . /usr/app/.
 ---> c1ef953f75e6
Step 14/14 : ENTRYPOINT ["/bin/bash", "./run-einsteintoolkit.sh"]
 ---> Running in 649a4bb619e8
Removing intermediate container 649a4bb619e8
 ---> 89115658387d
Successfully built 89115658387d
The push refers to repository [gcr.io/fifth-curve-272318/89115658387d]
3115ac903f4e: Pushed
8967bd659a94: Layer already exists
19fbd9a3af43: Layer already exists
a3a7098f01c8: Layer already exists
312ca960ac02: Layer already exists
ba8a6d988260: Layer already exists
fcc7d151ecbb: Layer already exists
8b43d09dc80b: Layer already exists
f687dbdcfcbe: Layer already exists
ab3d90e6c5bd: Layer already exists
c2baa3c233db: Layer already exists
3cdc847693bd: Layer already exists
0c11c52b6fa2: Layer already exists
d02b3bafbf2b: Layer already exists
8bb49dcf7729: Layer already exists
bde8cc76e518: Layer already exists
c83c21629dcf: Layer already exists
a8c372c103ab: Layer already exists
4e84efa3db52: Layer already exists
25f9bffed627: Layer already exists
d4b51844f98b: Layer already exists
ed1d1d4a83ac: Layer already exists
df7bab7a7925: Layer already exists
28ba7458d04b: Layer already exists
838a37a24627: Layer already exists
a6ebef4a95c3: Layer already exists
b7f7d2967507: Layer already exists
latest: digest: sha256:a00f7bc72418a1d47e0071e9addfb0ca20e703309d3b33187b625f28e031d334 size: 5980
I0802 15:25:05.499978 4368600512 cli.py:473] jobs submitted, visit https://console.cloud.google.com/kubernetes/clusters/details/us-central1-a/einsteintoolkit-cluster?project=fifth-curve-272318 to monitor

Note that the last copy (copying the current directory) does not use the cache, and the respective layer is pushed.

from caliban.

ajslone avatar ajslone commented on May 2, 2024

Erik, I'll have a look, as you are right both that this shouldn't happen and that it's quite annoying.

from caliban.

eschnett avatar eschnett commented on May 2, 2024

I believe the problem is using . as source of the COPY command. When I copy another large directory explicitly, the copy command is properly cached.

If that is indeed so, then one work-around might be to explicitly list all files (including dot files, of course) from the package directory in the copy command.

from caliban.

sritchie avatar sritchie commented on May 2, 2024

@eschnett ah, nice find. Another workaround would be to move your files out of the top level directory; by default, Caliban only includes the folder containing the script you've passed it, and then you have the ability to add more directories with the -d flag.

So you might change:

caliban cluster job submit ./run-einsteintoolkit.sh

to:

caliban cluster job submit -d data -d misc bin/run-einsteintoolkit.sh

or something like that, and that should fix it. Hopefully this will help for now!

from caliban.

sritchie avatar sritchie commented on May 2, 2024

Also, we're close to removing the --nogpu requirement, by

  • making --nogpu the default
  • allowing you to set a sticky default for your own machine,
  • and allowing you to override the gpu or nogpu default in .calibanconfig.json locally.

Shorter command strings are always a bonus!

from caliban.

eschnett avatar eschnett commented on May 2, 2024

@sritchie I have a folder called Cactus in the top level directory of the project. This Cactus folder contains the files that are not cached. This folder is copied into the Docker image. Are you recommending to use two separate directories, one project directory (which contains e.g. the .calibanconfig), and another directory tree outside it which contains the large files? In this case, I would need to specify ../Cactus or similar, not just Cactus, right?

Thanks for the --nogpu.

from caliban.

sritchie avatar sritchie commented on May 2, 2024

@eschnett almost - instead of:

root_folder/Cactus/....
root_folder/run-einsteintoolkit.sh

#command:
caliban cluster job submit ./run-einsteintoolkit.sh

do this:

root_folder/Cactus/....
root_folder/bin/run-einsteintoolkit.sh

#command:
caliban cluster job submit -d Cactus bin/run-einsteintoolkit.sh

I bet that will do the trick (unless @ajslone tells us that caliban cluster job submit doesn't handle -d, but I think it does.

from caliban.

eschnett avatar eschnett commented on May 2, 2024

Yes, this works! Thanks.

I would usually write ./bin/run-einsteintoolkit.sh instead of bin/run-einsteintoolkit.sh, but that doesn't work here. You cannot have the initial ./ in the path.

from caliban.

sritchie avatar sritchie commented on May 2, 2024

from caliban.

eschnett avatar eschnett commented on May 2, 2024

It seems the data directories are added to the Docker image before the apt dependencies are installed. This is the wrong order for me. I agree that this is the right order if one uses large, unchanging data sets (and I'm not advocating to change this order), but this makes the work-around a bit worse for me.

from caliban.

sritchie avatar sritchie commented on May 2, 2024

@eschnett yes, this is a real problem for a few reasons... we have a place where sometimes we need credentials before ANY dependencies, so we can get private deps. But if you don't need that feature, it's very disruptive to rebuild everything before the cache.

The solution here could be either:

  • letting users hint the order in .calibanconfig.json, or
  • figuring out how we can more aggressively use docker's multi-container builds to get out of this linear structure that is hurting us.

I want the tool to keep its sane defaults, but I do want to get to a world where our Docker building looks like like building a list of data structures internally, and then taking hints from the user about how to change that build order.

from caliban.

eschnett avatar eschnett commented on May 2, 2024

I don't have an opinion either way. If putting a job script into the root directory could be made to use the cache, I probably wouldn't need data directories. My workflow starts from scratch and produces large output files.

from caliban.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.