Git Product home page Git Product logo

eage_mdchpc_2020's Introduction

Reproducible material for "Implementation of large-scale integral operators with modern HPC solutions"

Authors: M.Ravasi, I.Vasconcelos

This repository contains the material used in the Implementation of large-scale integral operators with modern HPC solutions extended abstract to be presented at EAGE 2020 in Amsterdam.

Users interested to reproduce figures in the abstract can simply run the provided notebooks as explained below. Python scripts are also provided for long-running jobs that perform a certain processing step for the entire dataset.

These codes have been tested on local HPC as well as K8S with similar performances

Notebooks

All the figures in the abstract can be reproduced by running the following notebooks:

  • Marchenko3D_datacreation.ipynb: Data Creation notebook. The synthetic dataset is originally separated into .npz files. In this notebook the dataset is converted into a singe Zarr file using Dask for concurrent IO. The resulting file will loaded into distributed memory and used for MDC and Marchenko redatuming.

  • MDC_timing.ipynb: Timing of MDC operation for single virtual source. Used to create figure 2a.

  • MDCmulti_timing.ipynb: Timing of MDC operation for multiple virtual sources. Used to create input data for figure 2b.

  • MDC_timing_comparison.ipynb: Plotting different timings of MDC operation. Used to create figure 2.

  • Marchenko3D.ipynb: 3D Marchenko redatuming for single virtual point. Create Green's functions to be visualized by

  • Marchenko3Dmulti.ipynb: 3D Marchenko redatuming for multiple virtual points. Not used in the paper.

  • Marchenko3D_comparison.ipynb: Plotting Marchenko fields for different subsampling factors. Used to create figure 3.

  • MDD3D.ipynb: 3D Multi-dimensional deconvolution of Marchenko fields. Not used in the abstract.

  • MDD3D_visualization.ipynb: Plotting MDD redatumed local responses. Not used in the abstract.

Python scripts

  • MDC_timing.py: Timing of MDC operation for single virtual source. Same as MDC_timing.ipynb, used to perform timing of several configurations in 'batch' mode together with MDC_timing.sh

  • MDCmulti_timing.py: Timing of MDC operation for multiple virtual sources. Same as MDCmulti_timing.ipynb, used to perform timing of several configurations in 'batch' mode together with MDCmulti_timing.sh

  • Marchenko3D.py: 3D Marchenko redatuming for an entire depth level. Used to estimate fields in 'batch' mode together with Marchenko3D.sh

  • Marchenko3Dmulti.py: 3D Marchenko redatuming for an entire depth level using multiple virtual points. Used to estimate fields in 'batch' mode together with Marchenko3Dmulti.sh

  • MDD3D.py: 3D Multi-dimensional deconvolution of Marchenko fields for an entire depth level. Used to estimate fields in 'batch' mode together with MDD3D.sh

Auxiliary files:

  • setup-ssh.sh: Shell script to setup a SSH Dask cluster (note that you will need a hostfile.txt file in the same directory with the addresses of the nodes you want to use when setting up the cluster. Refer to https://docs.dask.org/en/latest/setup/ssh.html for more details.
  • utils.py: Small Python functions used in various notebooks

K8S

This directory contains the helm charts used to configure K8S and some instructions on how to run the Marchenko3D.py in a Kubernetes cluster. Note that our setup is based on the https://github.com/dask/helm-chart and we refer to those for more details with respect to the entire configuration setup.

Input data:

The input dataset has been created using a finite-difference modelling code available in the Madagascar sofware package. Given the size of the input dataset, authors will not be able to share it directly but can provide the SConstruct file used to create the data.

Environment

To ensure reproducibility of the results, we suggest using the requirements.txt file when creating an environment.

Note: All notebooks use the $STORE_PATH enviroment variable to specify the location of the input dataset. Either create this environment variable prior to running a notebook or use os.environ["STORE_PATH"] = "/path/to/data/" directly within the notebook.

eage_mdchpc_2020's People

Contributors

mrava87 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.