Git Product home page Git Product logo

quickstart's Introduction

Turing AI Cloud Quick Start

Workflow Overview

Workflow

The above picture illustrates the submission and debug workflows of TACC job.

Creating a TACC account

Before using tcloud SDK, please make sure that you have applied for a TACC account and submitted your public key to TACC. You may generate SSH public key according to the steps. To apply for a TACC account, please visit our website .

Installing tcloud SDK

  • Download tcloud SDK
    Download the latest tcloud SDK from tags.
  • Install tcloud SDK
    Place setup.sh and tcloud in the same directory, and run setup.sh.

Submitting Your First TACC Job

CLI Tool Initialization

  • First, you need to configure your TACC credentials. You can do this by running the tcloud config command:
    $ tcloud config [-u/--username] MYUSERNAME
    $ tcloud config [-f/--file] MYPRIVATEFILEPATH
    
  • Then, run tcloud init command to obtain the latest cluster hardware information from TACC cluster.
    PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
    tacc*        up   infinite      5  alloc 10-0-7-[18-19],10-0-8-[18-19]
    tacc*        up   infinite     19   idle 10-0-2-[18-19],10-0-3-[10-13]
    

Download Sample Job

You can use this link to download our example code.

Submit a Job

Each job requires a main.py with tuxiv.conf

After tcloud is configured correctly, you can try to submit your first job.

  1. Go to the example folder in your terminal.
  2. Run tcloud submit command.
    ~/Dow/quickstart-master/example/helloworld ❯ tcloud submit
    Start parsing tuxiv.conf...
    building file list ...
    8 files to consider
    helloworld/
    helloworld/run.sh
            151 100%    0.00kB/s    0:00:00 (xfer#1, to-check=5/8)
    helloworld/configurations/
    helloworld/configurations/citynet.sh
              12 100%   11.72kB/s    0:00:00 (xfer#2, to-check=2/8)
    helloworld/configurations/conda.yaml
            107 100%  104.49kB/s    0:00:00 (xfer#3, to-check=1/8)
    helloworld/configurations/run.slurm
            278 100%  271.48kB/s    0:00:00 (xfer#4, to-check=0/8)
    
    sent 429 bytes  received 144 bytes  382.00 bytes/sec
    total size is 1071  speedup is 1.87
    Submitted batch job 2000
    Job helloworld submitted.
    

Retrive Your Job Status and Output

In this section, we provide two methods to monitor the job log.

After training, you can use tcloud ls [filepath] to find the output files

  • cat

    You can configure your log path in the tuxiv.conf. The default path is slurm_log/slurm-jobid.out.

    tcloud cat slurm_log/slurm-jobid.out
    

    In the helloworld example, the tuxiv.conf file specifies the log path as slurm_log/hello.log

  • download

    You can use tcloud download [filepath].

    Note that you can only read and download files in USERDIR, and the files in WORKDIR may be removed after the job is finished.

    tcloud download slurm_log/slurm-jobid.out
    

Manage your environment

tcloud uses conda environment to manage your dependencies. We offer two ways of environment management:

  1. One-off Environment. A new environment will be created every time you submit a task to TACC. If your dependencies configuration does not change between two consecutive submissions, we will reuse the previous environment to save time. This is the default behavior.
  2. Persistent Environment. The environment will be shared across multiple task submissions. When you change your dependency configuration, tcloud will update this environment in stead of creating a new one. Learn how to do this in tuxiv.conf documentation

Demo video

The following videos will help you use tcloud CLI to begin your TACC journey: demo video.

Examples

Basic examples are provided under the example folder. These examples include: HelloWorld, TensorFlow, PyTorch and MXNet.

FAQ

FAQ

quickstart's People

Contributors

xcwanandy avatar decsun avatar andyxukq avatar lalalapotter avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.