Git Product home page Git Product logo

ligo-rucio's Introduction

LIGO Rucio

This repository has tools and notes for demonstration and evaluation of Rucio for LIGO bulk data management.

Preliminaries

Some notes on getting started

Configuration & Environment

  • RUCIO_HOME must point to a directory which includes etc/rucio.cfg
  • rucio.cfg should look like:
[client]
rucio_host = https://rucio-ligo.grid.uchicago.edu:443
auth_host = https://rucio-ligo.grid.uchicago.edu:443
ca_cert = /etc/grid-security/certificates
client_x509_proxy = /tmp/x509up_p2411400.filearAiBG.1
request_retries = 3
auth_type = x509
client_cert = /tmp/x509up_p2411400.filearAiBG.1
client_key = /tmp/x509up_p2411400.filearAiBG.1

where client_cert and client_key should point to the output of

grid-proxy-info -path
  • Admin tasks should have RUCIO_ACCOUNT=root
  • User tasks should have RUCIO_ACCOUNT=jclark (for example)

Rucio Storage Element

The first thing we need is an RSE (container for files) to upload our files to.

  1. Create the RSE (see e.g., CLI admin examples:
    rucio-admin rse add LIGOTEST
    
  2. Add supported protocols (e.g., srm, gsiftp, http, ...). To begin with, we can just use gsiftp:
    rucio-admin rse add-protocol  \
        --prefix /user/ligo/rucio \
        --domain-json '{"wan": {"read": 1, "write": 1, "delete": 1, "third_party_copy": 1}}' \
        --scheme gsiftp \
        --hostname red-gridftp.unl.edu \
        LIGOTEST
    

Note that rucio-admin operations should be performed with RUCIO_ACCOUNT=root

Scope

At least for testing, we will designate scopes according to data-taking runs (engineering and observing runs). To create an ER8 scope:

rucio-admin scope add --account jclark --scope ER8

See e.g., rucio scope docs

CLI Example

Now that we have an RSE and a scope we can experiment with the CLI examples

  1. Uploading a single frame with scope "ER8"
rucio -v upload \
    /hdfs/frames/ER8/hoft_C02/H1/H-H1_HOFT_C02-11262/H-H1_HOFT_C02-1126256640-4096.gwf
    --rse LIGOTEST --scope ER8 \
    --name H-H1_HOFT_C02-1126256640-4096.gwf

Should generate something like,

2018-02-05 13:33:31,104    DEBUG    Extracting filesize (457680774) and checksum
(ef00cf51) for file ER8:H-H1_HOFT_C02-1126256640-4096
2018-02-05 13:33:31,106    DEBUG    Automatically setting new GUID
2018-02-05 13:33:31,381    DEBUG    Using account root
2018-02-05 13:33:31,381    DEBUG    Skipping dataset registration
2018-02-05 13:33:31,381    DEBUG    Processing file
ER8:H-H1_HOFT_C02-1126256640-4096 for upload
2018-02-05 13:33:39,285    INFO    Local files and file
ER8:H-H1_HOFT_C02-1126256640-4096 recorded in Rucio have the same checksum. Will
try the upload
2018-02-05 13:33:56,808    INFO    File ER8:H-H1_HOFT_C02-1126256640-4096.gwf
successfully uploaded on the storage
2018-02-05 13:33:56,809    DEBUG    sending trace
2018-02-05 13:33:57,270    DEBUG    Finished uploading files to RSE.
2018-02-05 13:33:57,505    INFO    Will update the file replicas states
2018-02-05 13:33:57,586    INFO    File replicas states successfully updated
Completed in 34.7796 sec.

Python Example

A next step is to set up a python simple script to:

  • Retrieve a list of frame files which corresponds to some nominal data set
  • Loop through the list and call the Ruico API

This can be easily achieved with a simple python script which makes use of the pycbc datafind module and a pip install of Rucio.

Python data-insertion module

cmsexample.py is a command line tool for registering a CMS dataset into rucio. This set of slides describes the CMS evaluation. The CMS hierachy is more complicated than (at least our initial test) in LIGO. In CMS:

  • Files: ~4GB
  • Blocks (Rucio dataset): chunks of ~100 files. This is the typical unit of data transfer.
  • Datasets (Rucio container): N blocks with some physical meaning

The (current) proposed LIGO arrangement is simpler:

  • LIGO runs (ER8, O1, ...): Rucio scope
  • LIGO dataset == Rucio dataset

Here's a run-through of cmsexample.py:

  1. Instantiate the DataSetInjector object, a general class for injecting a cms dataset into rucio
  2. DataSetInjector has methods to create containers and register files and datasets
  3. This class has methods for finding the rucio url and filenames

I do not need anything to do with rucio containers (yet) so can just mimic the parts associated with file and data set registration, and some of the sanity checking. I should be able to swap out my existing routines for translating LIGO file URLs to Rucio DIDs.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.