Comments (3)
Hello @matbun,
After speaking with few people at CERN, there are two "main" way to interact with RUCIO data.
- Download the desired dataset into the localhost.
- Make a replication rule so that the files are available within the "local" RSE (RUCIO Storage Element), i.e., the distributed storage that should exists on each of the data centers. (And that should be mounted when you are logged in).
Option 1 takes much more time that option 2. Furthermore, you would need to keep an internet connection open during the whole download.
Therefore, we should go with option 2.
I can already create a small bash script for VEGA that simlinks all the dataset files into a txt file, that we would need to adapt for each of the data centers. Step by step ;-).
Let me know where I can add this script within itwinai
.
from itwinai.
from itwinai.
I have created a new tutorial folder on a new branch: https://github.com/interTwin-eu/itwinai/tree/156-easily-access-datasets-on-rucio-data-lake/tutorials/data-lake/pull-dataset
@garciagenrique could you please add an example of "option 2" with some documentation? The goal is giving such example to the interTwin use cases, so that they can reproduce it for their datasets. Perhaps a couple of links to Rucio docs would help as well.
Thanks!
from itwinai.
Related Issues (20)
- Add caching of pipeline steps
- Better explain how Pipeline parser parses subclasses
- First integration with Prov4ML HOT 1
- Video tutorial on torch distributed ML
- Improve SQAaaS badge
- Add documentation for ML workflows (a.k.a. Pipeline)
- Distributed ML for CERFACS
- HPO for CERFACS
- First Radio Astronomy use case integration
- Workflow representation for Wildfires (CMCC)
- Update documentation
- itwinai logo HOT 1
- Run WandB offline
- EURAC use case integration
- tutorial on interLink
- Kubeflow tutorial
- Distributed Torch Predictor
- Resume distributed training from checkpoint
- Import MD files into docs
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from itwinai.