Git Product home page Git Product logo

block-traces's Introduction

Alibaba Block Traces

These traces are published by Alibaba Group to help researchers understand the real-world workload in the cloud.

They are collected from a cluster in production of the elastic block service of Alibaba Cloud (i.e. storage for virtual disks). The cluster is located in Beijing region, one of the most popular regions of Alibaba Cloud.

There are 1000 virtual disks randomly sampled from that cluster, and all their I/O activities are recorded over the month of January 2020. These virtual disks are ultra disk products. Ultra disks are backed by a storage cluster and offer high data reliability. Ultra disks are cheaper and offer lower random I/O performance, compared to standard SSD and enhanced SSD disks link. Typical applications of ultra disks are running operating systems, big data processing software, web servers, etc..

Download

The data are available for download from Alibaba OSS. You will get the download link after taking a short survey. If you have any questions or ideas about the trace data, feel free to contact us. The current maintainer is Chao Shi <chao.shi AT alibaba-inc.com>. We are happy to see research work based on the trace data.

Just a kind reminder, the tarball is very large, 181GB gzip-compressed and 751GB uncompressed, so make sure you have enough space on your disk.

Here are MD5 checksums of the tarball and files inside.

Filename MD5 checksum
alibaba_block_traces_2020.tar.gz 95780fc531a60fd4ca0513ef88ef469c
io_traces.csv c60dd8f771738d4d8df56271e56dd308
device_size.csv 6641abe8a0f3625f13776120d2884e84

Schema

There are two files in CSV format. Their file format is defined as follow.

io_traces.csv

Each row is a read or write operation.

Column Type Example Description
device_id uint32 0 ID of the virtual disk
opcode char R Either of 'R' or 'W', indicating this operation is read or write
offset uint64 126703644672 Offset of this operation, in bytes
length uint32 4096 Length of this operation, in bytes
timestamp uint64 1577808000000626 Timestamp of this operation received by server, in microseconds

device_size.csv

Each row is a device with is capacity.

Column Type Example Description
device_id uint32 0 ID of the virtual disk
capacity uint64 536870912000 Capacity of the virtual disk, in bytes

All IDs of virtual disks are re-mapped to the range of 0 - 999.

Research outcome

Here is a list of research work based on the trace data. If your paper uses the data, it would be great to let us know and add your work to this list.

Alibaba Innovative Reseach (AIR) program sponsors research every year on various area in computer science that solve the real problems in industry scenarios. If you have fancy ideas and are interested in participating in this program, feel free to contact Chao Shi <[email protected]>.

Acknowledgements

Thanks to Qiuping Wang and Jinhong Li from the Chinese University of Hong Kong for analyzing and validating the data at an early stage.

License

The trace data and document are licensed under CC-4.0.

block-traces's People

Contributors

fallfish avatar stepinto avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

block-traces's Issues

数据集的采集方法

感谢科研人员的贡献,

这个trace对我的研究很有帮助!

请问这些数据是通过什么工具收集的?如果是自研的工具,请问哪些开源工具可以采集到和本trace类似的数据条目?

谢谢!

.bib file support?

I used this trace for my paper, and I want to cite this trace. The problem is, I'm not sure about which one should I cite.
So, should I cite this github repository, or "An In-Depth Analysis of Cloud Block Storage Workloads in Large-Scale Production" by J. Li? And can you guys provide a bibtex template for this trace?

Downlod speed is low

Hello,
I am trying to download the traces but the download rate is very low and I have high-speed internet.

Is there any other link that I can use to download the traces?

--
Iacovos

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.