Git Product home page Git Product logo

ansible-role-beegfs's Introduction

Build Status

stackhpc.beegfs

This Ansible role can be used to create and destroy a BeegFS cluster. In summary, BeegFS is a parallel file system that spreads user data across multiple servers. It is designed to be scalable both in terms of performance and capacity. Learn more about BeeFS here.

The role was last tested using Ansible version 2.5.0.

Example

Say we have an inventory that looks like this (inventory-beegfs):

[leader]
bgfs1 ansible_host=172.16.1.1 ansible_user=centos

[follower]
bgfs2 ansible_host=172.16.1.2 ansible_user=centos

[cluster:children]
leader
follower

[cluster_beegfs_mgmt:children]
leader

[cluster_beegfs_mds:children]
leader

[cluster_beegfs_oss:children]
leader
follower

[cluster_beegfs_client:children]
leader
follower

And a corresponding playbook as this (beegfs.yml):

---
- hosts:
  - cluster_beegfs_mgmt
  - cluster_beegfs_mds
  - cluster_beegfs_oss
  - cluster_beegfs_client 
  roles:
  - role: stackhpc.beegfs
    beegfs_enable:
      admon: false
      mgmt: "{{ inventory_hostname in groups['cluster_beegfs_mgmt'] }}"
      meta: "{{ inventory_hostname in groups['cluster_beegfs_mds'] }}"
      oss: "{{ inventory_hostname in groups['cluster_beegfs_oss'] }}"
      tuning: "{{ inventory_hostname in groups['cluster_beegfs_oss'] }}"
      client: "{{ inventory_hostname in groups['cluster_beegfs_client'] }}"
    beegfs_oss:
    - dev: "/dev/sdb"
      port: 8003
    - dev: "/dev/sdc"
      port: 8103
    - dev: "/dev/sdd"
      port: 8203
    beegfs_mgmt_host: "{{ groups['cluster_beegfs_mgmt'] | first }}"
    beegfs_client:
    - path: "/mnt/beegfs"
      port: 8004
    beegfs_fstype: "xfs"
    beegfs_force_format: false
    beegfs_interfaces: ["ib0"]
    beegfs_rdma: true
    beegfs_state: present
...

To create a cluster:

# ansible-playbook beegfs.yml -i inventory-beegfs -e beegfs_state=present

To destroy a cluster:

# ansible-playbook beegfs.yml -i inventory-beegfs -e beegfs_state=absent

Notes

Enabling various BeegFS services is as simple as configuring toggles under beegfs_enable to true or false where:

  • mgmt: Management server - minimum one host
  • mds: Metadata storage server nodes
  • oss: Object storage server nodes
  • client: Clients of the BeeGFS storage cluster
  • admon: NOT IMPLEMENTED

This role is dependent upon each node's hostname resolving to the IP address used to reach the management host, as configured via beegfs_host_mgmt. In this case, bgsf1 and bgfs2 must resolve to 172.16.1.1 and 172.16.1.2 respectively. This may be done via DNS or /etc/hosts.

It is important to note that when provisioning the cluster, if the block devices specified already have a file system specified, or the disk is not empty, it is important to force format the disk. This can be set my setting beegfs_force_format to true. THIS WILL DELETE THE CONTENT OF THE DISK(S). Make sure you have made backups if you care about their content.

Partitions are supported but they must already have been created through another means. Additionally, you will also need override the variable beegfs_oss_tunable with a list of parent block devices since partitions do not live under /sys/block/. For example, to create partitions using an Ansible module called parted (works on Ansible version 2.5+), you can run the following playbook:

---
- hosts:
  - cluster_beegfs_oss
  vars:
    partitions:
    - dev: /dev/sdb
      start: 0%
      end: 50%
      number: 1
    - dev: /dev/sdb
      start: 50%
      end: 100%
      number: 2
  tasks:
  - name: Create partitions
    parted:
      label: gpt
      state: present
      part_type: primary
      device: "{{ item.dev }}"
      part_start: "{{ item.start }}"
      part_end: "{{ item.end }}"
      number: "{{ item.number }}"
    with_items: "{{ partitions }}"
    become: true
...

Tests

Some tests are provided in molecule folder. To run them locally you need:

Once you have all the dependencies installed you can run the tests from the root folder of the role:

$> molecule lint
$> molecule test
$> molecule test -s vagrant-ubuntu-16.04
$> molecule test -s vagrant-ubuntu-18.04
  • The default molecule scenario will test the role in a Centos7.5 machine.
  • All the tests will deploy all the services in a single machine.
  • yaml lint and ansible lint are tested
  • idempotence is checked
  • Once the execution finishes some testinfra are executed. All the scenarios use the same tests located in molecule/tests

ansible-role-beegfs's People

Contributors

brtkwr avatar markgoddard avatar oneswig avatar pescobar avatar shadowphax avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ansible-role-beegfs's Issues

404 error for Debian repo because of a typo

Default URL for Debian repo is not correct because of an underscore instead of a dash :

Bad one : https://www.beegfs.io/release/latest-stable/dists/beegfs_deb9.list
Good one : https://www.beegfs.io/release/latest-stable/dists/beegfs-deb9.list

I replaced it in my setup in stackhpc.beegfs/defaults/main.yml and it worked.

Weird thing is that for RHEL 7 only IT IS an underscore and not a dash ... (check file listing @ https://www.beegfs.io/release/beegfs_7_1/dists/ and see for yourself) :-)

Support for multiple versions of BeeGFS

It would be useful to support the installation of multiple major release versions of BeeGFS. The default should be the latest the role is aware of, but older versions would also be useful.

This is likely to affect the deployment task logic in some areas (eg, the process for RDMA enablement), and will also affect the package repo we pull packages from.

Support should be constrained to a limited number of releases, to keep the test matrix under control.

Two flags for RDMA enablement

It looks like I inadvertently introduced a second flag for enabling RDMA support, which ought to be deprecated and removed.

We should standardise on one of beegfs_enable.rdma and beegfs_rdma (and the latter has the precedent).

For backwards compatibility both forms should work (provided they don't conflict) but only one should be documented and used in the task logic.

Scaling without data loss

There is a report of data loss when scaling up the number of nodes with v19.8.1. This appears to happen even when beegfs_force_format is set to false.

Fails if kernel not updated

Failed in TASK [stackhpc.beegfs : Ensure kernel development headers are present] with "No package matching 'kernel-devel-3.10.0-957.1.3.el7.x86_64' found available, installed or updated".

The appropriate package name is constructed in client.yml off {ansible_kernel} (= uname -r) but if the kernel is old then the appropriate header package might not be in the repo. In this case the available package was 3.10.0-1062.9.1.el7 so the kernel was only behind in build number.

However including a kernel update in this role seems inadvisable!

BeeGFS is sensitive to hostname

I ended up spending another couple of hours trying to work out why the /mnt/openhpc mount was failing to mount. In the end, it ended up being down to a badly configured /etc/hosts file which had a different IP address for the openhpc-login-0 node, the node hosting the management server. This issue is simply here as a cautionary tale... not sure if it can be resolved but we can probably add some simple smoke tests before the provisioning begins to address the issue.

Resilience to hostname change

If the hostname changes from, for example, openhpc-login-0.novalocal to openhpc-login-0 due to reboot, the management server fails to start.

Support single port per OSS

At the moments, each block device on an OSS can be configured to use its own unique port. Supporting a single port would allow comparison between multi and single port scenarios.

Filesystem creation attempted on mounted device when using /dev/disk/ symlinks

If you set the dev attribute of one of the OSS devices to a symlink in /dev/disk/ e.g. /dev/disk/by-id/<id>, then run the role with the file system already mounted, the following task fails:

TASK [roles/stackhpc.beegfs : Attempt to format if the device is not mounted or if beegfs_force_format is true] **********************************************************************************************
fatal: [host]: FAILED! => {"changed": false, "cmd": "/sbin/mkfs.xfs -f -K -d su=128k,sw=8 -l version=2,su=128k -isize=512 /dev/disk/by-path/pci-0000:18:00.0-sas-phy4-lun-0", "msg": "mkfs.xfs: cannot open /dev/disk/by-path/pci-0000:18:00.0-sas-phy4-lun-0: Device or resource busy", "rc": 1, "stderr": "mkfs.xfs: cannot open /dev/disk/by-path/pci-0000:18:00.0-sas-phy4-lun-0: Device or resource busy\n", "stderr_lines": ["mkfs.xfs: cannot open /dev/disk/by-path/pci-0000:18:00.0-sas-phy4-lun-0: Device or resource busy"], "stdout": "", "stdout_lines": []}

This is because the task runs only if the device is not mounted, but the symlink source is displayed in the mount list rather than the target.

Custom rights on mounts and config files

Allow custom rights

- name: Ensure the BeeGFS mount point exists
  file:
    mode: 0755
    path: "{{ client_path }}"
    state: directory
  become: true
  notify: Restart BeeGFS client service

- name: Copy over beegfs-mounts config file
  template:
    src: beegfs-mounts.conf.j2
    dest: /etc/beegfs/beegfs-mounts.conf
    mode: 0644
  become: true
  notify: Restart BeeGFS client service

- name: Make of copy of BeeGFS client config file if it doesn't exist
  copy:
    mode: 0644
    remote_src: true
    src: /etc/beegfs/beegfs-client.conf
    dest: "/etc/beegfs/{{ beegfs_client_config_file }}"
    force: false
  become: true
  when: beegfs_client_scope_config | bool
  notify: Restart BeeGFS client service

For security, it bight be needed to remove world rights, maybe having variables :

  • beegfs_client_config_mode
  • beegfs_client_mount_mode

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.