Git Product home page Git Product logo

xenvm's Introduction

xenvm

Build Status Coverage Status

Support tools for a thin lvhd implementation as described in the design doc.

To set up a test environment, run:

$ sudo ./setup.sh

This will

  1. create a sparse file to simulate a large LUN, using /dev/loop0
  2. formats the LUN for "XenVM": this is like LVM only with a built-in redo-log and operation journalling
  3. creates the metadata volumes for a single client host ("host1")
  4. creates 1000 LVs as a micro-benchmark

You can then query the state of the system with:

$ ./xenvm.native lvs
$ ./xenvm.native host-list

In another terminal start the local-allocator:

$ sudo ./local-allocator.native

This will take a few seconds to complete its handshake. You can then type in the name of a dm-device to request more space. Type in djstest-live: you will see it allocate from it local thin-pool, send the update to the master and update the local device mapper device.

To shut everything down run

$ sudo ./clean.sh

Note that a clean shutdown requires local-allocators to be online and responding to the handshake.

xenvm's People

Contributors

cheng-z avatar djs55 avatar euanh avatar jonludlam avatar simonjbeaumont avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

xenvm's Issues

Need these LVM commands:

  • lvchange /dev/vg/lvname -p {r/rw}
  • lvchange --addtag | --deltag 'hidden'
  • lvchange --refresh
  • lvcreate -n [-l <size_in_percentage> | -L <size_in_mb>] vgname --addtag
  • lvdisplay <vg/lv> or </dev/vg/lv> or </dev/mapper/vg--lv> etc, output ignored
  • lvdisplay -c - comma separated, used to get size (7th field, in sectors)
  • lvremove -f --config devices{<devices?>}
  • lvrename vg/lvname newname
  • lvresize -L newsize /dev/vg/lvname
  • lvs --noheadings /dev/vg/lvname (output ignored)
  • lvs --noheadings --units b -o +lv_tags -- eg:
    test-lv-999 vg -wi-a---- 4194304B foo,bar,bzqa
  • pvcreate -ff -y --metadatasize 10M /dev/path
  • pvremove device
  • pvresize device
  • pvs --noheadings --nosuffix --units b - use size (field 5) and free space (field 6)
  • pvs --noheadings -o vg_name <pv name?>
  • pvs
  • vgchange -a[n|y] vgname
  • vgcreate vgname /root/device
  • vgextend vgname /dev/path
  • vgremove vgname
  • vgs (output ignored)
  • vgs --noheadings --nosuffix --units b - use size (field 6) and free space (field 7)

lvchange --addtag

May 10 14:05:31 localhost SM: [5485] LVHDVDI.delete for b51f9ee9-1145-4471-913d-a0071d934a93
May 10 14:05:31 localhost SM: [5485] ['/bin/xenvm', 'lvchange', '--addtag', 'hidden', '/dev/VG_XenStorage-6006d90b-e8af-7ed7-df89-5c0b0b1ef2eb/LV-b51f9ee9-1145-4471-913d-a
0071d934a93']
May 10 14:05:31 localhost SM: [5485] FAILED in util.pread: (rc 1) stdout: '', stderr: 'xenvm: unknown option `--addtag'
May 10 14:05:31 localhost SM: [5485] Usage: xenvm lvchange [OPTION]... NAME
May 10 14:05:31 localhost SM: [5485] Try `xenvm lvchange --help' or `xenvm --help' for more information.
May 10 14:05:31 localhost SM: [5485] '

misunderstanding with lvresize and lvextend

We have two command lvresize and lvextend. But both of them have the same description.

If they are the same. just keep one. If different, please add more detail in description.

xenvm lvextend command error message not clear enough

Run lvextend to extend the size of the logic volume to exceed the available space, the error message will git as:
Xenvm_interface.Internal_error("Failure("Only this much space is available: 2035")")

It should have a unit of MB or GB or something to show a understandable value.

SM uses xenvm pvs which is currently not an available option

Encountered it while trying to delete local SR.

Apr 28 14:32:21 localhost SM: [19392] LVMCache: refreshing
Apr 28 14:32:21 localhost SM: [19392] ['/bin/xenvm', 'lvs', '--noheadings', '--units', 'b', '-o', '+lv_tags', '/dev/VG_Xen
Storage-90982251-79e0-8176-010f-ff65bb4c0f1d']
Apr 28 14:32:21 localhost SM: [19392]   pread SUCCESS
Apr 28 14:32:21 localhost SM: [19392] ['/bin/xenvm', 'pvs', '/dev/disk/by-id/ata-WDC_WD2502ABYS-18B7A0_WD-WCAT18036412-par
t3']
Apr 28 14:32:21 localhost SM: [19392] FAILED in util.pread: (rc 1) stdout: '', stderr: 'xenvm: unknown command `pvs'
Apr 28 14:32:21 localhost SM: [19392] Usage: xenvm COMMAND ...
Apr 28 14:32:21 localhost SM: [19392] Try `xenvm --help' for more information.
Apr 28 14:32:21 localhost SM: [19392] '
Apr 28 14:32:22 localhost SM: [19392] Raising exception [93, pvs failed [opterr=error is 1]]
Apr 28 14:32:22 localhost SM: [19392] lock: released /var/lock/sm/90982251-79e0-8176-010f-ff65bb4c0f1d/sr
Apr 28 14:32:22 localhost SM: [19392] ***** generic exception: sr_delete: EXCEPTION <class 'SR.SROSError'>, pvs failed [op
terr=error is 1]
Apr 28 14:32:22 localhost SM: [19392]   File "/opt/xensource/sm/SRCommand.py", line 110, in run
Apr 28 14:32:22 localhost SM: [19392]     return self._run_locked(sr)
Apr 28 14:32:22 localhost SM: [19392]   File "/opt/xensource/sm/SRCommand.py", line 159, in _run_locked
Apr 28 14:32:22 localhost SM: [19392]     rv = self._run(sr, target)
Apr 28 14:32:22 localhost SM: [19392]   File "/opt/xensource/sm/SRCommand.py", line 299, in _run
Apr 28 14:32:22 localhost SM: [19392]     return sr.delete(self.params['sr_uuid'])
Apr 28 14:32:22 localhost SM: [19392]   File "/opt/xensource/sm/LVMSR", line 575, in delete
Apr 28 14:32:22 localhost SM: [19392]     lvutil.removeVG(self.root, self.vgname)
Apr 28 14:32:22 localhost SM: [19392]   File "/opt/xensource/sm/lvutil.py", line 429, in removeVG
Apr 28 14:32:22 localhost SM: [19392]     opterr='error is %d' % inst.code)
Apr 28 14:32:22 localhost SM: [19392]   File "/opt/xensource/sm/xs_errors.py", line 52, in __init__
Apr 28 14:32:22 localhost SM: [19392]     raise SR.SROSError(errorcode, errormessage)
Apr 28 14:32:22 localhost SM: [19392]
Apr 28 14:32:22 localhost SM: [19392] ***** Local VHD on LVM: EXCEPTION <class 'SR.SROSError'>, pvs failed [opterr=error i
s 1]

local allocator is rounding down extents to 0

It should always round up the number of extents:

No free blocks, sleeping for 5s
FreePool.extents 0 extents from ((pv0 (13 32)) (pv0 (11 1)))
There are 1 items in the journal to replay
((volume
  ((volume VHD-1637be81-1b4d-427b-ae6e-9c131a648e06)
   (segments
    (((start_extent 514) (extent_count 0)
      (cls (Linear ((name pv0) (start_extent 13)))))))))
 (device
  ((extents ((pv0 (13 0))))
   (device
    VG_XenStorage--e7752700--0506--dfee--77d2--4b9cfaa36d91-VHD--1637be81--1b4d--427b--ae6e--9c131a648e06)
   (targets
    (((start 4210688) (size 0)
      (kind
       (Linear
        ((device
          (Path
           /dev/disk/by-id/ata-WDC_WD2502ABYS-18B7A0_WD-WCAT1H344267-part3))
         (offset 127104))))))))))
Suspend local dm device
device-mapper: reload ioctl on VG_XenStorage--e7752700--0506--dfee--77d2--4b9cfaa36d91-VHD--1637be81--1b4d--427b--ae6e--9c131a648e06 failed: Invalid argument
Failed to process journal item: Failure("dm_task_run failed")

FromLVM.suspend failure

When restarting local-allocator after it quits as described in #127, FromLVM.suspend could not complete successfully:

Loaded configuration: ((socket
/var/run/sm/allocator/VG_XenStorage-7fc7a66e-7330-51b6-1221-aff50e05fb95)
(allocation_quantum 16)
(localJournal
/tmp/sm/allocator-journal/VG_XenStorage-7fc7a66e-7330-51b6-1221-aff50e05fb95)
(devices (/dev/disk/by-id/scsi-36001405b4b212f1d57fcd4324d9d51de))
(toLVM d617c250-3704-4b5b-a5ed-5594b29023e4-toLVM)
(fromLVM d617c250-3704-4b5b-a5ed-5594b29023e4-fromLVM))
ToLVM queue is currently Running
Device /dev/disk/by-id/scsi-36001405b4b212f1d57fcd4324d9d51de has 512 byte sectors
The Volume Group has 8192 sector (4 MiB) extents
There are 0 items in the journal to replay
Initialising the FreePool
FromLVM queue is currently Running
Suspending FromLVM queue
FromLVM.suspend got Running; sleeping FromLVM.suspend got Running; sleeping
FromLVM.suspend got Running; sleeping FromLVM.suspend got Running; sleeping
FromLVM.suspend got Running; sleeping FromLVM.suspend got Running; sleeping
FromLVM.suspend got Running; sleeping FromLVM.suspend got Running; sleeping

I restarted it again and got following logs:

Loaded configuration: ((socket
/var/run/sm/allocator/VG_XenStorage-7fc7a66e-7330-51b6-1221-aff50e05fb95)
(allocation_quantum 16)
(localJournal
/tmp/sm/allocator-journal/VG_XenStorage-7fc7a66e-7330-51b6-1221-aff50e05fb95)
(devices (/dev/disk/by-id/scsi-36001405b4b212f1d57fcd4324d9d51de))
(toLVM d617c250-3704-4b5b-a5ed-5594b29023e4-toLVM)
(fromLVM d617c250-3704-4b5b-a5ed-5594b29023e4-fromLVM))
ToLVM queue is currently Running
Device /dev/disk/by-id/scsi-36001405b4b212f1d57fcd4324d9d51de has 512 byte sectors
The Volume Group has 8192 sector (4 MiB) extents
There are 0 items in the journal to replay
Initialising the FreePool
FromLVM queue is currently Running
Suspending FromLVM queue
FromLVM.suspend: retrying after 5s
FromLVM.suspend: retrying after 5s
FromLVM.suspend: retrying after 5s
FromLVM.suspend: retrying after 5s
FromLVM.suspend: retrying after 5s
FromLVM.suspend: retrying after 5s

vgs should probe the disk directly

It's safe to query the disk for out-of-date summary statistics in read/only mode and this let's us probe for existing volume groups without having to start a xenvmd.

(One could argue that it's safer to read the disk than to report 'no volume group found')

lvchange --refresh

May 10 14:17:30 localhost SM: [27805] ['/bin/xenvm', 'lvchange', '-ay', '/dev/VG_XenStorage-6006d90b-e8af-7ed7-df89-5c0b0b1ef2eb/VHD-e66a6dfb-4615-4902-8348-5919d4274768']
May 10 14:17:30 localhost SM: [27805]   pread SUCCESS
May 10 14:17:30 localhost SM: [27805] ['/bin/xenvm', 'lvchange', '--refresh', '/dev/VG_XenStorage-6006d90b-e8af-7ed7-df89-5c0b0b1ef2eb/VHD-e66a6dfb-4615-4902-8348-5919d427
4768']
May 10 14:17:30 localhost SM: [27805] FAILED in util.pread: (rc 1) stdout: '', stderr: 'xenvm: unknown option `--refresh'
May 10 14:17:30 localhost SM: [27805] Usage: xenvm lvchange [OPTION]... NAME
May 10 14:17:30 localhost SM: [27805] Try `xenvm lvchange --help' or `xenvm --help' for more information.
May 10 14:17:30 localhost SM: [27805] '

Bad error message when no xenvmd

Rather than saying:

May 10 20:26:15 localhost SM: [2721] ['/bin/xenvm', 'vgs', 'VG_XenStorage-770cdfa8-ccbf-d209-46ed-72e8e65f926a']
May 10 20:26:15 localhost SM: [2721] FAILED in util.pread: (rc 1) stdout: '', stderr: 'xenvm: internal error, uncaught exception:
May 10 20:26:15 localhost SM: [2721]        Unix.Unix_error(Unix.ECONNRESET, "read", "")

We should say:

[root@st30 thin-lvhd-tools]# vgs foo
  Volume group "foo" not found
  Skipping volume group foo
[root@st30 thin-lvhd-tools]# echo $?
5

Protocol for shutting down the system cleanly

The protocol is:

  1. Xenvmd suspends the incoming ToLVM queues
  2. the local allocator notices on the next poll, acknowledges and exits
  3. Xenvmd flushes updates from the incoming ToLVM queues
  4. Xenvmd flushes the LVM redo log deltas into the primary metadata

Possible race in daemon startup?

Perhaps the --daemon returned too quickly? When talking directly to xenvmd, xenvm has a retry loop. I bet xapi hasn't got one of those. Perhaps we should try harder to ensure --daemon returns after the socket is bound?

May 10 12:59:12 localhost SM: [2913] ['/sbin/xenvmd', '--daemon', '--config', '/etc/xenvm.d//VG_XenStorage-6006d90b-e8af-7ed7-df89-5c0b0b1ef2eb.xenvmd.config']
May 10 12:59:12 localhost SM: [2913]   pread SUCCESS
May 10 12:59:12 localhost SM: [2913] ['/bin/xenvm', 'set-vg-info', '--pvpath', '/dev/disk/by-id/scsi-3600507605d0016801b9753d12142850d-part3', '--uri', 'file://local/servi
ces/xenvmd/6006d90b-e8af-7ed7-df89-5c0b0b1ef2eb', '-S', '/var/lib/xcp/xapi', 'VG_XenStorage-6006d90b-e8af-7ed7-df89-5c0b0b1ef2eb']
May 10 12:59:12 localhost SM: [2913]   pread SUCCESS
May 10 12:59:12 localhost SM: [2913] ['/bin/xenvm', 'host-create', 'VG_XenStorage-6006d90b-e8af-7ed7-df89-5c0b0b1ef2eb', 'c64c41dd-88c3-4ccd-8897-4614ead5508f']
May 10 12:59:12 localhost SM: [2913] FAILED in util.pread: (rc 1) stdout: '', stderr: 'xenvm: internal error, uncaught exception:
May 10 12:59:12 localhost SM: [2913]        Unix.Unix_error(Unix.ECONNRESET, "read", "")

Resize fails when device is active

May  9 18:06:26 st20 SM: [6629] ['/bin/xenvm', 'lvrename', '/dev/VG_XenStorage-b1e34b94-e98b-dd67-6d93-9b57772c93b4/VHD-30dc7e84-b8ae-480a-91ee-7be9be1ca284', 'VHD-50b66606-60b1-4634-b400-56f827b18740']
May  9 18:06:26 st20 SM: [6629]   pread SUCCESS
May  9 18:06:26 st20 SM: [6629] Refcount for lvm-b1e34b94-e98b-dd67-6d93-9b57772c93b4:50b66606-60b1-4634-b400-56f827b18740 set => (1, 0b)
May  9 18:06:26 st20 SM: [6629] ['/usr/bin/vhd-util', 'modify', '--debug', '-s', '8388608', '-n', '/dev/VG_XenStorage-b1e34b94-e98b-dd67-6d93-9b57772c93b4/VHD-50b66606-60b1-4634-b400-56f827b18740']
May  9 18:06:26 st20 SM: [6629]   pread SUCCESS
May  9 18:06:26 st20 SM: [6629] ['/bin/xenvm', 'lvresize', '-L', '8', '/dev/VG_XenStorage-b1e34b94-e98b-dd67-6d93-9b57772c93b4/VHD-50b66606-60b1-4634-b400-56f827b18740']
May  9 18:06:26 st20 SM: [6629] FAILED in util.pread3: (errno 1) stdout: '', stderr: 'device-mapper: remove ioctl on VG_XenStorage--b1e34b94--e98b--dd67--6d93--9b57772c93b4-VHD--50b66606--60b1--4634--b400--56f827b18740 failed: Device or resource busy

This could be a transient caused by parallel udev activity. Perhaps it's better to suspend/reload/resume the device rather than removing and recreating it?

lvdisplay output result should be refine.

Use lvdispaly to get the detail of the LV, the result of LV create time, host and open will how like below:
LV Creation host, time unknown, unknown

open uknown

Also, some of the key and value have a ":" between them but some of them don't have.
I suggest we should all have a ":"

I think we also need a status of activated or not.

Transient failure in host-list

[root@host1 vagrant]# xenvm host-list /dev/VG_XenStorage-35edb984-e8c5-1c57-e2dd-fdd6dd469cda
xenvm: internal error, uncaught exception:
Xenvm_interface.Internal_error("Failure("querying ToLVM state: queue temporarily unavailable")")

Local allocator quits when the requester does not finish the connection

When the requester connected to the socket, sent the resize request, closed the connection without received the response, local-allocator quit with following logs:

Calling accept on the socket
FreePool.extents 1 extents from ((pv0 (15 510)))
There are 1 items in the journal to replay
((volume
((volume f4ab6409-53c5-49d2-b665-382358a48e29)
(segments
(((start_extent 56) (extent_count 1)
(cls (Linear ((name pv0) (start_extent 15)))))))))
(device
((extents ((pv0 (15 1))))
(device
/dev/VG_XenStorage-7fc7a66e-7330-51b6-1221-aff50e05fb95/VHD-f4ab6409-53c5-49d2-b665-382358a48e29)
(targets
(((start 458752) (size 8192)
(kind
(Linear
((device
(Path /dev/disk/by-id/scsi-36001405b4b212f1d57fcd4324d9d51de))
(offset 143488))))))))))
Suspend local dm device
Resume local dm device

Bad error message: missing PV and LV columns

[root@st30 thin-lvhd-tools]# xenvm lvs /dev/VG_XenStorage-770cdfa8-ccbf-d209-46ed-72e8e65f926a --options=pv_name
xenvm: internal error, uncaught exception:
       Failure("nth")

[root@st20 ~]# lvs /dev/VG_XenStorage-a1d805e0-89b3-dea9-0ec3-dd928902c330 --noheadings --nosuffix --units=b --options=vg_name,vg_extent_size,lv_count,pv_count,pv_name
  Can't report LV and PV fields at the same time

Test these LVM commands

We need simple regression tests for these LVM commands:

Note: not all of these commands may be implemented yet: see #16

  • lvchange /dev/vg/lvname -p {r/rw}
  • lvchange --addtag | --deltag 'hidden'
  • lvchange --refresh
  • lvcreate -n [-l <size_in_percentage> | -L <size_in_mb>] vgname --addtag
  • lvdisplay <vg/lv> or </dev/vg/lv> or </dev/mapper/vg--lv> etc, output ignored
  • lvdisplay -c - comma separated, used to get size (7th field, in sectors)
  • lvremove -f --config devices{<devices?>}
  • lvrename vg/lvname newname
  • lvresize -L newsize /dev/vg/lvname
  • lvs --noheadings /dev/vg/lvname (output ignored)
  • lvs --noheadings --units b -o +lv_tags -- eg:
    test-lv-999 vg -wi-a---- 4194304B foo,bar,bzqa
  • pvcreate --metadatasize 10M /dev/path
  • pvremove device
  • pvresize device
  • pvs --noheadings --nosuffix --units b - use size (field 5) and free space (field 6)
  • pvs --noheadings -o vg_name <pv name?>
  • pvs
  • vgchange -a[n|y] vgname
  • vgcreate vgname /root/device
  • vgextend vgname /dev/path
  • vgremove vgname
  • vgs (output ignored)
  • vgs --noheadings --nosuffix --units b - use size (field 6) and free space (field 7)

local allocator can allocate forever

We need to make some attempt to prevent the allocator blocking indefinitely for requests it could never satisfy, for example if the VG is full.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.