gsi-hpc / ltsm Goto Github PK

View Code? Open in Web Editor NEW

11.0 5.0 7.0 1.04 MB

LTSM - Lightweight TSM API, Lustre TSM Copytool and TSM Console Client for Archiving Data

License: GNU General Public License v2.0

Makefile 1.77% Shell 21.98% C 70.47% Emacs Lisp 0.16% M4 5.63%

tsm lustre tsm-server lustre-copytool

ltsm's People

Contributors

Stargazers

Watchers

Forkers

ironmann dariaphoebe joergbehrendt cilesiz phully munken inkdot7

ltsm's Issues

[tsmapi] Provide low-level ssize_t write(struct session_t session, const void buf, size_t count) call

One of the GSI experiments asked for a plain
and low-level ssize_t write(...) function call, such that data received as a data stream can be seamlessly
written to the TSM server (as a data stream). This is required, when the particle accelerator is running and experimental data is gathered as a data stream.

Store files in TSM base on a UUID

Modify this tool so that files could be stored in TSM as a UUID that is attached to the file via an extended attribute. There is issued with using the file path as the ID in TSM as well as the FID and the recommended approach seems to be to use an xattr UUID as the way to track files between HSM levels.

Robinhood already has support for this method.

[tsmapi] dsmSendData: tx and data_sent checking

I think we're not handling two aspects of dsmSendData() API#141:

stopping the tx if DSM_RC_WILL_ABORT is returned
checking whether the whole buffer is transmitted by looking at data_blk.numBytes after the call. Because I think it's just a socket call, and it might not take all data at once.

What do you think?

I've started to convert the tsm_archive_generic() into a FSM, as described in API#68. Hopefully it'll make it easier to support batched transactions.

Working with archive/released stub files

Currently we have a lustre subdirectory where users can archive data to TSM. Once that file is archived or especially released, if it's moved it can't be restored as TSM doesn't know about the new files path.

I'm looking for a way to handle stub files that are archived/released to TSM properly. I understand that due to the way TSM stores files that stub files that are moved/renamed get "lost". Is there a way to handle this making for a good user experience, possibly through robinhood or changes to the way we move files in/out of TSM?

[copytool] limit number of requests in work-queue

Lustre MDT has the max_requests hsm parameter, but it only applies to the appropriate MDT HSM coordinator. This makes it tricky to configure properly, because it's not connected to number of registered copytools. Ideally, it should be set to some value above total number of threads in all of the registered copytools.

Since there's a good chance for miss-configuration, I propose we limit number of request we take for hsm. How about setting the limit to 2 x thread_count?

[qtable] deduplication not working on objects with same date

If two tsm objects with the same fs/hl/ll key and the same insertion date are processed by the remove_older_obj function in qtable.c both are kept in the hashtable.
This causes the fileapi and my sanity script added in 3abba03 to fail because i always expect exactly one return object.

A simple change of the comparison from 'new.date > old.date' to greater than or equal fixed this problem for me but leads to errors in the test_qtable testsuite.

[tsmapi] Get rid of qarray and use chained hash tables

At current, an array (called qarray) is employed to store and retrieve qryRespArchiveData. The qarray datastructure is simple and trivial version of a C++ vector

/* Increase length (capacity) by factor of 2 when qarray is full. */
if ((*qarray)->N >= (*qarray)->capacity) {
	(*qarray)->capacity *= 2;
	(*qarray)->data = realloc((*qarray)->data,
				  sizeof(qryRespArchiveData) *
				  (*qarray)->capacity);
	if ((*qarray)->data == NULL) {
		CT_ERROR(errno, "realloc");
		return DSM_RC_UNSUCCESSFUL;
	}
}

where the capacity is simply doubled when the array has no space left. This design was chosen to straightforwardly enable a sort operation on the restore order field:

int cmp_restore_order(const void *a, const void *b)
{
	const qryRespArchiveData *query_data_a = (qryRespArchiveData *)a;
	const qryRespArchiveData *query_data_b = (qryRespArchiveData *)b;

	if (query_data_a->restoreOrderExt.top > query_data_b->restoreOrderExt.top)
		return(DS_GREATERTHAN);
	else if (query_data_a->restoreOrderExt.top < query_data_b->restoreOrderExt.top)
		return(DS_LESSTHAN);
	else if (query_data_a->restoreOrderExt.hi_hi > query_data_b->restoreOrderExt.hi_hi)
		return(DS_GREATERTHAN);
	else if (query_data_a->restoreOrderExt.hi_hi < query_data_b->restoreOrderExt.hi_hi)
		return(DS_LESSTHAN);
	else if (query_data_a->restoreOrderExt.hi_lo > query_data_b->restoreOrderExt.hi_lo)
		return(DS_GREATERTHAN);
	else if (query_data_a->restoreOrderExt.hi_lo < query_data_b->restoreOrderExt.hi_lo)
		return(DS_LESSTHAN);
	else if (query_data_a->restoreOrderExt.lo_hi > query_data_b->restoreOrderExt.lo_hi)
		return(DS_GREATERTHAN);
	else if (query_data_a->restoreOrderExt.lo_hi < query_data_b->restoreOrderExt.lo_hi)
		return(DS_LESSTHAN);
	else if (query_data_a->restoreOrderExt.lo_lo > query_data_b->restoreOrderExt.lo_lo)
		return(DS_GREATERTHAN);
	else if (query_data_a->restoreOrderExt.lo_lo < query_data_b->restoreOrderExt.lo_lo)
		return(DS_LESSTHAN);
	else
		return(DS_EQUAL);
}

The TSM server allows to archive data multiple times when "fs/hl/ll" is equal. In the query operation (when no date information is provided), we fill up qarray (when the latest flag is set) with the most current qryRespArchiveData. Therefore currently also a hashtable is used, to lookup which version was already queried. It makes more sense to directly store the queries in chained hash tables (buckets of linked-lists). To enable sorting one can either convert the chained hash tables into
a fixed array, then sort it, or into a single linked-list and sort the linked-list (e.g. as described in LinkedListProblems). Note, another but not efficient method, is to query each time before archiving, however that is for sure not an elegant solution.

[copytool,tsmapi] Save and restore xattr and Lustre stripe information

[copytool] Cleanup the consumer threads appropriately

The consumer threads are processing requests such as archiving/retrieving/deleting
and the daemon gets a shutdown/c-ctrl signal (by init.d/system.d/user), one should close
the working queue so no new consumer threads can dequeue HSM action items.
In addition, one suppose to wait for a certain period of time such that the current consumer threads can
finished their HSM action items. If the time is passed and they are
still not done, one could to a tsm_disconnect(...), cleanup, etc...

gsi-hpc / ltsm Goto Github PK

ltsm's People

Contributors

Stargazers

Watchers

Forkers

ltsm's Issues

[tsmapi] Provide low-level ssize_t write(struct session_t session, const void buf, size_t count) call

Store files in TSM base on a UUID

[tsmapi] dsmSendData: tx and data_sent checking

Working with archive/released stub files

[copytool] limit number of requests in work-queue

[qtable] deduplication not working on objects with same date

[tsmapi] Get rid of qarray and use chained hash tables

[copytool,tsmapi] Save and restore xattr and Lustre stripe information

[copytool] Cleanup the consumer threads appropriately

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent