gsi-hpc / ltsm Goto Github PK
View Code? Open in Web Editor NEWLTSM - Lightweight TSM API, Lustre TSM Copytool and TSM Console Client for Archiving Data
License: GNU General Public License v2.0
LTSM - Lightweight TSM API, Lustre TSM Copytool and TSM Console Client for Archiving Data
License: GNU General Public License v2.0
One of the GSI experiments asked for a plain
and low-level ssize_t write(...)
function call, such that data received as a data stream can be seamlessly
written to the TSM server (as a data stream). This is required, when the particle accelerator is running and experimental data is gathered as a data stream.
Modify this tool so that files could be stored in TSM as a UUID that is attached to the file via an extended attribute. There is issued with using the file path as the ID in TSM as well as the FID and the recommended approach seems to be to use an xattr UUID as the way to track files between HSM levels.
Robinhood already has support for this method.
I think we're not handling two aspects of dsmSendData()
API#141:
DSM_RC_WILL_ABORT
is returneddata_blk.numBytes
after the call. Because I think it's just a socket call, and it might not take all data at once.What do you think?
I've started to convert the tsm_archive_generic()
into a FSM, as described in API#68. Hopefully it'll make it easier to support batched transactions.
Currently we have a lustre subdirectory where users can archive data to TSM. Once that file is archived or especially released, if it's moved it can't be restored as TSM doesn't know about the new files path.
I'm looking for a way to handle stub files that are archived/released to TSM properly. I understand that due to the way TSM stores files that stub files that are moved/renamed get "lost". Is there a way to handle this making for a good user experience, possibly through robinhood or changes to the way we move files in/out of TSM?
Lustre MDT has the max_requests
hsm parameter, but it only applies to the appropriate MDT HSM coordinator. This makes it tricky to configure properly, because it's not connected to number of registered copytools. Ideally, it should be set to some value above total number of threads in all of the registered copytools.
Since there's a good chance for miss-configuration, I propose we limit number of request we take for hsm. How about setting the limit to 2 x thread_count
?
If two tsm objects with the same fs/hl/ll key and the same insertion date are processed by the remove_older_obj function in qtable.c both are kept in the hashtable.
This causes the fileapi and my sanity script added in 3abba03 to fail because i always expect exactly one return object.
A simple change of the comparison from 'new.date > old.date' to greater than or equal fixed this problem for me but leads to errors in the test_qtable testsuite.
At current, an array (called qarray) is employed to store and retrieve qryRespArchiveData
. The qarray datastructure is simple and trivial version of a C++ vector
/* Increase length (capacity) by factor of 2 when qarray is full. */
if ((*qarray)->N >= (*qarray)->capacity) {
(*qarray)->capacity *= 2;
(*qarray)->data = realloc((*qarray)->data,
sizeof(qryRespArchiveData) *
(*qarray)->capacity);
if ((*qarray)->data == NULL) {
CT_ERROR(errno, "realloc");
return DSM_RC_UNSUCCESSFUL;
}
}
where the capacity is simply doubled when the array has no space left. This design was chosen to straightforwardly enable a sort operation on the restore order field:
int cmp_restore_order(const void *a, const void *b)
{
const qryRespArchiveData *query_data_a = (qryRespArchiveData *)a;
const qryRespArchiveData *query_data_b = (qryRespArchiveData *)b;
if (query_data_a->restoreOrderExt.top > query_data_b->restoreOrderExt.top)
return(DS_GREATERTHAN);
else if (query_data_a->restoreOrderExt.top < query_data_b->restoreOrderExt.top)
return(DS_LESSTHAN);
else if (query_data_a->restoreOrderExt.hi_hi > query_data_b->restoreOrderExt.hi_hi)
return(DS_GREATERTHAN);
else if (query_data_a->restoreOrderExt.hi_hi < query_data_b->restoreOrderExt.hi_hi)
return(DS_LESSTHAN);
else if (query_data_a->restoreOrderExt.hi_lo > query_data_b->restoreOrderExt.hi_lo)
return(DS_GREATERTHAN);
else if (query_data_a->restoreOrderExt.hi_lo < query_data_b->restoreOrderExt.hi_lo)
return(DS_LESSTHAN);
else if (query_data_a->restoreOrderExt.lo_hi > query_data_b->restoreOrderExt.lo_hi)
return(DS_GREATERTHAN);
else if (query_data_a->restoreOrderExt.lo_hi < query_data_b->restoreOrderExt.lo_hi)
return(DS_LESSTHAN);
else if (query_data_a->restoreOrderExt.lo_lo > query_data_b->restoreOrderExt.lo_lo)
return(DS_GREATERTHAN);
else if (query_data_a->restoreOrderExt.lo_lo < query_data_b->restoreOrderExt.lo_lo)
return(DS_LESSTHAN);
else
return(DS_EQUAL);
}
The TSM server allows to archive data multiple times when "fs/hl/ll" is equal. In the query operation (when no date information is provided), we fill up qarray (when the latest flag is set) with the most current qryRespArchiveData
. Therefore currently also a hashtable is used, to lookup which version was already queried. It makes more sense to directly store the queries in chained hash tables (buckets of linked-lists). To enable sorting one can either convert the chained hash tables into
a fixed array, then sort it, or into a single linked-list and sort the linked-list (e.g. as described in LinkedListProblems). Note, another but not efficient method, is to query each time before archiving, however that is for sure not an elegant solution.
The consumer threads are processing requests such as archiving/retrieving/deleting
and the daemon gets a shutdown/c-ctrl signal (by init.d/system.d/user), one should close
the working queue so no new consumer threads can dequeue HSM action items.
In addition, one suppose to wait for a certain period of time such that the current consumer threads can
finished their HSM action items. If the time is passed and they are
still not done, one could to a tsm_disconnect(...), cleanup, etc...
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.