Comments (6)
The file logged as successfully transferred to tape by dCache is the last one, so that's the one that should be kept.
from endit.
The most common way for duplicates to be detected will likely be logging from tsmtapehints.pl, since it extracts the full file list it was trivial to do duplicate detection at the same time.
from endit.
For clarity, the procedure is, assuming old files are present written by endit with a static description:
- Identify the duplicates, if running tsmtapehints.pl they are logged in the tsmtapehints.log
su -
to the ENDIT runtime user- Deletion MUST be done using
dsmc
in-pick
mode, otherwise it's very easy to accidentally delete all archived copies of a file! - For each file, do something similar to
dsmc delete archive -asnode=MIGRTEST -errorlogname=/tmp/endit-dsmerror.log -dateformat=3 -timeformat=1 -pick /grid/pool/out/0000E8B2DB24C6624B8AA91D0BFCB39AF13F
, replacing the-asnode
argument to your TSM proxy node name (see your endit.conf) and the file name for which you want to delete duplicates.- The date/time format options outputs dates in ISO-style format
- The -errorlogname option is needed if the ENDIT runtime user is not allowed to write to the default dsmc error log (usually owned by root).
- If you forget the
-pick
option all copies of the file will be deleted without any further prompt asking for confirmation - You should then get a text interface, allowing you to select files using a numeric identifier, ie 42 and press Enter
- Do NOT select the newest copy version file since you want to keep the file that corresponds to the object dCache successfully logged as migrated.
- Select all other copies of the file
- Execute the deletion process by selecting OK, ie. input o and press Enter
- Repeat as needed until all duplicates are deleted.
from endit.
One reason for duplicates can be dcache being shut down while tsmarchiver.pl is running dsmc to archive files. Files that are successfully archived while the ENDIT dcache plugin is not running will not be marked as successfully migrated to tape, and dCache will retry the operation.
from endit.
Given a more modern version of endit daemons all files are written with a description that is the time when that particular dsmc write session was started. This can be used to uniquely identify a single duplicate, and thus avoiding the tedious manual procedure previously described.
An automated procedure can look like this:
- Identify the duplicates, if running tsmtapehints.pl they are logged in the tsmtapehints.log
su -
to the ENDIT runtime user- For each filename with duplicates, do
query archive
to get the descriptions of all duplicates- Something along the lines of
dsmc query archive -asnode=MIGRTEST -errorlogname=/tmp/endit-dsmerror.log /grid/pool/out/0000E8B2DB24C6624B8AA91D0BFCB39AF13F
replacing the-asnode
argument to your TSM proxy node name (see yourendit.conf
) and the file name.
- Something along the lines of
- Choose the duplicates you want to remove (usually you want to keep the latest copy)
delete archive
the duplicates using the description to identify them- Something along the lines of
dsmc delete archive -asnode=MIGRTEST -errorlogname=/tmp/endit-dsmerror.log -desc=ENDIT-2022-10-22T22:48:10+0200 /grid/pool/out/0000E8B2DB24C6624B8AA91D0BFCB39AF13F
replacing the-asnode
,-desc
and filename arguments. - If you are unsure, do
query archive
first. If only a single file is listed, proceed with doingdelete archive
with the same arguments.
- Something along the lines of
from endit.
The most common cause for duplicates today are pools being restarted while doing pool-to-pool migrations, ie moving tape data between instances. This causes the pool to re-transfer the files on disk, causing duplicates.
After such a migration we recommend running tsmtapehints to generate a fresh hint file, and check the output/log if any duplicates are found.
from endit.
Related Issues (20)
- Centralised logging HOT 1
- Add configurable short/long descriptions HOT 1
- Revert tsmarchiver to old behaviour of day/month in description of archived files HOT 1
- Add possibility to use dsmc query archive -detail output for tapehints
- Investigate removing use of IPC::Run3
- Do chdir / on startup to avoid cwd being in deleted directory
- Packaging endit scripts HOT 1
- Implement backoff when retrying dsmc operations
- Make tsmdeleter volume-aware HOT 7
- Add force-flush/recall via signal handler HOT 1
- tsmarchiver: Be more aggressive when retrying
- Refactor archiver to spawn multiple single-drive dsmc processes instead of varying drive use of a single process
- Reload config automatically/dynamically
- Add cputime limit to dsmc processes
- Properly document Prometheus stats file dirs
- Prometheus counters for bytes stored and retrieved
- Handle Server disabled errors more gracefully. HOT 2
- Review and clean up installation instructions HOT 1
- tsmretriever: cleanup of in/ on startup HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from endit.