Git Product home page Git Product logo

Comments (6)

ZNikke avatar ZNikke commented on July 23, 2024

The file logged as successfully transferred to tape by dCache is the last one, so that's the one that should be kept.

from endit.

ZNikke avatar ZNikke commented on July 23, 2024

The most common way for duplicates to be detected will likely be logging from tsmtapehints.pl, since it extracts the full file list it was trivial to do duplicate detection at the same time.

from endit.

ZNikke avatar ZNikke commented on July 23, 2024

For clarity, the procedure is, assuming old files are present written by endit with a static description:

  • Identify the duplicates, if running tsmtapehints.pl they are logged in the tsmtapehints.log
  • su - to the ENDIT runtime user
  • Deletion MUST be done using dsmc in -pick mode, otherwise it's very easy to accidentally delete all archived copies of a file!
  • For each file, do something similar to dsmc delete archive -asnode=MIGRTEST -errorlogname=/tmp/endit-dsmerror.log -dateformat=3 -timeformat=1 -pick /grid/pool/out/0000E8B2DB24C6624B8AA91D0BFCB39AF13F , replacing the -asnode argument to your TSM proxy node name (see your endit.conf) and the file name for which you want to delete duplicates.
    • The date/time format options outputs dates in ISO-style format
    • The -errorlogname option is needed if the ENDIT runtime user is not allowed to write to the default dsmc error log (usually owned by root).
    • If you forget the -pickoption all copies of the file will be deleted without any further prompt asking for confirmation
    • You should then get a text interface, allowing you to select files using a numeric identifier, ie 42 and press Enter
    • Do NOT select the newest copy version file since you want to keep the file that corresponds to the object dCache successfully logged as migrated.
    • Select all other copies of the file
    • Execute the deletion process by selecting OK, ie. input o and press Enter
  • Repeat as needed until all duplicates are deleted.

from endit.

ZNikke avatar ZNikke commented on July 23, 2024

One reason for duplicates can be dcache being shut down while tsmarchiver.pl is running dsmc to archive files. Files that are successfully archived while the ENDIT dcache plugin is not running will not be marked as successfully migrated to tape, and dCache will retry the operation.

from endit.

ZNikke avatar ZNikke commented on July 23, 2024

Given a more modern version of endit daemons all files are written with a description that is the time when that particular dsmc write session was started. This can be used to uniquely identify a single duplicate, and thus avoiding the tedious manual procedure previously described.

An automated procedure can look like this:

  • Identify the duplicates, if running tsmtapehints.pl they are logged in the tsmtapehints.log
  • su - to the ENDIT runtime user
  • For each filename with duplicates, do query archive to get the descriptions of all duplicates
    • Something along the lines of dsmc query archive -asnode=MIGRTEST -errorlogname=/tmp/endit-dsmerror.log /grid/pool/out/0000E8B2DB24C6624B8AA91D0BFCB39AF13F replacing the -asnode argument to your TSM proxy node name (see your endit.conf) and the file name.
  • Choose the duplicates you want to remove (usually you want to keep the latest copy)
  • delete archive the duplicates using the description to identify them
    • Something along the lines of dsmc delete archive -asnode=MIGRTEST -errorlogname=/tmp/endit-dsmerror.log -desc=ENDIT-2022-10-22T22:48:10+0200 /grid/pool/out/0000E8B2DB24C6624B8AA91D0BFCB39AF13F replacing the -asnode, -desc and filename arguments.
    • If you are unsure, do query archive first. If only a single file is listed, proceed with doing delete archive with the same arguments.

from endit.

ZNikke avatar ZNikke commented on July 23, 2024

The most common cause for duplicates today are pools being restarted while doing pool-to-pool migrations, ie moving tape data between instances. This causes the pool to re-transfer the files on disk, causing duplicates.

After such a migration we recommend running tsmtapehints to generate a fresh hint file, and check the output/log if any duplicates are found.

from endit.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.