taltman / scripts Goto Github PK
View Code? Open in Web Editor NEWMiscellaneous scripts that serve a stand-alone purpose that might be useful for others.
License: Other
Miscellaneous scripts that serve a stand-alone purpose that might be useful for others.
License: Other
Hi,
Any chance you could help me adapt your find-dupes.awk
script to work on a Linux system? Based on your notes, I was able to figure out the following changes:
ls -lTR
, use ls -l --full-time -R | grep -v ^d
md5_exec = "md5sum"
$9
to $8
: file = substr($0,match($0, $8)+length($8)+1,length($0))
$2
to $1
since we are using md5sum
: hash = $1
I couldn't figure out the rest, starting with the line sizes[$5]
, as I don't know awk
. Would appreciate it as I'm trying to find dupes using the md5sum
from the stackexchange thread that you referenced, and it's still running after 1 day on 1.3TB worth of data.
Thanks in advance.
shasum will always produce a 40 digit hash value. If a file/folder name which is shorter than 40 digits is encoded , it actually gets longer than shorter. Steps to reproduce:
mkdir -p abcdefghij/abcdefghij/abcdefghij
shorten-filenames.sh . encode 25
This results in ./abcdefghij/abcdefghij/b92ab2ae522e8b2a922b9c9b2c4fa7f677373489
which is actually longer than the original folder ./abcdefghij/abcdefghij/abcdefghij
I have been using the script today. But unfortunately, it tries to stat each word in a file with whitespaces as a different file:
mv: der Aufruf von stat für 'Recht' ist nicht möglich: No such file or directory
shorten-filenames.sh: it appears that directory . has already been shortened. Aborting.
mv: der Aufruf von stat für 'gehabt!' ist nicht möglich: No such file or directory
shorten-filenames.sh: it appears that directory . has already been shortened. Aborting.
mv: der Aufruf von stat für 'Der' ist nicht möglich: No such file or directory
shorten-filenames.sh: it appears that directory . has already been shortened. Aborting.
mv: der Aufruf von stat für 'neue' ist nicht möglich: No such file or directory
shorten-filenames.sh: it appears that directory . has already been shortened. Aborting.
mv: der Aufruf von stat für 'Skoda' ist nicht möglich: No such file or directory
shorten-filenames.sh: it appears that directory . has already been shortened. Aborting.
mv: der Aufruf von stat für 'Octavia' ist nicht möglich: No such file or directory
shorten-filenames.sh: it appears that directory . has already been shortened. Aborting.
mv: der Aufruf von stat für 'RS' ist nicht möglich: No such file or directory
shorten-filenames.sh: it appears that directory . has already been shortened. Aborting.
mv: der Aufruf von stat für '2013...' ist nicht möglich: No such file or directory
shorten-filenames.sh: it appears that directory . has already been shortened. Aborting.
mv: der Aufruf von stat für '_' ist nicht möglich: No such file or directory
shorten-filenames.sh: it appears that directory . has already been shortened. Aborting.
mv: der Aufruf von stat für 'rad-ab.com.mhtml' ist nicht möglich: No such file or directory
The file name is "Recht gehabt! Der neue Skoda Octavia RS 2013... _ rad-ab.com.mhtml"
I tried to fix it myself, but don't know where. I think there a some quotes missing somewhere in the script...
I just tried the script shorten-filenames.sh under openSUSE 13.1. Unfortunately, I'm stuck with the error find: unknown predicate `-E'
since the version of find
that comes with openSUSE does not know about a option "-E".
So, what does the option "-E" stand for?
Which operating system are you using to get this extra option?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.