Comments (12)
Here is an example:
./azcopy cp [src] [dst] --overwrite=false --recursive=true
from azure-storage-azcopy.
Very cool! Thank you! As far as I'm concerned we can close this. Not sure if you want to leave it open to track any other README updates.
from azure-storage-azcopy.
@colemickens sounds great!
I'll keep this open until #111 is merged in. A FAQ section was added thanks to your question. 😄
from azure-storage-azcopy.
@ppakawatk I just added a wiki page on the subject for you, here https://github.com/Azure/azure-storage-azcopy/wiki/Data-integrity-and-validation
from azure-storage-azcopy.
OK, here's what I think is happening. (I'm going partly from memory here, because I can't find all the documentation). As I recall, when a small blob is saved, it's usually saved in one single operation, with the PutBlob API call. The documentation for that call says that, if you don't provide an MD5 then it will compute a fresh one for you. So that's why it gets a fresh MD5 in your test.
But for big blobs, they have to be uploaded in several blocks. And for those, the service cannot automatically generate a new MD5 (because MD5s must be computed sequentially, but the blocks may not arrive sequentially). So for big blobs, the automatic update of the MD5, which you have observed, does not happen.
from azure-storage-azcopy.
Hi @colemickens, thanks for reaching out!
We appreciate your feedback, and will improve the README to include these information. In the meanwhile, you could look up help messages with: ./azcopy cp --help
.
And to answer your questions:
cp
is a simple transferring operation, it scans the source and attempts to transfer every single file/blob. The supported source/destination pairs are listed in the help message of the tool. On the other hand,sync
makes sure that whatever is present in source will be replicated to the destination, and also whatever is not at the source will be deleted from the destination. If your goal is to simply move some files, thencp
is definitely the right command.- It's a one way sync, the destination will ultimately only have whatever is on the source. We use last modified times to determine whether to transfer the same file present on both sides.
- Only local <-> blob is supported. I've improved the help message, and it will be merged in shortly.
And for the final question, you can always launch new jobs with the same parameters, and they are completely separate entities that do not impact each other. The "job" behavior is mandatory since we want to allow users to resume a failed job if necessary; but the user can choose not to resume and launch the same operation again.
from azure-storage-azcopy.
Thanks a bunch @zezha-msft, this is great information.
Is there any chance the sync behavior can be customized (I can open a separate issue?)
For example, some times I get nervous and don't trust timestamps, or it is the case that all of my content is content-addressable, therefor, the sync semantics I need are "Upload Everything that doesn't exist".
Today I've implemented this by:
- take a list of files in storage
- loop through files on disk, for each file not in storage, create a symlink in a
upload_staging
dir - use
az storage blob upload-batch
to upload the symlinked dir of missing files.
It would be really cool to remove my dependency on az
CLI and just be able to do it with a single command with this tool.
from azure-storage-azcopy.
Hi @colemickens, you can accomplish the "Upload Everything that doesn't exist" behavior with the copy
command, by including the following flag: --overwrite=false
.
from azure-storage-azcopy.
Hi. Can someone please explain how azcopy does the integrity check when uploading from local to Azure storage account?
As I understand, the azcopy will do the md5 check only when "downloading" (refer from the doc). For uploading, azcopy will "calculate" the md5 and put to the content-md5.
But it's not mentioning whether the azcopy validates the integrity of the file or not? (and how?)
Thank you.
from azure-storage-azcopy.
@ppakawatk I just added a wiki page on the subject for you, here https://github.com/Azure/azure-storage-azcopy/wiki/Data-integrity-and-validation
@JohnRusk, thank you so much for you prompt reply.
That helps me a lot to understand the process.
However, I have tried some experience and got a question.
As I understand, the "CONTENT-MD5" is calculated based on the original disk file and put into blobs.
I have tried editing data after uploaded to Azure strorage, and the "CONTENT-MD5" changed.
So after downloading using --check-md5, the process is still success.
Could you suggest if there is anything wrong with my understanding sir.
EDIT:
OK. I think I misunderstand the "edit" button. I guess that when I edited using the "edit" button, blob will be re-uploaded, so the MD5 was recalculated.
If that's the case, could you please suggest on how I can test that my program can do the integrity checking?
Right now, I can only assume that if the download (with --check-md5) finished without errors, that means integrity checking success.
from azure-storage-azcopy.
I have tried editing data after uploaded to Azure strorage, and the "CONTENT-MD5" changed.
Which data are you editing? The blob in Azure or the original source?
Also, how big is the blob?
And, what tool did you use to edit it?
from azure-storage-azcopy.
I have tried editing data after uploaded to Azure strorage, and the "CONTENT-MD5" changed.
Which data are you editing? The blob in Azure or the original source?
Also, how big is the blob?
And, what tool did you use to edit it?
Text file. I tried adding/ deleting 1 letter from the content of the file.
The size is around 100 KB.
I edited in Azure Portal.
from azure-storage-azcopy.
Related Issues (20)
- 10.25.0 Breaks List Command HOT 8
- Syslog error message on successful copy activity
- Azcopy on Mac: too many open files HOT 3
- azcopy login status shows "INFO: You are currently not logged in. Please login using 'azcopy login'" after login with AzCLI login-type HOT 2
- azcopy-node: Executable not found in $PATH: HOT 1
- Downloading from an Azure CDN endpoint to the local filesystem error
- Number of pending file transfers is greater than total number of files HOT 3
- Script URL is pulling the bleeding edge release HOT 1
- AzCopy v10 "panic: send on closed channel" on 403 Response HOT 1
- build azcopy with go 1.22.4 HOT 7
- Support for overwrite by deleting the existing file when it fails with 409 error
- `azcopy list` version `10.25.0` with the default output `text` does not output resources HOT 5
- File Creation Error mkdir HOT 1
- Segmentation violation running Azcopy jobs list on mac os HOT 2
- azcopy list --output-type="text" not showing anything HOT 1
- Azcopy v10 support in AGC and national clods HOT 2
- Azcopy never completes 100% sync - Always Pending. Says Total number of Files Transfered until the end then I get Final Job Status Failed. HOT 4
- Azcopy never completes 100% sync - Always Pending. Says Total number of Files Transfered until the end then I get Final Job Status Failed. HOT 2
- Security issues in @azure-tools/azcopy-node 3.2.0 HOT 6
- AZCopy Sync with Exclude-path does not work with DFS as source
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from azure-storage-azcopy.