Git Product home page Git Product logo

Comments (12)

zezha-msft avatar zezha-msft commented on July 29, 2024 1

Here is an example:

./azcopy cp [src] [dst] --overwrite=false --recursive=true

from azure-storage-azcopy.

colemickens avatar colemickens commented on July 29, 2024 1

Very cool! Thank you! As far as I'm concerned we can close this. Not sure if you want to leave it open to track any other README updates.

from azure-storage-azcopy.

zezha-msft avatar zezha-msft commented on July 29, 2024 1

@colemickens sounds great!

I'll keep this open until #111 is merged in. A FAQ section was added thanks to your question. 😄

from azure-storage-azcopy.

JohnRusk avatar JohnRusk commented on July 29, 2024 1

@ppakawatk I just added a wiki page on the subject for you, here https://github.com/Azure/azure-storage-azcopy/wiki/Data-integrity-and-validation

from azure-storage-azcopy.

JohnRusk avatar JohnRusk commented on July 29, 2024 1

OK, here's what I think is happening. (I'm going partly from memory here, because I can't find all the documentation). As I recall, when a small blob is saved, it's usually saved in one single operation, with the PutBlob API call. The documentation for that call says that, if you don't provide an MD5 then it will compute a fresh one for you. So that's why it gets a fresh MD5 in your test.

But for big blobs, they have to be uploaded in several blocks. And for those, the service cannot automatically generate a new MD5 (because MD5s must be computed sequentially, but the blocks may not arrive sequentially). So for big blobs, the automatic update of the MD5, which you have observed, does not happen.

from azure-storage-azcopy.

zezha-msft avatar zezha-msft commented on July 29, 2024

Hi @colemickens, thanks for reaching out!

We appreciate your feedback, and will improve the README to include these information. In the meanwhile, you could look up help messages with: ./azcopy cp --help.

And to answer your questions:

  • cp is a simple transferring operation, it scans the source and attempts to transfer every single file/blob. The supported source/destination pairs are listed in the help message of the tool. On the other hand, sync makes sure that whatever is present in source will be replicated to the destination, and also whatever is not at the source will be deleted from the destination. If your goal is to simply move some files, then cp is definitely the right command.
  • It's a one way sync, the destination will ultimately only have whatever is on the source. We use last modified times to determine whether to transfer the same file present on both sides.
  • Only local <-> blob is supported. I've improved the help message, and it will be merged in shortly.

And for the final question, you can always launch new jobs with the same parameters, and they are completely separate entities that do not impact each other. The "job" behavior is mandatory since we want to allow users to resume a failed job if necessary; but the user can choose not to resume and launch the same operation again.

from azure-storage-azcopy.

colemickens avatar colemickens commented on July 29, 2024

Thanks a bunch @zezha-msft, this is great information.

Is there any chance the sync behavior can be customized (I can open a separate issue?)

For example, some times I get nervous and don't trust timestamps, or it is the case that all of my content is content-addressable, therefor, the sync semantics I need are "Upload Everything that doesn't exist".

Today I've implemented this by:

  • take a list of files in storage
  • loop through files on disk, for each file not in storage, create a symlink in a upload_staging dir
  • use az storage blob upload-batch to upload the symlinked dir of missing files.

It would be really cool to remove my dependency on az CLI and just be able to do it with a single command with this tool.

from azure-storage-azcopy.

zezha-msft avatar zezha-msft commented on July 29, 2024

Hi @colemickens, you can accomplish the "Upload Everything that doesn't exist" behavior with the copy command, by including the following flag: --overwrite=false.

from azure-storage-azcopy.

ppakawatk avatar ppakawatk commented on July 29, 2024

Hi. Can someone please explain how azcopy does the integrity check when uploading from local to Azure storage account?

As I understand, the azcopy will do the md5 check only when "downloading" (refer from the doc). For uploading, azcopy will "calculate" the md5 and put to the content-md5.

But it's not mentioning whether the azcopy validates the integrity of the file or not? (and how?)

Thank you.

from azure-storage-azcopy.

ppakawatk avatar ppakawatk commented on July 29, 2024

@ppakawatk I just added a wiki page on the subject for you, here https://github.com/Azure/azure-storage-azcopy/wiki/Data-integrity-and-validation

@JohnRusk, thank you so much for you prompt reply.
That helps me a lot to understand the process.

However, I have tried some experience and got a question.
As I understand, the "CONTENT-MD5" is calculated based on the original disk file and put into blobs.
I have tried editing data after uploaded to Azure strorage, and the "CONTENT-MD5" changed.

So after downloading using --check-md5, the process is still success.

Could you suggest if there is anything wrong with my understanding sir.

EDIT:
OK. I think I misunderstand the "edit" button. I guess that when I edited using the "edit" button, blob will be re-uploaded, so the MD5 was recalculated.

If that's the case, could you please suggest on how I can test that my program can do the integrity checking?
Right now, I can only assume that if the download (with --check-md5) finished without errors, that means integrity checking success.

from azure-storage-azcopy.

JohnRusk avatar JohnRusk commented on July 29, 2024

I have tried editing data after uploaded to Azure strorage, and the "CONTENT-MD5" changed.

Which data are you editing? The blob in Azure or the original source?

Also, how big is the blob?

And, what tool did you use to edit it?

from azure-storage-azcopy.

ppakawatk avatar ppakawatk commented on July 29, 2024

I have tried editing data after uploaded to Azure strorage, and the "CONTENT-MD5" changed.

Which data are you editing? The blob in Azure or the original source?

Also, how big is the blob?

And, what tool did you use to edit it?

Text file. I tried adding/ deleting 1 letter from the content of the file.

The size is around 100 KB.

I edited in Azure Portal.

from azure-storage-azcopy.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.