Git Product home page Git Product logo

Comments (23)

Kapanther avatar Kapanther commented on July 29, 2024 1

~10000 files. 1.4GB. An azure copy took less than 2 minutes. Basically azcopy accepts the command and prints nothing for about 30 minutes.

How can i check if low throughput is a problem? throughput = files/sec? I posted the log above

from azure-storage-azcopy.

Kapanther avatar Kapanther commented on July 29, 2024 1

Well i tested two scenarios. The local file empty and the blob empty. (definitely reproducible)

When syncing from the localfile to blob (with blob empty) it took 31 mins before the job even started. But syncing back down to the local file from the blob was lightning fast. see below.

10000 files on local file (source) ā€“ empty on blob (destination) - 31mins to start job
10000 files on blob (source) - empty on local file (destination) - less than 1 minute to start.

seems like its taking a long time to queue the transfer to the blob when syncing,

from azure-storage-azcopy.

VelizarVESSELINOV avatar VelizarVESSELINOV commented on July 29, 2024 1

Compare to gsutil sync the azcopy sync performance are really very bad. Using macOS Mojave.
Azcopy being written in go in large part, expected higher performance than gsutil/boto written in Python.

Related to slow performance extra observations:

  • missing clear intermediate output information to follow what the program is doing specially during diff analysis phase (try gsutil if want to understand what Iā€™m talking about)
  • missing compression option during coping readable files like CSV
  • too much file transfer failures
  • missing chunking of the large files
  • missing multi-threaded option

from azure-storage-azcopy.

VelizarVESSELINOV avatar VelizarVESSELINOV commented on July 29, 2024 1

Hi @zezha-msft, thanks for the quick answer. In my process explorer, I saw a lot of threads running but the CPU usage was limited. Are there an option to control parallel execution or not, maybe the user interface is not showing enough what is currently done in parallel and/or chunked. For failures, I have often this error

   ERROR:
-> github.com/Azure/azure-storage-azcopy/ste.newAzcopyHTTPClientFactory.func1.1, /go/src/github.com/Azure/azure-storage-azcopy/ste/mgr-JobPartMgr.go:95
HTTP request failed

The CPU is often low (3%), but obviously using a lot of some resources so few minutes after start execution VSCode and other applications switch to not responding mode, which is annoying.

from azure-storage-azcopy.

Kapanther avatar Kapanther commented on July 29, 2024 1

@VelizarVESSELINOV take into account sync is a new feature thats only "in preview" right now. The guys are still testing it and optimizing its performance.

This thread is focused on an issue with the sync command's initial comparison between source and destination been slow. Not the file transfer operation itself. Can i suggest you post issues with multi threading performance as a separate issue?

from azure-storage-azcopy.

zezha-msft avatar zezha-msft commented on July 29, 2024 1

Hi @VelizarVESSELINOV, thanks for your feedbacks.

To clarify though, if you only wanted to copy files, you should use the copy command, not sync which has severe overhead because we have to compare the contents of the source and destination to figure out exactly what to transfer or delete. On the other hand, copy simply transfers the source to destination. With the help of the --overwrite=false flag, copy can also avoid overwriting existing files at the destination.

from azure-storage-azcopy.

Kapanther avatar Kapanther commented on July 29, 2024 1

@zezha-msft and @prjain-msft .. jsut got back from holidays.. will check now... im excited

from azure-storage-azcopy.

Kapanther avatar Kapanther commented on July 29, 2024 1

Consider this issue closed!!

from azure-storage-azcopy.

Kapanther avatar Kapanther commented on July 29, 2024

From the copy log. As you see very slow response times per file checked..

RESPONSE Status: 201 Created
Content-Md5: [Y08Lyv2EjoV4Z3tXjHkPSA==]
Date: [Thu, 11 Oct 2018 05:07:05 GMT]
Etag: ["0x8D62F3763A44874"]
Last-Modified: [Thu, 11 Oct 2018 05:07:06 GMT]
Server: [Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0]
X-Ms-Request-Id: [39aa8fcc-c01e-00a7-3120-61b85c000000]
X-Ms-Request-Server-Encrypted: [true]
X-Ms-Version: [2018-03-28]
2018/10/11 05:07:06 ==> REQUEST/RESPONSE (Try=1/3.02s[SLOW >3s], OpTime=3.02s) -- RESPONSE SUCCESSFULLY RECEIVED
PUT https://azgcdsdevst1.blob.core.windows.net/gcdstest2/Customisation/PAT_CUSTOM/S15.PAT?si=gcdstest2-16657ecd551&sig=REDACTED&sr=c&sv=2018-03-28&timeout=901
Content-Length: [1271]
User-Agent: [AzCopy/v10.0.2-Preview Azure-Storage/0.1 (go1.10.3; Windows_NT)]
X-Ms-Blob-Cache-Control: []
X-Ms-Blob-Content-Disposition: []
X-Ms-Blob-Content-Encoding: []
X-Ms-Blob-Content-Language: []
X-Ms-Blob-Content-Type: [text/plain; charset=utf-8]
X-Ms-Blob-Type: [BlockBlob]
X-Ms-Client-Request-Id: [c26faa5d-043f-4759-589f-a2d4f2aeca9e]
X-Ms-Version: [2018-03-28]

from azure-storage-azcopy.

zezha-msft avatar zezha-msft commented on July 29, 2024

Hi @Kapanther, thanks for reaching!

To clarify, you said that it took a long time for the job to start. How did you observe this? Was it low throughput?

And also, how many files did you have?

from azure-storage-azcopy.

zezha-msft avatar zezha-msft commented on July 29, 2024

Hi @Kapanther, thanks for the additional info! Just to confirm, this is a reproducible problem, right?

@prjain-msft could you help to confirm this behavior?

from azure-storage-azcopy.

prjain-msft avatar prjain-msft commented on July 29, 2024

Hey Kapanther,
How many files you have in the destination ??
The way sync works is it compare source against the destination and destination against source. All the files present in the destination are also listed and compared with expected file in the source. If the fle is not present in the source, they are marked for deletion. If you have very large amount of files in the destination, then probably the files from destination are being listed and then compared against the source. Until, there are 10000 transfers queued you won't see the output. So for scenarios in sync where there are 100000 thousand of files in source and destination and only very few of them are out of sync, then all the files will be compared first before you will see any output.

from azure-storage-azcopy.

prjain-msft avatar prjain-msft commented on July 29, 2024

Hi Kapanther,
Can you please provide me the commands you tried and also mark the resources (as file / destination) ?

from azure-storage-azcopy.

Kapanther avatar Kapanther commented on July 29, 2024

Localfile to Azure Command
C:\software\azcopy\azcopy.exe sync "C:\GCDS_dev" "https://azgcdsdevst1.blob.core.windows.net/gcdstest2?sv=2018-03-28&si=gcdstest2-16657ECD551&sr=c&sig=!retracted!" --recursive

Source = c:\GCDS_dev
Dest = https://azgcdsdevst1.blob.core.windows.net/gcdstest2?sv=2018-03-28&si=gcdstest2-16657ECD551&sr=c&sig=!retracted!

Azure to Localfile Command
C:\software\azcopy\azcopy.exe sync "https://azgcdsdevst1.blob.core.windows.net/gcdstest2?sv=2018-03-28&si=gcdstest2-16657ECD551&sr=c&sig=!retracted!" "C:\GCDS_dev" --recursive

Source = https://azgcdsdevst1.blob.core.windows.net/gcdstest2?sv=2018-03-28&si=gcdstest2-16657ECD551&sr=c&sig=!retracted!
Dest = C:\GCDS_dev

from azure-storage-azcopy.

prjain-msft avatar prjain-msft commented on July 29, 2024

Hi Kapanther,
Can you please confirm on certain things ?
c:\GCDS_dev (This is directory or file ?)
https://azgcdsdevst1.blob.core.windows.net/gcdstest2 (Points to a blob or a virtual folder ?)

from azure-storage-azcopy.

Kapanther avatar Kapanther commented on July 29, 2024

C:/gcds_dev is a directory..

This is a blob.. no virtual folder. It goes straight into the root.

from azure-storage-azcopy.

zezha-msft avatar zezha-msft commented on July 29, 2024

Hi @VelizarVESSELINOV, thanks for the feedbacks! We are actively working on this tool to improve the performance.

To clarify, we do perform concurrent operations and chunk up large files. What were the failures that you saw?

from azure-storage-azcopy.

zezha-msft avatar zezha-msft commented on July 29, 2024

Hi @VelizarVESSELINOV, which command were you running exactly? Was it sync or copy?

If you don't mind, please open up a new issue and fill out the issue template so that we can have a bit more info. Thanks!

The concurrency is indeed configurable, please refer to this guide. Our ultimate goal is to adjust the concurrency based on the environment&network; we are still working on this.

from azure-storage-azcopy.

prjain-msft avatar prjain-msft commented on July 29, 2024

Hi @Kapanther
In you command, the source is a directory and destination is a container so sync in background first lists all the files inside the source and compares them against the expected files at the destination. Then it lists all the files inside the destination and compares them against the expected files locally. That is why sync doesn't start immediately. We are working on improving the user experiecne for sync.

from azure-storage-azcopy.

Kapanther avatar Kapanther commented on July 29, 2024

@prjain-msft
two questions here

  1. if sync is only one way (source -> destination).. why does it have to check back again from destination against the source?

  2. What i find particularly strange is that checking 10000 files locally to an empty blob using sync takes 31 minutes. But going the other way is almost instant, both checks require the blob to be inspected though... Is there a timeout parameter here for files that are not found or something?

I guess I will leave it for you guys to interrogate, when you get a chance. In the mean time I'll use Rclone and see if i get similar results.

from azure-storage-azcopy.

VelizarVESSELINOV avatar VelizarVESSELINOV commented on July 29, 2024

@zezha-msft the answer to your question is sync

Eventually, I will open a new ticket, for now, stopped using azcopy.

The default AZCOPY_CONCURRENCY_VALUE 300 is probably too aggressive for macOS and make all the OS difficult to use. No time right now to test better default option of AZCOPY_CONCURRENCY_VALUE for macOS. Managed to copy the files with az with acceptable performance/user interface, hopefully, the team will manage to improve the performance and the usability with the @Kapanther help.

from azure-storage-azcopy.

zezha-msft avatar zezha-msft commented on July 29, 2024

Hi @Kapanther, @prjain-msft has improved the sync command's performance significantly. Could you please give it another try? Thank you!!

from azure-storage-azcopy.

Kapanther avatar Kapanther commented on July 29, 2024

@zezha-msft and @prjain-msft .. holy crap guys.. azcopy sync must now be using weapons grade plutonium.. because that is FAST! what was taking about 2 minutes before is taking less than a second.

Using that 10.0.0,5 preview...

from azure-storage-azcopy.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    šŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. šŸ“ŠšŸ“ˆšŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ā¤ļø Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.