Comments (23)
~10000 files. 1.4GB. An azure copy took less than 2 minutes. Basically azcopy accepts the command and prints nothing for about 30 minutes.
How can i check if low throughput is a problem? throughput = files/sec? I posted the log above
from azure-storage-azcopy.
Well i tested two scenarios. The local file empty and the blob empty. (definitely reproducible)
When syncing from the localfile to blob (with blob empty) it took 31 mins before the job even started. But syncing back down to the local file from the blob was lightning fast. see below.
10000 files on local file (source) ā empty on blob (destination) - 31mins to start job
10000 files on blob (source) - empty on local file (destination) - less than 1 minute to start.
seems like its taking a long time to queue the transfer to the blob when syncing,
from azure-storage-azcopy.
Compare to gsutil sync the azcopy sync performance are really very bad. Using macOS Mojave.
Azcopy being written in go in large part, expected higher performance than gsutil/boto written in Python.
Related to slow performance extra observations:
- missing clear intermediate output information to follow what the program is doing specially during diff analysis phase (try gsutil if want to understand what Iām talking about)
- missing compression option during coping readable files like CSV
- too much file transfer failures
- missing chunking of the large files
- missing multi-threaded option
from azure-storage-azcopy.
Hi @zezha-msft, thanks for the quick answer. In my process explorer, I saw a lot of threads running but the CPU usage was limited. Are there an option to control parallel execution or not, maybe the user interface is not showing enough what is currently done in parallel and/or chunked. For failures, I have often this error
ERROR:
-> github.com/Azure/azure-storage-azcopy/ste.newAzcopyHTTPClientFactory.func1.1, /go/src/github.com/Azure/azure-storage-azcopy/ste/mgr-JobPartMgr.go:95
HTTP request failed
The CPU is often low (3%), but obviously using a lot of some resources so few minutes after start execution VSCode and other applications switch to not responding mode, which is annoying.
from azure-storage-azcopy.
@VelizarVESSELINOV take into account sync is a new feature thats only "in preview" right now. The guys are still testing it and optimizing its performance.
This thread is focused on an issue with the sync command's initial comparison between source and destination been slow. Not the file transfer operation itself. Can i suggest you post issues with multi threading performance as a separate issue?
from azure-storage-azcopy.
Hi @VelizarVESSELINOV, thanks for your feedbacks.
To clarify though, if you only wanted to copy files, you should use the copy
command, not sync
which has severe overhead because we have to compare the contents of the source and destination to figure out exactly what to transfer or delete. On the other hand, copy
simply transfers the source to destination. With the help of the --overwrite=false
flag, copy can also avoid overwriting existing files at the destination.
from azure-storage-azcopy.
@zezha-msft and @prjain-msft .. jsut got back from holidays.. will check now... im excited
from azure-storage-azcopy.
Consider this issue closed!!
from azure-storage-azcopy.
From the copy log. As you see very slow response times per file checked..
RESPONSE Status: 201 Created
Content-Md5: [Y08Lyv2EjoV4Z3tXjHkPSA==]
Date: [Thu, 11 Oct 2018 05:07:05 GMT]
Etag: ["0x8D62F3763A44874"]
Last-Modified: [Thu, 11 Oct 2018 05:07:06 GMT]
Server: [Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0]
X-Ms-Request-Id: [39aa8fcc-c01e-00a7-3120-61b85c000000]
X-Ms-Request-Server-Encrypted: [true]
X-Ms-Version: [2018-03-28]
2018/10/11 05:07:06 ==> REQUEST/RESPONSE (Try=1/3.02s[SLOW >3s], OpTime=3.02s) -- RESPONSE SUCCESSFULLY RECEIVED
PUT https://azgcdsdevst1.blob.core.windows.net/gcdstest2/Customisation/PAT_CUSTOM/S15.PAT?si=gcdstest2-16657ecd551&sig=REDACTED&sr=c&sv=2018-03-28&timeout=901
Content-Length: [1271]
User-Agent: [AzCopy/v10.0.2-Preview Azure-Storage/0.1 (go1.10.3; Windows_NT)]
X-Ms-Blob-Cache-Control: []
X-Ms-Blob-Content-Disposition: []
X-Ms-Blob-Content-Encoding: []
X-Ms-Blob-Content-Language: []
X-Ms-Blob-Content-Type: [text/plain; charset=utf-8]
X-Ms-Blob-Type: [BlockBlob]
X-Ms-Client-Request-Id: [c26faa5d-043f-4759-589f-a2d4f2aeca9e]
X-Ms-Version: [2018-03-28]
from azure-storage-azcopy.
Hi @Kapanther, thanks for reaching!
To clarify, you said that it took a long time for the job to start. How did you observe this? Was it low throughput?
And also, how many files did you have?
from azure-storage-azcopy.
Hi @Kapanther, thanks for the additional info! Just to confirm, this is a reproducible problem, right?
@prjain-msft could you help to confirm this behavior?
from azure-storage-azcopy.
Hey Kapanther,
How many files you have in the destination ??
The way sync works is it compare source against the destination and destination against source. All the files present in the destination are also listed and compared with expected file in the source. If the fle is not present in the source, they are marked for deletion. If you have very large amount of files in the destination, then probably the files from destination are being listed and then compared against the source. Until, there are 10000 transfers queued you won't see the output. So for scenarios in sync where there are 100000 thousand of files in source and destination and only very few of them are out of sync, then all the files will be compared first before you will see any output.
from azure-storage-azcopy.
Hi Kapanther,
Can you please provide me the commands you tried and also mark the resources (as file / destination) ?
from azure-storage-azcopy.
Localfile to Azure Command
C:\software\azcopy\azcopy.exe sync "C:\GCDS_dev" "https://azgcdsdevst1.blob.core.windows.net/gcdstest2?sv=2018-03-28&si=gcdstest2-16657ECD551&sr=c&sig=!retracted!" --recursive
Source = c:\GCDS_dev
Dest = https://azgcdsdevst1.blob.core.windows.net/gcdstest2?sv=2018-03-28&si=gcdstest2-16657ECD551&sr=c&sig=!retracted!
Azure to Localfile Command
C:\software\azcopy\azcopy.exe sync "https://azgcdsdevst1.blob.core.windows.net/gcdstest2?sv=2018-03-28&si=gcdstest2-16657ECD551&sr=c&sig=!retracted!" "C:\GCDS_dev" --recursive
Source = https://azgcdsdevst1.blob.core.windows.net/gcdstest2?sv=2018-03-28&si=gcdstest2-16657ECD551&sr=c&sig=!retracted!
Dest = C:\GCDS_dev
from azure-storage-azcopy.
Hi Kapanther,
Can you please confirm on certain things ?
c:\GCDS_dev (This is directory or file ?)
https://azgcdsdevst1.blob.core.windows.net/gcdstest2 (Points to a blob or a virtual folder ?)
from azure-storage-azcopy.
C:/gcds_dev is a directory..
This is a blob.. no virtual folder. It goes straight into the root.
from azure-storage-azcopy.
Hi @VelizarVESSELINOV, thanks for the feedbacks! We are actively working on this tool to improve the performance.
To clarify, we do perform concurrent operations and chunk up large files. What were the failures that you saw?
from azure-storage-azcopy.
Hi @VelizarVESSELINOV, which command were you running exactly? Was it sync
or copy
?
If you don't mind, please open up a new issue and fill out the issue template so that we can have a bit more info. Thanks!
The concurrency is indeed configurable, please refer to this guide. Our ultimate goal is to adjust the concurrency based on the environment&network; we are still working on this.
from azure-storage-azcopy.
Hi @Kapanther
In you command, the source is a directory and destination is a container so sync in background first lists all the files inside the source and compares them against the expected files at the destination. Then it lists all the files inside the destination and compares them against the expected files locally. That is why sync doesn't start immediately. We are working on improving the user experiecne for sync.
from azure-storage-azcopy.
@prjain-msft
two questions here
-
if sync is only one way (source -> destination).. why does it have to check back again from destination against the source?
-
What i find particularly strange is that checking 10000 files locally to an empty blob using sync takes 31 minutes. But going the other way is almost instant, both checks require the blob to be inspected though... Is there a timeout parameter here for files that are not found or something?
I guess I will leave it for you guys to interrogate, when you get a chance. In the mean time I'll use Rclone and see if i get similar results.
from azure-storage-azcopy.
@zezha-msft the answer to your question is sync
Eventually, I will open a new ticket, for now, stopped using azcopy
.
The default AZCOPY_CONCURRENCY_VALUE
300 is probably too aggressive for macOS and make all the OS difficult to use. No time right now to test better default option of AZCOPY_CONCURRENCY_VALUE
for macOS. Managed to copy the files with az
with acceptable performance/user interface, hopefully, the team will manage to improve the performance and the usability with the @Kapanther help.
from azure-storage-azcopy.
Hi @Kapanther, @prjain-msft has improved the sync command's performance significantly. Could you please give it another try? Thank you!!
from azure-storage-azcopy.
@zezha-msft and @prjain-msft .. holy crap guys.. azcopy sync must now be using weapons grade plutonium.. because that is FAST! what was taking about 2 minutes before is taking less than a second.
Using that 10.0.0,5 preview...
from azure-storage-azcopy.
Related Issues (20)
- Support for overwrite by deleting the existing file when it fails with 409 error
- `azcopy list` version `10.25.0` with the default output `text` does not output resources HOT 5
- File Creation Error mkdir HOT 1
- Segmentation violation running Azcopy jobs list on mac os HOT 2
- azcopy list --output-type="text" not showing anything HOT 1
- Azcopy v10 support in AGC and national clods HOT 2
- Azcopy never completes 100% sync - Always Pending. Says Total number of Files Transfered until the end then I get Final Job Status Failed. HOT 4
- Azcopy never completes 100% sync - Always Pending. Says Total number of Files Transfered until the end then I get Final Job Status Failed. HOT 2
- Security issues in @azure-tools/azcopy-node 3.2.0 HOT 6
- AZCopy Sync with Exclude-path does not work with DFS as source
- 0.0% while uploading larger files HOT 3
- azcopy copy --list-of-files ignores failures when listed files do not exist
- High number of calls to GetBlobProperties when running azcopy copy HOT 3
- Permanent download link for specific versions
- Azcopy command taking lot of time while copying large number of files having small size. HOT 3
- list option + Azure files cannot work with MSI "azure files requires a SAS token for authentication" HOT 1
- Data missing without indications of it HOT 6
- build new azcopy binary with go 1.22.5 HOT 4
- Azcopy on Mac: too many open files HOT 1
- AzCopy not working on Azure Cloud Shell HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
š Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ššš
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ā¤ļø Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from azure-storage-azcopy.