Comments (58)
I cannot replicate the slowness you see as 7.4.2 takes less than 5 seconds for me and is in fact faster than WinPS but I do notice the large memory usage. My guess is that it is not only are now storing the 2 byte arrays (raw file data and the decoded base64 string) but also the base64 string is all allocated on the heap as part of the operation. Potentially WinPS/.NET Framework is more aggressive in reusing the array values but as per the above the CLR could be allocating the memory and just never freeing it so it can more efficient reuse the memory in the future.
Putting aside the above comment you can more efficiently base64 encode bytes by streaming it rather than reading all the input bytes into memory.
Function ConvertTo-Base64String {
[OutputType([string])]
[CmdletBinding()]
param (
[Parameter(Mandatory)]
[string]$Path
)
$fs = $cryptoStream = $sr = $null
try {
$fs = [System.IO.File]::OpenRead($Path)
$cryptoStream = [System.Security.CryptoGraphy.CryptoStream]::new(
$fs,
[System.Security.Cryptography.ToBase64Transform]::new(),
[System.Security.Cryptography.CryptoStreamMode]::Read)
$sr = [System.IO.StreamReader]::new($cryptoStream, [System.Text.Encoding]::ASCII)
$sr.ReadToEnd()
}
finally {
${sr}?.Dispose()
${cryptoStream}?.Dispose()
${fs}?.Dispose()
}
}
This will stream the raw bytes from the source file stream and produce the final output string. If you are storing this string into a file then you could optimize it further by streaming the output base64 CryptoStream to a file avoiding having to store all the data in PowerShell.
If you do need to store the base64 string as an object in PowerShell keep in mind this means you not only have to store the inflated size that base64 uses (($length / 3) * 4
) but each char of the string takes two bytes so at a minimum you are looking at around 600MB for the dotnet installer. Using the above function I see the memory usage sits around 1.2GB which is less than WinPS (about 1.4GB). While this is still more than the ~600MB there could be other factors in place here like the CLR allocating more memory than strictly needs in anticipation of future needs or some other reason.
from powershell.
I'm assuming PowerShell 7.4.2 is using .NET 8.0 under the hood. Does it ship with its own .NET run-time or does it rely on the run-time installed on the system?
With PowerShell 7.4.2 installed from Microsoft Store, Get-ChildItem $PSHOME
shows files like coreclr.dll
; I think that means PowerShell has its own copy of the .NET Runtime.
[System.Runtime.InteropServices.RuntimeInformation]::FrameworkDescription
is ".NET 8.0.4".
from powershell.
Hi,
Given all of your file and data operations are performed directly with the .NET API I suggest this is not a PowerShell problem, it looks like it is a .NET issue.
Is your issue that PowerShell is not running the garbage collector? Does running
[System.GC]::Collect()
make any difference?
Also be aware that managed memory runtimes often take memory from the OS in order to allocate objects but never return it. So the objects may have been released/disposed/freed so the actual CLR heap is free but the memory has not been returned to the OS. This is completely normal as managed runtimes (CLR, JVM etc) assume that if they needed it once they are likely to need it again so no point giving it back to the OS.
You would need to look for tools to examine the state of the CLR heap within the process rather than external process monitoring tools.
When you have an operation that you know is memory intensive then two other options are available,
- run the operation in a separate process, in PowerShell all you need to do is run it in a Job and PowerShell will do the rest.
- stream the operation and read and write chunks of data rather than holding it all in memory.
I hope that helps.
from powershell.
Thank you both for your responses and suggestions for optimization. I did reply to the Issue cross-posted in the dotnet/runtime project which you can see here dotnet/runtime#101061 (comment)
I ran [GC]::Collect()
after running the test script in both WinPS and PS7 and noticed that in PS7 almost none of the used memory was released back to the operating system, which seemed suspicious as in WinPS almost all memory is released after the test.
UPDATE: Confirmed below that the FromBase64String delay and excessive memory usage only occurs through PowerShell 7, but not through a .NET console app. This definitely appears to be a .NET run-time issue and not a PowerShell issue, however I wanted to post it here as well as over there for visibility in case others encounter this oddity while building PowerShell modules (which is how I stumbled upon it).
I am aware that storing a 200 MB file in memory as base64 text is wildly inefficient, and is not the intention of how my PowerPass module should be used (which is how I discovered this in the first place), but since I stumbled upon this unexpected behavior I thought it prudent to at least report it.
But again, I appreciate all of the comments and feedback here, especially the suggestions for optimization techniques.
from powershell.
In src/System.Management.Automation/engine/runtime/CompiledScriptBlock.cs
, there is class SuspiciousContentChecker
, which attempts to detect "suspicious strings" such as FromBase64String
. That then apparently causes PowerShell to log some ETW event. I wonder if Windows Defender monitors those events and then spends time investigating the process.
from powershell.
If I am understanding this nonsense with AMSI correctly, then a solution would be to perform the Base64 translation in a compiled C# cmdlet. Given we are talking PowerShell it should be implemented using a pipeline with System.Security.Cryptography.ToBase64Transform
rather than dealing with massive strings. In this case the biggest string would be 64 characters
from powershell.
Sure, AMSI is part of Windows API and doesn't exist in Linux.
from powershell.
My workaround is to use two cmdlets to do the Base64 conversions hence bypass the PSAMSIInvocationLogging
nonsense.
Original timings
218.4235 ms
64152.6876 ms
now
222.2945 ms
301.4766 ms
Code
#!/usr/bin/env pwsh
$env:__PSDumpAMSILogContent='1'
trap
{
throw $PSItem
}
$ErrorActionPreference = 'Stop'
$code = @"
using System;
using System.Management.Automation;
[Cmdlet("ConvertFrom", "Base64String")]
public class ConvertFromBase64String : PSCmdlet
{
[Parameter(Mandatory=true,ValueFromPipeline=true)]
public String InputString;
protected override void ProcessRecord()
{
WriteObject(System.Convert.FromBase64String(InputString));
}
}
[Cmdlet("ConvertTo", "Base64String")]
public class ConvertToBase64String : PSCmdlet
{
[Parameter(Mandatory=true,ValueFromPipeline=true)]
public byte[] InputObject;
protected override void ProcessRecord()
{
WriteObject(System.Convert.ToBase64String(InputObject));
}
}
"@
Add-Type $code -PassThru | ForEach-Object { Import-Module $_.Assembly }
Get-Command -Noun 'Base64String'
$bytes = new-object byte[] -ArgumentList @(,200554320)
$bytes.Length
$random = new-object Random
$random.NextBytes($bytes)
$now = Get-Date
$base64 = @(,$bytes) | ConvertTo-Base64String
Write-Host " $(((Get-Date) - $now).TotalMilliseconds) ms"
$base64.Length
$bytes = $null
$now = Get-Date
$bytes = $base64 | ConvertFrom-Base64String
Write-Host " $(((Get-Date) - $now).TotalMilliseconds) ms"
$bytes.Length
from powershell.
My workaround is to use two cmdlets to do the Base64 conversions hence bypass the
PSAMSIInvocationLogging
nonsense.
Thank you, this is very helpful. I was fiddling with using C# and Add-Type, but I noticed that making the call directly in C# via a static function still invokes the AMSI context. I will incorporate this into my PowerPass module to avoid the performance issue and excessive memory usage.
from powershell.
@chopinrlz, it isn't calls from C# that trigger an AMSI scan, it is .NET method calls from PowerShell code.
@rhubarb-geek-nz's workaround based on using C# code to implement cmdlets avoids this by letting PowerShell itself mediate the method calls.
Two asides:
-
$env:__PSDumpAMSILogContent='1'
isn't effective in-session; the env. var. must be set before calling PowerShell. -
$bytes = new-object byte[] -ArgumentList @(,200554320)
can be simplified to
$bytes = [byte[]]::new(200554320)
, which also avoids an (invisible)[psobject]
wrapper that would cause an (unrelated) problem with a call such asConvertTo-Base64String -InputObject $bytes
- see #21496
from powershell.
The script was predictable
No, it isn't predictable. My previous example stands. If the calling process doesn't have environment variable __PSDumpAMSILogContent
already set before invocation, a call to [byte[]]::new(0)
will not dump the diagnostic ASMI information, but it will do in the former case.
there is always another reason, case, exception or scenario where it breaks.
Again I empathize.
But simply venting your frustration isn't the way forward.
In the case at hand I've (indirectly) pointed to the (ultimate) root cause of the underlying problem - #5579.
I suggest channeling your frustration into constructive feedback - while being cognizant that such feedback may or may no be heard.
from powershell.
You're right - via the CLI (as implicitly used via a shebang-based executable shell script), the in-process setting is honored, if:
- the CLI call uses either (possibly implied)
-File
or-Command
for execute-and-exit functionality. - and
$env:__PSDumpAMSILogContent = 1
is set before any in-session .NET method calls occur from PowerShell code.
A simpler demonstration: Start a pristine POSIX-compatible shell and run the following:
export -n __PSDumpAMSILogContent # ensure that the env. var. isn't defined.
# AMSI log output via env. var. defined BEFORE
__PSDumpAMSILogContent=1 pwsh -noprofile -c '$null = [byte[]]::new(2048)'
# !! Produces AMSI output too, because the environment variable - despite being set in-session - is
# !! set *before the first method call*.
pwsh -noprofile -c '$env:__PSDumpAMSILogContent = 1; $null = [byte[]]::new(2048)'
Note:
-
From inside PowerShell, an executable shell script with extension
.ps1
is still executed in-process - and using a filename extension with an executable shell script is generally ill-advised. -
The bigger picture here is: To make PowerShell configuration environment variables work predictably, set them before invoking PowerShell, irrespective of the invocation mechanism.
from powershell.
because it has to fill the entire 64KB first before it passes onto the pipeline
Yes, it's an imperfect emulation of the native Unix pipeline, but with file input (where there's no "dribbling"), it works well.
That said, it's rare for Unix-heritage utilities to accept input via stdin (the pipeline) only and not also via file-path operands; thus, with a file as the data source, passing the file's path as an argument to an external program is the simpler and better solution (such as in the openssl
case, using the - syntactically unusual - -in
parameter).
It would certainly be better if Get-Content always wrote AsByteStream as a byte array but I think it is too late to change that.
Hopefully not: Let's see what becomes of the feature request you've since created:
from powershell.
In PowerShell 7.4.2, the time to complete is 82 seconds and memory used is 3.4 GB.
Yes, the same memory usage, but time to complete is about 1.2 seconds
from powershell.
I retested the following updated script on my desktop PC. A Ryzen 5800X with 128 GB of RAM and PCIe Gen4 NVMe storage. The test ran much faster as expected, but the memory usage still remains high.
Even after invoking [GC]::Collect()
around 3.2 GB of RAM still remains utilized by the pwsh
process.
Reading the 222 MB file into memory takes 0.06 seconds and converting it to base64 takes 0.22 seconds and uses 845 MB of RAM across both operations as expected. The last operation [System.Convert]::FromBase64String
uses 2.6 GB of RAM alone and takes 17 seconds.
I'll cross-post this in the dotnet/runtime issue. Thank you all for the feedback.
Updated test script:
$name = "random.bin"
$start = Get-Date
Write-Host "Creating Path to $name test file: " -NoNewline
$now = Get-Date
$file = Join-Path -Path $PSScriptRoot -ChildPath $name
Write-Host " $(((Get-Date) - $now).TotalMilliseconds) ms"
Write-Host "Reading all file bytes into memory: " -NoNewline
$now = Get-Date
$bytes = [System.IO.File]::ReadAllBytes( $file )
Write-Host " $(((Get-Date) - $now).TotalMilliseconds) ms"
Write-Host "Converting file bytes to base64 string: " -NoNewline
$now = Get-Date
$base64 = [System.Convert]::ToBase64String( $bytes )
Write-Host " $(((Get-Date) - $now).TotalMilliseconds) ms"
Write-Host "Converting base64 string back to file bytes: " -NoNewline
$now = Get-Date
$bytes = [System.Convert]::FromBase64String( $base64 )
Write-Host " $(((Get-Date) - $now).TotalMilliseconds) ms"
Write-Host "Test complete"
Write-Host "Total duration: $(((Get-Date) - $start).TotalMilliseconds) ms"
from powershell.
I just tested this same implementation using a C# console application for the dotnet/runtime team and the issues does NOT occur when running in a console application against .NET 8 on the latest SDK on Windows 11 Professional. My test results and C# code are here: dotnet/runtime#101061 (comment)
It seems that this is actually a memory leak in the PowerShell runtime for some reason. The dotnet/runtime crew was asking if the [System.Convert]::FromBase64String
function was being replaced with something else by PowerShell, but since I haven't tweaked my PowerShell installation, I can't imagine what could be doing that. Plus, this issue happens on multiple PCs, my desktop and my laptop, and only within PowerShell 7.4.2.
I'm assuming PowerShell 7.4.2 is using .NET 8.0 under the hood. Does it ship with its own .NET run-time or does it rely on the run-time installed on the system?
from powershell.
For reference, the random.bin
test file I am using is exactly 233,420,544 bytes in length. I generated it with this script:
param(
[int]
$Size
)
$blockSize = 256
$rand = [System.Random]::new()
$total = 0
[byte[]]$data = [System.Array]::CreateInstance( [byte[]], $blockSize )
$path = Join-Path -Path $PSScriptRoot -ChildPath "random.bin"
if( Test-Path $path ) {
Remove-Item -Path $path -Force
}
$file = [System.IO.File]::OpenWrite( $path )
while( $total -lt $Size ) {
$rand.NextBytes( $data )
$file.Write( $data, 0, $data.Length )
$total += $blockSize
}
$file.Flush()
$file.Close()
from powershell.
A couple more data points regarding the unexpected slowdown:
-
I see the slowdown only on Windows (neither on macOS nor on Linux), both in in 7.4.2 (.NET 8.0.4) and v7.5.0-preview.2 (.NET 9.0.0-preview.1.24080.9).
-
On Windows, I see the slowdown in 7.3.10 (.NET 7.0.14) too, albeit in less severe form (about twice as fast as 7.4.2 / v7.5.0-preview.2, but still way too slow); WinPS is fine.
from powershell.
I would like to remind that earlier we observed slow pwsh operations with files due to antivirus.
from powershell.
I'm assuming PowerShell 7.4.2 is using .NET 8.0 under the hood. Does it ship with its own .NET run-time or does it rely on the run-time installed on the system?
With PowerShell 7.4.2 installed from Microsoft Store,
Get-ChildItem $PSHOME
shows files likecoreclr.dll
; I think that means PowerShell has its own copy of the .NET Runtime.
[System.Runtime.InteropServices.RuntimeInformation]::FrameworkDescription
is ".NET 8.0.4".
I am also seeing .NET 8.0.4 for the FrameworkDescription
on my PowerShell 7 install which I setup from the MSI downloaded from Github. Assembly file versions are 8.0.424.16909.
from powershell.
I would like to remind that earlier we observed slow pwsh operations with files due to antivirus.
The slow operation is the call to [System.Convert]::FromBase64String
which occurs after the file is loaded from disk into memory. On my desktop, this operation takes 17 seconds. The file load operation from disk takes 0.06 seconds.
Also, this only happens when doing this in PowerShell 7.4.2. Running this same test in a C# console application takes under 1 second.
from powershell.
In
src/System.Management.Automation/engine/runtime/CompiledScriptBlock.cs
, there isclass SuspiciousContentChecker
, which attempts to detect "suspicious strings" such asFromBase64String
. That then apparently causes PowerShell to log some ETW event. I wonder if Windows Defender monitors those events and then spends time investigating the process.
I added a Program Setting for pwsh.exe
into Exploit Protection under App & Browser Control and disabled all protections. Rerunning the test resulted in the same duration. 17 seconds and 3.4 GB of RAM usage. Is there another place I can check in Windows Security to prevent Defender from watching pwsh.exe
?
from powershell.
The excess memory consumption of the pwsh
process suggests that there is extra code running within the process; rather than antivirus software examining it from the outside (from another process or from a kernel-mode driver).
During the slow FromBase64String operation, is the pwsh
process consuming a lot of processor time (one thread's worth)?
Can you attach to an unmanaged-code debugger to the process during FromBase64String and get a stack trace of the thread with the most CPU time? (!runaway
, k
, Thread Syntax)
from powershell.
Here is a stack trace of the thread with the most CPU time. I used WinDbg and broke process execution in the middle of the FromBase64String operation. You can see some fun stuff at the top.
0:009> ~21 k
# Child-SP RetAddr Call Site
00 0000007b`a638e6d8 00007ffc`5b790a10 MPCLIENT!MpUpdateServicePingRpc+0x7b676
01 0000007b`a638e6e0 00007ffc`5b7cfde4 MPCLIENT!MpTelemetryUpdateUserConsent+0x190
02 0000007b`a638e720 00007ffc`5b7c5db0 MPCLIENT!MpConveyUserChoiceForSampleList+0x704
03 0000007b`a638e800 00007ffc`68253f92 MPCLIENT!MpAmsiNotify+0x140
04 0000007b`a638e8d0 00007ffc`682eb5fd MpOav!DllRegisterServer+0x1142
05 0000007b`a638e930 00007ffc`682e81cb amsi!CAmsiAntimalware::Notify+0xcd
06 0000007b`a638e9c0 00007ffb`95ec173b amsi!AmsiNotifyOperation+0xab
07 0000007b`a638ea10 00007ffb`f1b36cfb 0x00007ffb`95ec173b
08 0000007b`a638eae0 00007ffb`f1acfc4b System_Management_Automation!System.Management.Automation.AmsiUtils.WinReportContent+0xeb
09 0000007b`a638eb60 00007ffb`95964a7c System_Management_Automation!System.Management.Automation.MemberInvocationLoggingOps.LogMemberInvocation+0x27b
0a 0000007b`a638ec90 00007ffb`fa945cf6 0x00007ffb`95964a7c
0b 0000007b`a638ecf0 00007ffb`f1f61b9f System_Linq_Expressions!System.Dynamic.UpdateDelegates.UpdateAndExecute2<System.Type,object,object>+0x1f6 [/_/src/libraries/System.Linq.Expressions/src/System/Dynamic/UpdateDelegates.Generated.cs @ 268]
0c 0000007b`a638ed80 00007ffb`f1bac64e System_Management_Automation!System.Management.Automation.Interpreter.DynamicInstruction<System.Type,object,object>.Run+0xff
0d 0000007b`a638ee10 00007ffb`f1bac64e System_Management_Automation!System.Management.Automation.Interpreter.EnterTryCatchFinallyInstruction.Run+0x7e
0e 0000007b`a638ee90 00007ffb`f1bb20d3 System_Management_Automation!System.Management.Automation.Interpreter.EnterTryCatchFinallyInstruction.Run+0x7e
0f 0000007b`a638ef10 00007ffb`f1fa3e06 System_Management_Automation!System.Management.Automation.Interpreter.Interpreter.Run+0x33
10 0000007b`a638ef60 00007ffb`f1ad841d System_Management_Automation!System.Management.Automation.Interpreter.LightLambda.RunVoid1<System.Management.Automation.Language.FunctionContext>+0xc6
11 0000007b`a638efe0 00007ffb`f1ad7e0d System_Management_Automation!System.Management.Automation.DlrScriptCommandProcessor.RunClause+0x28d
12 0000007b`a638f070 00007ffb`f19fff15 System_Management_Automation!System.Management.Automation.DlrScriptCommandProcessor.Complete+0x11d
13 0000007b`a638f0e0 00007ffb`f1cbd0ed System_Management_Automation!System.Management.Automation.CommandProcessorBase.DoComplete+0x85
14 0000007b`a638f130 00007ffb`f1cbcdc9 System_Management_Automation!System.Management.Automation.Internal.PipelineProcessor.DoCompleteCore+0x9d
15 0000007b`a638f1b0 00007ffb`f1ac7eab System_Management_Automation!System.Management.Automation.Internal.PipelineProcessor.SynchronousExecuteEnumerate+0xc9
16 0000007b`a638f230 00007ffb`9525b18e System_Management_Automation!System.Management.Automation.PipelineOps.InvokePipeline+0x33b
17 0000007b`a638f2d0 00007ffb`f1bac64e System_Management_Automation!System.Management.Automation.Interpreter.ActionCallInstruction<object,bool,System.Management.Automation.CommandParameterInternal[][],System.Management.Automation.Language.CommandBaseAst[],System.Management.Automation.CommandRedirection[][],System.Management.Automation.Language.FunctionContext>.Run+0x21e
18 0000007b`a638f380 00007ffb`f1bac64e System_Management_Automation!System.Management.Automation.Interpreter.EnterTryCatchFinallyInstruction.Run+0x7e
19 0000007b`a638f400 00007ffb`f1bb20d3 System_Management_Automation!System.Management.Automation.Interpreter.EnterTryCatchFinallyInstruction.Run+0x7e
1a 0000007b`a638f480 00007ffb`f1fa3e06 System_Management_Automation!System.Management.Automation.Interpreter.Interpreter.Run+0x33
1b 0000007b`a638f4d0 00007ffb`f1ad841d System_Management_Automation!System.Management.Automation.Interpreter.LightLambda.RunVoid1<System.Management.Automation.Language.FunctionContext>+0xc6
1c 0000007b`a638f550 00007ffb`f1ad7e0d System_Management_Automation!System.Management.Automation.DlrScriptCommandProcessor.RunClause+0x28d
1d 0000007b`a638f5e0 00007ffb`f19fff15 System_Management_Automation!System.Management.Automation.DlrScriptCommandProcessor.Complete+0x11d
1e 0000007b`a638f650 00007ffb`f1cbd0ed System_Management_Automation!System.Management.Automation.CommandProcessorBase.DoComplete+0x85
1f 0000007b`a638f6a0 00007ffb`f1cbcdc9 System_Management_Automation!System.Management.Automation.Internal.PipelineProcessor.DoCompleteCore+0x9d
20 0000007b`a638f720 00007ffb`f1bd1117 System_Management_Automation!System.Management.Automation.Internal.PipelineProcessor.SynchronousExecuteEnumerate+0xc9
21 0000007b`a638f7a0 00007ffb`f1bd1923 System_Management_Automation!System.Management.Automation.Runspaces.LocalPipeline.InvokeHelper+0x507
22 0000007b`a638f850 00007ffb`f1bd2a7f System_Management_Automation!System.Management.Automation.Runspaces.LocalPipeline.InvokeThreadProc+0x113
23 0000007b`a638f8b0 00007ffb`f2c763cd System_Management_Automation!System.Management.Automation.Runspaces.PipelineThread.WorkerProc+0x2f
24 0000007b`a638f8e0 00007ffb`f4d6b8d3 System_Private_CoreLib!System.Threading.ExecutionContext.RunInternal+0x7d [/_/src/libraries/System.Private.CoreLib/src/System/Threading/ExecutionContext.cs @ 179]
25 0000007b`a638f950 00007ffb`f4c3ebac coreclr!CallDescrWorkerInternal+0x83 [D:\a\_work\1\s\src\coreclr\vm\amd64\CallDescrWorkerAMD64.asm @ 100]
26 0000007b`a638f990 00007ffb`f4d57b93 coreclr!DispatchCallSimple+0x60 [D:\a\_work\1\s\src\coreclr\vm\callhelpers.cpp @ 221]
27 0000007b`a638fa20 00007ffb`f4cc4abd coreclr!ThreadNative::KickOffThread_Worker+0x63 [D:\a\_work\1\s\src\coreclr\vm\comsynchronizable.cpp @ 158]
28 (Inline Function) --------`-------- coreclr!ManagedThreadBase_DispatchInner+0xd [D:\a\_work\1\s\src\coreclr\vm\threads.cpp @ 7222]
29 0000007b`a638fa80 00007ffb`f4cc49d3 coreclr!ManagedThreadBase_DispatchMiddle+0x85 [D:\a\_work\1\s\src\coreclr\vm\threads.cpp @ 7266]
2a 0000007b`a638fb60 00007ffb`f4cc4b6e coreclr!ManagedThreadBase_DispatchOuter+0xab [D:\a\_work\1\s\src\coreclr\vm\threads.cpp @ 7425]
2b (Inline Function) --------`-------- coreclr!ManagedThreadBase_FullTransition+0x28 [D:\a\_work\1\s\src\coreclr\vm\threads.cpp @ 7470]
2c (Inline Function) --------`-------- coreclr!ManagedThreadBase::KickOff+0x28 [D:\a\_work\1\s\src\coreclr\vm\threads.cpp @ 7505]
2d 0000007b`a638fc00 00007ffc`738e257d coreclr!ThreadNative::KickOffThread+0x7e [D:\a\_work\1\s\src\coreclr\vm\comsynchronizable.cpp @ 230]
2e 0000007b`a638fc60 00007ffc`7508aa48 KERNEL32!BaseThreadInitThunk+0x1d
2f 0000007b`a638fc90 00000000`00000000 ntdll!RtlUserThreadStart+0x28
from powershell.
09 0000007b
a638eb60 00007ffb
95964a7c System_Management_Automation!System.Management.Automation.MemberInvocationLoggingOps.LogMemberInvocation+0x27b
Oh, LogMemberInvocation calls ArgumentToString here:
So does that mean the multi-megabyte base64 string goes via Anti-Malware Scan Interface to Windows Defender…? I guess that would be a sensible design. And then perhaps the Defender implementation of AMSI makes a few more copies of the string.
This PowerShell code would apparently log the whole AMSI scan request to the console if you set __PSDumpAMSILogContent=1
in the environment before you start PowerShell.
Why does the AMSI scan take that long, though… does it do useful work all that time, or does it get stuck somehow and give up after a timeout? Perhaps you could try with files of different sizes, graph how the file size affects the FromBase64String duration. If the duration stays the same, then that suggests there is a timeout.
from powershell.
I ran a test using variable size byte arrays with random payloads starting at 2 MiB in size and going up to 116 MiB in size. You can see that the duration required is linear, and also extremely slow. It takes 2 seconds to convert 32 MiB back to a byte array from a base64 string.
The same test conducted at 16 MiB intervals up to 256 MiB also shows a linear trend.
One final test at 32 MiB intervals up to 384 MiB shows a linear trend as well suggesting that there may be no upper boundary or timeout no matter how much data you ask PowerShell to convert from base64.
from powershell.
So does that mean the multi-megabyte base64 string goes via Anti-Malware Scan Interface to Windows Defender…? I guess that would be a sensible design.
Or perhaps you might be completely horrified by the idea of deep packet inspection of all arguments and no knowledge of whether that will be sent to 3rd parties. ( or any other party at all, to be honest )
from powershell.
I hoped the graph might show a lower boundary, because it could indicate a configuration error that could then be fixed to speed up the operation; for example, if the AMSI code running in-process were unable to contact the Defender service and spent some constant amount of time attempting that. Alas, the linear graph doesn't look like that's the case.
There may be ways to change the PowerShell script so that, even though it still triggers the suspicious content detector and causes an AMSI scan, the argument list being scanned does not include the base64 data and the scan finishes faster. But if such a workaround becomes commonly used, I suspect a future version of PowerShell will be changed to scan the data anyway.
from powershell.
AMSI logging of method invocations was added as an experimental feature in #16496 and changed to non-experimental in #18041. I'm not sure it even uses the suspicious content detector; perhaps the difference between ToBase64String and FromBase64String is that ArgumentToString does not format the elements of a byte[] argument for AMSI, but passes a string argument through.
A slowdown was previously reported in #19431.
from powershell.
I am not seeing similar times
$bytes = new-object byte[] -ArgumentList @(,200554320)
$random = new-object Random
$random.NextBytes($bytes)
$now = Get-Date
$base64 = [System.Convert]::ToBase64String( $bytes )
Write-Host " $(((Get-Date) - $now).TotalMilliseconds) ms"
Took 442.478 ms on a little Intel(R) Core(TM) i3-10100Y CPU @ 1.30GHz 1.61 GHz running Windows 11 Pro
from powershell.
AMSI logging of method invocations was added as an experimental feature in #16496 and changed to non-experimental in #18041.
Where can I find documentation on what this actually does? I am personally horrified by the idea that anyone thinks they have the rights to log data that was private to a process without their knowledge.
When I say POWERSHELL_TELEMETRY_OPTOUT=1
I mean it!
from powershell.
I am not seeing similar times
@rhubarb-geek-nz, your script uses ToBase64String, not FromBase64String.
Where can I find documentation on what this actually does?
The best may be the documentation of the PSAMSIMethodInvocationLogging experimental feature in this old version: https://github.com/MicrosoftDocs/PowerShell-Docs/blob/793ed5c687e6c7b64565d1751c532eb1d7d84209/reference/docs-conceptual/learn/experimental-features.md#psamsimethodinvocationlogging
The "How AMSI helps" link in that documentation doesn't work on GitHub; use https://learn.microsoft.com/windows/win32/amsi/how-amsi-helps instead.
AMSI doesn't necessarily involve telemetry that would send the data off the machine. I don't know whether Windows Defender has telemetry for AMSI scans.
The official PowerShell documentation sources. Contribute to MicrosoftDocs/PowerShell-Docs development by creating an account on GitHub.
As an application developer, you can actively participate in malware defense. Specifically, you can help protect your customers from dynamic script-based malware, and from non-traditional avenues of cyber attack.
from powershell.
There may be ways to change the PowerShell script so that, even though it still triggers the suspicious content detector and causes an AMSI scan, the argument list being scanned does not include the base64 data and the scan finishes faster.
Because ArgumentToString does not recognise the char[] type and returns only the type name, I think a [System.Convert]::FromBase64CharArray call should be much faster for AMSI to scan than [System.Convert]::FromBase64String. But who knows how long that will remain so.
from powershell.
@rhubarb-geek-nz, your script uses ToBase64String, not FromBase64String.
Thanks for that., well spotted. I have updated as
$bytes = new-object byte[] -ArgumentList @(,200554320)
$random = new-object Random
$random.NextBytes($bytes)
$now = Get-Date
$base64 = [System.Convert]::ToBase64String( $bytes )
Write-Host " $(((Get-Date) - $now).TotalMilliseconds) ms"
$bytes = $null
$now = Get-Date
$bytes = [System.Convert]::FromBase64String( $base64 )
Write-Host " $(((Get-Date) - $now).TotalMilliseconds) ms"
Running on Windows 11 I now get
480.4803 ms
65573.3259 ms
but running on WSL Debian 12 on same machine
556.7739 ms
2534.7992 ms
So this is an issue only on Windows
from powershell.
@daxian-dbw @SteveL-MSFT Who could review the method for performance? There is an issue with large arguments. It seems it makes no sense to send full large file content to amsi.
PowerShell/src/System.Management.Automation/engine/runtime/Operations/MiscOps.cs
Lines 3651 to 3668 in 8ea1598
from powershell.
@daxian-dbw @SteveL-MSFT Who could review the method for performance?
-
It is using StringBuilder, so that is good
-
ArgumentToString()
does not do any escaping of the string values, so if the string contained>, <
or similar then you have the equivalent of SQL injection,
3 Using ($"<{value}>"); and StringBuilder, choose one mechanism of the other. eg
argsBuilder.Append('<').Append(value).Append('>');
-
StringBuilder has appenders for primitive types, so no need to convert those to a string and append when StringBuilder can append ints, longs and bools more efficiently.
-
No exception is made for any cryptographic material or SecureString
from powershell.
the equivalent of SQL injection
Public AMSI documentation does not specify any particular format for the string. I expect that escaping would do more harm than good, as it could prevent the AMSI provider from recognizing substrings that are known to be used by malware.
The argsBuilder.Append($"<{value}>")
call goes to StringBuilder.Append(StringBuilder.AppendInterpolatedStringHandler handler), which should be optimal already. Actual IL in LogMemberInvocation:
IL_0032: ldc.i4.2
IL_0033: ldc.i4.1
IL_0034: ldloc.s V_6
IL_0036: newobj instance void [System.Runtime]System.Text.StringBuilder/AppendInterpolatedStringHandler::.ctor(int32,
int32,
class [System.Runtime]System.Text.StringBuilder)
IL_003b: stloc.s V_7
IL_003d: ldloca.s V_7
IL_003f: ldstr "<"
IL_0044: call instance void [System.Runtime]System.Text.StringBuilder/AppendInterpolatedStringHandler::AppendLiteral(string)
IL_0049: ldloca.s V_7
IL_004b: ldloc.s V_5
IL_004d: call instance void [System.Runtime]System.Text.StringBuilder/AppendInterpolatedStringHandler::AppendFormatted(string)
IL_0052: ldloca.s V_7
IL_0054: ldstr ">"
IL_0059: call instance void [System.Runtime]System.Text.StringBuilder/AppendInterpolatedStringHandler::AppendLiteral(string)
IL_005e: ldloca.s V_7
IL_0060: callvirt instance class [System.Runtime]System.Text.StringBuilder [System.Runtime]System.Text.StringBuilder::Append(valuetype [System.Runtime]System.Text.StringBuilder/AppendInterpolatedStringHandler&)
IL_0065: pop
It might be possible to save a little by replacing the subsequent string content = $"<{targetName}>.{name}({argsBuilder})";
formatting with more appends to the same StringBuilder; but the stack trace showed the AMSI call so I think most of the time is spent there rather than in managed-code string formatting.
from powershell.
I noticed this processing delay with [System.Convert]::FromBase64String
is also happening with several other CLR functions which fill and copy arrays, most notably [System.Random]::NextBytes
and [System.Array]::Copy
the first of which is an instance method on the System.Random
class. Both methods appear to have a linear duration increase as the size of the byte[]
increases.
I modified my test script to use [System.Convert]::FromBase64CharArray
which runs substantially faster than FromBase64String and in the process I noticed the NextBytes and Array::Copy methods were taking an increasingly long time to run. Here is my updated test script which outputs the timing results to a CSV file which can be pulled into Power BI for reporting.
# Create a random number generator and results collection
$rand = [System.Random]::new()
$testResults = @()
$iterations = 4 * 1024
$step = 1 * 1024
# Fill a byte[] with random data for testing
[byte[]]$randomData = [System.Array]::CreateInstance( [byte[]], $iterations * $step )
$rand.NextBytes( $randomData )
# Run a test in $step KiB increments
1..$iterations | ForEach-Object {
# Write a progress window
Write-Progress -Activity "Testing Base64 Conversion" -Status "Iteration $_ of $iterations" -PercentComplete (($_ / $iterations) * 99.9)
# Create a test result
$dataLength = ($_ * $step)
$result = [PSCustomObject]@{
Length = $dataLength
FillMs = 0
ToBase64Ms = 0
FromBase64Ms = 0
}
# Create an array of data
$start = Get-Date
[byte[]]$data = [System.Array]::CreateInstance( [byte[]], $dataLength )
[System.Array]::Copy( $randomData, 0, $data, 0, $dataLength )
$result.FillMs = ((Get-Date) - $start).TotalMilliseconds
# Run conversion tests and track timing
$start = Get-Date
$base64 = [System.Convert]::ToBase64String( $data )
$result.ToBase64Ms = ((Get-Date) - $start).TotalMilliseconds
$start = Get-Date
$chars = $base64.ToCharArray()
$bytes = [System.Convert]::FromBase64CharArray( $chars, 0, $chars.Length )
$result.FromBase64Ms = ((Get-Date) - $start).TotalMilliseconds
# Record results for output and graphing
$testResults += $result
}
# Output the results to CSV
$testResults | Export-Csv "base64-results.csv" -Force
from powershell.
but the stack trace showed the AMSI call so I think most of the time is spent there rather than in managed-code string formatting.
I don't think it is so much the matter of time but heap usage, the StringBuilder will have another copy of the base64 data, then the construction of content will do similar.
I suggest you can do it without formatting or StringBuilder and only allocate the memory for the final string once.
var argsBuilder = new string[5+args.Length*2];
int i = 0;
argsBuilder[i++] = "<";
argsBuilder[i++] = targetName;
argsBuilder[i++] = ">.";
argsBuilder[i++] = name;
if (args.Length > 0)
{
foreach (var arg in args)
{
argsBuilder[i++] = i==5 ? "(<" : ">, <";
argsBuilder[i++] = ArgumentToString(arg);
}
argsBuilder[i++] = ">)";
}
else
{
argsBuilder[i++] = "()";
}
string content = String.Join(null, argsBuilder);
Apart from the list of strings themselves in argsBuilder the final string is allocated only in String.Join() which determines the total length first.
from powershell.
Sure, AMSI is part of Windows API and doesn't exist in Linux.
Not quite. The logging is still done on Linux, whether AMSI exists or not. This was run on 7.4.2 on Linux
#!/usr/bin/env pwsh
$env:__PSDumpAMSILogContent='1'
$bytes = new-object byte[] -ArgumentList @(,200554320)
$random = new-object Random
$random.NextBytes($bytes)
Gives
=== Amsi notification report content ===
<System.Random>.NextBytes(<System.Byte[]>)
=== Amsi notification report success: False ===
from powershell.
$env:__PSDumpAMSILogContent='1'
isn't effective in-session; the env. var. must be set before calling PowerShell.
Not actually true, it merely needs to be set before the first call to get DumpLogAMSIContent
, the environment variable is accessed by a lazy load. Eg #21492
private static readonly Lazy<bool> DumpLogAMSIContent = new Lazy<bool>(
() => {
object result = Environment.GetEnvironmentVariable("__PSDumpAMSILogContent");
if (result != null && LanguagePrimitives.TryConvertTo(result, out int value))
{
return value == 1;
}
return false;
}
);
$bytes = new-object byte[] -ArgumentList @(,200554320)
can be simplified to
$bytes = [byte[]]::new(200554320)
, which also avoids an (invisible)[psobject]
wrapper that would cause an (unrelated) problem ....
Of course, how could we not have invisible, non-obvious problems in the simplest of code.
from powershell.
Not actually true, because for predictable diagnostic output you indeed do need the set the environment variable first, as evidenced by the following:
$null, 1 | % {
Write-Host ---
$env:__PSDumpAMSILogContent = $_
pwsh -noprofile { [byte[]]::new(0) }
}
$env:__PSDumpAMSILogContent = $null
Of course, how could we not have invisible, non-obvious problems in the simplest of code.
I assume this is pure sarcasm (which I do not endorse, but I empathize with the frustration I presume to underlie it); if there's an actual argument in there (beyond what #21496 expresses), please tell us.
from powershell.
Not actually true, because for predictable diagnostic output you indeed do need the set the environment variable first, as evidenced by the following:
The script was predictable because it had the "#!/usr/bin/env pwsh" at the start, the executable it bit set, and was designed to run directly from bash. It sets the environment variable after powershell has started but before the first reflection invocation.
if there's an actual argument in there, please tell us.
The frustration is because everytime you think you have found the solution with PowerShell, there is always another reason, case, exception or scenario where it breaks. As a user you don't have the tools to see all these problems because the very objects themselves play stupid games trying to pretend to be something they are not, or changing from what you thought it should have been. I can only assume I am not the target audience for this tool despite it supposedly being for system administrators, developers and IT professionals.
from powershell.
The script was predictable
No, it isn't predictable. My previous example stands. If the calling process doesn't have environment variable
__PSDumpAMSILogContent
already set before invocation, a call to[byte[]]::new(0)
will not dump the diagnostic ASMI information, but it will do in the former case.
Yes, you are absolutely right. I wasn't predictable because you might have been using PowerShell as your default shell to launch scripts. Whereas all other UNIX shell scripts start a new script with executable bit set in a new process, we are talking about PowerShell here. Sigh.
Perhaps a recommendation for running test scripts is "In a new process....", not "In whatever process with whatever indeterminate state you happen to have....."
from powershell.
I wasn't predictable because you might have been using PowerShell as your default shell
I was, but that is irrelevant: the only thing that matters is whether the calling process had a __PSDumpAMSILogContent
variable defined or not, so this equally applies to POSIX-compatible shells.
See below.
from powershell.
P.S.: @rhubarb-geek-nz:
-
I haven't looked into why the apparent attempt at honoring an in-process definition of the variable (
private static readonly Lazy<bool> DumpLogAMSIContent
) doesn't work reliably.- Update: The reason is that the very first .NET method call from PowerShell code in a session locks in the value of
DumpLogAMSIContent.Value
based on whether env. var.__PSDumpAMSILogContent
is defined (and set to1
) then. In an interactive session, it is invariably thePSReadLine
module that is the first to make such a call ([Microsoft.PowerShell.PSConsoleReadLine]::ReadLine()
), so that subsequent in-process attempts to set__PSDumpAMSILogContent
are ineffective.
- Update: The reason is that the very first .NET method call from PowerShell code in a session locks in the value of
-
However, generally speaking, in cases where PowerShell honors environment variables, they are expected to be set before PowerShell is launched.
- Update: There is at least one exception:
$env:PSModulePath
is honored dynamically, on every access; the lazy once-per session initialization ofDumpLogAMSIContent
is an unfortunate hybrid between static and dynamic behavior.
- Update: There is at least one exception:
from powershell.
the only thing that matters is whether the calling process had a
__PSDumpAMSILogContent
variable defined or not, so this equally applies to POSIX-compatible shells.
I beg to offer a different opinion...
$ ls -ld new.ps1
-rwxr-xr-x 1 me users 139 Apr 18 00:57 new.ps1
$ cat new.ps1
#!/usr/bin/env pwsh
$env:__PSDumpAMSILogContent='1'
$bytes1024 = new-object byte[] -ArgumentList @(,1024)
$bytes2048 = [byte[]]::new(2048)
Scenario A - The environment variable is not set in the calling process
$ echo $__PSDumpAMSILogContent
$ ./new.ps1
=== Amsi notification report content ===
<System.Byte[]>.new(<2048>)
=== Amsi notification report success: False ===
Scenario B - it is set to 0 in the calling process
$ __PSDumpAMSILogContent=0
$ echo $__PSDumpAMSILogContent
0
$ ./new.ps1
=== Amsi notification report content ===
<System.Byte[]>.new(<2048>)
=== Amsi notification report success: False ===
from powershell.
Rather than making a cmdlet for every .NET method you wish to call, you can simply put reflection in a single cmdlet.
$bytes = [byte[]]@(1,2,3)
$base64 = [string](Invoke-Reflection -Method ToBase64String -Type ([System.Convert]) -ArgumentList @(,$bytes))
Invoke-Reflection -Method FromBase64String -Type ([System.Convert]) -ArgumentList @(,$base64) | Format-Hex
from powershell.
and using a filename extension with an executable shell script is generally ill-advised.
Really? That is one I have not heard of.... eg
$ ls -ld *.sh
-rwxr-xr-x 1 github users 772 May 18 2023 debug.sh
-rw-r--r-- 1 github users 107 May 18 2023 download.sh
-rwxr-xr-x 1 github users 2094 May 18 2023 generate-icns.sh
-rwxr-xr-x 1 github users 7418 May 18 2023 install-powershell.sh
-rwxr-xr-x 1 github users 7307 May 18 2023 installpsh-amazonlinux.sh
-rwxr-xr-x 1 github users 9229 May 18 2023 installpsh-debian.sh
-rwxr-xr-x 1 github users 7791 May 18 2023 installpsh-gentoo.sh
-rw-r--r-- 1 github users 7533 May 18 2023 installpsh-mariner.sh
-rwxr-xr-x 1 github users 6483 May 18 2023 installpsh-osx.sh
-rwxr-xr-x 1 github users 6425 May 18 2023 installpsh-redhat.sh
-rwxr-xr-x 1 github users 9081 May 18 2023 installpsh-suse.sh
If you mean executable PowerShell scripts without the ps1 extension, we know how that ends up.
from powershell.
Another alternative is to do the reflection directly in PowerShell itself
ToBase64String
is
$method = ([System.Convert]).GetMethod('ToBase64String',[type[]]@(,([byte[]])))
$base64 = [string]($method.Invoke($null,@(,$bytes)))
FromBase64String
is
$method = ([System.Convert]).GetMethod('FromBase64String',[type[]]@(,([string])))
$bytes = $method.Invoke($null,@(,$base64))
Then the AMSI logging just looks like
=== Amsi notification report content ===
<System.Random>.NextBytes(<System.Byte[]>)
=== Amsi notification report success: False ===
=== Amsi notification report content ===
<System.RuntimeType>.GetMethod(<ToBase64String>, <System.Type[]>)
=== Amsi notification report success: False ===
=== Amsi notification report content ===
<System.Reflection.RuntimeMethodInfo>.Invoke(<null>, <System.Object[]>)
=== Amsi notification report success: False ===
=== Amsi notification report content ===
<System.RuntimeType>.GetMethod(<FromBase64String>, <System.Type[]>)
=== Amsi notification report success: False ===
=== Amsi notification report content ===
<System.Reflection.RuntimeMethodInfo>.Invoke(<null>, <System.Object[]>)
=== Amsi notification report success: False ===
Where the arguments are not dumped because all it prints is System.Object[]
from powershell.
Really? That is one I have not heard of...
Unfortunately, many ill-advised practices are common.
An executable shell script (using a shebang line) is an executable like any other, and there is no benefit to signaling to a caller that a given executable happens to be a shell script, which is (a) an implementation detail and (b) may lead users to believe that sh <script>.sh
should be used for invocation, which can fail if the script uses Bashisms, for instance.
With PowerShell, specifically, things get tricky (leaving the bug you mention aside), because, unlike analogous shell scripts for POSIX-compatible shells, an executable, shebang line-based .ps1
file is still executed in-process, with the potential to alter the session state. An executable, shebang line-based .ps1
file must therefore be designed with this in mind.
One without this extension consistently runs in a child process - albeit more slowly and at the expense of not having rich type support in the in- and output and the inability to pass array arguments and arguments that have no string-literal representations - but a PowerShell script that is designed to (also) run as a standalone executable should not rely on these features anyway.
I presume it is the latter limitations that explain why - at least in my perception - shebang line-based PowerShell scripts haven't really caught on and why bugs such as #21402 are still not fixed.
from powershell.
Really? That is one I have not heard of...
Unfortunately, many ill-advised practices are common.
It depends on the context. If you mean a program that is found on via the PATH then I might agree, but in general when you are managing large numbers of scripts to perform tasks then keeping the .sh extension is very useful. UNIX exec() does not care about file extensions for executables, the concept of file extensions does not exist within the POSIX C API. You are free to name executable files how you like. One major advantage of maintaining the .sh extension is when you manage them in a source code repository and you are storing text, not a compiled binary. Keeping the extension makes that absolutely obvious.
It is Window'isms that step through extensions (com, bat, exe, cmd) while looking for commands on the path or local directory, and similarly PowerShell does the same and will try and append .ps1 to try and look for a command.
from powershell.
@rhubarb-geek-nz , we're getting far afield, but let me attempt a summary of the issue at hand first, which implies that there's likely nothing actionable here:
-
I presume that there's no actual memory leak here, only a memory "grab" by the CLR that isn't released, at least not instantly (perhaps on demand?).
-
The behavior is currently by design, and the only pathological case is an attempt to pass a very large string as an argument to a .NET method. Workarounds have been offered:
-
Per @KalleOlaviNiemitalo's comment, using
[System.Convert]::FromBase64CharArray()
bypasses the problem. -
Per your own comment, reflection can be used to bypass the problem.
-
-
A fundamental solution would be to allow opt-out of AMSI calls - with obvious security implications - which you've asked for in #21491
-
Also, given the currently unnecessary overhead on Unix-like platforms - where no AMSI equivalent exists - runtime performance could be improved: #21492 (comment)
Returning to the tangent:
If you mean a program that is found on via the PATH then I might agree
When it comes to naming a stand-alone executable, it seems to me that the end-user experience should be the driver, trumping any design-time / implementation considerations:
-
On Unix-like platforms, this means: Do not use filename extensions when naming such executables.
-
On Windows, this means: Given that
.ps1
files aren't directly executable from outside PowerShell, create companion.cmd
files that are - both with and without specifying.cmd
- using@"%~dpn0.ps1" %*
from powershell.
Per your own comment, reflection can be used to bypass the problem.
I'm half expecting you to make PowerShell recognise MethodInfo.Invoke calls and log each element of object?[]? parameters
to AMSI as if the method had been called directly.
from powershell.
@KalleOlaviNiemitalo, fair point: Both of the aforementioned workarounds amount to bypassing the intended AMSI calls - I merely summarized them, speaking as someone who's neither a security expert nor speaking in any official capacity.
from powershell.
Let's go back to the original problem.
Reading into memory and converting to base64 then converting back should require about 790 MB of RAM with all variables remaining in scope during the process and no garbage collection happening or object disposal happening. The observed behavior appears to be memory-leak related as the amount of memory used once the conversion eventually completes is about 3.4 GB of RAM.
Since the early days of computers we have been able to deal with files larger than the available memory of the computer. This is still the case.
The first thing to realise is
(a) PowerShell is not a UNIX shell and it is really really bad at dealing with streams of bytes. That is not a problem of the PowerShell engine itself, but the existing cmdlets, scripts, patterns and expectations. PowerShell deals with pipelines of typed objects, not text or byte streams.
(b) UNIX does this kind of thing in its sleep, literally. A pipe is a byte stream first and foremost. Deciding to treat it as text is an afterthought.
So if we were doing this in UNIX we would simply do
$ openssl base64 < file.in | openssl base64 -d > file.out
The file went through the memory as it was being processed and then out to the final file.
Now let's do the same thing with PowerShell, $file
is the sdk exe, $copy
is a 2nd copy we are making
Split-Content -LiteralPath $file -AsByteStream | ConvertTo-Base64 | ConvertFrom-Base64 | Set-Content -LiteralPath $copy -AsByteStream
When you put that pipeline together it takes only about 50MB working set in order to process dotnet-sdk-8.0.204-win-x64.exe and write a copy of the output.
Validate it and compare with the SHA512 from the original download site
Get-FileHash -LiteralPath $file,$copy -Algorithm SHA512
So how does that work?
Split-Content reads a file and writes arrays of 4096 bytes to the success pipeline
ConvertTo-Base64 reads the byte arrays and writes out lines of Base64 encoding of just 64 characters each, same as openssl base64
.
ConvertFrom-Base64 reads the strings and converts them to byte arrays.
Set-Content writes the bytes arrays to the final file.
It only took about 27MB to read, encode the decode the base64, without writing to a file.
$total = 0
Split-Content -LiteralPath $file -AsByteStream | ConvertTo-Base64 | ConvertFrom-Base64 | ForEach-Object { $total += $_.Length }
$total
So from 3.4GB to 27MB with no change to PowerShell itself is not a bad effort.
It was a trade-off of space versus time. It takes about 7 seconds or so to run the read, encode and decode pipeline.
from powershell.
Yes, prior to PS 7.4 raw byte handling in pipelines wasn't supported, but in 7.4+ it now is, between external (native) programs, so the following works as intended from PowerShell (also on Windows, if you install openssl.exe
there); note the use of -in
to specify the input file:
# OK in PS 7.4+
openssl base64 -in file.in | openssl base64 -d > file.out
I haven't looked into the implementation, but I assume (hope) that on Unix-like platforms the usual system-level data buffering applies, which is 64KB these days.
While <
isn't available in PowerShell to byte-stream data to an external program, in v7.4+ you can feed [byte]
or - much more efficiently - [byte[]]
data output from PowerShell commands to external programs:
The - slow - solution is therefore (byte-by-byte processing on the PowerShell side):
Get-Content file.in -AsByteStream | openssl base64 | openssl base64 -d > file.out
The - much faster - solution, which, however, reads the input file in full, due to -Raw
:
Get-Content file.in -Raw -AsByteStream | openssl base64 | openssl base64 -d > file.out
The - more memory-efficient - solution that emulates Unix pipeline buffering is:
Get-Content file.in -ReadCount 64kb -AsByteStream |
% { , [byte[]] $_ } |
openssl base64 | openssl base64 -d > file.out
Note the - unfortunate in terms of both verbosity and performance - need for an intermediate %
(ForEach-Object
) call that strongly types the [object[]]
-typed arrays that -ReadCount
produces as [byte[]]
, as that is the prerequisite for sending raw byte data to an external program.
Arguably, Get-Content
's -ReadCount
parameter should instead output:
-
[string[]]
arrays by default -
[byte[]]
arrays in combination with-AsByteStream
This would obviate the need for the inefficient and awkward ForEach-Object
helper call.
from powershell.
The - more memory-efficient - solution that emulates Unix pipeline buffering is:
Get-Content file.in -ReadCount 64kb -AsByteStream |
% { , [byte[]] $_ }
I did not have much success with Get-Content with ReadCount even in binary mode, I did not think of the array conversion in a ForEach-Object.
Hence I wrote the Split-Content which reads directly into a byte array and put that straight in the output pipeline. No need to convert any arrays.
I am not convinced that large buffers like 64K help in the PowerShell pipeline, because it has to fill the entire 64KB first before it passes onto the pipeline. The buffering in UNIX works the other way round, things can keep writing until the pipe buffer is full then they block until the reader has made some room.
A UNIX pipeline has a record size of 1. The PowerShell pipeline above has a record size of 64K, so nothing can move until the record is full. In UNIX if a network stream is slow then even the few hundred bytes at a time would still dribble through.
It would certainly be better if Get-Content always wrote AsByteStream as a byte array but I think it is too late to change that.
from powershell.
Related Issues (20)
- Test-Json has false positives when using anyof and allof statements in JSON schema HOT 2
- Inside a class method, the PowerShell parser cannot detect that a variable that is declared and assigned in all branches of a switch statement has actually been assigned a value HOT 3
- serious security bug - auto completion reveals password HOT 11
- https://aka.ms/pwsh-buildinfo-lts points to the wrong release HOT 1
- Clearing gateway using powershell WMI does not function as expected
- Support ASKPASS for credential prompt HOT 3
- How can I disable PSAMSIMethodInvocationLogging HOT 8
- AMSI logging implemented on Linux HOT 8
- Debian/Ubuntu at packages.microsoft.com not updated for 7.4.2 release HOT 5
- Feedback providers should export a VT escape sequence for other Terminals HOT 3
- An incidentally `[psobject]`-wrapped strongly typed array isn't recognized as such in argument-based parameter binding HOT 8
- __PSDumpAMSILogContent debug logging sent to stdout HOT 12
- Improve Start-Process functionality to control the position and size of the new application window HOT 3
- Unable to Resolve Parameter Set When Dynamic Parameter Qualifies Parameter Set HOT 5
- deadlock on Interop+Shell32.SHGetKnownFolderPath using -MTA/-STA flag HOT 1
- Invoke-sqlcmd crash on parallel execution HOT 5
- disable tracking HOT 2
- `Set-Location` does not work with some Win32 short paths HOT 12
- Get-Content with -AsByteStream -ReadCount X -Raw should return byte arrays HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from powershell.