Git Product home page Git Product logo

Comments (58)

jborean93 avatar jborean93 commented on May 20, 2024 2

I cannot replicate the slowness you see as 7.4.2 takes less than 5 seconds for me and is in fact faster than WinPS but I do notice the large memory usage. My guess is that it is not only are now storing the 2 byte arrays (raw file data and the decoded base64 string) but also the base64 string is all allocated on the heap as part of the operation. Potentially WinPS/.NET Framework is more aggressive in reusing the array values but as per the above the CLR could be allocating the memory and just never freeing it so it can more efficient reuse the memory in the future.

Putting aside the above comment you can more efficiently base64 encode bytes by streaming it rather than reading all the input bytes into memory.

Function ConvertTo-Base64String {
    [OutputType([string])]
    [CmdletBinding()]
    param (
        [Parameter(Mandatory)]
        [string]$Path
    )

    $fs = $cryptoStream = $sr = $null
    try {
        $fs = [System.IO.File]::OpenRead($Path)
        $cryptoStream = [System.Security.CryptoGraphy.CryptoStream]::new(
            $fs,
            [System.Security.Cryptography.ToBase64Transform]::new(),
            [System.Security.Cryptography.CryptoStreamMode]::Read)
        $sr = [System.IO.StreamReader]::new($cryptoStream, [System.Text.Encoding]::ASCII)
        $sr.ReadToEnd()
    }
    finally {
        ${sr}?.Dispose()
        ${cryptoStream}?.Dispose()
        ${fs}?.Dispose()
    }
}

This will stream the raw bytes from the source file stream and produce the final output string. If you are storing this string into a file then you could optimize it further by streaming the output base64 CryptoStream to a file avoiding having to store all the data in PowerShell.

If you do need to store the base64 string as an object in PowerShell keep in mind this means you not only have to store the inflated size that base64 uses (($length / 3) * 4) but each char of the string takes two bytes so at a minimum you are looking at around 600MB for the dotnet installer. Using the above function I see the memory usage sits around 1.2GB which is less than WinPS (about 1.4GB). While this is still more than the ~600MB there could be other factors in place here like the CLR allocating more memory than strictly needs in anticipation of future needs or some other reason.

from powershell.

KalleOlaviNiemitalo avatar KalleOlaviNiemitalo commented on May 20, 2024 2

I'm assuming PowerShell 7.4.2 is using .NET 8.0 under the hood. Does it ship with its own .NET run-time or does it rely on the run-time installed on the system?

With PowerShell 7.4.2 installed from Microsoft Store, Get-ChildItem $PSHOME shows files like coreclr.dll; I think that means PowerShell has its own copy of the .NET Runtime.

[System.Runtime.InteropServices.RuntimeInformation]::FrameworkDescription is ".NET 8.0.4".

from powershell.

rhubarb-geek-nz avatar rhubarb-geek-nz commented on May 20, 2024 1

Hi,

Given all of your file and data operations are performed directly with the .NET API I suggest this is not a PowerShell problem, it looks like it is a .NET issue.

Is your issue that PowerShell is not running the garbage collector? Does running

[System.GC]::Collect()

make any difference?

Also be aware that managed memory runtimes often take memory from the OS in order to allocate objects but never return it. So the objects may have been released/disposed/freed so the actual CLR heap is free but the memory has not been returned to the OS. This is completely normal as managed runtimes (CLR, JVM etc) assume that if they needed it once they are likely to need it again so no point giving it back to the OS.

You would need to look for tools to examine the state of the CLR heap within the process rather than external process monitoring tools.

When you have an operation that you know is memory intensive then two other options are available,

  1. run the operation in a separate process, in PowerShell all you need to do is run it in a Job and PowerShell will do the rest.
  2. stream the operation and read and write chunks of data rather than holding it all in memory.

I hope that helps.

from powershell.

chopinrlz avatar chopinrlz commented on May 20, 2024 1

Thank you both for your responses and suggestions for optimization. I did reply to the Issue cross-posted in the dotnet/runtime project which you can see here dotnet/runtime#101061 (comment)

I ran [GC]::Collect() after running the test script in both WinPS and PS7 and noticed that in PS7 almost none of the used memory was released back to the operating system, which seemed suspicious as in WinPS almost all memory is released after the test.

UPDATE: Confirmed below that the FromBase64String delay and excessive memory usage only occurs through PowerShell 7, but not through a .NET console app. This definitely appears to be a .NET run-time issue and not a PowerShell issue, however I wanted to post it here as well as over there for visibility in case others encounter this oddity while building PowerShell modules (which is how I stumbled upon it).

I am aware that storing a 200 MB file in memory as base64 text is wildly inefficient, and is not the intention of how my PowerPass module should be used (which is how I discovered this in the first place), but since I stumbled upon this unexpected behavior I thought it prudent to at least report it.

But again, I appreciate all of the comments and feedback here, especially the suggestions for optimization techniques.

from powershell.

KalleOlaviNiemitalo avatar KalleOlaviNiemitalo commented on May 20, 2024 1

In src/System.Management.Automation/engine/runtime/CompiledScriptBlock.cs, there is class SuspiciousContentChecker, which attempts to detect "suspicious strings" such as FromBase64String. That then apparently causes PowerShell to log some ETW event. I wonder if Windows Defender monitors those events and then spends time investigating the process.

from powershell.

rhubarb-geek-nz avatar rhubarb-geek-nz commented on May 20, 2024 1

If I am understanding this nonsense with AMSI correctly, then a solution would be to perform the Base64 translation in a compiled C# cmdlet. Given we are talking PowerShell it should be implemented using a pipeline with System.Security.Cryptography.ToBase64Transform rather than dealing with massive strings. In this case the biggest string would be 64 characters

from powershell.

KalleOlaviNiemitalo avatar KalleOlaviNiemitalo commented on May 20, 2024 1

Sure, AMSI is part of Windows API and doesn't exist in Linux.

from powershell.

rhubarb-geek-nz avatar rhubarb-geek-nz commented on May 20, 2024 1

My workaround is to use two cmdlets to do the Base64 conversions hence bypass the PSAMSIInvocationLogging nonsense.

Original timings

 218.4235 ms
 64152.6876 ms

now

 222.2945 ms
 301.4766 ms

Code

#!/usr/bin/env pwsh

$env:__PSDumpAMSILogContent='1'

trap
{
	throw $PSItem
}

$ErrorActionPreference = 'Stop'

$code = @"
using System;
using System.Management.Automation;

	[Cmdlet("ConvertFrom", "Base64String")]
	public class ConvertFromBase64String : PSCmdlet
	{
		[Parameter(Mandatory=true,ValueFromPipeline=true)]
		public String InputString;

		protected override void ProcessRecord()
		{
			WriteObject(System.Convert.FromBase64String(InputString));
		}
	}

	[Cmdlet("ConvertTo", "Base64String")]
	public class ConvertToBase64String : PSCmdlet
	{
		[Parameter(Mandatory=true,ValueFromPipeline=true)]
		public byte[] InputObject;

		protected override void ProcessRecord()
		{
			WriteObject(System.Convert.ToBase64String(InputObject));
		}
	}
"@

Add-Type $code -PassThru | ForEach-Object { Import-Module $_.Assembly }

Get-Command -Noun 'Base64String'

$bytes = new-object byte[] -ArgumentList @(,200554320)

$bytes.Length

$random = new-object Random

$random.NextBytes($bytes)

$now = Get-Date

$base64 = @(,$bytes) | ConvertTo-Base64String

Write-Host " $(((Get-Date) - $now).TotalMilliseconds) ms"

$base64.Length

$bytes = $null

$now = Get-Date

$bytes = $base64 | ConvertFrom-Base64String

Write-Host " $(((Get-Date) - $now).TotalMilliseconds) ms"

$bytes.Length

from powershell.

chopinrlz avatar chopinrlz commented on May 20, 2024 1

My workaround is to use two cmdlets to do the Base64 conversions hence bypass the PSAMSIInvocationLogging nonsense.

Thank you, this is very helpful. I was fiddling with using C# and Add-Type, but I noticed that making the call directly in C# via a static function still invokes the AMSI context. I will incorporate this into my PowerPass module to avoid the performance issue and excessive memory usage.

from powershell.

mklement0 avatar mklement0 commented on May 20, 2024 1

@chopinrlz, it isn't calls from C# that trigger an AMSI scan, it is .NET method calls from PowerShell code.

@rhubarb-geek-nz's workaround based on using C# code to implement cmdlets avoids this by letting PowerShell itself mediate the method calls.


Two asides:

  • $env:__PSDumpAMSILogContent='1' isn't effective in-session; the env. var. must be set before calling PowerShell.

  • $bytes = new-object byte[] -ArgumentList @(,200554320) can be simplified to
    $bytes = [byte[]]::new(200554320), which also avoids an (invisible) [psobject] wrapper that would cause an (unrelated) problem with a call such as ConvertTo-Base64String -InputObject $bytes - see #21496

from powershell.

mklement0 avatar mklement0 commented on May 20, 2024 1

The script was predictable

No, it isn't predictable. My previous example stands. If the calling process doesn't have environment variable __PSDumpAMSILogContent already set before invocation, a call to [byte[]]::new(0) will not dump the diagnostic ASMI information, but it will do in the former case.

there is always another reason, case, exception or scenario where it breaks.

Again I empathize.
But simply venting your frustration isn't the way forward.

In the case at hand I've (indirectly) pointed to the (ultimate) root cause of the underlying problem - #5579.
I suggest channeling your frustration into constructive feedback - while being cognizant that such feedback may or may no be heard.

from powershell.

mklement0 avatar mklement0 commented on May 20, 2024 1

You're right - via the CLI (as implicitly used via a shebang-based executable shell script), the in-process setting is honored, if:

  • the CLI call uses either (possibly implied) -File or -Command for execute-and-exit functionality.
  • and $env:__PSDumpAMSILogContent = 1 is set before any in-session .NET method calls occur from PowerShell code.

A simpler demonstration: Start a pristine POSIX-compatible shell and run the following:

export -n __PSDumpAMSILogContent # ensure that the env. var. isn't defined.

# AMSI log output via env. var. defined BEFORE 
__PSDumpAMSILogContent=1 pwsh -noprofile -c '$null = [byte[]]::new(2048)'

# !! Produces AMSI output too, because the environment variable - despite being set in-session - is
# !! set *before the first method call*.
pwsh -noprofile -c '$env:__PSDumpAMSILogContent = 1; $null = [byte[]]::new(2048)'

Note:

  • From inside PowerShell, an executable shell script with extension .ps1 is still executed in-process - and using a filename extension with an executable shell script is generally ill-advised.

  • The bigger picture here is: To make PowerShell configuration environment variables work predictably, set them before invoking PowerShell, irrespective of the invocation mechanism.

from powershell.

mklement0 avatar mklement0 commented on May 20, 2024 1

because it has to fill the entire 64KB first before it passes onto the pipeline

Yes, it's an imperfect emulation of the native Unix pipeline, but with file input (where there's no "dribbling"), it works well.

That said, it's rare for Unix-heritage utilities to accept input via stdin (the pipeline) only and not also via file-path operands; thus, with a file as the data source, passing the file's path as an argument to an external program is the simpler and better solution (such as in the openssl case, using the - syntactically unusual - -in parameter).

It would certainly be better if Get-Content always wrote AsByteStream as a byte array but I think it is too late to change that.

Hopefully not: Let's see what becomes of the feature request you've since created:

from powershell.

237dmitry avatar 237dmitry commented on May 20, 2024

In PowerShell 7.4.2, the time to complete is 82 seconds and memory used is 3.4 GB.

Yes, the same memory usage, but time to complete is about 1.2 seconds

from powershell.

chopinrlz avatar chopinrlz commented on May 20, 2024

I retested the following updated script on my desktop PC. A Ryzen 5800X with 128 GB of RAM and PCIe Gen4 NVMe storage. The test ran much faster as expected, but the memory usage still remains high.

Even after invoking [GC]::Collect() around 3.2 GB of RAM still remains utilized by the pwsh process.

Reading the 222 MB file into memory takes 0.06 seconds and converting it to base64 takes 0.22 seconds and uses 845 MB of RAM across both operations as expected. The last operation [System.Convert]::FromBase64String uses 2.6 GB of RAM alone and takes 17 seconds.

ps7-retest-timings

I'll cross-post this in the dotnet/runtime issue. Thank you all for the feedback.

Updated test script:

$name = "random.bin"

$start = Get-Date

Write-Host "Creating Path to $name test file: " -NoNewline
$now = Get-Date
$file = Join-Path -Path $PSScriptRoot -ChildPath $name
Write-Host " $(((Get-Date) - $now).TotalMilliseconds) ms"

Write-Host "Reading all file bytes into memory: " -NoNewline
$now = Get-Date
$bytes = [System.IO.File]::ReadAllBytes( $file )
Write-Host " $(((Get-Date) - $now).TotalMilliseconds) ms"

Write-Host "Converting file bytes to base64 string: " -NoNewline
$now = Get-Date
$base64 = [System.Convert]::ToBase64String( $bytes )
Write-Host " $(((Get-Date) - $now).TotalMilliseconds) ms"

Write-Host "Converting base64 string back to file bytes: " -NoNewline
$now = Get-Date
$bytes = [System.Convert]::FromBase64String( $base64 )
Write-Host " $(((Get-Date) - $now).TotalMilliseconds) ms"

Write-Host "Test complete"

Write-Host "Total duration: $(((Get-Date) - $start).TotalMilliseconds) ms"

from powershell.

chopinrlz avatar chopinrlz commented on May 20, 2024

I just tested this same implementation using a C# console application for the dotnet/runtime team and the issues does NOT occur when running in a console application against .NET 8 on the latest SDK on Windows 11 Professional. My test results and C# code are here: dotnet/runtime#101061 (comment)

It seems that this is actually a memory leak in the PowerShell runtime for some reason. The dotnet/runtime crew was asking if the [System.Convert]::FromBase64String function was being replaced with something else by PowerShell, but since I haven't tweaked my PowerShell installation, I can't imagine what could be doing that. Plus, this issue happens on multiple PCs, my desktop and my laptop, and only within PowerShell 7.4.2.

I'm assuming PowerShell 7.4.2 is using .NET 8.0 under the hood. Does it ship with its own .NET run-time or does it rely on the run-time installed on the system?

from powershell.

chopinrlz avatar chopinrlz commented on May 20, 2024

For reference, the random.bin test file I am using is exactly 233,420,544 bytes in length. I generated it with this script:

param(
	[int]
	$Size
)
$blockSize = 256
$rand = [System.Random]::new()
$total = 0
[byte[]]$data = [System.Array]::CreateInstance( [byte[]], $blockSize )
$path = Join-Path -Path $PSScriptRoot -ChildPath "random.bin"
if( Test-Path $path ) {
	Remove-Item -Path $path -Force
}
$file = [System.IO.File]::OpenWrite( $path )
while( $total -lt $Size ) {
	$rand.NextBytes( $data )
	$file.Write( $data, 0, $data.Length )
	$total += $blockSize
}
$file.Flush()
$file.Close()

from powershell.

mklement0 avatar mklement0 commented on May 20, 2024

A couple more data points regarding the unexpected slowdown:

  • I see the slowdown only on Windows (neither on macOS nor on Linux), both in in 7.4.2 (.NET 8.0.4) and v7.5.0-preview.2 (.NET 9.0.0-preview.1.24080.9).

  • On Windows, I see the slowdown in 7.3.10 (.NET 7.0.14) too, albeit in less severe form (about twice as fast as 7.4.2 / v7.5.0-preview.2, but still way too slow); WinPS is fine.

from powershell.

iSazonov avatar iSazonov commented on May 20, 2024

I would like to remind that earlier we observed slow pwsh operations with files due to antivirus.

from powershell.

chopinrlz avatar chopinrlz commented on May 20, 2024

I'm assuming PowerShell 7.4.2 is using .NET 8.0 under the hood. Does it ship with its own .NET run-time or does it rely on the run-time installed on the system?

With PowerShell 7.4.2 installed from Microsoft Store, Get-ChildItem $PSHOME shows files like coreclr.dll; I think that means PowerShell has its own copy of the .NET Runtime.

[System.Runtime.InteropServices.RuntimeInformation]::FrameworkDescription is ".NET 8.0.4".

I am also seeing .NET 8.0.4 for the FrameworkDescription on my PowerShell 7 install which I setup from the MSI downloaded from Github. Assembly file versions are 8.0.424.16909.

from powershell.

chopinrlz avatar chopinrlz commented on May 20, 2024

I would like to remind that earlier we observed slow pwsh operations with files due to antivirus.

The slow operation is the call to [System.Convert]::FromBase64String which occurs after the file is loaded from disk into memory. On my desktop, this operation takes 17 seconds. The file load operation from disk takes 0.06 seconds.

Also, this only happens when doing this in PowerShell 7.4.2. Running this same test in a C# console application takes under 1 second.

console-app-test

from powershell.

chopinrlz avatar chopinrlz commented on May 20, 2024

In src/System.Management.Automation/engine/runtime/CompiledScriptBlock.cs, there is class SuspiciousContentChecker, which attempts to detect "suspicious strings" such as FromBase64String. That then apparently causes PowerShell to log some ETW event. I wonder if Windows Defender monitors those events and then spends time investigating the process.

I added a Program Setting for pwsh.exe into Exploit Protection under App & Browser Control and disabled all protections. Rerunning the test resulted in the same duration. 17 seconds and 3.4 GB of RAM usage. Is there another place I can check in Windows Security to prevent Defender from watching pwsh.exe?

from powershell.

KalleOlaviNiemitalo avatar KalleOlaviNiemitalo commented on May 20, 2024

The excess memory consumption of the pwsh process suggests that there is extra code running within the process; rather than antivirus software examining it from the outside (from another process or from a kernel-mode driver).

During the slow FromBase64String operation, is the pwsh process consuming a lot of processor time (one thread's worth)?

Can you attach to an unmanaged-code debugger to the process during FromBase64String and get a stack trace of the thread with the most CPU time? (!runaway, k, Thread Syntax)

from powershell.

chopinrlz avatar chopinrlz commented on May 20, 2024

Here is a stack trace of the thread with the most CPU time. I used WinDbg and broke process execution in the middle of the FromBase64String operation. You can see some fun stuff at the top.

0:009> ~21 k
 # Child-SP          RetAddr               Call Site
00 0000007b`a638e6d8 00007ffc`5b790a10     MPCLIENT!MpUpdateServicePingRpc+0x7b676
01 0000007b`a638e6e0 00007ffc`5b7cfde4     MPCLIENT!MpTelemetryUpdateUserConsent+0x190
02 0000007b`a638e720 00007ffc`5b7c5db0     MPCLIENT!MpConveyUserChoiceForSampleList+0x704
03 0000007b`a638e800 00007ffc`68253f92     MPCLIENT!MpAmsiNotify+0x140
04 0000007b`a638e8d0 00007ffc`682eb5fd     MpOav!DllRegisterServer+0x1142
05 0000007b`a638e930 00007ffc`682e81cb     amsi!CAmsiAntimalware::Notify+0xcd
06 0000007b`a638e9c0 00007ffb`95ec173b     amsi!AmsiNotifyOperation+0xab
07 0000007b`a638ea10 00007ffb`f1b36cfb     0x00007ffb`95ec173b
08 0000007b`a638eae0 00007ffb`f1acfc4b     System_Management_Automation!System.Management.Automation.AmsiUtils.WinReportContent+0xeb
09 0000007b`a638eb60 00007ffb`95964a7c     System_Management_Automation!System.Management.Automation.MemberInvocationLoggingOps.LogMemberInvocation+0x27b
0a 0000007b`a638ec90 00007ffb`fa945cf6     0x00007ffb`95964a7c
0b 0000007b`a638ecf0 00007ffb`f1f61b9f     System_Linq_Expressions!System.Dynamic.UpdateDelegates.UpdateAndExecute2<System.Type,object,object>+0x1f6 [/_/src/libraries/System.Linq.Expressions/src/System/Dynamic/UpdateDelegates.Generated.cs @ 268] 
0c 0000007b`a638ed80 00007ffb`f1bac64e     System_Management_Automation!System.Management.Automation.Interpreter.DynamicInstruction<System.Type,object,object>.Run+0xff
0d 0000007b`a638ee10 00007ffb`f1bac64e     System_Management_Automation!System.Management.Automation.Interpreter.EnterTryCatchFinallyInstruction.Run+0x7e
0e 0000007b`a638ee90 00007ffb`f1bb20d3     System_Management_Automation!System.Management.Automation.Interpreter.EnterTryCatchFinallyInstruction.Run+0x7e
0f 0000007b`a638ef10 00007ffb`f1fa3e06     System_Management_Automation!System.Management.Automation.Interpreter.Interpreter.Run+0x33
10 0000007b`a638ef60 00007ffb`f1ad841d     System_Management_Automation!System.Management.Automation.Interpreter.LightLambda.RunVoid1<System.Management.Automation.Language.FunctionContext>+0xc6
11 0000007b`a638efe0 00007ffb`f1ad7e0d     System_Management_Automation!System.Management.Automation.DlrScriptCommandProcessor.RunClause+0x28d
12 0000007b`a638f070 00007ffb`f19fff15     System_Management_Automation!System.Management.Automation.DlrScriptCommandProcessor.Complete+0x11d
13 0000007b`a638f0e0 00007ffb`f1cbd0ed     System_Management_Automation!System.Management.Automation.CommandProcessorBase.DoComplete+0x85
14 0000007b`a638f130 00007ffb`f1cbcdc9     System_Management_Automation!System.Management.Automation.Internal.PipelineProcessor.DoCompleteCore+0x9d
15 0000007b`a638f1b0 00007ffb`f1ac7eab     System_Management_Automation!System.Management.Automation.Internal.PipelineProcessor.SynchronousExecuteEnumerate+0xc9
16 0000007b`a638f230 00007ffb`9525b18e     System_Management_Automation!System.Management.Automation.PipelineOps.InvokePipeline+0x33b
17 0000007b`a638f2d0 00007ffb`f1bac64e     System_Management_Automation!System.Management.Automation.Interpreter.ActionCallInstruction<object,bool,System.Management.Automation.CommandParameterInternal[][],System.Management.Automation.Language.CommandBaseAst[],System.Management.Automation.CommandRedirection[][],System.Management.Automation.Language.FunctionContext>.Run+0x21e
18 0000007b`a638f380 00007ffb`f1bac64e     System_Management_Automation!System.Management.Automation.Interpreter.EnterTryCatchFinallyInstruction.Run+0x7e
19 0000007b`a638f400 00007ffb`f1bb20d3     System_Management_Automation!System.Management.Automation.Interpreter.EnterTryCatchFinallyInstruction.Run+0x7e
1a 0000007b`a638f480 00007ffb`f1fa3e06     System_Management_Automation!System.Management.Automation.Interpreter.Interpreter.Run+0x33
1b 0000007b`a638f4d0 00007ffb`f1ad841d     System_Management_Automation!System.Management.Automation.Interpreter.LightLambda.RunVoid1<System.Management.Automation.Language.FunctionContext>+0xc6
1c 0000007b`a638f550 00007ffb`f1ad7e0d     System_Management_Automation!System.Management.Automation.DlrScriptCommandProcessor.RunClause+0x28d
1d 0000007b`a638f5e0 00007ffb`f19fff15     System_Management_Automation!System.Management.Automation.DlrScriptCommandProcessor.Complete+0x11d
1e 0000007b`a638f650 00007ffb`f1cbd0ed     System_Management_Automation!System.Management.Automation.CommandProcessorBase.DoComplete+0x85
1f 0000007b`a638f6a0 00007ffb`f1cbcdc9     System_Management_Automation!System.Management.Automation.Internal.PipelineProcessor.DoCompleteCore+0x9d
20 0000007b`a638f720 00007ffb`f1bd1117     System_Management_Automation!System.Management.Automation.Internal.PipelineProcessor.SynchronousExecuteEnumerate+0xc9
21 0000007b`a638f7a0 00007ffb`f1bd1923     System_Management_Automation!System.Management.Automation.Runspaces.LocalPipeline.InvokeHelper+0x507
22 0000007b`a638f850 00007ffb`f1bd2a7f     System_Management_Automation!System.Management.Automation.Runspaces.LocalPipeline.InvokeThreadProc+0x113
23 0000007b`a638f8b0 00007ffb`f2c763cd     System_Management_Automation!System.Management.Automation.Runspaces.PipelineThread.WorkerProc+0x2f
24 0000007b`a638f8e0 00007ffb`f4d6b8d3     System_Private_CoreLib!System.Threading.ExecutionContext.RunInternal+0x7d [/_/src/libraries/System.Private.CoreLib/src/System/Threading/ExecutionContext.cs @ 179] 
25 0000007b`a638f950 00007ffb`f4c3ebac     coreclr!CallDescrWorkerInternal+0x83 [D:\a\_work\1\s\src\coreclr\vm\amd64\CallDescrWorkerAMD64.asm @ 100] 
26 0000007b`a638f990 00007ffb`f4d57b93     coreclr!DispatchCallSimple+0x60 [D:\a\_work\1\s\src\coreclr\vm\callhelpers.cpp @ 221] 
27 0000007b`a638fa20 00007ffb`f4cc4abd     coreclr!ThreadNative::KickOffThread_Worker+0x63 [D:\a\_work\1\s\src\coreclr\vm\comsynchronizable.cpp @ 158] 
28 (Inline Function) --------`--------     coreclr!ManagedThreadBase_DispatchInner+0xd [D:\a\_work\1\s\src\coreclr\vm\threads.cpp @ 7222] 
29 0000007b`a638fa80 00007ffb`f4cc49d3     coreclr!ManagedThreadBase_DispatchMiddle+0x85 [D:\a\_work\1\s\src\coreclr\vm\threads.cpp @ 7266] 
2a 0000007b`a638fb60 00007ffb`f4cc4b6e     coreclr!ManagedThreadBase_DispatchOuter+0xab [D:\a\_work\1\s\src\coreclr\vm\threads.cpp @ 7425] 
2b (Inline Function) --------`--------     coreclr!ManagedThreadBase_FullTransition+0x28 [D:\a\_work\1\s\src\coreclr\vm\threads.cpp @ 7470] 
2c (Inline Function) --------`--------     coreclr!ManagedThreadBase::KickOff+0x28 [D:\a\_work\1\s\src\coreclr\vm\threads.cpp @ 7505] 
2d 0000007b`a638fc00 00007ffc`738e257d     coreclr!ThreadNative::KickOffThread+0x7e [D:\a\_work\1\s\src\coreclr\vm\comsynchronizable.cpp @ 230] 
2e 0000007b`a638fc60 00007ffc`7508aa48     KERNEL32!BaseThreadInitThunk+0x1d
2f 0000007b`a638fc90 00000000`00000000     ntdll!RtlUserThreadStart+0x28

from powershell.

KalleOlaviNiemitalo avatar KalleOlaviNiemitalo commented on May 20, 2024

09 0000007ba638eb60 00007ffb95964a7c System_Management_Automation!System.Management.Automation.MemberInvocationLoggingOps.LogMemberInvocation+0x27b

Oh, LogMemberInvocation calls ArgumentToString here:

string value = ArgumentToString(args[i]);

So does that mean the multi-megabyte base64 string goes via Anti-Malware Scan Interface to Windows Defender…? I guess that would be a sensible design. And then perhaps the Defender implementation of AMSI makes a few more copies of the string.

This PowerShell code would apparently log the whole AMSI scan request to the console if you set __PSDumpAMSILogContent=1 in the environment before you start PowerShell.

Why does the AMSI scan take that long, though… does it do useful work all that time, or does it get stuck somehow and give up after a timeout? Perhaps you could try with files of different sizes, graph how the file size affects the FromBase64String duration. If the duration stays the same, then that suggests there is a timeout.

from powershell.

chopinrlz avatar chopinrlz commented on May 20, 2024

I ran a test using variable size byte arrays with random payloads starting at 2 MiB in size and going up to 116 MiB in size. You can see that the duration required is linear, and also extremely slow. It takes 2 seconds to convert 32 MiB back to a byte array from a base64 string.

data-size-chart

The same test conducted at 16 MiB intervals up to 256 MiB also shows a linear trend.

data-size-chart

One final test at 32 MiB intervals up to 384 MiB shows a linear trend as well suggesting that there may be no upper boundary or timeout no matter how much data you ask PowerShell to convert from base64.

data-size-chart

from powershell.

rhubarb-geek-nz avatar rhubarb-geek-nz commented on May 20, 2024

So does that mean the multi-megabyte base64 string goes via Anti-Malware Scan Interface to Windows Defender…? I guess that would be a sensible design.

Or perhaps you might be completely horrified by the idea of deep packet inspection of all arguments and no knowledge of whether that will be sent to 3rd parties. ( or any other party at all, to be honest )

from powershell.

KalleOlaviNiemitalo avatar KalleOlaviNiemitalo commented on May 20, 2024

I hoped the graph might show a lower boundary, because it could indicate a configuration error that could then be fixed to speed up the operation; for example, if the AMSI code running in-process were unable to contact the Defender service and spent some constant amount of time attempting that. Alas, the linear graph doesn't look like that's the case.

There may be ways to change the PowerShell script so that, even though it still triggers the suspicious content detector and causes an AMSI scan, the argument list being scanned does not include the base64 data and the scan finishes faster. But if such a workaround becomes commonly used, I suspect a future version of PowerShell will be changed to scan the data anyway.

from powershell.

KalleOlaviNiemitalo avatar KalleOlaviNiemitalo commented on May 20, 2024

AMSI logging of method invocations was added as an experimental feature in #16496 and changed to non-experimental in #18041. I'm not sure it even uses the suspicious content detector; perhaps the difference between ToBase64String and FromBase64String is that ArgumentToString does not format the elements of a byte[] argument for AMSI, but passes a string argument through.

A slowdown was previously reported in #19431.

from powershell.

rhubarb-geek-nz avatar rhubarb-geek-nz commented on May 20, 2024

I am not seeing similar times

$bytes = new-object byte[] -ArgumentList @(,200554320)
$random = new-object Random
$random.NextBytes($bytes)
$now = Get-Date
$base64 = [System.Convert]::ToBase64String( $bytes )
Write-Host " $(((Get-Date) - $now).TotalMilliseconds) ms"

Took 442.478 ms on a little Intel(R) Core(TM) i3-10100Y CPU @ 1.30GHz 1.61 GHz running Windows 11 Pro

from powershell.

rhubarb-geek-nz avatar rhubarb-geek-nz commented on May 20, 2024

AMSI logging of method invocations was added as an experimental feature in #16496 and changed to non-experimental in #18041.

Where can I find documentation on what this actually does? I am personally horrified by the idea that anyone thinks they have the rights to log data that was private to a process without their knowledge.

When I say POWERSHELL_TELEMETRY_OPTOUT=1 I mean it!

from powershell.

KalleOlaviNiemitalo avatar KalleOlaviNiemitalo commented on May 20, 2024

I am not seeing similar times

@rhubarb-geek-nz, your script uses ToBase64String, not FromBase64String.

Where can I find documentation on what this actually does?

The best may be the documentation of the PSAMSIMethodInvocationLogging experimental feature in this old version: https://github.com/MicrosoftDocs/PowerShell-Docs/blob/793ed5c687e6c7b64565d1751c532eb1d7d84209/reference/docs-conceptual/learn/experimental-features.md#psamsimethodinvocationlogging

The "How AMSI helps" link in that documentation doesn't work on GitHub; use https://learn.microsoft.com/windows/win32/amsi/how-amsi-helps instead.

AMSI doesn't necessarily involve telemetry that would send the data off the machine. I don't know whether Windows Defender has telemetry for AMSI scans.

GitHub
The official PowerShell documentation sources. Contribute to MicrosoftDocs/PowerShell-Docs development by creating an account on GitHub.
As an application developer, you can actively participate in malware defense. Specifically, you can help protect your customers from dynamic script-based malware, and from non-traditional avenues of cyber attack.

from powershell.

KalleOlaviNiemitalo avatar KalleOlaviNiemitalo commented on May 20, 2024

There may be ways to change the PowerShell script so that, even though it still triggers the suspicious content detector and causes an AMSI scan, the argument list being scanned does not include the base64 data and the scan finishes faster.

Because ArgumentToString does not recognise the char[] type and returns only the type name, I think a [System.Convert]::FromBase64CharArray call should be much faster for AMSI to scan than [System.Convert]::FromBase64String. But who knows how long that will remain so.

from powershell.

rhubarb-geek-nz avatar rhubarb-geek-nz commented on May 20, 2024

@rhubarb-geek-nz, your script uses ToBase64String, not FromBase64String.

Thanks for that., well spotted. I have updated as

$bytes = new-object byte[] -ArgumentList @(,200554320)
$random = new-object Random
$random.NextBytes($bytes)
$now = Get-Date
$base64 = [System.Convert]::ToBase64String( $bytes )
Write-Host " $(((Get-Date) - $now).TotalMilliseconds) ms"
$bytes = $null
$now = Get-Date
$bytes = [System.Convert]::FromBase64String( $base64 )
Write-Host " $(((Get-Date) - $now).TotalMilliseconds) ms"

Running on Windows 11 I now get

 480.4803 ms
 65573.3259 ms

but running on WSL Debian 12 on same machine

 556.7739 ms
 2534.7992 ms

So this is an issue only on Windows

from powershell.

iSazonov avatar iSazonov commented on May 20, 2024

@daxian-dbw @SteveL-MSFT Who could review the method for performance? There is an issue with large arguments. It seems it makes no sense to send full large file content to amsi.

internal static void LogMemberInvocation(string targetName, string name, object[] args)
{
try
{
var contentName = "PowerShellMemberInvocation";
var argsBuilder = new Text.StringBuilder();
for (int i = 0; i < args.Length; i++)
{
string value = ArgumentToString(args[i]);
if (i > 0)
{
argsBuilder.Append(", ");
}
argsBuilder.Append($"<{value}>");
}

from powershell.

rhubarb-geek-nz avatar rhubarb-geek-nz commented on May 20, 2024

@daxian-dbw @SteveL-MSFT Who could review the method for performance?

  1. It is using StringBuilder, so that is good

  2. ArgumentToString() does not do any escaping of the string values, so if the string contained >, < or similar then you have the equivalent of SQL injection,

3 Using ($"<{value}>"); and StringBuilder, choose one mechanism of the other. eg

argsBuilder.Append('<').Append(value).Append('>');
  1. StringBuilder has appenders for primitive types, so no need to convert those to a string and append when StringBuilder can append ints, longs and bools more efficiently.

  2. No exception is made for any cryptographic material or SecureString

from powershell.

KalleOlaviNiemitalo avatar KalleOlaviNiemitalo commented on May 20, 2024

the equivalent of SQL injection

Public AMSI documentation does not specify any particular format for the string. I expect that escaping would do more harm than good, as it could prevent the AMSI provider from recognizing substrings that are known to be used by malware.

The argsBuilder.Append($"<{value}>") call goes to StringBuilder.Append(StringBuilder.AppendInterpolatedStringHandler handler), which should be optimal already. Actual IL in LogMemberInvocation:

    IL_0032:  ldc.i4.2
    IL_0033:  ldc.i4.1
    IL_0034:  ldloc.s    V_6
    IL_0036:  newobj     instance void [System.Runtime]System.Text.StringBuilder/AppendInterpolatedStringHandler::.ctor(int32,
                                                                                                                        int32,
                                                                                                                        class [System.Runtime]System.Text.StringBuilder)
    IL_003b:  stloc.s    V_7
    IL_003d:  ldloca.s   V_7
    IL_003f:  ldstr      "<"
    IL_0044:  call       instance void [System.Runtime]System.Text.StringBuilder/AppendInterpolatedStringHandler::AppendLiteral(string)
    IL_0049:  ldloca.s   V_7
    IL_004b:  ldloc.s    V_5
    IL_004d:  call       instance void [System.Runtime]System.Text.StringBuilder/AppendInterpolatedStringHandler::AppendFormatted(string)
    IL_0052:  ldloca.s   V_7
    IL_0054:  ldstr      ">"
    IL_0059:  call       instance void [System.Runtime]System.Text.StringBuilder/AppendInterpolatedStringHandler::AppendLiteral(string)
    IL_005e:  ldloca.s   V_7
    IL_0060:  callvirt   instance class [System.Runtime]System.Text.StringBuilder [System.Runtime]System.Text.StringBuilder::Append(valuetype [System.Runtime]System.Text.StringBuilder/AppendInterpolatedStringHandler&)
    IL_0065:  pop

It might be possible to save a little by replacing the subsequent string content = $"<{targetName}>.{name}({argsBuilder})"; formatting with more appends to the same StringBuilder; but the stack trace showed the AMSI call so I think most of the time is spent there rather than in managed-code string formatting.

from powershell.

chopinrlz avatar chopinrlz commented on May 20, 2024

I noticed this processing delay with [System.Convert]::FromBase64String is also happening with several other CLR functions which fill and copy arrays, most notably [System.Random]::NextBytes and [System.Array]::Copy the first of which is an instance method on the System.Random class. Both methods appear to have a linear duration increase as the size of the byte[] increases.

I modified my test script to use [System.Convert]::FromBase64CharArray which runs substantially faster than FromBase64String and in the process I noticed the NextBytes and Array::Copy methods were taking an increasingly long time to run. Here is my updated test script which outputs the timing results to a CSV file which can be pulled into Power BI for reporting.

# Create a random number generator and results collection
$rand = [System.Random]::new()
$testResults = @()
$iterations = 4 * 1024
$step = 1 * 1024

# Fill a byte[] with random data for testing
[byte[]]$randomData = [System.Array]::CreateInstance( [byte[]], $iterations * $step )
$rand.NextBytes( $randomData )

# Run a test in $step KiB increments
1..$iterations | ForEach-Object {

    # Write a progress window
    Write-Progress -Activity "Testing Base64 Conversion" -Status "Iteration $_ of $iterations" -PercentComplete (($_ / $iterations) * 99.9)

    # Create a test result
    $dataLength = ($_ * $step)
    $result = [PSCustomObject]@{
        Length = $dataLength
        FillMs = 0
        ToBase64Ms = 0
        FromBase64Ms = 0
    }

    # Create an array of data
    $start = Get-Date
    [byte[]]$data = [System.Array]::CreateInstance( [byte[]], $dataLength )
    [System.Array]::Copy( $randomData, 0, $data, 0, $dataLength )
    $result.FillMs = ((Get-Date) - $start).TotalMilliseconds

    # Run conversion tests and track timing
    $start = Get-Date
    $base64 = [System.Convert]::ToBase64String( $data )
    $result.ToBase64Ms = ((Get-Date) - $start).TotalMilliseconds

    $start = Get-Date
    $chars = $base64.ToCharArray()
    $bytes = [System.Convert]::FromBase64CharArray( $chars, 0, $chars.Length )
    $result.FromBase64Ms = ((Get-Date) - $start).TotalMilliseconds

    # Record results for output and graphing
    $testResults += $result
}

# Output the results to CSV
$testResults | Export-Csv "base64-results.csv" -Force

from powershell.

rhubarb-geek-nz avatar rhubarb-geek-nz commented on May 20, 2024

but the stack trace showed the AMSI call so I think most of the time is spent there rather than in managed-code string formatting.

I don't think it is so much the matter of time but heap usage, the StringBuilder will have another copy of the base64 data, then the construction of content will do similar.

I suggest you can do it without formatting or StringBuilder and only allocate the memory for the final string once.

            var argsBuilder = new string[5+args.Length*2];
            int i = 0;
            argsBuilder[i++] = "<";
            argsBuilder[i++] = targetName;
            argsBuilder[i++] = ">.";
            argsBuilder[i++] = name;

            if (args.Length > 0)
            {
                foreach (var arg in args)
                {
                    argsBuilder[i++] = i==5 ? "(<" : ">, <";
                    argsBuilder[i++] = ArgumentToString(arg);
                }

                argsBuilder[i++] = ">)";
            }
            else
            {
                argsBuilder[i++] = "()";
            }

            string content = String.Join(null, argsBuilder);

Apart from the list of strings themselves in argsBuilder the final string is allocated only in String.Join() which determines the total length first.

from powershell.

rhubarb-geek-nz avatar rhubarb-geek-nz commented on May 20, 2024

Sure, AMSI is part of Windows API and doesn't exist in Linux.

Not quite. The logging is still done on Linux, whether AMSI exists or not. This was run on 7.4.2 on Linux

#!/usr/bin/env pwsh
$env:__PSDumpAMSILogContent='1'
$bytes = new-object byte[] -ArgumentList @(,200554320)
$random = new-object Random
$random.NextBytes($bytes)

Gives

=== Amsi notification report content ===
<System.Random>.NextBytes(<System.Byte[]>)
=== Amsi notification report success: False ===

from powershell.

rhubarb-geek-nz avatar rhubarb-geek-nz commented on May 20, 2024
  • $env:__PSDumpAMSILogContent='1' isn't effective in-session; the env. var. must be set before calling PowerShell.

Not actually true, it merely needs to be set before the first call to get DumpLogAMSIContent, the environment variable is accessed by a lazy load. Eg #21492

        private static readonly Lazy<bool> DumpLogAMSIContent = new Lazy<bool>(
            () => {
                object result = Environment.GetEnvironmentVariable("__PSDumpAMSILogContent");
                if (result != null && LanguagePrimitives.TryConvertTo(result, out int value))
                {
                    return value == 1;
                }
                return false;
            }
        );
  • $bytes = new-object byte[] -ArgumentList @(,200554320) can be simplified to
    $bytes = [byte[]]::new(200554320), which also avoids an (invisible) [psobject] wrapper that would cause an (unrelated) problem ....

Of course, how could we not have invisible, non-obvious problems in the simplest of code.

from powershell.

mklement0 avatar mklement0 commented on May 20, 2024

@rhubarb-geek-nz:

Not actually true, because for predictable diagnostic output you indeed do need the set the environment variable first, as evidenced by the following:

$null, 1 | % {
  Write-Host ---
  $env:__PSDumpAMSILogContent = $_
  pwsh -noprofile { [byte[]]::new(0) } 
}
$env:__PSDumpAMSILogContent = $null

Of course, how could we not have invisible, non-obvious problems in the simplest of code.

I assume this is pure sarcasm (which I do not endorse, but I empathize with the frustration I presume to underlie it); if there's an actual argument in there (beyond what #21496 expresses), please tell us.

from powershell.

rhubarb-geek-nz avatar rhubarb-geek-nz commented on May 20, 2024

Not actually true, because for predictable diagnostic output you indeed do need the set the environment variable first, as evidenced by the following:

The script was predictable because it had the "#!/usr/bin/env pwsh" at the start, the executable it bit set, and was designed to run directly from bash. It sets the environment variable after powershell has started but before the first reflection invocation.

if there's an actual argument in there, please tell us.

The frustration is because everytime you think you have found the solution with PowerShell, there is always another reason, case, exception or scenario where it breaks. As a user you don't have the tools to see all these problems because the very objects themselves play stupid games trying to pretend to be something they are not, or changing from what you thought it should have been. I can only assume I am not the target audience for this tool despite it supposedly being for system administrators, developers and IT professionals.

from powershell.

rhubarb-geek-nz avatar rhubarb-geek-nz commented on May 20, 2024

The script was predictable

No, it isn't predictable. My previous example stands. If the calling process doesn't have environment variable __PSDumpAMSILogContent already set before invocation, a call to [byte[]]::new(0) will not dump the diagnostic ASMI information, but it will do in the former case.

Yes, you are absolutely right. I wasn't predictable because you might have been using PowerShell as your default shell to launch scripts. Whereas all other UNIX shell scripts start a new script with executable bit set in a new process, we are talking about PowerShell here. Sigh.

Perhaps a recommendation for running test scripts is "In a new process....", not "In whatever process with whatever indeterminate state you happen to have....."

from powershell.

mklement0 avatar mklement0 commented on May 20, 2024

I wasn't predictable because you might have been using PowerShell as your default shell

I was, but that is irrelevant: the only thing that matters is whether the calling process had a __PSDumpAMSILogContent variable defined or not, so this equally applies to POSIX-compatible shells.
See below.

from powershell.

mklement0 avatar mklement0 commented on May 20, 2024

P.S.: @rhubarb-geek-nz:

  • I haven't looked into why the apparent attempt at honoring an in-process definition of the variable (private static readonly Lazy<bool> DumpLogAMSIContent) doesn't work reliably.

    • Update: The reason is that the very first .NET method call from PowerShell code in a session locks in the value of DumpLogAMSIContent.Value based on whether env. var. __PSDumpAMSILogContent is defined (and set to 1) then. In an interactive session, it is invariably the PSReadLine module that is the first to make such a call ([Microsoft.PowerShell.PSConsoleReadLine]::ReadLine()), so that subsequent in-process attempts to set __PSDumpAMSILogContent are ineffective.
  • However, generally speaking, in cases where PowerShell honors environment variables, they are expected to be set before PowerShell is launched.

    • Update: There is at least one exception: $env:PSModulePath is honored dynamically, on every access; the lazy once-per session initialization of DumpLogAMSIContent is an unfortunate hybrid between static and dynamic behavior.

from powershell.

rhubarb-geek-nz avatar rhubarb-geek-nz commented on May 20, 2024

the only thing that matters is whether the calling process had a __PSDumpAMSILogContent variable defined or not, so this equally applies to POSIX-compatible shells.

I beg to offer a different opinion...

$ ls -ld new.ps1
-rwxr-xr-x 1 me users 139 Apr 18 00:57 new.ps1
$ cat new.ps1
#!/usr/bin/env pwsh
$env:__PSDumpAMSILogContent='1'
$bytes1024 = new-object byte[] -ArgumentList @(,1024)
$bytes2048 = [byte[]]::new(2048)

Scenario A - The environment variable is not set in the calling process

$ echo $__PSDumpAMSILogContent

$ ./new.ps1

=== Amsi notification report content ===
<System.Byte[]>.new(<2048>)
=== Amsi notification report success: False ===

Scenario B - it is set to 0 in the calling process

$ __PSDumpAMSILogContent=0
$ echo $__PSDumpAMSILogContent
0
$ ./new.ps1

=== Amsi notification report content ===
<System.Byte[]>.new(<2048>)
=== Amsi notification report success: False ===

from powershell.

rhubarb-geek-nz avatar rhubarb-geek-nz commented on May 20, 2024

Rather than making a cmdlet for every .NET method you wish to call, you can simply put reflection in a single cmdlet.

$bytes = [byte[]]@(1,2,3)
$base64 = [string](Invoke-Reflection -Method ToBase64String -Type ([System.Convert]) -ArgumentList @(,$bytes))
Invoke-Reflection -Method FromBase64String -Type ([System.Convert]) -ArgumentList @(,$base64) | Format-Hex

from powershell.

rhubarb-geek-nz avatar rhubarb-geek-nz commented on May 20, 2024

and using a filename extension with an executable shell script is generally ill-advised.

Really? That is one I have not heard of.... eg

$ ls -ld *.sh
-rwxr-xr-x 1 github users  772 May 18  2023 debug.sh
-rw-r--r-- 1 github users  107 May 18  2023 download.sh
-rwxr-xr-x 1 github users 2094 May 18  2023 generate-icns.sh
-rwxr-xr-x 1 github users 7418 May 18  2023 install-powershell.sh
-rwxr-xr-x 1 github users 7307 May 18  2023 installpsh-amazonlinux.sh
-rwxr-xr-x 1 github users 9229 May 18  2023 installpsh-debian.sh
-rwxr-xr-x 1 github users 7791 May 18  2023 installpsh-gentoo.sh
-rw-r--r-- 1 github users 7533 May 18  2023 installpsh-mariner.sh
-rwxr-xr-x 1 github users 6483 May 18  2023 installpsh-osx.sh
-rwxr-xr-x 1 github users 6425 May 18  2023 installpsh-redhat.sh
-rwxr-xr-x 1 github users 9081 May 18  2023 installpsh-suse.sh

If you mean executable PowerShell scripts without the ps1 extension, we know how that ends up.

from powershell.

rhubarb-geek-nz avatar rhubarb-geek-nz commented on May 20, 2024

Another alternative is to do the reflection directly in PowerShell itself

ToBase64String is

$method = ([System.Convert]).GetMethod('ToBase64String',[type[]]@(,([byte[]])))

$base64 = [string]($method.Invoke($null,@(,$bytes)))

FromBase64String is

$method = ([System.Convert]).GetMethod('FromBase64String',[type[]]@(,([string])))

$bytes = $method.Invoke($null,@(,$base64))

Then the AMSI logging just looks like

=== Amsi notification report content ===
<System.Random>.NextBytes(<System.Byte[]>)
=== Amsi notification report success: False ===

=== Amsi notification report content ===
<System.RuntimeType>.GetMethod(<ToBase64String>, <System.Type[]>)
=== Amsi notification report success: False ===

=== Amsi notification report content ===
<System.Reflection.RuntimeMethodInfo>.Invoke(<null>, <System.Object[]>)
=== Amsi notification report success: False ===

=== Amsi notification report content ===
<System.RuntimeType>.GetMethod(<FromBase64String>, <System.Type[]>)
=== Amsi notification report success: False ===

=== Amsi notification report content ===
<System.Reflection.RuntimeMethodInfo>.Invoke(<null>, <System.Object[]>)
=== Amsi notification report success: False ===

Where the arguments are not dumped because all it prints is System.Object[]

from powershell.

mklement0 avatar mklement0 commented on May 20, 2024

Really? That is one I have not heard of...

Unfortunately, many ill-advised practices are common.
An executable shell script (using a shebang line) is an executable like any other, and there is no benefit to signaling to a caller that a given executable happens to be a shell script, which is (a) an implementation detail and (b) may lead users to believe that sh <script>.sh should be used for invocation, which can fail if the script uses Bashisms, for instance.

With PowerShell, specifically, things get tricky (leaving the bug you mention aside), because, unlike analogous shell scripts for POSIX-compatible shells, an executable, shebang line-based .ps1 file is still executed in-process, with the potential to alter the session state. An executable, shebang line-based .ps1 file must therefore be designed with this in mind.

One without this extension consistently runs in a child process - albeit more slowly and at the expense of not having rich type support in the in- and output and the inability to pass array arguments and arguments that have no string-literal representations - but a PowerShell script that is designed to (also) run as a standalone executable should not rely on these features anyway.

I presume it is the latter limitations that explain why - at least in my perception - shebang line-based PowerShell scripts haven't really caught on and why bugs such as #21402 are still not fixed.

from powershell.

rhubarb-geek-nz avatar rhubarb-geek-nz commented on May 20, 2024

Really? That is one I have not heard of...

Unfortunately, many ill-advised practices are common.

It depends on the context. If you mean a program that is found on via the PATH then I might agree, but in general when you are managing large numbers of scripts to perform tasks then keeping the .sh extension is very useful. UNIX exec() does not care about file extensions for executables, the concept of file extensions does not exist within the POSIX C API. You are free to name executable files how you like. One major advantage of maintaining the .sh extension is when you manage them in a source code repository and you are storing text, not a compiled binary. Keeping the extension makes that absolutely obvious.

It is Window'isms that step through extensions (com, bat, exe, cmd) while looking for commands on the path or local directory, and similarly PowerShell does the same and will try and append .ps1 to try and look for a command.

from powershell.

mklement0 avatar mklement0 commented on May 20, 2024

@rhubarb-geek-nz , we're getting far afield, but let me attempt a summary of the issue at hand first, which implies that there's likely nothing actionable here:

  • I presume that there's no actual memory leak here, only a memory "grab" by the CLR that isn't released, at least not instantly (perhaps on demand?).

  • The behavior is currently by design, and the only pathological case is an attempt to pass a very large string as an argument to a .NET method. Workarounds have been offered:

    • Per @KalleOlaviNiemitalo's comment, using [System.Convert]::FromBase64CharArray() bypasses the problem.

    • Per your own comment, reflection can be used to bypass the problem.

  • A fundamental solution would be to allow opt-out of AMSI calls - with obvious security implications - which you've asked for in #21491

  • Also, given the currently unnecessary overhead on Unix-like platforms - where no AMSI equivalent exists - runtime performance could be improved: #21492 (comment)


Returning to the tangent:

If you mean a program that is found on via the PATH then I might agree

When it comes to naming a stand-alone executable, it seems to me that the end-user experience should be the driver, trumping any design-time / implementation considerations:

  • On Unix-like platforms, this means: Do not use filename extensions when naming such executables.

  • On Windows, this means: Given that .ps1 files aren't directly executable from outside PowerShell, create companion .cmd files that are - both with and without specifying .cmd - using @"%~dpn0.ps1" %*

from powershell.

KalleOlaviNiemitalo avatar KalleOlaviNiemitalo commented on May 20, 2024

Per your own comment, reflection can be used to bypass the problem.

I'm half expecting you to make PowerShell recognise MethodInfo.Invoke calls and log each element of object?[]? parameters to AMSI as if the method had been called directly.

from powershell.

mklement0 avatar mklement0 commented on May 20, 2024

@KalleOlaviNiemitalo, fair point: Both of the aforementioned workarounds amount to bypassing the intended AMSI calls - I merely summarized them, speaking as someone who's neither a security expert nor speaking in any official capacity.

from powershell.

rhubarb-geek-nz avatar rhubarb-geek-nz commented on May 20, 2024

Let's go back to the original problem.

Reading into memory and converting to base64 then converting back should require about 790 MB of RAM with all variables remaining in scope during the process and no garbage collection happening or object disposal happening. The observed behavior appears to be memory-leak related as the amount of memory used once the conversion eventually completes is about 3.4 GB of RAM.

Since the early days of computers we have been able to deal with files larger than the available memory of the computer. This is still the case.

The first thing to realise is

(a) PowerShell is not a UNIX shell and it is really really bad at dealing with streams of bytes. That is not a problem of the PowerShell engine itself, but the existing cmdlets, scripts, patterns and expectations. PowerShell deals with pipelines of typed objects, not text or byte streams.

(b) UNIX does this kind of thing in its sleep, literally. A pipe is a byte stream first and foremost. Deciding to treat it as text is an afterthought.

So if we were doing this in UNIX we would simply do

$ openssl base64 < file.in | openssl base64 -d > file.out

The file went through the memory as it was being processed and then out to the final file.

Now let's do the same thing with PowerShell, $file is the sdk exe, $copy is a 2nd copy we are making

Split-Content -LiteralPath $file -AsByteStream | ConvertTo-Base64 | ConvertFrom-Base64 | Set-Content -LiteralPath $copy -AsByteStream

When you put that pipeline together it takes only about 50MB working set in order to process dotnet-sdk-8.0.204-win-x64.exe and write a copy of the output.

Validate it and compare with the SHA512 from the original download site

Get-FileHash -LiteralPath $file,$copy -Algorithm SHA512

So how does that work?

Split-Content reads a file and writes arrays of 4096 bytes to the success pipeline

ConvertTo-Base64 reads the byte arrays and writes out lines of Base64 encoding of just 64 characters each, same as openssl base64.

ConvertFrom-Base64 reads the strings and converts them to byte arrays.

Set-Content writes the bytes arrays to the final file.

It only took about 27MB to read, encode the decode the base64, without writing to a file.

$total = 0
	
Split-Content -LiteralPath $file -AsByteStream | ConvertTo-Base64 | ConvertFrom-Base64 | ForEach-Object { $total += $_.Length }

$total

So from 3.4GB to 27MB with no change to PowerShell itself is not a bad effort.

It was a trade-off of space versus time. It takes about 7 seconds or so to run the read, encode and decode pipeline.

from powershell.

mklement0 avatar mklement0 commented on May 20, 2024

Yes, prior to PS 7.4 raw byte handling in pipelines wasn't supported, but in 7.4+ it now is, between external (native) programs, so the following works as intended from PowerShell (also on Windows, if you install openssl.exe there); note the use of -in to specify the input file:

# OK in PS 7.4+
openssl base64 -in file.in | openssl base64 -d > file.out

I haven't looked into the implementation, but I assume (hope) that on Unix-like platforms the usual system-level data buffering applies, which is 64KB these days.

While < isn't available in PowerShell to byte-stream data to an external program, in v7.4+ you can feed [byte] or - much more efficiently - [byte[]] data output from PowerShell commands to external programs:

The - slow - solution is therefore (byte-by-byte processing on the PowerShell side):

Get-Content file.in -AsByteStream | openssl base64 | openssl base64 -d > file.out

The - much faster - solution, which, however, reads the input file in full, due to -Raw:

Get-Content file.in -Raw -AsByteStream |  openssl base64 | openssl base64 -d > file.out

The - more memory-efficient - solution that emulates Unix pipeline buffering is:

Get-Content file.in -ReadCount 64kb -AsByteStream | 
  % { , [byte[]] $_ } |
  openssl base64 | openssl base64 -d > file.out

Note the - unfortunate in terms of both verbosity and performance - need for an intermediate % (ForEach-Object) call that strongly types the [object[]]-typed arrays that -ReadCount produces as [byte[]], as that is the prerequisite for sending raw byte data to an external program.

Arguably, Get-Content's -ReadCount parameter should instead output:

  • [string[]] arrays by default

  • [byte[]] arrays in combination with -AsByteStream

This would obviate the need for the inefficient and awkward ForEach-Object helper call.

from powershell.

rhubarb-geek-nz avatar rhubarb-geek-nz commented on May 20, 2024

The - more memory-efficient - solution that emulates Unix pipeline buffering is:

Get-Content file.in -ReadCount 64kb -AsByteStream | 
  % { , [byte[]] $_ } 

I did not have much success with Get-Content with ReadCount even in binary mode, I did not think of the array conversion in a ForEach-Object.

Hence I wrote the Split-Content which reads directly into a byte array and put that straight in the output pipeline. No need to convert any arrays.

I am not convinced that large buffers like 64K help in the PowerShell pipeline, because it has to fill the entire 64KB first before it passes onto the pipeline. The buffering in UNIX works the other way round, things can keep writing until the pipe buffer is full then they block until the reader has made some room.

A UNIX pipeline has a record size of 1. The PowerShell pipeline above has a record size of 64K, so nothing can move until the record is full. In UNIX if a network stream is slow then even the few hundred bytes at a time would still dribble through.

It would certainly be better if Get-Content always wrote AsByteStream as a byte array but I think it is too late to change that.

from powershell.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.