Git Product home page Git Product logo

Comments (9)

rhubarb-geek-nz avatar rhubarb-geek-nz commented on June 11, 2024

It is interesting that you are using an HTTP response as your example, In HTTP protocol you can have multiple entries for the same header which would be incompatible with putting in a hash-table because you can't have duplicate keys.

(Invoke-WebRequest -Uri https://github.com/favicon.ico -Method HEAD).Headers

Key                 Value
---                 -----
Server              {GitHub.com}
Date                {Mon, 13 May 2024 01:49:48 GMT}
ETag                {W/"664165de-1976"}
Cache-Control       {max-age=315360000}
Vary                {Accept-Encoding, Accept, X-Requested-With}
X-Frame-Options     {DENY}
Set-Cookie          {_gh_sess=A3nCvWqXhb%2BiY8%2FYndCFdikMaBkl%2B6w1qj33QJ5f6nbd1q2dwHKyzjr7KXGsLjQZvC13xyHbpptRSW38GAb%2BcUfHCxsEBHUUSqSjxcBhgEDuYCO8XsQqDmJdtg%2BNaypPv…
Accept-Ranges       {bytes}
X-GitHub-Request-Id {E8AC:A2755:4F712F9:5686B67:66417331}
Content-Type        {image/x-icon}
Last-Modified       {Mon, 13 May 2024 00:59:10 GMT}
Expires             {Thu, 11 May 2034 01:49:48 GMT}
Content-Length      {6518}

The http response effectively treats header values as dictionary of list of strings.

from powershell.

CrendKing avatar CrendKing commented on June 11, 2024

It is just an example. The point is that there are data that mostly conforms to the <key> <delimiter> <value> format with slight deviations, and I want ConvertFrom-StringData to be able to return the "mostly conformant" parts instead of throwing outright.

from powershell.

mklement0 avatar mklement0 commented on June 11, 2024

@CrendKing:

Pragmatically speaking, you could simply filter out lines that aren't of interest along the lines of:

curl --silent --head https://www.example.com | 
  Select-String -Raw : |
  ConvertFrom-StringData -Delimiter :

However, note that the above creates a hashtable for each and every line (of interest) output by curl, and that, in general, stdout output from external (native) programs is streamed line by line to PowerShell's pipeline.

As a result, the above outputs multiple, single-entry hashtables rather than a single hashtable.

To create a single hashtable, you'd have to provide all input lines as a single string, which in the simplest case means piping to Out-String:

curl --silent --head https://www.example.com | 
  Select-String -Raw : |
  Out-String |
  ConvertFrom-StringData -Delimiter :

However, duplicate keys - which, as @rhubarb-geek-nz points out, are a possibility in HTTP headers - then would present a problem - justifiably.

However, you did uncover what to me appears to be a genuine bug (since reported in #23793):

With multiple, individual strings as input, ConvertFrom-StringData mistakenly aborts processing overall, via a statement-terminating error, once an incorrectly formatted string is encountered; the expected behavior would be to merely emit a non-terminating error in that event, and continue processing further inputs:

'foo=bar', 'broken', 'bar=baz' | ConvertFrom-StringData

Currently, the output is:

Name                           Value
----                           -----
foo                            bar
ConvertFrom-StringData: Data line 'broken' is not in 'name=value' format.

That is, overall processing was unexpectedly aborted when the broken input string was encountered.

The output should be:

Name                           Value
----                           -----
foo                            bar
ConvertFrom-StringData: Data line 'broken' is not in 'name=value' format.
bar                            baz

That is, processing should have continued after the incorrectly formatted input string (resulting in two hashtables overall).

Also, 2>$null or -ErrorAction SilentlyContinue / -ErrorAction Ignore should be effective in silencing / ignoring any input-string-specific error messages, which currently does not work, because the error is unexpectedly statement-terminating rather than non-terminating.

from powershell.

CrendKing avatar CrendKing commented on June 11, 2024

Thanks. Please be understanding that I only used curl and HTTP as an example because it is common. If it is causing confusion, I can replace the example with things like output from ffmpeg/ffprobe. Just let me know.

The 'foo=bar', 'broken', 'bar=baz' | ConvertFrom-StringData case is different to my one string use case, but the root cause seems to be the same. If some -ErrorAction would make it continue, that'd be great.

Pragmatically speaking, you could simply filter out lines that aren't of interest along the lines of:

Yes, I could. But why encourage the users to come up with case-by-case filtering that not only spend CPU time to take out data we don't need to begin with at all, while the proposed update to ConvertFrom-StringData achieves exact the same without sacrifice anything? What is the benefit of the current ConvertFrom-StringData behavior?

from powershell.

rhubarb-geek-nz avatar rhubarb-geek-nz commented on June 11, 2024

What is the benefit of the current ConvertFrom-StringData behavior?

Strictly enforcing data formats improves data quality throughout a system. If you have a relaxed system that accepts anything but ignores it all then you will have trouble later on finding out why none of your values made it though the system.
A good recent example is ConvertFrom-JSON which previously accepted, without error, data that basically was not JSON, it allowed C++ comments, single quotes instead of double quotes etc, So while your parser may interpret the data you would have rude shock when you pass the data to a 3rd party RESTful service and it rejects it outright as not being JSON.

Stick to the "do one thing well" and have ConvertFrom-StringData do the name value conversions and leave filtering to other components that are better suited and can be customised.

from powershell.

CrendKing avatar CrendKing commented on June 11, 2024

I agree the default behavior should be strict. I'm talking about adding an option or -ErrorAction to relax it. It is a conscious action from user to use that option, they already know the potential of incomplete conversion.

Another example. Invoke-WebRequest by default throws exception if the response HTTP code is error (400-500). That leaves no room to check the header or content of the response. Fortunately there is -SkipHttpErrorCheck. And when we use that option, we know we should always check for the HTTP code. Imagine if such option never existed.

from powershell.

mklement0 avatar mklement0 commented on June 11, 2024

@CrendKing, note that even if the true bug gets fixed (which I've since reported separately, in #23793), it won't help your use case:

  • While you will then be able to use, say, -ErrorAction Ignore to ignore the errors from individual input strings, you'll still get a separate, single-entry output hashtable for each (well-formed) input string.

  • However, if you pass a single, multi-line string (e.g. curl --silent --head https://www.example.com | Out-String) as the input, that string as a whole will result in a non-terminating error, and no hashtable will be constructed.

So, to spell it out, what you're asking for would require modifying ConvertFrom-StringData to emit non-terminating errors for individual lines in a given, single, multi-line string and to still emit a hashtable if at least one well-formed line was found in the input string (I now see that you're already pointing to the relevant source-code location in your original post).

On a related note, the current behavior of emitting a separate hashtable for each input string is awkward, especially when piping an external program's output directly to ConvertFrom-StringData, which happens line by line (as in your example; with input from a file, Get-Content -Raw is the solution).
It would be much more useful to collect all input strings first, and then emit a single hashtable for all of them (the way ConvertFrom-Json does, for instance).


Both changes discussed are technically breaking ones, though the former is more likely to fall into bucket 3, whereas I think the latter, if it gets implemented, would require opt-in in the form of a new parameter to avoid breaking existing code.

from powershell.

CrendKing avatar CrendKing commented on June 11, 2024

Thanks. Like you said, my original request is to make the function return non-empty Hashtable from single string input when some line can't be converted. I even showed a workaround that break the single string input into lines, try-catch convert each line and merge all single-entry Hashtables into one. It is doable but very awkward.

Also, if this new behavior is protected by a NEW OPTION, for instance ConvertFrom-StringData -SkipErrorLines, I don't think it is backward incompatible change. On the other hand, make ConvertFrom-StringData -ErrorAction Ignore ignore lines or input entry is backward incompatible, even though I believe the current behavior is strict up wrong, since it is not ignoring error at all.

from powershell.

mklement0 avatar mklement0 commented on June 11, 2024

ConvertFrom-StringData -ErrorAction Ignore ignore lines or input entry is backward incompatible

Well, it is backward-incompatible in the sense that any bug fix is technically a breaking change.
The symptom is a side effect of the cmdlet (in effect) creating a statement-terminating error, to which -ErrorAction by (current) design does not apply (see #14819); fixing #23793 will make this symptom go away.

That said, even though it would clearly be a bug fix, it has the potential to break existing code, given that using a try statement would no longer be effective for catching an error - unless -ErrorAction Stop is also used.

a NEW OPTION

Good point, introducing a new switch is a way to avoid backward-compatibility problems, but perhaps the need for that can be avoided if the suggested change is deemed to be a bucket-3 change - but someone would have to do some digging (e.g. analyzing code here on GitHub) to ascertain that.

That said, if fixing #23793 is decided against, a new switch is unavoidable.

I even showed a workaround that break the single string input into lines

Yes, though note that your workaround wouldn't work with curl --silent --head https://www.example.com as input, as each output line would be received separately in the process block; I know it's just a sample command, but the point is that it the problem applies to any external-program call.

To avoid a need for Out-String or similar, you'd have to collect all input strings in the process block (e.g. by building up a multiline string), and process the collected lines in the end block.
This relates to additional change I suggested: to make ConvertFrom-StringData perform this collecting itself, though that most likely requires a new switch.

from powershell.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.