Git Product home page Git Product logo

Comments (35)

bar-g avatar bar-g commented on August 26, 2024

Hm, printf is supposed to interpret the format string itself.

But echo actually does seem to print it only verbatim in ordinary shells:

~ $ echo "'\n"
'\n
~ $ 

ysh ysh-0.19.0$ echo "'\n"

  echo "'\n"
         ^
[ interactive ]:674: Invalid char escape in double quoted string

So, at the moment I don't know, why is it invalid?
EDIT: Invalid in ysh:all, since there is no echo -e anyway.

from oil.

bar-g avatar bar-g commented on August 26, 2024

Seems analog to #1772.

So? Have some strict_ option that gets enabled with ysh:upgrade that triggers an error upon using the feature, with the error message giving the reason and mentioning the more consistent solution.

But with ysh:all, disable the strict_ option together with the corresponding special parse_ option to just follow the new standard behavior without triggering any error, by default?

Manual enabling of the strict_ option enables error messages, allowing to find voidly usage of disabled syntax (parse_ option enabled just internally also, automatically)?

from oil.

andychu avatar andychu commented on August 26, 2024

Hm can you post the whole code snippet? We have an idea to fold printf into double quoted strings, like "number = ${x %2f}", which will make it obsolete

The "correct" thing is to do

printf  "'\\n"

Because \\ is a backslash inside double quotes, and then printf is another dynamically parsed language

We want to move away from both printf and echo -e because they're dynamic, not static. They are separate languages from shell


So this error is intended but there should be a nicer way to write the code

from oil.

bar-g avatar bar-g commented on August 26, 2024

To give some context, what I'm currently testing (and filing faced issues about) is combining "fully-featured" ysh code and compatible shell code. I'm impressed how well it's going, and find it really useful, e.g. to include same shell code on non-ysh router and on ysh desktop.


I kind of have an answer, but if I could better understand your reason/intention for breaking double-quotes, as it seems, it could become easier/clearer or even much revised. Could you please add some reasoning?

(I mean concerning the "Invalid char escape in double quoted string". I guess printf is a separate thing.)

from oil.

bar-g avatar bar-g commented on August 26, 2024

Which shopt is this actually?

from oil.

andychu avatar andychu commented on August 26, 2024

It should be parse_backslash, so yeah you can simply turn that on you don't want it

$ bin/ysh  -c 'echo "\n"'
  echo "\n"
        ^
[ -c flag ]:1: Invalid char escape in double quoted string
$ bin/ysh -o parse_backslash -c 'echo "\n"'
\n

I thought I called it strict_backslash, but it's parse_backslash I guess because it's a parsing option. And it's on in OSH but off in YSH, so you have to turn it back on.

shopt -p prints the current option settings


The best list of options is here (I will update this topic slightly)

https://www.oilshell.org/release/0.19.0/doc/ref/toc-ysh.html#option

from oil.

andychu avatar andychu commented on August 26, 2024

The basic reason is that to be very clear about what code means

This is confusing, there should only be one way to write the string (the first way)

$ echo "\\z"
\z
$ echo "\z"
\z
$ echo '\z'
\z

There are too many ways to quote things in shell, so we reduce them.

The second one is invalid, only 1 and 3 are OK.

from oil.

andychu avatar andychu commented on August 26, 2024

I wonder if it's less confusing to say parse_strict_backslash and parse_strict_backticks

Maybe too late for that though

from oil.

bar-g avatar bar-g commented on August 26, 2024

Thanks! The background is helping a lot.

First, the generic thing.

Concerning confusing shopt names, I don't think is too late. It's always good to clear up confusion from previous iterations, even if there were thousands or much much more users already "add-deprecate-(keep/warn/remove)". It'll help the understanding for them and foremost for each and every one still to come.

I came to realize good strict_* and parse_* option patterns between the elevated error/warning feedback "in the middle" of osh/ysh and both sides, can clear up a lot.

  • compatible-osh
  • strict:all-osh (many errors or warnings)
  • ysh:upgrade (some errors and hints)
  • ysh:all

See the case around echo, it has shown that confusion can be much reduced by having corresponding strict_* and parse_* options, to control error/warnings and behavior separately (#1772).


[EDIT: Moved described solution into issue description.]

from oil.

andychu avatar andychu commented on August 26, 2024

Those rules are too complicated!

And sorry I meant there is only

  • one way to write the string with double quotes
  • one way to write it with single quotes

Not 2 ways with double quotes.


I think the real problem is that we need to fold printf into double quoted strings "", and "deprecate" printf

So that you never need to use printf in YSH

You should use either OSH and printf, or YSH and new style


I appreciate the transition is not as easy as it could be -- though I also am wary of adding a whole intermediate state

Basically I think if you start out with

printf 'hi %2f\n' "$x"

then in YSH you just change it to

echo "hi ${x %2f}"   # not implemented yet

That's not a trivial jump but the end result is cleaner

It would be nice if there was a way to auto-rewrite it, but that seems beyond the scope of the project now. We are going to export the Oils lossless syntax tree (AST) -- so other people could do this, as tools of their own

from oil.

bar-g avatar bar-g commented on August 26, 2024

Maybe take a bit more time to think it through.

I understand that in the parser the proposed rules may be a bit more complicated to implement.

However when writing a double-quoted string text, the adding of the escape character only at the places where needed, and it having a special meaning only at those places (and none elsewhere), is what really works nicely and naturally.

Compare that to having to escape every backslash (\\) when writing a double-quoted string to avoid ysh errors, and multiple backslashes in a row silently dropping unprinted characters (shell) or producing errors on every other odd.

Consider the minimized difficulties for copy pasting arbitrary text into a double-quoted string!


I don't understand, you mean one same way to write double- and single-quoted strings?

I think the strict_backslash_escape I propose would produce mostly the same way in ysh as in osh. However, because the escape character is only an escape character at places where there actually is something to escape, all ambiguity of nonsensical character escaping is removed.

So, YSH's double-quotes would be more compatible with generic text. Ordinary shells require fiddling with multiple backslashes all over the place, not just for $, ", and " at the end.


I think "printf-style interpolation" would be another really nice ysh feature. But it was just unfortunate that I stumbled upon the broken double-quotes in ysh while using printf. (Already changed the issue title to using true.)

Breaking regular backslashes in double-quotes has much wider effects: string variable assignments, specifying command arguments, ... .


You should use either OSH and printf, or YSH and new style

I'm using shell scripts and YSH scripts, calling and sourcing each other. And I'm sure there are many sites that will also continue to be working with shell scripts for some time to come.

And being able to use the same double-quoted string format (except edge cases) in both areas (compatible shell and ysh) seems quite crucial.

How do you picture pasting strings and texts to $interpolate between scripts, and from outside material?

from oil.

andychu avatar andychu commented on August 26, 2024

I don't think sourcing OSH and YSH from each other is a good idea

I would limit it to invoking them as subprocesses. e.g. ./foo.ysh rather than source foo.ysh

Then you get a fresh set of options each time

...

I don't see the problem, because you can simply double up the escapes.

If you really want to use double quoted strings, then you write "\\n" and not "\n". I wouldn't write it that way, but it works.

Or you can turn off the shopt. So there are 2 good solutions


BTW I'm thinking of changing it so all the options go from OFF -> ON from OSH -> YSH. I think the way it is is a bit confusing.

from oil.

bar-g avatar bar-g commented on August 26, 2024

Hi, yeah, I think whether OFF->ON is good depends on the option name, though.

If its as clear like echo_flags it seems completely natural to turn it off for ysh. While it seems natural that strict_* options get turned on.

For the parse_* options maybe it could work out if you remove the parse_ prefix, and maybe add some more context instead?

Kind of shows the problem of this issue: A general shopt --unset backslashes makes it obvious that this option is disabling too much!


I don't see the problem, because you can simply double up the escapes.

Nah, that is a problem. That's a tedious unnecessary chore and breaks compatibility where there is no need to.


See, I found the original table from when you implemented parse_backslash, and I've now spend some time on working this out a bit better. Please open the updated issue description, it now even has a table comparison, and much better structured examples.

from oil.

andychu avatar andychu commented on August 26, 2024

Yeah I looked at the table, it's changing the meaning of existing shell code too much.

What we are doing in YSH is (optionally) removing some ways of expressing the same thing, because it's too much freedom -- shell is too noisy and accepts too much

Removing is better than changing, because it's optional. Changing is also bad because it's not always obvious if you're looking at OSH or YSH code -- you shouldn't have to consult the shebang -- you should be able to read "locally"


According to our https://github.com/oilshell/oil/wiki/Language-Design-Principles we want to avoid "taking over" existing syntax

(The two major exceptions being word splitting "silently" changing, and echo -e silently changing -- I view those as fixing existing bugs.)

But here we are not changing shell, so much as taming the possibilities, making it easier to read code.

Users aren't going to be able to remember what's in your table -- it has to be smaller than that.


Also the proposal WIDELY diverges from what \ already means in every language. Python and JavaScript both use the same rules for \.

In Python and JS, '\\' is a single backslash. And in C. It's very well accepted


BTW $'\\' is now legacy -- in Oils 0.20.0 we will have "J8 strings" which look like u'\\' and b'\\'. They can express arbitrary byte strings and nice unicode escapes.

from oil.

bar-g avatar bar-g commented on August 26, 2024

Hm, ok, how suboptimal can changing the existing syntax (actually improving verbatim writing) be here?

Do you have an example, where it's bad to allow singleton backslashes (within quotes, not command language), and where it's better to forbid and require "escaped escapes"?

I mean, why shouldn't users just re-enable parse_backslash right away?

Users aren't going to be able to remember what's in your table -- it has to be smaller than that.

Hm, the only special rules about backslashes to remember would be the three items written in bold (escapes are escapes only before escapable chars, repeat twice (only there) to write backslash, and tripple it before a quotation mark to still get a valid ending-quote).

The table just shows some differentiating examples of the simple rules.

Python and JS [...]. And in C. It's very well accepted

Uh, unfortunate early design choice. Well, shell is for succinct interactive use, so probably that's why they agreed to allow singleton backslashes within quotes (and also allowed them in command language but there they get lost).


At least I think this might have found a better name for parse_backslash, maybe singleton_backslash (disabled in ysh)? Or, something denoting whatever the risk of re-enabling it is.

And better error messages.

For double quoted string:
Option singleton_backslash (except for char escape) disabled (use \\ to express one).

For command language:
Option singleton_backslash (except for char escape) disabled (remove, or add single-quote or use \\ to express one).

from oil.

bar-g avatar bar-g commented on August 26, 2024

Ok, singleton backslashes (when lost or same as double-backslash) are bad for comparing strings. Anything else?

And a changed (instead of a subset) format within double-quotes does not allow collective shared usage of variables from ysh and osh. So, nah, I actually don't want that incompatibility at all, I want compat as much as possible.


How come you also couldn't give these answers say at the beginning? Would have guessed it's much easier for you to quickly see the conclusions, than for me. :-) [In any case, my thanks, I much appreciate your help.]

from oil.

bar-g avatar bar-g commented on August 26, 2024

Hm, what about calling it break_backslash? (That could actually apply to all parse_* options of the breaking group.)

But it could be even better: More self explaining, when naming those according to the pitfall, fault, or thing, they prevent/or fix.

So, it again depends on the question: What is break_backslash good for? (Or why make it default or not?)

Is it break_manifold_strings or enforce_unique_string_enc?


sourcing OSH and YSH from each other

I think I want it because I want to share state between ysh modules and some remaining osh modules. Passing data isn't nice, because it's not simple to use json in osh. But accessing string variables is, and it seems to works nicely when only using string variables in the ysh parts!

But for sure, it would only be complete if osh can also access (global) ysh types (of course only at the basic string level interface).

This would allow the ysh parts to make full use of the language, while osh parts still have a defined string interface to the data, i.e. #1813 "ysh-list/dict not accessible with osh syntax (sourcing ysh procs)" and #1811 "fatal: "declare -n" refs to ysh vars (as universal interface "narrow waist" to source/use ysh code)".


you write "\n" and not "\n". I wouldn't write it that way, but it works.

What would you write instead? That could be helpful and shortening.

from oil.

bar-g avatar bar-g commented on August 26, 2024

Urgh, this ugly backslash mangling is even present in these github issue comments! It's awful.

See how the quote in my reply displays, it becomes "\n" and not "\n", even though you originally wrote two backslashes on the left, and they are there in my Markdown comment source.

from oil.

bar-g avatar bar-g commented on August 26, 2024

I wonder if it's less confusing to say parse_strict_backslash and parse_strict_backticks

Hm, counting on tab-completion, it shouldn't be a problem if options get slightly longer. The least confusing may be names that most precisely explain their effects.

The current parse_backslash actually is only breaking in one direction, is this correct?

I think the shopts naming in general could remove much confusion by adding some classification prefixes to the names, which could indicate backwards/forwards code compatibility (c) or incompatibility (i).
Also may be indicate smaller breakages (i) vs. large breakages (I).

What do you think?:

I think the individual option switches are really mostly only for interested or "expert" users, anyway, so I think it's good to provide that info, and fine to use some kind of "cryptic" prefix, that may perfectly be ignored otherwise.

So, for the current parse_backslash, how about this:

It re'strict's the expression of backslashes, only allowing use in their escaping meanig. Existing shell code becomes incompatible, because the option breaks all singleton backslash usages at all places (the ysh interpreter is not "backwards compatible" with this option), however ysh code running with it remains compatible shell code.

So referring to the code, this option introduces a larger incompatibility for shell code to run in ysh, but adapted (ysh) code remains compatible to be used in shell.

From this I'd derive a name something like this:

Ic_strict_escaping_backslash
With it's error messages printing a helpful Ic_strict_escaping_backslash_only: backslashes can only be expressed escaping ('\\').

Applying the same logic to what I proposed above as strict_backslash_escape, actually had the nice effect to allow devising an option with the same characteristics but with much improved compatibilty!:


As you explained, looking at what I proposed before as a separate option, it would have been a new/changing feature, that would break less verbatim string expression than the strict restriction to only escaping backslashes, however creating limited incompatibilities in both directions, i.e. prefixed 'ii_':

ii_only_char_bound_escapes or, put positively ii_unbound_non_escaping_backslash
This is clearly indicating a breaking feature. Even though the incompatibility would really only concern chained backslashes therein (these not being "doubled up").

So now for a combination with much better shell compatibility (and interactive typing usability):

A low-impact strict_ option that guarantees unique string encoding while maintaining full compatibility for single backslashes in both directions, for both the escaping ones and non-escaping verbatim ones.

The "trick" is to forbid singleton backslashes and odd-length-chains in command language (just as now), but in quoted strings, forbid single-pairs and odd-length-chains, but not singleton backslashes.

May this actually maintain full string-compatibility for single escaping backslashes, as well as for single non-escaping verbatim backslashes? Only erroring on superfluous backslashes, is that what you call "bad backslashes"? So could it even be named a nice two-way compatible strict option?

cc_strict_unique_string_encoding

It's nothing new if single backslashes behave differently in command language than within quoted strings.

It could better fulfill the principle to maintain compatibility on the top-level, i.e. basic command language and quoted strings keep working in the same shell way. The strict option only forbids lesser used, redundant things (singleton-\ in commands, singleton-\\ in quoted strings).

[EDITED:]
It's [still mostly] compatible with the python/js etc. way with only backslash escapes or backslash pairs. [Except for singleton-\\.]

from oil.

bar-g avatar bar-g commented on August 26, 2024

Correction: It's [still mostly] compatible with the python/js etc. way with only backslash escapes or backslash pairs. [Except for singleton-\\.]

from oil.

bar-g avatar bar-g commented on August 26, 2024

An even better name for parse_backslash seemed Ic_strict_escaping_backslash, so I changed that above.

Now, I even realize all my confusion with python/js etc. compatibility could also have been avoided from the beginning, say by a compatibility suffix:

Ic_strict_escaping_backslash_cc

So the larger incopatibility on the (left) for existing shell code is warranted by full both-way compatibility on the (right) python/js side!

from oil.

bar-g avatar bar-g commented on August 26, 2024

Likely better to read, sort, and use:

Only have shopt -p print the full compatibility info up front, and only as an optional part, that's not required for manually switching the option:
Ic-cc:strict_escaping_backslash?

from oil.

bar-g avatar bar-g commented on August 26, 2024

I am surprised how much relieve and understanding these compatibility info bits give.

I think providing such compatibility and incompatibility info is a crucial part to foster understanding and appreciation of what comprises and constitutes osh and ysh.

Maybe still to find a better representation for the shopt -p output?:

shopt -s strict_escaping_backslash   # if set:  =>shell-->adapt ,  =>python=>compat

from oil.

bar-g avatar bar-g commented on August 26, 2024

Meaning:
=>shell-->adapt

  • Affected code executing with this option is directly compatible with shell code.
  • Shell code will need adaption to execute with this option
  • The longer thin arrow means it's not just a limited/small (mutating) adaption.

Example of incompatible, "added features" option:
adapt->shell=>compat

  • Affected code executing with this option may need adaption to run in shell.
  • Shell code is compatible to execute with this option.
  • The short thin arrow means it's rather a limited/small (mutating) adaption.

from oil.

bar-g avatar bar-g commented on August 26, 2024

Thanks here as well,
to finish it up, this is the conclusion I draw from this issue's discussion:


The conclusion after the discussion of the original problem in this issue was that the implementation of the current parse_backslash is fine, because adapted code remains 100% compatible with shell, and double-quotes in python and others are fully compatible with it as well.

However, better warrant the required adaptions, by making the retained and gained compatibility much more obvious:

The option (as-it-is) should be renamed to strict_backslash_escape, and shopt -p could show how the complying code is directly compatible with shell and python:

ysh$ shopt -p | grep strict_backslash_escape
shopt -s strict_backslash_escape   # if set:  =>shell-->adapt ,  =>python=>compat

Meaning:
=>shell-->adapt

  • Affected code executing with this option is compatible shell code.
  • Shell code will need adaption to execute with this option (i.e. remove faux escapes, and double-up printed backslashes).
  • The longer thin arrow means it's not just a limited/small (mutating) adaption.

=>python=>compat

  • Affected code executing with this option is compatible python code.
  • Python code is compatible to execute with this option.

from oil.

bar-g avatar bar-g commented on August 26, 2024

Actually, ysh with parse_backslash unset does not seem to be compatible with python at all.

What were you telling! Has this changed recent ?

$ python3
Python 3.11.2 (main, Mar 13 2023, 12:18:29) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> print("m \n ")
m 

>>> 
>>> print("m \z ")
m \z
>>> print("m \\ ")
m \ 
>>> print ('m \ ')
m \ 
>>> print ('m \" ')
m " 

from oil.

andychu avatar andychu commented on August 26, 2024

Well I wasn't saying that YSH behaves exactly like Python, but I was saying that \\ means a single backslash in Python,
JavaScript, and C

I think some of your proposed semantics made \\ into 2 backslashes, I don't remember though


What I will say is that there is a common subset of YSH and Python that is very close

from oil.

andychu avatar andychu commented on August 26, 2024

One advantage of the strict behavior is that most people can't remember which language supports what escapes.

Python supports \0

$ python -c 'print("\0")' | od -c
0000000  \0  \n
0000002

But shell doesn't

$ bash -c 'echo \0'
0

YSH notifies you that this is an error:

$ bin/ysh -c 'echo \0'
  echo \0
       ^~
[ -c flag ]:1: Invalid char escape (parse_backslash)

YSH is like JSON -- JSON disallows \z, but Python apparently doesn't

Most C compilers warn about bad escape sequences these days

from oil.

bar-g avatar bar-g commented on August 26, 2024

Thanks for explaining! Now I'm sure I just had misunderstood you before. (Possibly kind of some wishfully thinking on my side that general progaming languages could have been more consistent.)

Your explanations actually also seem to be a nice description for, and possibly also a good reason to still rather rename this option to something like strict_backslash_escape_err.

Printing errors on escape sequences that are not supported in shell is a good reason for forbidding singleton backslashes.

I adapted the option's description proposed in the issue description to this:

ysh$ shopt -p | grep strict_backslash_escape_err
shopt -s strict_backslash_escape_err   # not print unknown escapes (compat: =>shell->adapt, =>several->adapt)

from oil.

bar-g avatar bar-g commented on August 26, 2024

(The two major exceptions being word splitting "silently" changing, and echo -e silently changing -- I view those as fixing existing bugs.)

I haven't thought about if it's also possible for word splitting, but I think the echo -e bug fix can be done nicely with good error messages, so not silently at all.

That is with strict_echostrings enabled in ysh:upgrade and disabled in ysh:all together with a separate echo_flags option that also gets disabled in ysh:all (#1772).

from oil.

andychu avatar andychu commented on August 26, 2024

It might be reasonable to rename it to strict_backslash , because that will "invert" OSH->YSH from yes -> no to no -> yes

Still thinking about making all the options turn ON by default for YSH

from oil.

bar-g avatar bar-g commented on August 26, 2024

I think turning on a strict_backslash and much more so strict_backslash_escape would make sense. Do you think strict_backslash_escape is too long? Ysh aims at tab-completion, I think. And it allows direct understanding / determination of what it is doing and the failures it creates ("invalid char escape").

from oil.

bar-g avatar bar-g commented on August 26, 2024

Concerning turning off echo_flags I can't think of a better wording or understanding with turning an option on.

It's so short, nice, and self explaining, or do you have an idea that fits into the explaining and setting better than disabling echo_flags in ysh?

from oil.

andychu avatar andychu commented on August 26, 2024

Well it could be shopt --set no_echo_flags, or I honestly I think shopt --set simple_echo is an OK name

from oil.

bar-g avatar bar-g commented on August 26, 2024

Hm, simple_echo is kind of an OK name, only if one already knows what it does. But without knowing, newcomers (my experience) first have to ask what is simple? (And then go, why that? If they used -en before-)

I think no_echo_flags is more directly understandable if you really want strict (or pedantic?) "all options 'on' in ysh". Because, it says what it does. (However, it adds a different, avoidable confusion though, because you need to --set something (on) to turn it off.)

I mean all-on works nice for strict_* options, but naturally some legacy parse_* options need to be turned off and newer ones turned on (feature flags), also there could be some common standard (bash compat?) feature names that are turned off and should not get renamed into a no_* option. So, all in all it seems, a telling name that can be intuitively --set and --unset is worth something in itself.

To still help the systematic cause, maybe there could be something like shopt --print mentioning the ysh default in a comment? And shopt --print --ysh could print setting all options to ysh defaults for comparisons?


(Just noticed, looking through shopt -p, wouldn't it make sense for dynamic_scope to be _dynamic_scope? Isn't it usually just auto-off within funcs/procs? And on, on top-level and in shell functions.)


Finally, back to the problem of the why?-question, that again can be solved nicely by providing an error message that readily contains corresponding help-on-error hints when the users hits on it.

See the error message example that I added at:

from oil.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.