Git Product home page Git Product logo

Comments (21)

ixm7 avatar ixm7 commented on July 20, 2024 1

Just submitted a pull request with the code to solve this issue.

from peanutbutter.

ixm7 avatar ixm7 commented on July 20, 2024 1

from peanutbutter.

ixm7 avatar ixm7 commented on July 20, 2024

Here is a cleaned up example (all <> brackets changed to [] to avoid text disappearing in this post):
Note that the text got truncated not on the first semi-colon. So the problem seems to be truncation caused by semi-colon + some other unknown condition.

Value in INI file (Email_Message=):
"[HTML] [HEAD] [style type="text/css"] body{font-family: Verdana, Geneva, sans-serif; font-size:10pt;} td{font-family: Verdana, Geneva, sans-serif;font-size:10pt;} p{margin-top: 10px; margin-bottom: 10px;} ol,ul{margin-top: -10px; margin-bottom: -10px;} [/style] [/HEAD] [BODY]test [font style="background-color: rgb(255, 255, 192);"]{MovedFileFullPath}[/font][/BODY][/HTML]"

Value Read by INI component:
"[HTML] [HEAD] [style type="text/css"] body{font-family: Verdana, Geneva, sans-serif; font-size:10pt;} td{font-family: Verdana, Geneva, sans-serif;font-size:10pt;} p{margin-top: 10px; margin-bottom: 10px;} ol,ul{margin-top: -10px; margin-bottom: -10px;} [/style] [/HEAD] [BODY]test [font style="background-color: rgb(255, 255, 192)

from peanutbutter.

ixm7 avatar ixm7 commented on July 20, 2024

Note that the semi-colon in rgb(255, 255, 192);"] is indeed causing the problem because removing it in the INI entry fixes the issue.
Adding a space after it doesn't fix the issue.

from peanutbutter.

ixm7 avatar ixm7 commented on July 20, 2024

Also note that the value returned starts with a literal double-quote: "[HTML]...
That literal double quote shouldn't be there.

from peanutbutter.

ixm7 avatar ixm7 commented on July 20, 2024

The problem is due to finding even number of double quotes in a segment that doesn't actually end with double quotes.
Please change the following function:

`private bool HaveUnmatchedQuotesIn(IEnumerable parts)

{

var joined = string.Join(";", parts);

var quoted = joined.Count(c => c == '"');

return quoted % 2 != 0;

}`

to:

`private bool HaveUnmatchedQuotesIn(IEnumerable parts)

{

var joined = string.Join(";", parts);

if (!joined.EndsWith('"'))
    return true;
var quoted = joined.Count(c => c == '"');

return quoted % 2 != 0;

}`

from peanutbutter.

fluffynuts avatar fluffynuts commented on July 20, 2024

Thanks for reporting!

Looking at #43 I'm not convinced that that's the best way to resolve this (though it's working for your use-case). Looking at the first SO post I found (https://stackoverflow.com/questions/11046234/adding-quotes-to-ini-file-values-php#11046544) and the wikipedia entry at https://en.wikipedia.org/wiki/INI_file#Escape_characters, I think that the best way forward is to support escaping quotes with \" and ensuring that parsing takes the entire value in the outer quotes. Also, that article suggests that I should revisit INIFile to support escaping other common characters, especially newlines and carriage returns; perhaps tabs too?

The others aren't that interesting in a string which is surrounded with double-quotes, like you have done:

  • #, ', ;, : don't break anything in quotes
  • nulls, bells and backspaces aren't going to be easily supported in .net string values
  • = is already catered for by only accepting the first = as a name/value separator

For now, I'm going to leave these as-is, unless you or someone else would find them really useful, but I've fixed up the parsing of values to do as above: double-quotes which are escaped by a backslash will not be counted as part of an unmatched pair, so will cause the value-grabber to keep on going, gobbling up the rest of the line.

There's a test at

public void ShouldReturnFullQuotedValueIrrespectiveOfEmbeddedSemiColon()
using your example. This update should be available in PeanutButter.INI version 2.0.0.

Since this behavior breaks existing usage for some people, there's a way to opt out via constructor parameter and property on INIFile. I've also renamed the namespace to match the package - if I'm going to break stuff, I might as well do it properly!

from peanutbutter.

ixm7 avatar ixm7 commented on July 20, 2024

Davyd, how would your solution handle cases with a literal backslash followed by double quote (at the end of the entry or in the middle of the entry?)
My pull request solution handles those cases without a problem and doesn't change the behavior of the component.

from peanutbutter.

ixm7 avatar ixm7 commented on July 20, 2024

from peanutbutter.

fluffynuts avatar fluffynuts commented on July 20, 2024

a literal backslash also has to be escaped, ie as \\, or the caller can opt out of escaped characters (an option I had to add back in again for backwards compatibility because I know I have a user with settings like:

[section]
value="\\server\share\path\"

I thought I'd rather pick something which would be expected by a new user: that backslashes could be used to escape quotes (meaning that lone backslashes also need escaping), as per https://en.wikipedia.org/wiki/INI_file#Escape_characters . The idea is to try to have as conformant a parser as possible.

Honestly, I feel like the fact that I missed the concept of escaped characters and the complex use-cases that people have found (yours included) meant that I introduced buggy behavior into the parser, and I apologise!

So the default is to escape everything but the caller can opt out.

I didn't see a test for your PR, but there's an obvious problem in that quotes inside comments would cause incorrect interpretation of the value, ie:

[section]
key="some value;" ;"force something"

and it's non-conformant behavior. To be quite honest, INIFile was one of the first things I wrote in PeanutButter, though GitHub only shows history back to 2016, I'm quite sure I first made it a few years prior, since I'd come out of C++ land, where I had my own parser, into .net land, around 2012, and I needed a similar parser. I really wish I'd thought about escape characters back then! It's (imo) the right thing to do, though it means that your INI files are negatively affected, I know, and I'm really sorry: it's the result of me doing it wrong the first time. A setting like:

[section]
key="some value "with quotes" and more"

should have always given parsing errors, though I guess I could figure a way around it. If it's a huge issue, please let me know and I'll see if I can figure out more tolerant parsing to maintain older behavior. It's a tough situation tho, because if I apply the old parsing mechanism to a value which is correctly escaped, ie:

[section]
key="some value \"with quotes\" and more"

then the consuming code would see the value of key to be `some value "with quotes" and more", ie the slashes would be unexpected.

I could add a third parsing "mode" (where we currently have "no escaping" and "full escaping", where I try "best effort" and I see what algorithm I can come up with that solves the best effort without breaking the others - I could literally switch on the algo and say that for "best effort", do the best I can. There will still be cases where the parser might do unexpected things - whereas the fully-escaped and totally-not-escaped paths are very predictable. I don't want to leave a user in a bad spot, so, as I say, let me know what you think - if it's highly impractical for you to use "proper" escaping, as per the wiki article, let's see what we can do. If it were me, I'd write a pre-parser for the config files to do the conversion, since you know the format of your files and what to expect, so it wouldn't be a blanket "try to fix all values" - you could literally read that line out of the file, look for " without preceding = or \ where it's not at the end of the line " and replace with \" to have a winner.

Something like:

void EnsureSettingIsEscaped(
	string iniPath, // path to the ini file
	string setting) // setting name to fix up
{
 	var lines = File.ReadAllLines(iniPath);
	var idx = -1;
	foreach (var line in lines)
	{
		idx++;
		var parts = line.Split('=');
		if (parts[0] != setting)
		{
			continue;
		}
		var rawSetting = string.Join("=", parts.Skip(1));
		if (rawSetting.Length > 2)
		{
 			// trim off containing quotes, but only one at either end!
			if (rawSetting[0] == '"')
			{
				rawSetting = rawSetting.Substring(1);
			}
			if (rawSetting[rawSetting.Length - 1] == '"')
			{
				rawSetting = rawSetting.Substring(0, rawSetting.Length - 1);
			}
		}
		var escaped = rawSetting.Replace("\"", "\\\"");
		lines[idx] = $"{parts[0]}=\"{escaped}\"";
	}
	File.WriteAllLines(iniPath, lines, System.Text.Encoding.UTF8);
}

though I've written this "by the seat of my pants", so I wouldn't be surprised if there's an error!

from peanutbutter.

ixm7 avatar ixm7 commented on July 20, 2024

from peanutbutter.

fluffynuts avatar fluffynuts commented on July 20, 2024

By the way, going forward, I suggest you state that all future parsing modes do not support inline comments. All comments must start on a new line. Regards, - Ido

This probably has lower impact than other changes as I imagine that inline comments are a rarer use-case than fancy value parsing; however, i would still need to know up-front that the caller doesn't intend to use escaped characters so that, eg, \" from an ini file ends up coming through as a slash and a quote

perhaps then, we need to have a parsing feature matrix, of which we can include:

  • allow inline comments
    • as you've noted, with this off, line parsing becomes really simple
  • allow escaped characters
    • the library needs a definitive answer of what to do with, eg \\ and \" in ini files

It seems that the broadest use-case might be catered for by:

  • not allowing inline comments (as you've specified)
  • not allowing escaped characters

Since the last package release was a major version bump to 2.0.0, it shouldn't have affected existing users unless they choose to upgrade there (and a major version upgrade is a warning that things may break). So I'm happy to still allow a little "breakage", in that 2.0.1 would not behave like 2.0.0, but would have better backward compatibility.

from peanutbutter.

ixm7 avatar ixm7 commented on July 20, 2024

Sounds good to me.

from peanutbutter.

fluffynuts avatar fluffynuts commented on July 20, 2024

Ok, I think I've gone one better - using TDD to drive out the implementation, I have two current strategies for parsing ini files: BestEffort and Strict.

BestEffort is chosen by default and will deal with all the current situations that I know about within reason. I say "within reason" because a setting like your example with an inline comment containing a quote will give unexpected results, as enforced by a test (:

Strict expects that quotes and backslashes are properly escaped within values. When persisting, lines that were properly escaped will be re-persisted as such, otherwise the effort is to try to get the line back as it was before parsing, unless Strict parsing is selected, in which case, all output key-value-pairs are "properly" escaped.

Parsing is achieved per-line with a selected instance of the public interface ILineParser, based on the provided parser strategy value. This also means that I've exposed the interfaces required so that a consumer could, should they wish, implement their very own parsing strategy. If nothing else, this would allow someone to work around an issue whilst discussions on an internal solution take place.

I've also taken this time to ensure that PeanutButter has xmldoc emitted and cleaned up a bit of code here and there. The new logic is available as of published package 2.0.1 . Please test and report any issues.

from peanutbutter.

ixm7 avatar ixm7 commented on July 20, 2024

New default behavior solves the issue for me. Many thanks!

Consider adding a Simple strategy, which assumes no inline comments and no escaped characters.
I believe this is the most common use case. Handling it as such would boost performance and simplify the logic.

from peanutbutter.

ixm7 avatar ixm7 commented on July 20, 2024

Setting gINIFile.ParseStrategy = ParseStrategies.BestEffort seems to get proper behavior on the outside.
But persisted values are now escaped (in the ini file). That breaks expected behavior.

Please add a simple Parse strategy that doesn't expect and doesn't force escaping.
Even better if that strategy doesn't expect inline comments (for better performance and simpler logic).

from peanutbutter.

fluffynuts avatar fluffynuts commented on July 20, 2024

write-out escaping when using BestEffort should only happen if the incoming line was escaped - please provide an example ini to work off of for me to create a unit test and a fix if that's happening. There's even a test around this (:

You shouldn't have to set anything btw - default strategy is BestEffort.

If performance is hugely impacted, please provide an example too - this is my order of interest for all things:

  1. reliability
  2. functionality
  3. form (here, api)
  4. performance

so I'm sure you can see that, whilst performance matters to me, it's by no means top of the list, and I get quite stubborn about it.

from peanutbutter.

ixm7 avatar ixm7 commented on July 20, 2024

The comment editor on this page keeps dropping content and I don't know how to work around it.
Can you email me directly and I'll send an example of before/after.

from peanutbutter.

fluffynuts avatar fluffynuts commented on July 20, 2024

I'll look into it in the morning; best-effort should only apply escaping if the incoming value had escaping

from peanutbutter.

fluffynuts avatar fluffynuts commented on July 20, 2024

btw, if your values are written out escaped, that may not be a bad thing - it means that future reads will have a precise handle on what data should be there. As mentioned before, i consider it a bug that PeanutButter.INI didn't have escaping from the start

from peanutbutter.

ixm7 avatar ixm7 commented on July 20, 2024

I apologize for the false alarm. This is working fine for me now.
Must have been a problem with the way I was testing.

Many thanks for the lightning-fast new version release!

from peanutbutter.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.