Git Product home page Git Product logo

Comments (22)

znerol avatar znerol commented on May 17, 2024 2

And in that case then maybe it makes sense to keep the internal representation LN terminated, because it keeps things a bit simpler and easier to work with/less accident prone.

That's fine. Also a smaller patch = less risk.

Sorry this is taking so long, it's unfortunate that we got zero warning about the problem, but I'd rather take a bit more time to get the fix as right as possible.

There is no need to be sorry and there is no need to rush! Thanks a lot for your work.

from chasquid.

znerol avatar znerol commented on May 17, 2024 2

I went through the tests once more. They are suitable to cover the tricky cases. I especially appreciate the fact that there are integration tests which exercise a combination of error states (message too large plus invalid newline). Good thinking here.

Thanks a lot @albertito for the quick and at the same time thorough update.

from chasquid.

znerol avatar znerol commented on May 17, 2024 1

Oh! Same time different outcome. I have to confess, that my observations are based purely on looking at docs and issues, not at the code itself.

from chasquid.

albertito avatar albertito commented on May 17, 2024 1

Looks like most of the parsing is done by net/txtproto Dotreader. I only found one relevant issue in the go tracker: golang/go#9781 It was closed with the following remark:

it looks like we're following the strict rules from RFC 5321. I'm going to close this without making any changes. The documentation seems accurate too.

If this is still the case, then DotReader should be safe.

Thanks for finding that!

I think the messages and examples from the issue are not consistent with the message closing it: one person is saying "RFC says this behaviour is invalid, and here are some examples", and then it is closed with "we are compliant to the RFC so closing" (obviously paraphrasing both).

So I think the implementations are too lax and don't enforce the "\r or \n must only appear together", and that original issue was closed incorrectly.

I also did some preliminary integration tests and they seem to confirm this too.

But I will look a bit more in depth after lunch, it is very possible I'm missing something that makes this okay.

from chasquid.

ThinkChaos avatar ThinkChaos commented on May 17, 2024 1

Thanks for tackling this so quick! I'm glad to be using your software :)

from chasquid.

znerol avatar znerol commented on May 17, 2024 1

It keeps an LF-terminated internal representation, so the patch is less intrusive, and there's less chances of internal problems or mixed termination styles. It enforces CRLF-terminated lines on external endpoints: SMTP courier already had them, this patch adds it to the MDA courier.

Comparing the next branch with crlf-everywhere it is quite obvious that keeping the internal representation LF-only is less risky. If that internal representation should be changed in the future, then it likely would make sense to introduce a new data type for guaranteed-to-be-CRLF-terminated-text and use that throughout the project. That way inconsistencies could be detected during compile time.

I went through the patch in next again and left some comments. I'm neither a professional in email protocols nor in golang though, so judge them with a grain of salt.

from chasquid.

ThinkChaos avatar ThinkChaos commented on May 17, 2024 1

Everything looks fine to me!

The following is not critical at all and I didn't see any bugs in the code, just thought I'd write it down as "might be nice for the future".
Generally errors could be handled more idiomatically, which would help be more refactor and error proof.
Here's a couple specific points:

  • Use errors.Is(err, x) instead of err == x so that introducing a new abstraction that wraps errors doesn't break those checks
  • When using fmt.Errorf to wrap an error, %w should be preferred to %v as it preserves the actual error and not just a string. That way errors.Is/As/Unwrap continue working.
    I know linters can catch this, so maybe worth looking into (golangci is what most use, maybe go vet is enough).
  • Avoid returning a special value, such as code < 0 in this patch, as it's easy to miss. Having an explicit error return value instead forces the caller to check it. Using an error also enables returning data.
    Example for the code case again in pseudo-go:
    type response struct{ Code int; Msg string }
    type responseError struct{ e error; r response } // Would need methods for Is/As/Unwrap
    
    fn (c *Conn) Handle() {
        r, err := c.DATA()
        if err != nil {
            var rErr *responseError
            if errors.As(err, &rErr) {
                c.writeResponse(rErr.r)
            }
    
            return err
        }
    
        c.writeResponse(r)
    }
    
    fn (c *Conn) DATA() (response, error) {
        data, err := read()
        if err != nil {
            return responseError{err, response{521, "..."}}
        }
    
        return response{250, "..."}
    }

from chasquid.

albertito avatar albertito commented on May 17, 2024 1

Everything looks fine to me!

The following is not critical at all and I didn't see any bugs in the code, just thought I'd write it down as "might be nice for the future". Generally errors could be handled more idiomatically, which would help be more refactor and error proof. Here's a couple specific points:

Thanks for taking the time to think about this and write it down!

  • Use errors.Is(err, x) instead of err == x so that introducing a new abstraction that wraps errors doesn't break those checks

Thanks, this is a good point. A lot of the chasquid code predates errors.Is and there are many cases where it can be used, and doing a review pass is a great idea.

I think sometimes in tightly coupled code, == can be more practical (since it indicates "this is a specific error"), but that is a very narrow situation and in general I agree with you.

In this particular patch, the error returned could include e.g. the line number and the type of error so it can be added to the trace by the DATA function.

I didn't want to do this right now as it would add more complexity to this patch that's already fairly big and sensitive, but it's definitely something to add later.

  • When using fmt.Errorf to wrap an error, %w should be preferred to %v as it preserves the actual error and not just a string. That way errors.Is/As/Unwrap continue working.
    I know linters can catch this, so maybe worth looking into (golangci is what most use, maybe go vet is enough).

This is just an artifact of the code predating %w, and I agree with you. It's something to clean up/bring more up to date alongside the above.

  • Avoid returning a special value, such as code < 0 in this patch, as it's easy to miss. Having an explicit error return value instead forces the caller to check it. Using an error also enables returning data.

Totally! This code < 0 thing is a big hack!

Until this point, I thought returning the (code, msg) pair was practical and clear enough, and a custom type wouldn't have added much value.

But now that we want to break the connection this way, I think it definitely merits a custom type in the style you suggest.

I don't want to add that now because it means again a fair amount of complexity on an already intrusive change. If we had time, I would do a patch introducing that type first, and then use it in this one

But this issue is fairly pressing so I rather do the little hack, and then clean it up with more time afterwards.

I have added a couple of TODOs based on your suggestions, so I don't forget :)

Thanks again!

from chasquid.

znerol avatar znerol commented on May 17, 2024

Looks like most of the parsing is done by net/txtproto Dotreader. I only found one relevant issue in the go tracker: golang/go#9781 It was closed with the following remark:

it looks like we're following the strict rules from RFC 5321. I'm going to close this without making any changes. The documentation seems accurate too.

If this is still the case, then DotReader should be safe.

from chasquid.

albertito avatar albertito commented on May 17, 2024

I think chasquid is vulnerable to the attack because neither net/textproto.Reader.DotReader nor net/mail.ReadMessage enforce the "LR and CF must only appear together" rule.

Some (not polished and not comprehensive) examples: https://go.dev/play/p/pZNkk-FMwp-

This is based on a quick initial analysis; it is possible I'm missing something. I will continue looking at this today.

from chasquid.

albertito avatar albertito commented on May 17, 2024

Just to keep folks updated: I've written a bunch of test cases to confirm the issue, and now I'm working on a wrapper that enforces strict \r\n in messages, as per the RFC.

That way we can continue to use the standard library for parsing dot-terminated sections and email messages, but still be strict about newlines in them.

While I don't necessarily love the approach, it is practical, and I hope it is robust enough for now.

I'll post another update once I have something more concrete.

from chasquid.

albertito avatar albertito commented on May 17, 2024

Commit 606c392 (in the next branch) has a preliminary fix.

It still needs more testing, which I will do tomorrow.

from chasquid.

znerol avatar znerol commented on May 17, 2024

Thanks a lot @albertito, commit 606c392 looks promising. While reading through the new readUntilDot func, I noticed the following:

  1. The new readUntilDot func is much simpler than the stdlib DotReader.Read func. I definitely have less trouble following the code and the state flow.
  2. The new readUntilDot func doesn't consume CR characters, it returns any CR-LF sequence unmodified. The stdlib DotReader.Read func does consume CR characters. It only returns LF line endings.
  3. The new readUntilDot func does consume a leading period (dot stuffing). Same with the stdlib DotReader.Read func.

If observation 2. is correct, then the internal representation of a message body changes with the fix. Will that have an impact on post-data hook implementations? Will it have an impact on mda-lmtp?

from chasquid.

znerol avatar znerol commented on May 17, 2024

If observation 2. is correct, then the internal representation of a message body changes with the fix. Will that have an impact on post-data hook implementations? Will it have an impact on mda-lmtp?

I assume that the tools invoked from the post-data hook will expect to be operating on the body of a mime message. RFC 2046 section 4.1.1 specifies that:

The canonical form of any MIME "text" subtype MUST always represent a line break as a CRLF sequence.

Passing the email body to post-data with CRLF line endings is actually better aligned with standards.

from chasquid.

znerol avatar znerol commented on May 17, 2024

If observation 2. is correct, then the internal representation of a message body changes with the fix. Will that have an impact on post-data hook implementations? Will it have an impact on mda-lmtp?

mda-lmtp uses stdlib net/texproto DotWriter. That caters for content in both forms (NL as well as CRNL). So that should be good as well.

from chasquid.

znerol avatar znerol commented on May 17, 2024

Hm, I think addReceivedHeader and envelope.AddHeader might need some attention as well. It looks like this funcs are adding NL terminated headers instead of CRNL terminated. If that is true, then chasquid instances will have trouble talking to each other after the fix (and to postfix as well as soon as they roll out smtpd_forbid_bare_newline ). This is because the receiver will drop the connection as soon as it encounters a solitary NL.

from chasquid.

albertito avatar albertito commented on May 17, 2024

Thanks a lot @albertito, commit 606c392 looks promising. While reading through the new readUntilDot func, I noticed the following:

  1. The new readUntilDot func is much simpler than the stdlib DotReader.Read func. I definitely have less trouble following the code and the state flow.

Thank you! Some parts are a bit hacky, and I think it's likely that it can be simplified/made more readable, but that can be iterated on later.

  1. The new readUntilDot func doesn't consume CR characters, it returns any CR-LF sequence unmodified. The stdlib DotReader.Read func does consume CR characters. It only returns LF line endings.
    If observation 2. is correct, then the internal representation of a message body changes with the fix. Will that have an impact on post-data hook implementations? Will it have an impact on mda-lmtp?

This is correct. I thought about keeping the old behaviour, but then if other tools (like MDAs) get more strict in the future, then it's likely we will need to end up doing it anyway.

So my current thinking is to just bite the bullet and update it.

The impact is smaller than it may seem, because both SMTP courier and mda-lmtp already normalize newlines to \r\n, so there shouldn't be externally observable behaviour differences.

Hooks, and other MDAs, will observe a difference in newlines. While the new behaviour is more compliant (as you noted in the other comment), it is still a significant change.

  1. The new readUntilDot func does consume a leading period (dot stuffing). Same with the stdlib DotReader.Read func.

Yeah, that is important for preserving the message structure. While this behaviour was tested, the test was a bit on the sides, so now I've added more explicit end-to-end tests for the dot stuffing just in case.

Hm, I think addReceivedHeader and envelope.AddHeader might need some attention as well. It looks like this funcs are adding NL terminated headers instead of CRNL terminated.

This is true, and also DSNs are internally \n-terminated which now creates some inconsistencies. I am working on clearing these cases too.

If that is true, then chasquid instances will have trouble talking to each other after the fix (and to postfix as well as soon as they roll out smtpd_forbid_bare_newline ). This is because the receiver will drop the connection as soon as it encounters a solitary NL.

That's not the case: even if we internally still have some \n-terminated lines, the SMTP courier will normalize the newlines; because of that, there are no issues communicating with other strict servers (chasquid or postfix).

This is well covered by the integration tests (which have chasquid talk to each other).

I will post an updated patch that improves testing, although there are no significant code changes from the previous one.

My next step is to clear up the known internal \n-terminated lines just to reduce potential friction/confusion. I will look at the hook and the local MDA courier to see if it's worth doing the newline normalization there too, although I am not sure it will be needed.

Thank you!

from chasquid.

albertito avatar albertito commented on May 17, 2024

My next step is to clear up the known internal \n-terminated lines just to reduce potential friction/confusion. I will look at the hook and the local MDA courier to see if it's worth doing the newline normalization there too, although I am not sure it will be needed.

FYI, I've done a pass of "internal representation uses CRLN terminated lines" (fixing up DSN generation and those headers), and it works fine.

However, after thinking about this for a bit more, I think doing a CRLN normalization at endpoints with external systems (SMTP courier (preexisting) and MDA courier (missing)) is still the right call.

And in that case then maybe it makes sense to keep the internal representation LN terminated, because it keeps things a bit simpler and easier to work with/less accident prone.

I'm working on an alternative set of patches so we can compare the two approaches. I will post references once I have something in reasonable shape.

Sorry this is taking so long, it's unfortunate that we got zero warning about the problem, but I'd rather take a bit more time to get the fix as right as possible.

from chasquid.

albertito avatar albertito commented on May 17, 2024

The next branch has the latest iteration of the fixes.

It keeps an LF-terminated internal representation, so the patch is less intrusive, and there's less chances of internal problems or mixed termination styles. It enforces CRLF-terminated lines on external endpoints: SMTP courier already had them, this patch adds it to the MDA courier.

Doing the enforcement at endpoints should help prevent any future issues arising from potentially inconsistent internal newlines.

Hooks continue to get LF-terminated messages, so there are less chances of regressions in preexisting deployments. Also output from hooks (which are usually LF-terminated) continues to be well handled.

I'm going to run this version in my development server for a bit, do some external-world tests; and report back.

For comparison, I uploaded a new branch crlf-everywhere which does the internal conversion to CRLF, and skips the enforcement in the MDA courier. I'll eventually remove it, but it can be useful if anyone wants to compare the two approaches and have an opinion about it.

Thank you!

from chasquid.

albertito avatar albertito commented on May 17, 2024

Thanks @znerol and @ThinkChaos for your comments and thorough review!

I've responded to the comments and uploaded an amended patch 34d52ca.

I really appreciate the review, so if you have a chance, please take another look and we can keep iterating on this.

Thank you!

from chasquid.

albertito avatar albertito commented on May 17, 2024

Thanks again for the detailed and thorough reviews!

Things look stable, and I haven't identified any issue in my (unfortunately short) tests in the wild, so I am preparing a new release.

from chasquid.

albertito avatar albertito commented on May 17, 2024

chasquid v1.13 has been released with the fix.

Thanks again for all the help, discussions and reviews through fixing this!

from chasquid.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.