Git Product home page Git Product logo

email-validate-hs's Introduction

email-validate-hs's People

Contributors

bergmark avatar bitemyapp avatar danburton avatar felixonmars avatar k0001 avatar kamaradclimber avatar l8d avatar ocharles avatar porges avatar tehnix avatar vekhir avatar ysangkok avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

email-validate-hs's Issues

Major version changes (v3)

Ideas for v3:

  • Input type should be Text, not ByteString (see #9)
  • Test suite will be inherited from Dominic Sayer's isemail, and won't be inlined in the code.
  • The basic EmailAddress should not provide individual access to localPart/domainPart, but instead be a newtype around Text. This will facilitate loading/storing from unvalidated stores in a simpler fashion than is currently possible (e.g. at the moment you can use unsafeEmailAddress but it takes the two parts separately, not a single email address string).
  • Lift ParseOptions to type level, per this comment.
  • The default parsing mode will change to something saner that excludes "obsolete" syntaxes (these can be accessed using the "Detailed" module).
  • Support internationalized emails.
    • In local parts
      • Add tests for each bit (atext, ctext, etc – ignore dtext since we'll do proper handling)
    • In domains (this probably requires an IDNA2008 implementation)
  • For those who want full analysis, there will be a separate "Detailed" module which can break apart an email address. (Or, just provide functions to break down the EmailAddress type...)
  • Error messages should be improved, and this checked by tests.
  • Consider exposing an FFI interface for other languages to use.
  • Consider improved equality (see #36)
    this should fall out of better domain parsing
  • Consider switching to Megaparsec, for better error messages?
  • Consider which instances to upstream from https://github.com/cdepillabout/emailaddress (as pointed out by @bitemyapp)
    • NFData (per #25)
    • Binary (per #25)
    • PathPiece (per #20)
    • Aeson instances (from emailaddress)

Create EmailAdress or validate String

Hi,

Could someone give me an example on how to create an EmailAddress from String or how to create ByteString from String. I was expecting to be easy to validate a String email but I have a hard time finding how.

Thanks

Underscore in domain is accepted

Prelude> :m + Text.Email.Validate Data.ByteString.UTF8
Prelude Text.Email.Validate Data.ByteString.UTF8> isValid $ fromString "local@exam_ple.com"
True

While underscores are legal in DNS, they're disallowed for hostnames. So you can't have either an 'A' or an 'MX' record for a name with an underscore in it, and thus no address with an underscore in it will be deliverable.

This is a gray area with respect to RFC5322, but I think it's best to reject them since in practice the addresses are invalid. A quote from the RFC:

Note: A liberal syntax for the domain portion of addr-spec is
given here. However, the domain portion contains addressing
information specified by and used in other protocols (e.g.,
[RFC1034], [RFC1035], [RFC1123], [RFC5321]). It is therefore
incumbent upon implementations to conform to the syntax of
addresses for the context in which they are used.

testsuite failing in Stackage Nightly

Test suite failure for package email-validate-2.3.2.13

    src/Text/Email/Parser.hs:15:1: error:                                                                                   [8/9960]
        Could not load module ‘Data.Attoparsec.ByteString.Char8’
        It is a member of the hidden package ‘attoparsec-0.13.2.4’.
        You can run ‘:set -package attoparsec’ to expose it.
        (Note: this unloads all the modules in the current scope.)
        Use -v (or `:set -v` in ghci) to see a list of the files searched for.
    
    src/Text/Email/Parser.hs:16:1: error:
        Could not load module ‘Data.ByteString’
        It is a member of the hidden package ‘bytestring-0.10.12.0’.
        You can run ‘:set -package bytestring’ to expose it.
        (Note: this unloads all the modules in the current scope.)
        Use -v (or `:set -v` in ghci) to see a list of the files searched for.
    
    src/Text/Email/Parser.hs:17:1: error:
        Could not load module ‘Data.ByteString.Char8’
        It is a member of the hidden package ‘bytestring-0.10.12.0’.
        You can run ‘:set -package bytestring’ to expose it.
        (Note: this unloads all the modules in the current scope.)
        Use -v (or `:set -v` in ghci) to see a list of the files searched for.
    src/Text/Email/QuasiQuotation.hs:23: failure in expression `[email|[email protected]|]'
    expected: "[email protected]"
     but got: 
              ^
              <interactive>:25:1: error:
                  • Not in scope: ‘email’
                  • In the quasi-quotation: [email|[email protected]|]
    
    src/Text/Email/Validate.hs:41: failure in expression `canonicalizeEmail "spaces. are. [email protected]"'
    expected: Just "[email protected]"
     but got: 
              ^
              <interactive>:39:1: error:
                  Variable not in scope: canonicalizeEmail :: t0 -> t
    
    src/Text/Email/Validate.hs:56: failure in expression `validate "[email protected]"'
    expected: Right "[email protected]"
     but got: 
              ^
              <interactive>:47:1: error:
                  Variable not in scope: validate :: t0 -> t
    
    Examples: 6  Tried: 5  Errors: 0  Failures: 3

This was with ghc-8.10.3

Why does Text.Email.Parser call a parser on its intermediate results?

https://github.com/Porges/email-validate-hs/blob/master/src/Text/Email/Parser.hs#L68

Here, a value called raw is obtained based largely on the dottedAtoms parser, and then the raw value is then parsed a second time using domainParser. What is the reason for this? Is it to comply with some element of the spec?

Specifically, if one wanted to re-write the whole domainName parser to not use an inner parse, would that be possible in some way while giving correct results?

I've been working on a fork that adds a polymorphic version of every parser in terms of the parsers typeclasses here, and this is the last one to go, but I'm not clear on how to write the polymorphic version because it obviously wouldn't be allowed to call a sub-parser during the process.

Create a joint test suite with hsemail

The package hsemail provides an e-mail parser as well, and IMHO both packages should behave exactly the same in terms of which addresses they accept as valid. My package has a test suite as well as yours, but wouldn't it be nice if we could share that code somehow so that one test suite could verify both libraries? IMHO, this would be beneficial for both packages and increase the likelihood that bugs in either package are discovered.

Use case-insensitive ByteString in EmailAddress

I think it would be appropriate to use the case-insensitive package to represent the internal ByteStrings domain in EmailAddress. This would also seem to "fix" the Eq instance.

Has this been considered before? I've generally implemented the entire email address in a web app as newtype Email = Email CaseInsensitive.ByteString but after a little research, it appears the part before the @ can technically be interpreted as case-sensitive.

Failing tests

I want to work on making the input type more generic (I'm working with Data.Text and not String and don't want to have to convert back and forth), but discovered that many of the tests are failing. It appears as though very few of the invalid addresses are actually coming up as being invalid.

123456789012345678901234567890123456789012345678901234567890@12345678901234567890123456789012345678901234567890123456789.12345678901234567890123456789012345678901234567890123456789.12345678901234567890123456789012345678901234567890123456789.1234.example.com: Should be False, got True
        Entire address is longer than 256 characters
12345678901234567890123456789012345678901234567890123456789012345@example.com: Should be False, got True
        Local part more than 64 characters
""@example.com: Should be False, got True
        Local part is effectively empty
x@x23456789.x23456789.x23456789.x23456789.x23456789.x23456789.x23456789.x23456789.x23456789.x23456789.x23456789.x23456789.x23456789.x23456789.x23456789.x23456789.x23456789.x23456789.x23456789.x23456789.x23456789.x23456789.x23456789.x23456789.x23456789.x23456: Should be False, got True
        Domain exceeds 255 chars
first.last@[.12.34.56.78]: Should be False, got True
        Only char that can precede IPv4 address is ':'
first.last@[12.34.56.789]: Should be False, got True
        Can't be interpreted as IPv4 so IPv6 tag is missing
first.last@[::12.34.56.78]: Should be False, got True
        IPv6 tag is missing
first.last@[IPv5:::12.34.56.78]: Should be False, got True
        IPv6 tag is wrong
first.last@[IPv6:1111:2222:3333::4444:5555:12.34.56.78]: Should be False, got True
        Too many IPv6 groups (4 max)
first.last@[IPv6:1111:2222:3333:4444:5555:12.34.56.78]: Should be False, got True
        Not enough IPv6 groups
first.last@[IPv6:1111:2222:3333:4444:5555:6666:7777:12.34.56.78]: Should be False, got True
        Too many IPv6 groups (6 max)
first.last@[IPv6:1111:2222:3333:4444:5555:6666:7777]: Should be False, got True
        Not enough IPv6 groups
first.last@[IPv6:1111:2222:3333:4444:5555:6666:7777:8888:9999]: Should be False, got True
        Too many IPv6 groups (8 max)
first.last@[IPv6:1111:2222::3333::4444:5555:6666]: Should be False, got True
        Too many '::' (can be none or one)
first.last@[IPv6:1111:2222:3333::4444:5555:6666:7777]: Should be False, got True
        Too many IPv6 groups (6 max)
first.last@[IPv6:1111:2222:333x::4444:5555]: Should be False, got True
        x is not valid in an IPv6 address
first.last@[IPv6:1111:2222:33333::4444:5555]: Should be False, got True
        33333 is not a valid group in an IPv6 address
[email protected]: Should be False, got True
        TLD can't be all digits
first.last@com: Should be False, got True
        Mail host must be second- or lower level
[email protected]: Should be False, got True
        Label can't begin with a hyphen
[email protected]: Should be False, got True
        Label can't end with a hyphen
first.last@x234567890123456789012345678901234567890123456789012345678901234.example.com: Should be False, got True
        Label can't be longer than 63 octets
[email protected]: Should be False, got True
        Top Level Domain won't be all-numeric (see RFC3696 Section 2). I disagree with Dave Child on this one.
test@123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012.com: Should be False, got True
        255 characters is maximum length for domain. This is 256.
test@example: Should be False, got True
        Dave Child says so
first.""[email protected]: Should be False, got True
        Contains a zero-length element
first.last@[IPv6:1111:2222:3333:4444:5555:6666:12.34.567.89]: Should be False, got True
        IPv4 part contains an invalid octet
a@b: Should be False, got True

aaa@[123.123.123.333]: Should be False, got True
        not a valid IP
a@bar: Should be False, got True

[email protected]: Should be False, got True

[email protected]: Should be False, got True

[email protected]: Should be False, got True

[email protected]: Should be False, got True
        ip need to be []

Quotes-wrapped `Show` intentional?

Could well be the case that this was intentional, if it is, could you explain or add a comment?

instance Show EmailAddress where
    show = show . toByteString

-- | Converts an email address back to a ByteString
toByteString :: EmailAddress -> ByteString
toByteString (EmailAddress l d) = BS.concat [l, BS.singleton '@', d]

It's possibly the case that for show = you intended it to be unpack . toByteString or similar. As it is, the instance renders email addresses wrapped in " ... " quotation marks. Let me know either way, happy to submit a PR if it should be amended. Example:

screenshot from 2018-06-09 17-30-08

Status of v3 and offer of help

What's the status of v3? I'd really like to validate Unicode email addresses.

If you have a list of things that need to be worked on I'd be happy to help.

Thanks for a super useful package!

canonicalizeEmail removes spaces inside emails

I ran into a issue with the canonicalizeEmail function where whitespace interspersed between atoms of an email were trimmed from the given email, for example:

>>> :set -XOverloadedStrings
>>> import Data.ByteString (ByteString)
>>> import Text.Email.Validate (canonicalizeEmail)
>>> canonicalizeEmail ("alice. cool. [email protected]" :: ByteString)
(Just "[email protected]") :: Maybe ByteString

This was not originally the case for canonicalizeEmail function. Emails containing whitespace were only accepted after #3 was raised. I was not able to find where RFC 5322 listed whitespace as valid characters for the email grammar however. Shouldn't the correct implementation reject emails containing whitespace? i.e.

>>> canonicalizeEmail ("alice. cool. [email protected]" :: ByteString)
Nothing :: Maybe ByteString

I wasn't able to find where RFC 5322 allowed for whitespace outside quotations in the grammar it specified for email address. However, even if "alice. cool. [email protected]" constitutes a valid email address according to RFC 5322, stripping whitespace within emails means that the emails "alice. cool. [email protected]" and "[email protected]" are essentially equivalent according to canonicalizeEmail function:

>>> import Data.Function (on)
>>> ((==) `on` canonicalizeEmail) "alice. cool. [email protected]" "[email protected]"
True

Of course I think trimming surrounding whitespace is probably reasonable since it does not change the email address in any meaningful way. I do believe that the string "alice. cool. [email protected]" is meaningfully different from the string "[email protected]", so I don't believe canonicalizeEmail function should behave as if they aren't.

Ideally I would like canonicalizeEmail to outright reject emails containing whitespace. If that isn't desirable for whatever reason, then I would be really nice for canonicalizeEmail to not strip whitespace in emails. Especially since the original issue that motivated the change which allowed canonicalizeEmail to accept emails containing spaces was made to meet the needs of an individual at the cost of making canonicalizeEmail diverge from the RFC that the rest of the library adheres to.

I'm happy to make either of the changes I proposed here if you agree with me, just let me know what I can do.

template-haskell-2.21 not tolerated

This version of template-haskell must be accepted for the library to work on GHC 9.8, since the library is shipped with the compiler. The library was removed from Stackage Nightly for this reason.

Because email-validate is a dependency of yesod-form, and yesod depends on yesod-form, this means all packages depending on yesod had to be removed.

quasiquoter?

it'd be nice to have a quasiquoter that can be run at compile time so i don't need to do (fromJust $ emailAddress "[email protected]"). I had a bit of a go, but got a weird error from ByteString's Data.Data instance:

emailQQ = QuasiQuoter { quoteExp = quoteEmailExp
                      }

quoteEmailExp s = do
   case emailAddress (BS8.pack s) of
     Nothing -> error "bugger"
     Just x -> trace "worked" $ dataToExpQ (const Nothing) (x :: EmailAddress)
[1 of 1] Compiling Main             ( test/Spec.hs, .stack-work/dist/x86_64-linux/Cabal-1.22.5.0/build/email-validate-qq-test/email-validate-qq-test-tmp/Main.o )
worked       

/home/mark/projects/email-validate-qq/test/Spec.hs:12:12:
    Exception when trying to run compile-time code:
      Data.ByteString.ByteString.toConstr
    Code: template-haskell-2.10.0.0:Language.Haskell.TH.Quote.quoteExp
            emailQQ "[email protected]"

is the Data instance for ByteString just not up to it? I might be doing something terribly wrong.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.