Email address validation for Haskell
porges / email-validate-hs Goto Github PK
View Code? Open in Web Editor NEWEmail address validation for Haskell
License: Other
Email address validation for Haskell
License: Other
Email address validation for Haskell
Ideas for v3:
isemail
, and won't be inlined in the code.EmailAddress
should not provide individual access to localPart
/domainPart
, but instead be a newtype
around Text
. This will facilitate loading/storing from unvalidated stores in a simpler fashion than is currently possible (e.g. at the moment you can use unsafeEmailAddress
but it takes the two parts separately, not a single email address string).ParseOptions
to type level, per this comment.EmailAddress
type...)
DomainPart = HostName | IP
.Text
?ip
for now, since it provides Attoparsec Text
parsers.Hi,
Could someone give me an example on how to create an EmailAddress from String or how to create ByteString from String. I was expecting to be easy to validate a String email but I have a hard time finding how.
Thanks
Prelude> :m + Text.Email.Validate Data.ByteString.UTF8
Prelude Text.Email.Validate Data.ByteString.UTF8> isValid $ fromString "local@exam_ple.com"
True
While underscores are legal in DNS, they're disallowed for hostnames. So you can't have either an 'A' or an 'MX' record for a name with an underscore in it, and thus no address with an underscore in it will be deliverable.
This is a gray area with respect to RFC5322, but I think it's best to reject them since in practice the addresses are invalid. A quote from the RFC:
Note: A liberal syntax for the domain portion of addr-spec is
given here. However, the domain portion contains addressing
information specified by and used in other protocols (e.g.,
[RFC1034], [RFC1035], [RFC1123], [RFC5321]). It is therefore
incumbent upon implementations to conform to the syntax of
addresses for the context in which they are used.
The latest GHC version compile this package just fine, except for the over-specified version constraint on template-haskell
that prevents the build from running without --allow-newer
etc. Would you mind editing the Cabal file on Hackage to relax that particular constraint?
Test suite failure for package email-validate-2.3.2.13
src/Text/Email/Parser.hs:15:1: error: [8/9960]
Could not load module ‘Data.Attoparsec.ByteString.Char8’
It is a member of the hidden package ‘attoparsec-0.13.2.4’.
You can run ‘:set -package attoparsec’ to expose it.
(Note: this unloads all the modules in the current scope.)
Use -v (or `:set -v` in ghci) to see a list of the files searched for.
src/Text/Email/Parser.hs:16:1: error:
Could not load module ‘Data.ByteString’
It is a member of the hidden package ‘bytestring-0.10.12.0’.
You can run ‘:set -package bytestring’ to expose it.
(Note: this unloads all the modules in the current scope.)
Use -v (or `:set -v` in ghci) to see a list of the files searched for.
src/Text/Email/Parser.hs:17:1: error:
Could not load module ‘Data.ByteString.Char8’
It is a member of the hidden package ‘bytestring-0.10.12.0’.
You can run ‘:set -package bytestring’ to expose it.
(Note: this unloads all the modules in the current scope.)
Use -v (or `:set -v` in ghci) to see a list of the files searched for.
src/Text/Email/QuasiQuotation.hs:23: failure in expression `[email|[email protected]|]'
expected: "[email protected]"
but got:
^
<interactive>:25:1: error:
• Not in scope: ‘email’
• In the quasi-quotation: [email|[email protected]|]
src/Text/Email/Validate.hs:41: failure in expression `canonicalizeEmail "spaces. are. [email protected]"'
expected: Just "[email protected]"
but got:
^
<interactive>:39:1: error:
Variable not in scope: canonicalizeEmail :: t0 -> t
src/Text/Email/Validate.hs:56: failure in expression `validate "[email protected]"'
expected: Right "[email protected]"
but got:
^
<interactive>:47:1: error:
Variable not in scope: validate :: t0 -> t
Examples: 6 Tried: 5 Errors: 0 Failures: 3
This was with ghc-8.10.3
https://github.com/Porges/email-validate-hs/blob/master/src/Text/Email/Parser.hs#L68
Here, a value called raw
is obtained based largely on the dottedAtoms parser, and then the raw
value is then parsed a second time using domainParser. What is the reason for this? Is it to comply with some element of the spec?
Specifically, if one wanted to re-write the whole domainName parser to not use an inner parse, would that be possible in some way while giving correct results?
I've been working on a fork that adds a polymorphic version of every parser in terms of the parsers typeclasses here, and this is the last one to go, but I'm not clear on how to write the polymorphic version because it obviously wouldn't be allowed to call a sub-parser during the process.
Cf. https://twitter.com/bitemyapp/status/1049342418745774080 from a non-Haskell app.
I checked and the library considers the emails invalid. There's nothing for you to fix here, just wanted to let you know the library does the right thing :)
For more info see http://en.wikipedia.org/wiki/Email_address#Internationalization
The package hsemail provides an e-mail parser as well, and IMHO both packages should behave exactly the same in terms of which addresses they accept as valid. My package has a test suite as well as yours, but wouldn't it be nice if we could share that code somehow so that one test suite could verify both libraries? IMHO, this would be beneficial for both packages and increase the likelihood that bugs in either package are discovered.
ICANN has prohibited dotless domain names:
https://www.icann.org/news/announcement-2013-08-30-en
While these are technically possible in the spec, they have effectively been made invalid. email-validate
will accept these domains in its current form, when it should mark them as invalid.
I think it would be appropriate to use the case-insensitive
package to represent the internal ByteString
s domain in EmailAddress
. This would also seem to "fix" the Eq
instance.
Has this been considered before? I've generally implemented the entire email address in a web app as newtype Email = Email CaseInsensitive.ByteString
but after a little research, it appears the part before the @
can technically be interpreted as case-sensitive.
I want to work on making the input type more generic (I'm working with Data.Text
and not String
and don't want to have to convert back and forth), but discovered that many of the tests are failing. It appears as though very few of the invalid addresses are actually coming up as being invalid.
123456789012345678901234567890123456789012345678901234567890@12345678901234567890123456789012345678901234567890123456789.12345678901234567890123456789012345678901234567890123456789.12345678901234567890123456789012345678901234567890123456789.1234.example.com: Should be False, got True
Entire address is longer than 256 characters
12345678901234567890123456789012345678901234567890123456789012345@example.com: Should be False, got True
Local part more than 64 characters
""@example.com: Should be False, got True
Local part is effectively empty
x@x23456789.x23456789.x23456789.x23456789.x23456789.x23456789.x23456789.x23456789.x23456789.x23456789.x23456789.x23456789.x23456789.x23456789.x23456789.x23456789.x23456789.x23456789.x23456789.x23456789.x23456789.x23456789.x23456789.x23456789.x23456789.x23456: Should be False, got True
Domain exceeds 255 chars
first.last@[.12.34.56.78]: Should be False, got True
Only char that can precede IPv4 address is ':'
first.last@[12.34.56.789]: Should be False, got True
Can't be interpreted as IPv4 so IPv6 tag is missing
first.last@[::12.34.56.78]: Should be False, got True
IPv6 tag is missing
first.last@[IPv5:::12.34.56.78]: Should be False, got True
IPv6 tag is wrong
first.last@[IPv6:1111:2222:3333::4444:5555:12.34.56.78]: Should be False, got True
Too many IPv6 groups (4 max)
first.last@[IPv6:1111:2222:3333:4444:5555:12.34.56.78]: Should be False, got True
Not enough IPv6 groups
first.last@[IPv6:1111:2222:3333:4444:5555:6666:7777:12.34.56.78]: Should be False, got True
Too many IPv6 groups (6 max)
first.last@[IPv6:1111:2222:3333:4444:5555:6666:7777]: Should be False, got True
Not enough IPv6 groups
first.last@[IPv6:1111:2222:3333:4444:5555:6666:7777:8888:9999]: Should be False, got True
Too many IPv6 groups (8 max)
first.last@[IPv6:1111:2222::3333::4444:5555:6666]: Should be False, got True
Too many '::' (can be none or one)
first.last@[IPv6:1111:2222:3333::4444:5555:6666:7777]: Should be False, got True
Too many IPv6 groups (6 max)
first.last@[IPv6:1111:2222:333x::4444:5555]: Should be False, got True
x is not valid in an IPv6 address
first.last@[IPv6:1111:2222:33333::4444:5555]: Should be False, got True
33333 is not a valid group in an IPv6 address
[email protected]: Should be False, got True
TLD can't be all digits
first.last@com: Should be False, got True
Mail host must be second- or lower level
[email protected]: Should be False, got True
Label can't begin with a hyphen
[email protected]: Should be False, got True
Label can't end with a hyphen
first.last@x234567890123456789012345678901234567890123456789012345678901234.example.com: Should be False, got True
Label can't be longer than 63 octets
[email protected]: Should be False, got True
Top Level Domain won't be all-numeric (see RFC3696 Section 2). I disagree with Dave Child on this one.
test@123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012.com: Should be False, got True
255 characters is maximum length for domain. This is 256.
test@example: Should be False, got True
Dave Child says so
first.""[email protected]: Should be False, got True
Contains a zero-length element
first.last@[IPv6:1111:2222:3333:4444:5555:6666:12.34.567.89]: Should be False, got True
IPv4 part contains an invalid octet
a@b: Should be False, got True
aaa@[123.123.123.333]: Should be False, got True
not a valid IP
a@bar: Should be False, got True
[email protected]: Should be False, got True
[email protected]: Should be False, got True
[email protected]: Should be False, got True
[email protected]: Should be False, got True
ip need to be []
Example: michael @snoyman.com
is an accepted email address. I ran into an issue where our validated email addresses were rejected by Amazon SES.
Current constraint is doctest < 0.15
Could well be the case that this was intentional, if it is, could you explain or add a comment?
instance Show EmailAddress where
show = show . toByteString
-- | Converts an email address back to a ByteString
toByteString :: EmailAddress -> ByteString
toByteString (EmailAddress l d) = BS.concat [l, BS.singleton '@', d]
It's possibly the case that for show =
you intended it to be unpack . toByteString
or similar. As it is, the instance renders email addresses wrapped in " ... "
quotation marks. Let me know either way, happy to submit a PR if it should be amended. Example:
What's the status of v3? I'd really like to validate Unicode email addresses.
If you have a list of things that need to be worked on I'd be happy to help.
Thanks for a super useful package!
I ran into a issue with the canonicalizeEmail
function where whitespace interspersed between atoms of an email were trimmed from the given email, for example:
>>> :set -XOverloadedStrings
>>> import Data.ByteString (ByteString)
>>> import Text.Email.Validate (canonicalizeEmail)
>>> canonicalizeEmail ("alice. cool. [email protected]" :: ByteString)
(Just "[email protected]") :: Maybe ByteString
This was not originally the case for canonicalizeEmail
function. Emails containing whitespace were only accepted after #3 was raised. I was not able to find where RFC 5322 listed whitespace as valid characters for the email grammar however. Shouldn't the correct implementation reject emails containing whitespace? i.e.
>>> canonicalizeEmail ("alice. cool. [email protected]" :: ByteString)
Nothing :: Maybe ByteString
I wasn't able to find where RFC 5322 allowed for whitespace outside quotations in the grammar it specified for email address. However, even if "alice. cool. [email protected]"
constitutes a valid email address according to RFC 5322, stripping whitespace within emails means that the emails "alice. cool. [email protected]"
and "[email protected]"
are essentially equivalent according to canonicalizeEmail
function:
>>> import Data.Function (on)
>>> ((==) `on` canonicalizeEmail) "alice. cool. [email protected]" "[email protected]"
True
Of course I think trimming surrounding whitespace is probably reasonable since it does not change the email address in any meaningful way. I do believe that the string "alice. cool. [email protected]"
is meaningfully different from the string "[email protected]"
, so I don't believe canonicalizeEmail
function should behave as if they aren't.
Ideally I would like canonicalizeEmail
to outright reject emails containing whitespace. If that isn't desirable for whatever reason, then I would be really nice for canonicalizeEmail
to not strip whitespace in emails. Especially since the original issue that motivated the change which allowed canonicalizeEmail
to accept emails containing spaces was made to meet the needs of an individual at the cost of making canonicalizeEmail
diverge from the RFC that the rest of the library adheres to.
I'm happy to make either of the changes I proposed here if you agree with me, just let me know what I can do.
This version of template-haskell must be accepted for the library to work on GHC 9.8, since the library is shipped with the compiler. The library was removed from Stackage Nightly for this reason.
Because email-validate is a dependency of yesod-form, and yesod depends on yesod-form, this means all packages depending on yesod had to be removed.
Since you're just wrapping ByteString
you might as well offer the same instances. These also come in hand for serialization.
See also #67
it'd be nice to have a quasiquoter that can be run at compile time so i don't need to do (fromJust $ emailAddress "[email protected]"). I had a bit of a go, but got a weird error from ByteString's Data.Data instance:
emailQQ = QuasiQuoter { quoteExp = quoteEmailExp
}
quoteEmailExp s = do
case emailAddress (BS8.pack s) of
Nothing -> error "bugger"
Just x -> trace "worked" $ dataToExpQ (const Nothing) (x :: EmailAddress)
[1 of 1] Compiling Main ( test/Spec.hs, .stack-work/dist/x86_64-linux/Cabal-1.22.5.0/build/email-validate-qq-test/email-validate-qq-test-tmp/Main.o )
worked
/home/mark/projects/email-validate-qq/test/Spec.hs:12:12:
Exception when trying to run compile-time code:
Data.ByteString.ByteString.toConstr
Code: template-haskell-2.10.0.0:Language.Haskell.TH.Quote.quoteExp
emailQQ "[email protected]"
is the Data instance for ByteString just not up to it? I might be doing something terribly wrong.
Needs template-haskell-2.17
. See here:
https://github.com/commercialhaskell/stackage/blob/master/build-constraints.yaml#L5267
This effectively disables a huge subtree of dependent packages.
Please see http://hydra.cryp.to/build/805289/nixlog/2/raw for a complete build log that shows the error messages.
http://hackage.haskell.org/package/path-pieces-0.2.1 is a widely used package, and this instance is useful when using the EmailAddress
type from this package together with yesod forms.
The implementation is pretty straightforward:
instance PathPiece EmailAddress where
toPathPiece = toPathPiece . decodeUtf8 . toByteString
fromPathPiece = emailAddress . encodeUtf8
Could I trouble you for another test suite bounds bump? :)
hspec-2.5.0 is out of bounds for:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.