I'm looking at all of the characters that get written in hex in mailmime_quoted_printable_write_driver() and it seems that this routine is overly aggressive (or pessimistic) in what gets rewritten:
int mailmime_quoted_printable_write_driver(int (* do_write)(void , const char *
, size_t), void * data, int * col, int istext,
const char * text, size_t size)
{
...
case '!':
case '"':
case '#':
case '$':
case '@':
case '[':
case '':
case ']':
case '^':
case '`':
case '{':
case '|':
case '}':
case '~':
case '=':
case '?':
case '':
case 'F': /_ there is no more 'From' at the beginning of a line */
r = write_remaining(do_write, data, col, &start, &len);
if (r != MAILIMF_NO_ERROR)
return r;
start = text + i + 1;
snprintf(hexstr, 6, "=%02X", ch);
r = mailimf_string_write_driver(do_write, data, col, hexstr, 3);
if (r != MAILIMF_NO_ERROR)
return r;
i ++;
break;
I'm not sure why anything OTHER than '=' needs to be escaped?
All of these are 7-bit safe characters from NVT.
I'd be inclined to remove everything but '='.
As that's all that's really required. Per RFC-2045, section 6.7 "Quoted-Printable Content-Transfer-Encoding":
(2) (Literal representation) Octets with decimal values of
33 through 60 inclusive, and 62 through 126, inclusive,
MAY be represented as the US-ASCII characters which
correspond to those octets (EXCLAMATION POINT through
LESS THAN, and GREATER THAN through TILDE,
respectively).
Although reading RFC-1521 (now obsolete), "Appendix B -- General Guidelines For Sending Email Data" says:
(6) Many mail domains use variations on the ASCII character set,
or use character sets such as EBCDIC which contain most but not
all of the US-ASCII characters. The correct translation of
characters not in the "invariant" set cannot be depended on across
character converting gateways. For example, this situation is a
problem when sending uuencoded information across BITNET, an
EBCDIC system. Similar problems can occur without crossing a
gateway, since many Internet hosts use character sets other than
ASCII internally. The definition of Printable Strings in X.400
adds further restrictions in certain special cases. In
particular, the only characters that are known to be consistent
across all gateways are the 73 characters that correspond to the
upper and lower case letters A-Z and a-z, the 10 digits 0-9, and
the following eleven special characters:
"'" (ASCII code 39)
"(" (ASCII code 40)
")" (ASCII code 41)
"+" (ASCII code 43)
"," (ASCII code 44)
"-" (ASCII code 45)
"." (ASCII code 46)
"/" (ASCII code 47)
":" (ASCII code 58)
"=" (ASCII code 61)
"?" (ASCII code 63)
A maximally portable mail representation, such as the base64
encoding, will confine itself to relatively short lines of text in
which the only meaningful characters are taken from this set of 73
characters.
Are we really worried about EBCDIC and X.400 compatibility???? This section continues as:
(7) Some mail transport agents will corrupt data that includes
certain literal strings. In particular, a period (".") alone on a
line is known to be corrupted by some (incorrect) SMTP
implementations, and a line that starts with the five characters
"From " (the fifth character is a SPACE) are commonly corrupted as
well. A careful composition agent can prevent these corruptions
by encoding the data (e.g., in the quoted-printable encoding,
"=46rom " in place of "From " at the start of a line, and "=2E" in
place of "." alone on a line.
Please note that the above list is NOT a list of recommended
practices for MTAs. RFC 821 MTAs are prohibited from altering the
character of white space or wrapping long lines. These BAD and
illegal practices are known to occur on established networks, and
implementations should be robust in dealing with the bad effects they
can cause.
Given that converting punctuation into HEX is NOT a recommended practice, why are we doing it anyway?
Some of these concerns were a lot more relevant in 1993 when this was written. These days the "^From " bugs, etc. are anachronisms.