RFC822 string exporter #16

ronaldmannak · 2022-06-14T04:25:44Z

Description

With Apple MailKit extensions it is more common to both parse mime messages and export those back to a RFC822 formatted string. This draft PR adds a RFC822 compliant exporter to MimeParser.

In addition, this PR makes a number of initializers public, to accommodate raw -> Mime -> sign/encrypt -> raw workflow.

Testing

This is a draft PR has no unit tests yet

Known issues

The order of the headers is not preserved.
There are some unresolved edge cases, including mails not getting exported (or rather parsed) correctly by Apple Mail on macOS when mailed from and to the same Gmail account. (The mail will be correctly parsed by Gmail webmail and Apple Mail for iOS, as will mails between two different Gmail accounts).

Feedback

Happy to receive feedback on both the RFC822 formatting (it's a pain) and whether this is direction is in line with the creator's vision.

miximka · 2022-06-14T17:12:33Z

Dear Ronald,

thank you very much for your PR. It would be a great addition not only be able to parse, but also to generate mimes directly in Swift.

I know, it's not easy to produce a valid-formatted mime. Sure, the simplest case is easily done - it's just a bunch of string concatenations. But in a general case, we need to consider what data we are getting from the user and enforce the mime/rfc822 rules accordingly. There are a couple of important requirements, for instance, header fields only allowed to contain ascii characters otherwise they need to be encoded. Or, some characters need to be quoted. Or, header fields must be folded if they exceed length limitation, and so on.

What I am trying to say is, the implementation of such a feature requires deeper study of the RFC specification even than it did for the parser. Because when parsing goes wrong, you usually end up causing troubles only for yourself, but with a wrongly generated mimes, you may end up unknowingly causing troubles somewhere at the receiver end, which is much worse. At the moment, user data validation is not so much present, and it would be great to see at least some of the most important validations/transformations covered in this PR.

Likewize, it would be great if you would add unit tests for mime generation, as it is already the case for the parsing part. This way, you can ensure the correctness of the generated mime and allow for code to be easily changed without fearing to introduce regression bugs in the future.

I hope I didn't sound too grumpy and would really love to see this PR improved and merged very soon!! 👍

Best regards,
Maksim

ronaldmannak · 2022-06-14T19:11:37Z

Hi Maksim,

Thanks for your quick response, I appreciate it.

You don't sound grumpy at all and, in fact, I agree with you :) The reason this is a draft PR is exactly because it will take some effort and iterations to make it fully comply with RFC822/2044/2045, work with all edge cases and work with all mail clients (there are some discrepancies in Mime parsing across the clients).

However, the current implementation already works for the vast majority of cases. I'm already using it in an early beta Mail extension myself.

The RFCs are not that long and fairly readable. There are also a lot of existing libraries in C and JS available.

I also agree adding unit tests should be priority #1. A simple roundtrip of the existing mail templates should give us a good baseline. By roundtrip I mean parsing the existing mail messages into Mime, exporting them back to an RFC822 string and compare them to the original message. If the two strings are exactly the same, we know the export isn't adding or removing characters or tokens that will mess up the parsing of the message for mail lcients.

I am happy to add the tests. However, it would be helpful for the unit tests if we could keep the order of the header fields of the message. Right now, content transfer encoding, content type and content disposition are out of order since they're not stored in the MimeHeader.other array.

What are your thoughts on that? Would it make sense to preserve the header order? If so, what would be the easiest way? If not, how can we compare the exported messages to the original in the unit test? Let me know your thoughts!

miximka · 2022-06-14T19:41:15Z

Yeah, I already saw the order issue coming, but in my use-case it did not matter, so I didn't pay much attention to it earlier. I guess we could just remove the three hard-coded fields and other array, replace them with something like fields array and provide an easy access to the particular fields using something like header.field(type: .contentType), header.field(type: "Content-Type"), or even something more sophisticated like header[.contentType] using subscripts or key paths etc. But really, I don't have much preference, if you have some other idea, you are welcome!!

ronaldmannak · 2022-06-14T20:21:14Z

Just a quick initial draft to fix to preserve the header order.

public struct MimeHeader {
    public let contentTransferEncoding: ContentTransferEncoding?
    public let contentType: ContentType?
    public let contentDisposition: ContentDisposition?

    // Propose to add new EmailHeader struct to store parsed to, cc and bcc headers since there are scenarios where these have to be edited
    public var to: [Email]? 
    public var cc: [Email]?
    public var bcc: [Email]? 

    // There might be a reason to keep other next to all. Not sure.
    @available(*, deprecated, message: "Use all property instead")
    public let other: [RFC822HeaderField]

    public let all: [RFC822HeaderField] // Add all property that preserves order
}

ronaldmannak · 2022-06-14T20:25:29Z

Our messages crossed :) I totally understand order wasn't necessary in the original use case. I also thought about a header protocols but not sure if that would be over engineering but it would sure break existing code. I personally wouldn't mind breaking backwards compatibility, but other projects who use the library might have a different opinion.

ronaldmannak · 2022-06-14T22:17:13Z

This might be closer to what you meant. I just wrapping all fields in an enum. There's room for optimization but it might be cleaner in the end.

enum HeaderType {
    case contentTransferEncoding(ContentTransferEncoding)
    case contentType(ContentType)
    case contentDisposition(ContentDisposition)
    case to([EmailField])
    case cc([EmailField])
    case bcc([EmailField])
    case other(RFC822HeaderField)
}

public struct MimeHeader {
    public var contentTransferEncoding: ContentTransferEncoding? {
        // code to fetch field from fields array
    }
    ...

    @available(*, deprecated, message: "Use fields property instead")
    public let other: [RFC822HeaderField]

    public let fields: [HeaderType]
}

The new unit tests all fail, we’ll work towards fixing them.

ronaldmannak · 2022-06-15T04:24:04Z

@miximka I have implemented a version of the second example that both parses and exports the headers, while providing backwards compatibility to existing code. I've moved all header related code to a new file called MimeHeader.swift, you can can see the changes there and in HeaderParser.swift

I've also added unit tests in RFC822ExportTest.swift that tests a roundtrip for every mail file included. The unit tests fail at the moment. I can already see two issues:

The header parser unfolds header values that are folded (span over multiple lines). This is visible in the Received: header field. I'm not sure if there's any technical reason to unfold the header values. I can't reconstruct the folds. And technically, both folded and unfolded headers are valid.
I'm using four spaces (LWSP-chars in RFC822 parle) to fold headers like Content-Type, in line with Apple Mail's internal Mime parser. The examples messages use an arbitrary number of spaces, from 1 to 4.

I assume both are deliberately to test the parsers with different standards/messages. However, for the roundtrip tests it doesn't work.

I can think of two options:

I create a copy of every resource that removes arbitrary folds in all fields except Content-Type and use four spaces consistently for folds in the Content-Type field.
We find a better way to compare strings. Something like the Kaleidoscope app that compares two documents character-by-character. But I'm not sure how that would work.

What do you think?

Edit: just added a quick method to visualize tokens (\r\n, \n, \t only for now). I can see the example messages and a new file I just copy&pasted from Apple Mail only use LF (\n) instead of CRLF (\r\n). That's not conform RFC822. Could it be that the CRLF is somehow removed by the copy and paste command? Or was \r\n made redundant in one of the later RFCs?

miximka · 2022-06-16T09:44:49Z

@ronaldmannak

I can think of two options:

I create a copy of every resource that removes arbitrary folds in all fields except Content-Type and use four spaces consistently for folds in the Content-Type field.

We find a better way to compare strings. Something like the Kaleidoscope app that compares two documents character-by-character. But I'm not sure how that would work.

Option 1 sounds reasonable to me 👍.

Edit: just added a quick method to visualize tokens (\r\n, \n, \t only for now). I can see the example messages and a new file I just copy&pasted from Apple Mail only use LF (\n) instead of CRLF (\r\n). That's not conform RFC822. Could it be that the CRLF is somehow removed by the copy and paste command? Or was \r\n made redundant in one of the later RFCs?

It must be one of the Apple's we-do-our-own-thing again 🤦‍♂️. Because even if you save an email to *.eml file and look inside it with some hex-editor, you will only find \n line endings inside it. I just cross-checked with Gmail (you can download an *.eml file from Gmail's web-ui directly) and the file only had \r\n line endings for the same message...

ronaldmannak · 2022-06-16T16:10:56Z

@miximka I can confirm that as well. I used a Mail extension to get the raw message directly (just in case the exporter encodes the message or something) last night and found only \n line endings as well.

The core issue is that MIME is tricky standard and some apps (like Apple Mail) don't seem to be 100% compliant. No unit test can help here. I think we should aim to make it 100% work with Apple Mail (and Gmail) first, and then take it from there.

ronaldmannak added 5 commits May 23, 2022 16:36

Add public initializers

ee0d217

Add more public initializers

11c15cb

Merge branch 'feature/publicinitializers' into develop

2980d7e

Merge branch 'miximka:master' into master

a73a772

Add RFC822 string exporter

e5a9b1d

ronaldmannak added 6 commits June 14, 2022 16:52

Move MimeHeaders to separate file

5aa2f53

Wrap headers in HeaderType

7be4ed5

Update RFC822 header exporter

6406e1f

Add initial unit tests

9f852de

The new unit tests all fail, we’ll work towards fixing them.

Merge branch 'feature/headerorder' into develop

a584b24

Remove comment

bfd83a4

ronaldmannak added 4 commits June 14, 2022 21:46

Show hidden tokens in test emails

81914ba

Slightly smarter showToken method

f7c42d0

Code cleanup

5c68b61

Make Mime initializer public

d73e293

ronaldmannak added 6 commits June 16, 2022 09:37

Make boundary conform Apple Mail encoding

c32f454

Add MailKit option

290bb85

Merge branch 'feature/MailHeader' into develop

c6b4a48

Refactor MimeHeader

a6a90ae

Support for encoding strings

6c75c75

update mime type

276571a

ronaldmannak added 16 commits August 21, 2022 19:23

Make more methods public

864f303

Add Ubuntu action

67b971c

Remove verbose flag

1a0be03

Disable RFC822 export test

6b6d07c

Change action name to Ubuntu

b1277de

Add macOS action

e27fc6b

Fix unit tests

ad00d10

Merge branch 'master' of https://github.com/MoonfishApp/MimeParser

6de9e88

Add RFC2047 decoder

c14fc51

Update unit tests

5758db0

Rename macOS action

6b30bde

Use Swift 5.7

4aab6a5

yaml fix

9dfeb2d

Add armor parsing tests

a19bdda

Fix Q-coded Qencoding

b475ef3

Add public quoted printable decoder

55c698d

jasonm23 mentioned this pull request Jun 9, 2023

Add swift test runner #19

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC822 string exporter #16

RFC822 string exporter #16

ronaldmannak commented Jun 14, 2022 •

edited

Loading

miximka commented Jun 14, 2022 •

edited

Loading

ronaldmannak commented Jun 14, 2022

miximka commented Jun 14, 2022

ronaldmannak commented Jun 14, 2022

ronaldmannak commented Jun 14, 2022 •

edited

Loading

ronaldmannak commented Jun 14, 2022

ronaldmannak commented Jun 15, 2022 •

edited

Loading

miximka commented Jun 16, 2022

ronaldmannak commented Jun 16, 2022

RFC822 string exporter #16

Are you sure you want to change the base?

RFC822 string exporter #16

Conversation

ronaldmannak commented Jun 14, 2022 • edited Loading

Description

Testing

Known issues

Feedback

miximka commented Jun 14, 2022 • edited Loading

ronaldmannak commented Jun 14, 2022

miximka commented Jun 14, 2022

ronaldmannak commented Jun 14, 2022

ronaldmannak commented Jun 14, 2022 • edited Loading

ronaldmannak commented Jun 14, 2022

ronaldmannak commented Jun 15, 2022 • edited Loading

miximka commented Jun 16, 2022

ronaldmannak commented Jun 16, 2022

ronaldmannak commented Jun 14, 2022 •

edited

Loading

miximka commented Jun 14, 2022 •

edited

Loading

ronaldmannak commented Jun 14, 2022 •

edited

Loading

ronaldmannak commented Jun 15, 2022 •

edited

Loading