What DKIM is made of, from a tech perspective?
On another article, I've explained briefly how DKIM works, this one will try to describe what exactly DKIM is made of, and what are the technos used to generate a signature.
Maybe on a first place, let's dissect a DKIM header found on an email received:
DKIM-Signature: a=rsa-sha256;
bh=Mi4Ptruf3aiF5LqQkgnB4ysAKkkkxo7wikG3Cc8o8SE=;
c=relaxed/relaxed;
d=bressier.fr;
h=to:cc:from:reply-to:subject:date:mime-version:content-type:list-id:list-unsubscribe:x-csa-complaints:list-unsubscribe-post:message-id:sender:x-sib-id:x-mailin-client:x-mailin-campaign:feedback-id;
q=dns/txt;
s=mail;
t=1618653372;
v=1;
b=xjAWgYt1qLwxzeO4C58+13pa9xUbhy7osvfEYNu9BxDHRAzdq6um9dUjbiGlyQZNVQGGWkxr LOqZAI782Tl0Jm8KhW2XOPXTM0tbyIeBCkaSBAur6A+xATnhqXCbmWYmOLPhYAinKPpgpH6RDsE rlA4CvDQtkEemLYEdpH9MdIE=
You can here observe there are several different fields on a single DKIM header, what are their purpose?
- "a=" is the algorithm to use to perform the signature, divided into 2 parts:
- the cryptosystem to use to generate the signature, currently all DKIM receivers must handle RSA (but some ESPs and mailbox providers start to try Ed25519 as well, not yet well supported but maybe the future of DKIM? Stronger and shorten keys)
- and a cryptographic hash algorithm, mostly sha256 these days (if you are still using sha1, you MUST move to sha256, sha1 is now vulnerable)
- "bh=" is called the Body Hash, it is the result of body hashing, using the hash algorithm described on the a= param
- "c=" are the canonicalization algorithms to use to standardize the headers and body before signature, there's basically 2 choices for both headers and body, a simple or a relaxed option, that will be detailed later
- "d=" is the domain responsible for the signature
- "h=" is the list of the headers that will be included on the signature
- "q=" is the way of getting the public key used to validate the signature, current implementation of DKIM is from the DNS zone of "d=" domain
- "s=" is what we call the selector, the very first part of the host where to find the public key for the domain "d=" customselector._domainkey.example.com
- "t=" is the timestamp of the signature
- "v=" for the version of the DKIM implementation, (there's only one version so far)
- "b=" which is finally the signature generated using the RSA private key
Why describing all that fields, you'll see that all of those actually have a usage on several steps to create the final signature.
I'll present here the different steps followed to verify a DKIM signature from an email received:
- The very first point will be actually to verify the format of the DKIM signature, is there all mandatory fields on the DKIM header?
- Get the canonicalization algorithms from "c=" and start canonicalizing the body part, there's 2 possible different behaviors:
- If body canonicalization protocol is "Simple", canonicalization only means here to remove duplicate empty lines at the end of the email and only keep one single empty line
- If body canonicalization protocol is "Relaxed", canonicalization will require more steps
- first remove all whitespaces at the end of all lines
- replace multiple consecutive whitespaces by single whitespaces on the body
- remove duplicate empty lines at the end of the email and only keep one single empty line
- Calculate the hash of the canonicalized body using the hash algorithm defined in "a=", and see it the freshly calculated hash is matching the "bh=" value. If it do not match, it means the body is not the same on the email received that it were on the email sent, the body has been altered. If there's a mismatch here, DKIM fails with a "body hash did not verify" error
- If body hashes match, get the public key from the DNS zone of the signing domain and verify the validity of the format.
- Get the canonicalized version of the headers listed on "h=" (almost the same principle than for the body, the idea is to have a standard version with Simple or Relaxed way to compute the headers)
- OpenSSL or another cryptographic library will then decrypt the signature ("b=" value) using the public DKIM key got from the DNS zone of the signing domain, this produces a hash as a result, and will compare that hash with a computed hash of the [canonicalized headers + body hash freshly calculated]
- If both hashes are matching, DKIM will pass :-)
- Else, you will have a DKIM fail, with such "DKIM signature did not verify" error
If there are multiple DKIM signatures on the email, all the signatures will be evaluated using that same process.
Woow, yes? If you are still here, reading that final words, congrats! DKIM as a concept is quite easy to understand, but it is technically not obvious ;-)
Hope you now have a better overview of that techno, I have deliberately omitted some parameters to not make it more complex, but there's also controls on signature datetime expiration, DKIM version verification, verification that mandatory fields are included into the signature (basically only From header is mandatory)
I let you read the raw RFC if you are interested (and brave) enough : https://tools.ietf.org/html/rfc6376
Have fun!