Beyond Bayesian Filters: Why your MTA needs Structural DNA matching

Beyond Bayesian Filters: Why your MTA needs Structural DNA matching

The "Spam" problem is changing. Traditional filters (SpamAssassin, Rspamd) are excellent at catching known signatures or keyword-heavy junk. But modern phishing is polymorphic: attackers slightly tweak the content of every email to bypass classic filters.

This is where Mailuminati Guardian comes in.

Why I Built This: A Call for Modern, Community-Driven Defense

For a long time, I felt something was missing in the open-source mail ecosystem. We have great tools, but very few are truly platform-agnostic and even fewer focus on collaborative intelligence without compromising privacy.

I started developing Guardian because I believe in the power of sharing. Fighting spam shouldn't be a lonely battle fought by isolated sysadmins. However, most existing solutions are either "black boxes" or rely on heavy, outdated Bayesian logic that struggles with today's fast-moving phishing campaigns.

I wanted a modern solution based on fuzzy hashing—a tool that focuses on the "DNA" of a threat rather than its specific wording, and that allows the community to defend itself collectively. Guardian is my contribution to a more transparent and collaborative anti-spam landscape.

The Tech: Structural Fingerprinting with TLSH

Most filters look at what is written. Guardian looks at how the email is built. It uses TLSH (Trend Micro Locality Sensitive Hash) to generate a structural "DNA" of every incoming message.

Visualizing the Difference (The Homoglyph Attack):

[ EMAIL CONTENT ]             [ CRYPTO HASH ]          [ TLSH (GUARDIAN) ]
--------------------------------------------------------------------------
"Win a prize!"      ------>    5e884898...    ------>    T1A9B0E0F2...
"Win a prizе!"*     ------>    a42d91b2...    ------>    T1A9B0E0F2...
                                   |                        |
                                (AVALANCHE)              (PROXIMITY)
                                   v                        v
                           "Totally different"       "99.9% Match Found"

*Note: The second "prizе" uses a Cyrillic 'e' (homoglyph attack).
    

How Guardian Works (The sidecar approach)

Guardian is a lightweight Go service that runs as a sidecar next to your MTA. It doesn't replace your stack; it supercharges it via a REST API.

  • Normalization: The email is stripped of volatile data.
  • Fingerprinting: Guardian computes the TLSH of the body and attachments.
  • Proximity Detection: Using LSH (Locality Sensitive Hashing), Guardian checks its local Redis cache.
  • The Oracle: If a match is found locally, it's an instant verdict. If it's a "maybe," Guardian queries the Mailuminati Oracle—the collaborative layer.

Closing the Loop: Immediate Local Learning

One of the coolest features for us sysadmins is the Feedback Loop. By integrating with Dovecot via Sieve, Guardian learns in real-time.

The Guardian Feedback Loop Architecture:

 [ USER INTERFACE ]          [ MAIL SERVER ]          [ SECURITY LAYER ]
  (IMAP Client)             (Dovecot + Sieve)         (Guardian Service)
        |                          |                          |
  1. Move to Junk  --------------> |                          |
        |                          |                          |
        |                  2. Trigger Sieve Rule              |
        |                (pipe to report script)              |
        |                          |                          |
        |                          | --- 3. POST /report ---> |
        |                          |    (Message-ID + Type)   |
        |                          |                          |
        |                          |                  4. Local Learning
        |                          |                 (Update TLSH DB)
        |                          |                          |
        | <----------------------- | <--- 5. 200 OK --------- |
  [ INSTANT PROTECTION ]     [ CLUSTER UPDATED ]      [ CAMPAIGN BLOCKED ]
    

Integration: It plays well with others

Guardian is "stack-agnostic". You can plug it into:

  • Rspamd: Via a native Lua prefilter module.
  • SpamAssassin: Via a Perl plugin.
  • Custom MTAs: Via a simple HTTP POST to port 1133.

Why you should try it

If you're tired of seeing the same phishing template bypass your filters just because the attacker changed a few words, it's time to move to structural defense. It's Open Source (GPLv3), written in Go for performance, and respects privacy (it never shares raw mail content, only hashes).

Check it out here: guardian.mailuminati.com
Source Code: GitHub - Mailuminati/Guardian