Home

Trying to make use of Outlook’s Thread-Index: header

2007/12/11

tl;dr Finally the format of the Thread-Index: header is documented!

Recently I was in a situation where I had to reconstruct a thread of email messages using the Thread-Index: header which is used by Microsoft’s products, instead of the standard way of threading using Message-Id:, References: and In-Reply-To:

The truth is that I was really frustrated, thinking that Microsoft was breaking the standards using custom headers that do not begin with X- but as Dan Bernstein points out:

822 promised that the IETF would never define field names beginning with X-. It did not prohibit use of non-X names by other organizations.”

Which means that Microsoft is allowed to add Thread-Index: (and Thread-Topic:) without breaking any standards. On the other hand Microsoft does not document anywhere (at least anywhere I looked and I looked plenty) how Thread-Index: is calculated and how it can be decoded to be made useful by any other application, any other than Outlook that is.

After some experimenting and a little bit of reverse engineering I’ve reached to the following results:

  • Thread-Topic: preserves the original subject of the thread, that is the Subject: but stripped from any Fw: or Re: prefixes.
  • Thread-Index: is used in a way similar to In-Reply-To: and References: Assuming that the first message in a thread has a:
    Thread-Index: AcdyY+a08VX8xfobTsy61v9NHPZ7QA==

    and the next in thread:

    Thread-Index: AcdyY+a08VX8xfobTsy61v9NHPZ7QAAAiXbA

    while a third one:

    Thread-Index: AcdyY+a08VX8xfobTsy61v9NHPZ7QAAAiXbAAAXP5fw=

    and a fourth one:

    Thread-Index: AcdyY+a08VX8xfobTsy61v9NHPZ7QAAAiXbAAAXP5fwAABGXGw==

    the pattern that decides the threading seems obvious; I have not yet found out what the single or double equal sign suffix means.

If only Microsoft could make such simple information available! Think of all the lost work hours! Only after I had resolved my problem did I find out about these guys, who had arrived on similar conclusions about the usage of Thread-Index:

Update #1: You may be interested to read the next episode.

Update #2: Yes, I keep refusing the BASE64 explanation. This is because what the BASE64 value decodes to is something either meaningless, or without known semantics.

Update #3: From the GNOME documentation: The value is apparently unique but has no meaning we know of. That is why I refuse the BASE64 explanation. It looks like a BASE64 string and it can get decoded into a string of bytes that one can represent as a number. But the questions remain unanswered: How is the first 27-byte long value chosen? Why every “next” value in a thread 5 bytes longer than the previous one? How are these 5 bytes chosen? The decoded value of an undocumented BASE64 string remains undocumented, hence it may not even be a BASE64 string at all (and may only coincidentally look like one).


The example Thread-Index: headers are taken from the MediaDefender Defenders site

Advertisements

15 Responses to “Trying to make use of Outlook’s Thread-Index: header”

  1. Apostolos Says:

    They seem to me like base64 encoded strings (because of the equal signs at the end).

  2. adamo Says:

    @Apostolos:
    Unfortunately, they are not :(

  3. Tack Says:

    @adamo:

    Apostolos is correct. They are base64 encoded.

  4. Arboleda Says:

    @adamo:

    Apostolos 8s correct. They are base64 encoded.;

  5. adamo Says:

    To the next person that will insist that they are base64 encoded:

    1- Please try and decode the string. Then come up with a meaningful explanation of the result.

    2- Still not convinced? Read this paper [pdf].

    3- Still not convinced? Read my next post on the subject.

  6. Ryan Mauger Says:

    The above people ARE (sort of) correct.

    base64 is NOT for encoding strings. it is for encoding BINARY data in an ascii string, to make it safe for ascii mode data transfer (as is required by SMTP)

    The result still looks meaningless when decoded, because your trying to read it as a string, when it is infact, binary data.

    • adamo Says:

      It is OK for you to believe that I did not try to see it as binary data because you do not know me. The above people (including you) are NOT correct, or are leaving guesswork to the reader. One can give any x as input to an f(x) and get some output. This is the case here with base64. However, as long as someone is not telling what the resulting binary data stands for, it still is undocumented garbage.

      So please provide an explanation of the binary data we are all looking at in order to provide a correct (and complete) answer. What is it that you are seeing there? Is it a number? A random byte sequence? Something else? But do not tell me that it is binary data without telling me what it stands for. Everything can be viewed as binary data if it suits the purpose. So the question remains.

  7. Mrten Says:

    it’s an OLE timestamp (22 bytes), appended with timediffs (5 bytes). which sucks, because the timestamp is not guaranteed unique.


  8. LOL – nice one Mrten.

    @adamo- When it looks like Base64, smellls like Bas64, Tastes like Bas64, and *everyone* else tells you, time and again, it’s Base64, you looks pretty foolish screaming over-and-over that it’s not base64, just because you didn’t comprehend what’s inside.

    Heck – it’s a timestamp – you can’t get much easier than that to figure out. Send. Wait 60 seconds. Send again. Oh look – the base-64-decoded number increased by 60. I wonder what it could mean :-)

    • adamo Says:

      You are making the assumption that I was using Outlook when I wrote the post. I am not an Outlook user. I had a mail archive though that I wanted to work on which had several messages with Thread-Index: headers.

      You are trying to make a fuss over my BASE64 complaint. Yes I do understand that it is BASE64 encoded data. So what? You tell me that these 22 bytes are a timestamp. Is it an integer? From what epoch? Does it matter if an epoch exists? Why are five more bytes needed to describe the difference from the previous timestamp?

      You are trying to scold me for my BASE64 complaints yet you provide no useful information to me or the readers, do you?

      Thank you for your time.

  9. Kasper Brandt Says:

    They are just base64 encoded Conversation Index values which are quite well documented. See e.g. http://msdn.microsoft.com/en-us/library/ee202481(v=exchg.80).aspx and http://msdn.microsoft.com/en-us/library/ee202481(v=exchg.80).aspx

  10. jakubbartkowiak572080007 Says:

    If anyone has mastered Thread-Index parsing then please take a look at this question:

    http://forum.rebex.net/questions/3841/how-to-interprete-thread-index-header

    Any idea why some Thread-Indexes when parsed generate valid timestamp and some generate garbage?


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: