[TAG] IMAP4 vs POP3

Mike Orr sluggoster at gmail.com
Tue Nov 22 12:18:27 MSK 2005


On 11/21/05, Martin J Hooper <martinjh at blueyonder.co.uk>
> Thanks Mike - Another quick question..  Can you filter messages on IMAP?

A client program can log in periodically and apply a filtering
algorithm.  I don't see how you can have the server automatically do
it.

> I am on quite a few mailing lists and it would be handy to do... ;)

I think you'd have to have a client program analyze the messages and
move them to folders or set flags on them.

I'm just learning IMAP now because I have to write a cron job that
copies messages from a certain mailbox every five minutes.   
Meanwhile another program downloads and deletes them every thirty
minutes -- a program I'm planning to replace.  I'm saving them to
files by UID (unique ID), and I'd like to avoid the ones I've already
have.  But IMAP programming is a real pain: the RFC doesn't have many
examples so you have to guess at syntax and interpret vague error
messages.  For instance, I'd like to search for all messages except
the UIDs I have.  Is that:
    NOT UID 111,222
    NOT UID (111 222)
    NOT UID 111 AND NOT UID 222
    NOT (UID 111 UID 222)

My colleague suggested the SINCE criterion and I thought wonderful,
why didn't I think of that?  Just keep a timestamp and do like HTTP
if-modified-since.  Problem: SINCE takes only a date, not a time, so
unless you do it exactly at midnight it'll be wrong.

Now say you want to download a message in RFC822 format with UID.  The
FETCH specifier is "(UID RFC822)".  The output is something like:
"(1 UID 111 RFC822 {length-of-message} Date:... )"

Python's ultra-primitive imaplib translates this to:

``
("OK", [
    ("(1 UID 111 RFC822 {length-of-message} ", "Date:..."),
    ")"
    ])
''

I couldn't f**ing believe it.  At least the message body is in a
separate string, even though you're not totally confident it will
always be result[1][0][1].  Why does it even bother with the "OK"
status code when any other status would raise an exception?
I assume the UID will always be after the word "UID", but sometimes a
value is multiword-quoted, so you'd have to use a string tokenizer to
extract it (hoping that it will never misparse a value that contains
unexpected characters).  I've started making an imap library class,
containing whichever methods I need as time goes on.

These messages are announcements.  The ultimate goal is to parse the
body content into a database.

I gather your unwritten filtering program would work something like
that.  Download the subjects of all messages, move the list ones into
folders, flag the remaining ones as "seen by filter program".  Then
next time search for any new messages without that flag.  This
wouldn't guarantee that you'd never see a list message in your inbox
-- it may have come in after the last run -- but it'd cut down on them
dramatically.

--
Mike Orr <sluggoster at gmail.com>
(mso at oz.net address is broken)





More information about the TAG mailing list