evolution/mail/README.async

*******
This document is absurdly, obscenely out of date. Don't read it.

	-- Peter Williams 7/2/2001
*******

Asynchronous Mailer Information
Peter Williams <peterw@helixcode.com>
8/4/2000

1. INTRODUCTION

It's pretty clear that the Evolution mailer needs to be asynchronous in
some manner. Blocking the UI completely on IMAP calls or large disk reads
is unnacceptable, and it's really nice to be able to thread the message
view in the background, or do other things while a mailbox is downloading.

The problem in making Evolution asynchronous is Camel. Camel is not
asynchronous for a few reasons. All of its interfaces are synchronous --
calls like camel_store_get_folder, camel_folder_get_message, etc. can
take a very long time if they're being performed over a network or with
a large mbox mailbox file. However, these functions have no mechanism
for specifying that the operation is in progress but not complete, and
no mechanism for signaling when to operation does complete.

2. WHY I DIDN'T MAKE CAMEL ASYNCHRONOUS

It seems like it would be a good idea, then, to rewrite Camel to be
asynchonous. This presents several problems:

	* Many interfaces must be rewritten to support "completed"
	  callbacks, etc. Some of these interfaces are connected to
	  synchronous CORBA calls.
	* Everything must be rewritten to be asynchonous. This includes
	  the CamelStore, CamelFolder, CamelMimeMessage, CamelProvider,
	  every subclass thereof, and all the code that touches these.
	  These include the files in camel/, mail/, filter/, and
	  composer/. The change would be a complete redesign for any
	  provider implementation.
	* All the work on providers (IMAP, mh, mbox, nntp) up to this
	  point would be wasted. While they were being rewritten
	  evolution-mail would be useless.

However, it is worth noting that the solution I chose is not optimal,
and I think that it would be worthwhile to write a libcamel2 or some
such thing that was designed from the ground up to work asynchronously.
Starting fresh from such a design would work, but trying to move the
existing code over would be more trouble than it's worth.

3. WHY I MADE CAMEL THREADED

If Camel was not going to be made asynchronous, really the only other
choice was to make it multithreaded. [1] No one has been particularly
excited by this plan, as debugging and writing MT-safe code is hard.
But there wasn't much of a choice, and I believed that a simple thread
wrapper could be written around Camel.

The important thing to know is that while Camel is multithreaded, we
DO NOT and CANNOT share objects between threads. Instead,
evolution-mail sends a request to a dispatching thread, which performs
the action or queues it to be performed. (See section 4 for details)

The goal that I was working towards is that there should be no calls
to camel made, ever, in the main thread. I didn't expect to and
didn't do this, but that was the intent.

[1]. Well, we could fork off another process, but they share so much
data that this would be pretty impractical.

4. IMPLEMENTATION

a. CamelObject

Threading presented a major problem regarding Camel. Camel is based
on the GTK Object system, and uses signals to communicate events. This
is okay in a nonthreaded application, but the GTK Object system is
not thread-safe.

Particularly, signals and object allocations use static data. Using
either one inside Camel would guarantee that we'd be stuck with
random crashes forevermore. That's Bad (TM).

There were two choices: make sure to limit our usage of GTK, or stop
using the GTK Object system. I decided to do the latter, as the
former would lead to a mess of "what GTK calls can we make" and
GDK_THREADS_ENTER and accidentally calling UI functions and upgrades
to GTK breaking everything.

So I wrote a very very simple CamelObject system. It had three goals:

	* Be really straightforward, just encapsulate the type
	  heirarchy without all that GtkArg silliness or anything.
	* Be as compatible as possible with the GTK Object system
	  to make porting easy
	* Be threadsafe

It supports:

	* Type inheritance
	* Events (signals)
	* Type checking
	* Normal refcounting
	* No unref/destroy messiness
	* Threadsafety
	* Class functions

The entire code to the object system is in camel/camel-object.c. It's
a naive implementation and not full of features, but intentionally that
way. The main differences between GTK Objects and Camel Objects are:

	* s,gtk,camel,i of course
	* Finalize is no longer a GtkObjectClass function. You specify
	  a finalize function along with an init function when declaring
	  a type, and it is called automatically and chained automatically.
	* Declaring a type is a slightly different API
	* The signal system is replaced with a not-so-clever event system.
	  Every event is equivalent to a NONE__POINTER signal. The syntax
	  is slightly different: a class "declares" an event and specifies
	  a name and a "prep func", that is called before the event is
	  triggered and can cancel it.
	* There is only one CamelXXXClass in existence for every type.
	  All objects share it.

There is a shell script, tools/make-camel-object.sh that will do all of
the common substitutions to make a file CamelObject-compatible. Usually
all that needs to be done is move the implementation of the finalize
event out of the class init, modify the get_type function, and replace
signals with events.

Pitfalls in the transition that I ran into were:

	* gtk_object_ref -> camel_object_ref or you coredump
	* some files return 'guint' instead of GtkType and must be changed
	* Remove the #include <gtk/gtk.h>
	* gtk_object_set_datas must be changed (This happened once; I
	  added a static hashtable)
	* signals have to be fudged a bit to match the gpointer input
	* the BAST_CASTARD option is on, meaning failed typecasts will
	  return NULL, almost guaranteeing a segfault -- gets those
	  bugs fixed double-quick!

b. API -- mail_operation_spec

I worked by creating a very specific definition of a "mail operation"
and wrote an engine to queue and dispatch them.

A mail operation is defined by a structure mail_operation_spec
prototyped in mail-threads.h. It comes in three logical parts -- a
"setup" phase, executed in the main thread; a "do" phase, executed
in the dispatch thread; and a "cleanup" phase, executed in the main
thread. These three phases are guaranteed to be performed in order
and atomically with respect to other mail operations.

Each of these phases is represented by a function pointer in the
mail_operation_spec structure. The function mail_operation_queue() is
called and passed a pointer to a mail_operation_spec and a user_data-style
pointer that fills in the operation's parameters. The "setup" callback
is called immediately, though that may change.

Each callback is passed three parameters: a pointer to the user_data,
a pointer to the "operation data", and a pointer to a CamelException.
The "operation data" is allocated automatically and freed when the operation
completes. Internal data that needs to be shared between phases should
be stored here. The size allocated is specified in the mail_operation_spec
structure.

Because all of the callbacks use Camel calls at some point, the
CamelException is provided as utility. The dispatcher will catch exceptions
and display error dialogs, unlike the synchronous code which lets
exceptions fall through the cracks fairly easily.

I tried to implement all the operations following this convention. Basically
I used this skeleton code for all the operations, just filling in the
specifics:

===================================

typedef struct operation_name_input_s {
	parameters to operation
} operation_name_input_t;

typedef struct operation_name_data_s {
	internal data to operation, if any
	(if none, omit the structure and set opdata_size to 0)
} operation_name_data_t;

static gchar *describe_operation_name (gpointer in_data, gboolean gerund);
static void setup_operation_name   (gpointer in_data, gpointer op_data, CamelException *ex);
static void do_operation_name      (gpointer in_data, gpointer op_data, CamelException *ex);
static void cleanup_operation_name (gpointer in_data, gpointer op_data, CamelException *ex);

static gchar *describe_operation_name (gpointer in_data, gboolean gerund)
{
	operation_name_input_t *input = (operation_name_input_t *) in_data;

	if (gerund) {
		return a g_strdup'ed string describing what we're doing
	} else {
		return a g_strdup'ed string describing what we're about to do
	}
}

static void setup_operation_name (gpointer in_data, gpointer op_data, CamelException *ex)
{
	operation_name_input_t *input = (operation_name_input_t *) in_data;
	operation_name_data_t *data = (operation_name_data_t *) op_data;

	verify that parameters are valid

	initialize op_data

	reference objects
}

static void do_operation_name (gpointer in_data, gpointer op_data, CamelException *ex)
{
	operation_name_input_t *input = (operation_name_input_t *) in_data;
	operation_name_data_t *data = (operation_name_data_t *) op_data;

	perform camel operations
}

static void cleanup_operation_name (gpointer in_data, gpointer op_data, CamelException *ex)
{
	operation_name_input_t *input = (operation_name_input_t *) in_data;
	operation_name_data_t *data = (operation_name_data_t *) op_data;

	perform UI updates

	free allocations

	dereference objects
}

static const mail_operation_spec op_operation_name = {
	describe_operation_name,
	sizeof (operation_name_data_t),
	setup_operation_name,
	do_operation_name,
	cleanup_operation_name
};

void
mail_do_operation_name (parameters)
{
	operation_name_input_t *input;

	input = g_new (operation_name_input_t, 1);

	store parameters in input

	mail_operation_queue (&op_operation_name, input, TRUE);
}

===========================================

c. mail-ops.c

Has been drawn and quartered. It has been split into:

	* mail-callbacks.c: the UI callbacks
	* mail-tools.c: useful sequences wrapping common Camel operations
	* mail-ops.c: implementations of all the mail_operation_specs

An important part of mail-ops.c are the global functions
mail_tool_camel_lock_{up,down}. These simulate a recursize mutex around
camel. There are an extreme few, supposedly safe, calls to Camel made in
the main thread. These functions should go around evey call to Camel or
group thereof. I don't think they're necessary but it's nice to know
they're there.

If you look at mail-tools.c, you'll notice that all the Camel calls are
protected with these functions. Remember that a mail tool is really
just another Camel call, so don't use them in the main thread either.

All the mail operations are implemented in mail-ops.c EXCEPT:

	* filter-driver.c: the filter_mail operation
	* message-list.c: the regenerate_messagelist operation
	* message-thread.c: the thread_messages operation

d. Using the operations

The mail operations as implemented are very specific to evolution-mail. I
was thinking about leaving them mostly generic and then allowing extra
callbacks to be added to perform the more specific UI touches, but this
seemed kind of pointless.

I basically looked through the code, found references to Camel, and split
the code into three parts -- the bit before the Camel calls, the bit after,
and the Camel calls. These were mapped onto the template, given a name,
and added to mail-ops.c. Additionally, I simplified the common tasks that
were taken care of in mail-tools.c, making some functions much simpler.

Ninety-nine percent of the time, whatever operation is being done is being
done in a callback, so all that has to be done is this:

==================

void my_callback (GtkObject *obj, gchar *uid)
{
	camel_do_something (uid);
}

==== becomes ====

void my_callback (GtkObject *obj, gchar *uid)
{
	mail_do_do_something (uid);
}

=================

There are, however, a few belligerents. Particularly, the function
mail_uri_to_folder returns a CamelFolder and yet should really be
asynchronous. This is called in a CORBA call that is sychronous, and
additionally is used in the filter code.

I changed the first usage to return the folder immediately but
still fetch the CamelFolder asyncrhonously, and in the second case,
made filtering asynchronous, so the fact that the call is synchronous
doesn't matter.

The function was renamed to mail_tool_uri_to_folder to emphasize that
it's a synchronous Camel call.

e. The dispatcher

mail_operation_queue () takes its parameters and assembles them in a
closure_t structure, which I abbreviate clur. It sets a timeout to
display a progress window if an operation is still running one second
later (we're not smart enough to check if it's the same operation,
but the issue is not a big deal). The other thread and some communication
pipes are created.

The dispatcher thread sits in a loop reading from a pipe. Every time
the main thread queues an operation, it writes the closure_t into the pipe.
The dispatcher reads the closure, sends a STARTING message to the main
thread (see below for explanation), calls the callback specified in the
closure, and sends a FINISHED message. It then goes back to reading
from its pipe; it will either block until another operation comes along,
or find one right away and start it. This the pipe takes care of queueing
operations.

The dispatch thread communicates with the main thread with another pipe;
however, the main thread has other things to do than read from the pipe,
so it adds registers a GIOReader that checks for messages in the glib
main loop. In addition to starting and finishing messages, the other
thread can communicate to the user using messages and a progress bar.
(This is currently implemented but unused.)

5. ISSUES

	* Operations are queued and dequeued stupidly. Like if you click
	  on one message then click on another, the first will be retrieved
	  and displayed then overwritten by the second. Operations that could
	  be performed at the same time safely aren't.
	* The CamelObject system is workable, but it'd be nice to work with
	  something established like the GtkObject
	* The whole threading idea is not great. Concensus is that an
	  asynchronous interface is the Right Thing, eventually.
	* Care still needs to be taken when designing evolution-mail code to
	  work with the asynchronous mail_do_ functions
	* Some of the operations are extremely hacky.
	* IMAP's timeout to send a NOOP had to be removed because we can't
	  use GTK. We need an alternative for this.