sgmlized Giao's doc about virtual folders.
1999-06-03 bertrand <Bertrand.Guiheneuf@inria.fr> * devel-docs/query/virtual-folder-in-depth.sgml: sgmlized Giao's doc about virtual folders. svn path=/trunk/; revision=968
This commit is contained in:
committed by
Bertrand Guiheneuf
parent
0f8a086d18
commit
a8d04a6651
@ -1,3 +1,8 @@
|
||||
1999-06-03 bertrand <Bertrand.Guiheneuf@inria.fr>
|
||||
|
||||
* devel-docs/query/virtual-folder-in-depth.sgml:
|
||||
sgmlized Giao's doc about virtual folders.
|
||||
|
||||
1999-05-31 bertrand <Bertrand.Guiheneuf@inria.fr>
|
||||
|
||||
* tests/test2.c (main):
|
||||
|
||||
8
NEWS
8
NEWS
@ -0,0 +1,8 @@
|
||||
01/Jun/1999
|
||||
-----------
|
||||
|
||||
New developpement document from Giao Nguyen :
|
||||
TITLE: An in-depth look at the virtual folder mechanism
|
||||
(see devel-docs/query)
|
||||
|
||||
|
||||
|
||||
395
devel-docs/query/virtual-folder-in-depth.sgml
Normal file
395
devel-docs/query/virtual-folder-in-depth.sgml
Normal file
@ -0,0 +1,395 @@
|
||||
<!doctype article PUBLIC "-//Davenport//DTD DocBook V3.0//EN" []>
|
||||
|
||||
<!-- SGMLized by Bertrand <Bertrand.Guiheneuf@inria.fr> -->
|
||||
|
||||
<article id="index">
|
||||
<artheader>
|
||||
<authorgroup>
|
||||
<author>
|
||||
<firstname>Giao</firstname>
|
||||
<surname>Nguyen</surname>
|
||||
</author>
|
||||
</authorgroup>
|
||||
<title>An in-depth look at the virtual folder mechanism</title>
|
||||
<abstract>
|
||||
<para>
|
||||
This document describes a different way of approaching mail
|
||||
organization and how all things are possible in this brave new
|
||||
world. This document does not describe physical storage issues nor
|
||||
interface issues.
|
||||
</para>
|
||||
<para>
|
||||
Historically mail has been organized into folders. These folders
|
||||
usually mapped to a single storage medium. The relationship between
|
||||
mail organization and storage medium was one to one. There was one
|
||||
mail organization for every storage medium. This scheme had its
|
||||
limitations.
|
||||
</para>
|
||||
<para>
|
||||
Efforts at categorizations are only meaningful at the instance that
|
||||
one categorized. To find any piece of data, regardless of how well
|
||||
it was categorized, required some amount of searching. Therefore, any
|
||||
attempts to nullify searching is doomed to fail. It's time to embrace
|
||||
searching as a way of life.
|
||||
</para>
|
||||
<para>
|
||||
These are the terms and their definitions. The example rules used are
|
||||
based on the syntax for VM (http://www.wonderworks.com/vm/) by Kyle
|
||||
Jones whose ideas form the basis for this. I'm only adding the
|
||||
existence of summary files to aid in scaling. I currently use VM and
|
||||
it's virtual-folder rules for my daily mail purposes. To date, my only
|
||||
complaints are speed (it has no caches) and for the unitiated, it's
|
||||
not very user-friendly.
|
||||
</para>
|
||||
<para>
|
||||
Comments, questions, rants, etc. should be directed at Giao Nguyen
|
||||
(grail@cafebabe.org) who will try to address issues in a timely
|
||||
manner.
|
||||
</para>
|
||||
</abstract>
|
||||
</artheader>
|
||||
<sect1 id="definitions">
|
||||
<title>Definitions</title>
|
||||
<sect2>
|
||||
<title>Store</title>
|
||||
<para>
|
||||
A location where mail can be found. This may be a file (Berkeley
|
||||
mbox), directory (MH), IMAP server, POP3 server, Exchange server,
|
||||
Lotus Notes server, a stack of Post-Its by your monitor fed through
|
||||
some OCR system.
|
||||
</para>
|
||||
</sect2>
|
||||
<sect2>
|
||||
<title>Message</title>
|
||||
<para>
|
||||
An individual mail message.
|
||||
</para>
|
||||
</sect2>
|
||||
<sect2>
|
||||
<title>Vfolder</title>
|
||||
<para>
|
||||
A group of messages sharing some commonality. This is the result of a
|
||||
query. The vfolder maybe contained in a store, but it is not necessary
|
||||
that a store holds only one vfolder. There is always an implicit
|
||||
vfolder rule which matches all messages. A store contains the vfolder
|
||||
which is the result of the query (any). It's short for virtual folder
|
||||
or maybe view folder. I dunno.
|
||||
</para>
|
||||
</sect2>
|
||||
<sect2>
|
||||
<title>Default-vfolder</title>
|
||||
<para>
|
||||
The vfolder defined by (any) applied to the store. This is not the
|
||||
inbox. The inbox could easily be defined by a query. A default rule
|
||||
for the inbox could be (new) but it doesn't have to be. Mine happens
|
||||
to be (or (unread) (new)).
|
||||
</para>
|
||||
</sect2>
|
||||
<sect2>
|
||||
<title>Folder</title>
|
||||
<para>
|
||||
The classical mail folder approach: one message organization per
|
||||
store.
|
||||
</para>
|
||||
</sect2>
|
||||
<sect2>
|
||||
<title>Query</title>
|
||||
<para>
|
||||
A search for messages. The result of this is a vfolder. There are two
|
||||
kinds of queries: named queries and lambda queries. More on this
|
||||
later.
|
||||
</para>
|
||||
</sect2>
|
||||
<sect2>
|
||||
<title>Summary file </title>
|
||||
<para>
|
||||
An external file that contains pointers to messages which are matches
|
||||
for a named query. In addition to pointers, the summary file should
|
||||
also contain signatures of the store for sanity checks. When the term
|
||||
"index" is used as a verb, it means to build a summary file for a
|
||||
given name-value pair.
|
||||
</para>
|
||||
</sect2>
|
||||
</sect1>
|
||||
|
||||
<sect1>
|
||||
<title>Queries</title>
|
||||
<para>
|
||||
Named queries are analogous to classical mail folders. Because named
|
||||
queries maybe reused, summary files are kept as caches to reduce
|
||||
the overall cost of viewing a vfolder. Summary files are superior to
|
||||
folders in that they allow for the same messages to appear in multiple
|
||||
vfolders without message duplications. Duplications of messages
|
||||
defeats attempts at tagging a message with additional user information
|
||||
like annotations. Named queries will define folders.
|
||||
</para>
|
||||
<para>
|
||||
Lambda queries are similar to named queries except that they have no
|
||||
name. These are created on the fly by the user to filter out or
|
||||
include certain messages.
|
||||
</para>
|
||||
<para>
|
||||
All queries can be layered on top of each other. A lambda query can be
|
||||
layered on a named query and a named query can be layered on a lambda
|
||||
query. The possibilities are endless.
|
||||
</para>
|
||||
<para>
|
||||
The layerings can be done as boolean operations (and, or, not). Short
|
||||
circuiting should be used.
|
||||
</para>
|
||||
<para>
|
||||
Examples:
|
||||
<programlisting>
|
||||
(and (author "Giao")
|
||||
(unread))
|
||||
</programlisting>
|
||||
The (unread) query should only be evaluated on the results of (author
|
||||
"Giao").
|
||||
<programlisting>
|
||||
(or (author "Giao")
|
||||
(unread))
|
||||
</programlisting>
|
||||
Both of these queries should be evaluated. Any matches are added to the
|
||||
resulting vfolder.
|
||||
</para>
|
||||
</sect1>
|
||||
|
||||
<sect1>
|
||||
<title>Summary files</title>
|
||||
<para>
|
||||
Summary files are only meaningful when applied to the context of the
|
||||
default-vfolder of a store.
|
||||
</para>
|
||||
<para>
|
||||
Summary files should be generated for queries of the form:
|
||||
<programlisting>
|
||||
(function "constant value")
|
||||
</programlisting>
|
||||
Summary files should never be generated for queries of the form:
|
||||
<programlisting>
|
||||
(function (function1))
|
||||
|
||||
(and (function "value")
|
||||
(another-function "another value"))
|
||||
</programlisting>
|
||||
Given a query of the form:
|
||||
<programlisting>
|
||||
(and (function "value")
|
||||
(another-function "another value"))
|
||||
</programlisting>
|
||||
The system should use one summary file for (function "value") and
|
||||
another summary file for (another-function "another value"). I will
|
||||
call the prior form the "plain form".
|
||||
</para>
|
||||
<para>
|
||||
It should be noted that the signature of the store should be based on
|
||||
the assumption that new data may have been added to the store since
|
||||
the application generated the summary file. Signatures generated on
|
||||
the entirety of the store will most likely be meaningless for things
|
||||
like POP/IMAP servers.
|
||||
</para>
|
||||
</sect1>
|
||||
|
||||
<sect1>
|
||||
<title>Incremental indexing</title>
|
||||
<para>
|
||||
When new messages are detected, all known queries should be evaluated
|
||||
on the new messages. vfolders should be notified of new messages that
|
||||
are positive matches for their queries. The indexes generated by this
|
||||
process should be merged into the current indexes for the vfolder.
|
||||
</para>
|
||||
</sect1>
|
||||
|
||||
<sect1>
|
||||
<title>Can I have multiple stores?</title>
|
||||
<para>
|
||||
I don't see why not. Again, the inbox is a vfolder so you can get a
|
||||
unified inbox consisting of all new mail sent to all your stores or
|
||||
your can get inboxes for each store or any combination your heart
|
||||
desire. You get your cake, eat it, and someone else cleans the dishes!
|
||||
</para>
|
||||
</sect1>
|
||||
|
||||
<sect1>
|
||||
<title>Why all this?</title>
|
||||
<para>
|
||||
Consider the dynamic nature of the following query:
|
||||
<programlisting>
|
||||
(and (author "Giao")
|
||||
(sent-after (today-midnight)))
|
||||
</programlisting>
|
||||
today-midnight would be a function that is evaluated at run-time to
|
||||
calculate the appropriate object.
|
||||
</para>
|
||||
</sect1>
|
||||
|
||||
<sect1>
|
||||
<title>Scenarios of usage and their solutions</title>
|
||||
<sect2>
|
||||
<title>Mesage alterations</title>
|
||||
<para>
|
||||
This is a fuzzy area that should be left to the UI to handle. Messages
|
||||
are altered. Read status are altered when a new message is read for
|
||||
example. How do we handle this if our query is for unread messages?
|
||||
Upon viewing the state would change.
|
||||
</para>
|
||||
<para>
|
||||
One idea is to not evaluate the queries unless we're changing between
|
||||
vfolder views. This assumes that one can only view a particular
|
||||
vfolder at a time. For multi-vfolder viewing, a message change should
|
||||
propagate through the vfolder system. Certain effects (as in our
|
||||
example) would not be intuitive.
|
||||
</para>
|
||||
<para>
|
||||
It would not be a clean solution to make special cases but they may be
|
||||
necessary where certain defined fields are ignored when they are
|
||||
changed. Some combination of the above rules can be used. I don't
|
||||
think it's an easy solution.
|
||||
</para>
|
||||
</sect2>
|
||||
<sect2>
|
||||
<title>Message inclusion and exclusion</title>
|
||||
<para>
|
||||
Messages are included and excluded also with queries. The final query
|
||||
will have the form of:
|
||||
<programlisting>
|
||||
(and (author "Giao")
|
||||
(criteria value)
|
||||
(not (criteria other-value)))
|
||||
</programlisting>
|
||||
Userland criterias may be a label of some sort. These may be userland
|
||||
labels or Message-IDs. What are the performance issues involved in
|
||||
this? With short circuiting, it's not a major problem.
|
||||
</para>
|
||||
<para>
|
||||
The criterias and values are determined by the UI. The vfolder
|
||||
mechanism isn't concerned with such issues.
|
||||
</para>
|
||||
<para>
|
||||
Messages can be included and excluded at will. The idea is often
|
||||
called "arbitrary inclusion/exclusion". This can be done by
|
||||
Message-IDs or other fields. It's been noted that Message-IDs are not
|
||||
unique.
|
||||
</para>
|
||||
<para>
|
||||
I propose that any given vfolder is allocated an inclusion label and an
|
||||
exclusion label. These should be randomly generated. This should be
|
||||
part of the vfolder description. It should be noted that the vfolder
|
||||
description has not been drafted yet.
|
||||
</para>
|
||||
<para>
|
||||
The result is such that the rules for a given named query is:
|
||||
<programlisting>
|
||||
(and (user-query)
|
||||
(label inclusion-label)
|
||||
(not exclusion-label))
|
||||
</programlisting>
|
||||
</para>
|
||||
</sect2>
|
||||
<sect2>
|
||||
<title>Query scheduling</title>
|
||||
<para>
|
||||
Consider the following extremely dynamic queries:
|
||||
<programlisting>
|
||||
A:
|
||||
(and (author "Giao")
|
||||
(sent-after (today-midnight)))
|
||||
|
||||
B:
|
||||
(and (sent-after (today-midnight))
|
||||
(author "Giao"))
|
||||
|
||||
C:
|
||||
(or (author "Giao")
|
||||
(sent-after (today-midnight)))
|
||||
</programlisting>
|
||||
Query A would be significantly faster because (author "Giao") is not
|
||||
dynamic. A summary file could be generated for this query. Query B is
|
||||
slow and can be optimized if there was a query compiler of some
|
||||
sort. Query C demonstrates a query in which there is no good
|
||||
optimization which can be applied. These come with a certain amount of
|
||||
baggage.
|
||||
</para>
|
||||
<para>
|
||||
It seems then that for boolean 'and' operations, plain forms should be
|
||||
moved forward and other queries should be moved such that they are
|
||||
evaluated later. I would expect that the majority of queries would be
|
||||
of the plain form.
|
||||
</para>
|
||||
<para>
|
||||
First is that the summary file is tied to the query and the store
|
||||
where the query originates from. Second, a hashing function for
|
||||
strings needs to be calculated for the query so that the query and the
|
||||
summary file can be associated. This hashing function could be similar
|
||||
to the hashing function described in Rob Pike's "The Practice of
|
||||
Programming". (FIXME: Stick page number here)
|
||||
</para>
|
||||
</sect2>
|
||||
<sect2>
|
||||
<title>Archives</title>
|
||||
<para>
|
||||
Many people are concerned that archives won't be preserved, archives
|
||||
aren't supported, and many other archive related issues. This is the
|
||||
short version.
|
||||
</para>
|
||||
<para>
|
||||
Archives are just that, archives. Archives are stores. Take your
|
||||
vfolder, export it to a store. You are done. If you load up the store
|
||||
again, then the default-vfolder of that store is the view of the
|
||||
vfolder, except the query is different.
|
||||
</para>
|
||||
<para>
|
||||
The point to vfolder is not to do away with classical folder
|
||||
representation but to move the queries to the front where it would
|
||||
make data management easier for people who don't think in terms of
|
||||
files but in terms of queries because ordinary people don't think in
|
||||
terms of files.
|
||||
</para>
|
||||
</sect2>
|
||||
</sect1>
|
||||
|
||||
<sect1>
|
||||
<title>Miscellany</title>
|
||||
<sect2>
|
||||
<title>Annotations</title>
|
||||
<para>
|
||||
There should be a scheme to add annotations to messages. Common mail
|
||||
user agents have used a tag in the message header to mark messages as
|
||||
read/unread for example. Extending on this we have the ability to add
|
||||
our own data to a message to add meaning to it. If we have a good
|
||||
scheme for doing this, new possibilities are opened.
|
||||
</para>
|
||||
<sect3>
|
||||
<title>Keywords</title>
|
||||
<para>
|
||||
When sending a message, a message could have certain keywords attached
|
||||
to it. While this can be done with the subject line, the subject line
|
||||
has a tendency to be munged by other mail applications. One popular
|
||||
example is the "[rR]e:" prefix. Using the subject line also breaks the
|
||||
"contract" with other mail user agents. Using keywords in another
|
||||
field in the message header allows the sender to assist the recipient
|
||||
in organizing data automatically. Note that the sender can only
|
||||
provide hints as the sender is unlikely to know the organization
|
||||
schemes of the recipient.
|
||||
</para>
|
||||
</sect3>
|
||||
</sect2>
|
||||
<sect2>
|
||||
<title>Scope</title>
|
||||
<para>
|
||||
Let us assume that we have multiple stores. Does a query work on a
|
||||
given store? Or does it work on all stores? Or is it configurable such
|
||||
that a query can work on a user-selected list of stores?
|
||||
</para>
|
||||
</sect2>
|
||||
</sect1>
|
||||
|
||||
<sect1>
|
||||
<title>Alternatives to the above</title>
|
||||
<para>
|
||||
Jim Meyer (purp@selequa.com) is putting some notes on where
|
||||
annotations needs to be located. They'll be located here as well as
|
||||
any contributions I may have to them.
|
||||
</para>
|
||||
</sect1>
|
||||
</article>
|
||||
Reference in New Issue
Block a user