Supreme Law SCAN Programs:
Managing a Large E-Mail Archive
Introduction
The E-mail archive now available at the Supreme Law website
was written by software developed by Paul Andrew Mitchell, using
the Microsoft QBASIC interpreter.
SCAN was developed to break a large archive file of e-mail,
written by Eudora Pro, into one separate DOS file per message.
Special Problems
The e-mail archive written by Eudora Pro contained isolated
line feed characters, which were NOT preceded by carriage return
characters. These isolated line feed characters had to be
replaced with carriage return characters, but only if the
resulting replacement did not leave two carriage returns in
sequence.
The RANDOM program was developed to read Eudora Pro mailbox
files, one DOS sector at a time (512 bytes), using the random
access logic available in the Microsoft QBASIC compiler.
Very simply, this random access reads through the archive
file, one record at a time, until no more sectors are available.
This method bypasses the software eof character which signals an
end-of-file condition during sequential INPUT statements.
The RANDOM program also replaces other unwanted non-printing
characters. See the QBASIC code in RANDOM.BAS for details.
SCAN Program Logic
After the RANDOM program has replaced unwanted characters,
the SCAN program reads the RANDOM output file, and writes one DOS
text file per e-mail message. Each e-mail message is identified
with "From " in its first line of text.
The SCAN program creates sequential sub-directories, in
order to prevent the poor performance which results from having a
single DOS directory with an unusually large number of files
(e.g. 5,000 or more).
SCAN allocates at most 100 messages files per DOS directory,
then creates another directory to store the next 100 messages,
and continues in this fashion until all messages have been
processed. For redundancy, the name of each new directory is
embedded in the name of each file stored in it, as follows:
Message "msg00154.htm" is stored in directory "box001"
Message "msg02154.htm" is stored in directory "box021"
and so on ....
SCAN writes Hyper-Text Markup Language ("HTML") which is
compatible with Netscape and Internet Explorer. A prolog section
of code begins each message file, and an epilog section of code
ends each message file. These sections of code are found in the
DOS files "prolog.htm" and "epilog.htm", respectively.
SCAN also eliminates most of the routing data which is found
in many e-mail messages.
The MAKEIN ("make index") program simply converts the INDEX
file, written by the SCAN program, into HTML language. Of course,
MAKEIN expects the INDEX file to have the record layout written
by the SCAN program.
Any competent BASIC programmer will be familiar with these
programming details.
Downloading Files
Internet browsers can access the necessary files at the
following URL's:
RANDOM.BAS http://supremelaw.org/sls/email/random.bas
SCAN.BAS http://supremelaw.org/sls/email/scan.bas
MAKEIN.BAS http://supremelaw.org/sls/email/makein.bas
PROLOG.HTM http://supremelaw.org/sls/email/prolog.htm
EPILOG.HTM http://supremelaw.org/sls/email/epilog.htm
Use the VIEW/SOURCE option in your browser to preview the
text after it is downloaded by the browser you are using and then
use SAVE AS to store it in a local directory on your computer.
# # #
Return to Table of Contents for
Supreme Law School: main index