Supreme Law SCAN Programs: Managing a Large E-Mail Archive Introduction The E-mail archive now available at the Supreme Law website was written by software developed by Paul Andrew Mitchell, using the Microsoft QBASIC interpreter. SCAN was developed to break a large archive file of e-mail, written by Eudora Pro, into one separate DOS file per message. Special Problems The e-mail archive written by Eudora Pro contained isolated line feed characters, which were NOT preceded by carriage return characters. These isolated line feed characters had to be replaced with carriage return characters, but only if the resulting replacement did not leave two carriage returns in sequence. The RANDOM program was developed to read Eudora Pro mailbox files, one DOS sector at a time (512 bytes), using the random access logic available in the Microsoft QBASIC compiler. Very simply, this random access reads through the archive file, one record at a time, until no more sectors are available. This method bypasses the software eof character which signals an end-of-file condition during sequential INPUT statements. The RANDOM program also replaces other unwanted non-printing characters. See the QBASIC code in RANDOM.BAS for details. SCAN Program Logic After the RANDOM program has replaced unwanted characters, the SCAN program reads the RANDOM output file, and writes one DOS text file per e-mail message. Each e-mail message is identified with "From " in its first line of text. The SCAN program creates sequential sub-directories, in order to prevent the poor performance which results from having a single DOS directory with an unusually large number of files (e.g. 5,000 or more). SCAN allocates at most 100 messages files per DOS directory, then creates another directory to store the next 100 messages, and continues in this fashion until all messages have been processed. For redundancy, the name of each new directory is embedded in the name of each file stored in it, as follows: Message "msg00154.htm" is stored in directory "box001" Message "msg02154.htm" is stored in directory "box021" and so on .... SCAN writes Hyper-Text Markup Language ("HTML") which is compatible with Netscape and Internet Explorer. A prolog section of code begins each message file, and an epilog section of code ends each message file. These sections of code are found in the DOS files "prolog.htm" and "epilog.htm", respectively. SCAN also eliminates most of the routing data which is found in many e-mail messages. The MAKEIN ("make index") program simply converts the INDEX file, written by the SCAN program, into HTML language. Of course, MAKEIN expects the INDEX file to have the record layout written by the SCAN program. Any competent BASIC programmer will be familiar with these programming details. Downloading Files Internet browsers can access the necessary files at the following URL's: RANDOM.BAS http://supremelaw.org/sls/email/random.bas SCAN.BAS http://supremelaw.org/sls/email/scan.bas MAKEIN.BAS http://supremelaw.org/sls/email/makein.bas PROLOG.HTM http://supremelaw.org/sls/email/prolog.htm EPILOG.HTM http://supremelaw.org/sls/email/epilog.htm Use the VIEW/SOURCE option in your browser to preview the text after it is downloaded by the browser you are using and then use SAVE AS to store it in a local directory on your computer. # # #
Return to Table of Contents for
Supreme Law School: main index