NAME
smarc
—
mailing list web archive generation
system
DESCRIPTION
smarc
is a system to incrementally
generate a web archive for a mailing list and optionally provide search
capabilities. The generated archive is a set of HTML files that can be
served as-is, while searching requires the use of a FastCGI application. At
a higher level, smarc
is made of three
components:
- smarc(1)
- to generate the static HTML files from a maildir.
- smingest(1)
- to import emails into a sqlite3 database.
- msearchd(8)
- to provide search results.
INITIAL SETUP
There are several step necessary to initialize the web archive:
- Create and populate the output directory.
- Customize the templates.
- Prepare the maildir.
- Generate the web archive.
- Set up the database for searching.
- Configure the web server.
It is recommended to use a dedicate user. Commands to be run as a unpriviledged user are preceded by a dollar sign ‘$’, while commands requiring superuser privileges by a hash mark ‘#’. Hereafter, it will be assumed that the local user is called ‘smarc’.
1. Create and populate the output directory
The web archive is made of several static files, mostly HTML, that needs to be served by a web server like httpd(8). /var/www/smarc is the default location, but a different path can be used. To prepare it, issue:
# mkdir -p /var/www/smarc # chown smarc /var/www/smarc
Then copy the CSS file, optionally tweaking it.
$ cp
/usr/local/share/examples/smarc/style.css /var/www/smarc
Copy also other eventual assets (e.g. logo images) here as well.
2. Customize the templates
The default templates are installed at /etc/smarc. Since these are anonymous, they need to be tweaked to include information about the mailing list.
Care should be taken when editing these files after generating the archive since existing pages won't be automatically updated. The cachedir (see smarc(1)) needs to be deleted and the web archive generated again. msearchd(8) has to be stopped and restarted as well.
3. Prepare the maildir
The maildir with the mailing list entries needs to be populated. It is assumed to be at ~/Mail/smarc by default, but a different path can be used.
4. Generate the web archive
smarc(1) can be finally used to generate the web archive. The first run may take a while, depending on the size of the maildir, while subsequent runs will be incremental and take less time.
$ smarc -m path/to/maildir -o
path/to/outdir
On multi-processor machines multiple jobs may be used to save some
time with the smarc(1) -j
flag.
The generated files may be compressed to save bandwidth:
$ gzip -krq /var/www/smarc
</dev/null 2>/dev/null
5. Set up the database for searching
This is a suggested yet optional step.
msearchd(8) offers full text search capabilities using a sqlite3 database that has to be populated with smingest(1). First, create a directory in the /var/www chroot(8) jail:
# mkdir -p /var/www/msearchd # chown smarc /var/www/msearchd
Then, populate the database with all emails in the maildir:
$ sqlite3 /var/www/msearchd/mails.sqlite3 \ </usr/local/share/examples/smarc/schema.sql $ mlist ~/Mail/smarc | smingest /var/www/msearchd/mails.sqlite3
At this point, msearchd(8) can be started.
6. Configure the web server
The web server needs to serve the contents of the outdir as-is and handle the requests for /search via the msearchd(8) FastCGI server. A sample httpd(8) configuration is provided here for reference:
server "marc.example.com" { listen on * port 80 root "/smarc" gzip-static # leave out when not using msearchd(8) location "/search" { fastcgi socket "/run/msearchd.sock" } }
HANDLING NEW MESSAGES
New messages should be fetched periodically using tools like fdm(1) or mbsync(1), the database updated with smingest(1) and the web archive refreshed using smarc(1).
It is recommended to create a script like the following and schedule its execution periodically with cron(8):
#!/bin/sh set -e fdm -l fetch minc ~/Mail/smarc | smingest /var/www/msearchd/mails.sqlite3 smarc yes | gzip -krq /var/www/smarc/ 2>/dev/null || true
If msearchd(8) is not used, new messages still needs to be incorporated (i.e. moved from new/ to cur/) but no database has to be updated. In that case simplify the minc(1) invocation as:
minc -q ~/Mail/smarc
and don't call smingest(1) at all.
HANDLING MULTIPLE MAILING LISTS
If the archive for multiple mailing lists needs to be served from the same box, care must be taken to use different directories and database files to avoid mixing messages.
msearchd(8) handles only one database at a time, so multiple instances need to be run, each pointing at the database for only one mailing list. Different FastCGI socket path needs to be used per-instance.
smarc(1) outdir, maildir and cachedir must be unique per-mailing
list, i.e. the -c
, -m
and -o
flag must always be
provided.
Very likely, each mailing list will needs its own set of templates, so those needs to be prepared and both smarc(1) and msearchd(8) have to be pointed at the right template directory.
SEE ALSO
minc(1), smarc(1), smingest(1), sqlite3(1), httpd(8), msearchd(8)