SMARC(7) Miscellaneous Information Manual SMARC(7)

smarcmailing list web archive generation system

smarc is a system to incrementally generate a web archive for a mailing list and optionally provide search capabilities. The generated archive is a set of HTML files that can be served as-is, while searching requires the use of a FastCGI application. At a higher level, smarc is made of three components:

smarc(1)
to generate the static HTML files from a maildir.
smingest(1)
to import emails into a sqlite3 database.
msearchd(8)
to provide search results.

There are several step necessary to initialize the web archive:

  1. Create and populate the output directory.
  2. Customize the templates.
  3. Prepare the maildir.
  4. Generate the web archive.
  5. Set up the database for searching.
  6. Configure the web server.

It is recommended to use a dedicate user. Commands to be run as a unpriviledged user are preceded by a dollar sign ‘$’, while commands requiring superuser privileges by a hash mark ‘#’. Hereafter, it will be assumed that the local user is called ‘smarc’.

The web archive is made of several static files, mostly HTML, that needs to be served by a web server like httpd(8). /var/www/smarc is the default location, but a different path can be used. To prepare it, issue:

# mkdir -p /var/www/smarc
# chown smarc /var/www/smarc

Then copy the CSS file, optionally tweaking it.

$ cp /usr/local/share/examples/smarc/style.css /var/www/smarc

Copy also other eventual assets (e.g. logo images) here as well.

The default templates are installed at /etc/smarc. Since these are anonymous, they need to be tweaked to include information about the mailing list.

Care should be taken when editing these files after generating the archive since existing pages won't be automatically updated. The cachedir (see smarc(1)) needs to be deleted and the web archive generated again. msearchd(8) has to be stopped and restarted as well.

The maildir with the mailing list entries needs to be populated. It is assumed to be at ~/Mail/smarc by default, but a different path can be used.

smarc(1) can be finally used to generate the web archive. The first run may take a while, depending on the size of the maildir, while subsequent runs will be incremental and take less time.

$ smarc -m path/to/maildir -o path/to/outdir

On multi-processor machines multiple jobs may be used to save some time with the smarc(1) -j flag.

The generated files may be compressed to save bandwidth:

$ gzip -krq /var/www/smarc </dev/null 2>/dev/null

This is a suggested yet optional step.

msearchd(8) offers full text search capabilities using a sqlite3 database that has to be populated with smingest(1). First, create a directory in the /var/www chroot(8) jail:

# mkdir -p /var/www/msearchd
# chown smarc /var/www/msearchd

Then, populate the database with all emails in the maildir:

$ sqlite3 /var/www/msearchd/mails.sqlite3 \
	</usr/local/share/examples/smarc/schema.sql
$ mlist ~/Mail/smarc | smingest /var/www/msearchd/mails.sqlite3

At this point, msearchd(8) can be started.

The web server needs to serve the contents of the outdir as-is and handle the requests for /search via the msearchd(8) FastCGI server. A sample httpd(8) configuration is provided here for reference:

server "marc.example.com" {
	listen on * port 80
	root "/smarc"
	gzip-static

	# leave out when not using msearchd(8)
	location "/search" {
		fastcgi socket "/run/msearchd.sock"
	}
}

New messages should be fetched periodically using tools like fdm(1) or mbsync(1), the database updated with smingest(1) and the web archive refreshed using smarc(1).

It is recommended to create a script like the following and schedule its execution periodically with cron(8):

#!/bin/sh

set -e
fdm -l fetch
minc ~/Mail/smarc | smingest /var/www/msearchd/mails.sqlite3
smarc
yes | gzip -krq /var/www/smarc/ 2>/dev/null || true

If msearchd(8) is not used, new messages still needs to be incorporated (i.e. moved from new/ to cur/) but no database has to be updated. In that case simplify the minc(1) invocation as:

minc -q ~/Mail/smarc

and don't call smingest(1) at all.

If the archive for multiple mailing lists needs to be served from the same box, care must be taken to use different directories and database files to avoid mixing messages.

msearchd(8) handles only one database at a time, so multiple instances need to be run, each pointing at the database for only one mailing list. Different FastCGI socket path needs to be used per-instance.

smarc(1) outdir, maildir and cachedir must be unique per-mailing list, i.e. the -c, -m and -o flag must always be provided.

Very likely, each mailing list will needs its own set of templates, so those needs to be prepared and both smarc(1) and msearchd(8) have to be pointed at the right template directory.

minc(1), smarc(1), smingest(1), sqlite3(1), httpd(8), msearchd(8)

September 4, 2023 OpenBSD 7.4