[GoLUG] Mailman archive restoration from subscriber/participant emails (was: Web host discussion, 7/5/2023)

Rick Moen rick at linuxmafia.com
Mon Jul 10 14:29:01 EDT 2023


Quoting Syeed Ali (syeedali at syeedali.com):

> Noted, but I don't think I can do that myself as I'm using a
> point-and-click host.  I may be able to ask them for help for this and
> provide it on some particular URL like golug.org/golug.mbox

Sy, gosh, I am sorry to disappoint, or to discourage you, but I really
have no experience at all trying to be a Mailman siteadmin without
command-line access and at least sudo-mediated ability to run the
utilities in /var/lib/mailman/bin/ as privileged user "list".

There was exactly one occasion on which I agreed to be a volunteer
listadmin for a couple of dozen mailing lists and for all SMTP
operations at a domain on shared hosting, such that every administrative
action could be done only via an admin Web site "control panel".  That
was for the 2015 World Science Fiction Convention in Spokane, Washington
All of that work was gratuitously slow and convoluted, and many
desirable tasks, such as working on antispam and occasionally redacting
something from an archived posting, were kept out of my reach.

I'm reluctant to ever do that again.  It was just too frustrating 
to have to work through clicky-clicky tools that were so limited
and inefficient.

Anyway, literally all of my other experience has been as a root-wielding 
user of the underlying host, or as someone with ability to run needed
command-line tools with user "list" authority via /etc/sudoers .

> > Other bits of metadata, such as subscriber passwords and options, and
> > per-mailing list administrative settings, are IMO fine points whose
> > backup can be put off for a later day.
> 
> Oh I didn't even realize this.  I don't know if there's a way to export
> those data with the web admin interface.

Er, I guess you could use a scripting language and screen-scrape the
information from the admin WebUI.  Well, some of it.  The admin WebUI
doesn't give, for example, any access to see or alter a subscribed
user's per-user & per-list subscription password.

> I definitely need this figured out at some point, because resurrecting
> this mailing list has required a person willing to manually email a
> notice to recent participants to re-subscribe (which is not a good
> idea, expanded on later).

FYI, Mailman permits a listadmin to send "invite" mails to prospective 
subscribers, instead of to force-subscribe them.  It's on the same admin
WebUI screen:  Membership Management, Mass Subscription, and then
"Subscribe these users now or invite them" is a radio button that does
either "Subscribe" or "Invite", with Subscribe as the default.

"Invite" sends out an invitation text of your choosing to a roster of
potential members' addresses that you fill into the "Enter one address
per line below.." box.  Each potential member can optionally visit an
individually hashed URL to subscribe with one click, or can respond to
the invite e-mail to instantly subscribe.

The advantage of "invite" over force-subscribe should be obvious:  As 
the user retains control over the subscribe-or-not decision, and as
the process validates bidirectional ability to receive mail, there can
be no "You're behaving like a spamhaus" objection, and annoyed invitees
cannot get your host's IP added to a spammer DNSBL.


> It's theoretically possible for me to script something that'll crawl
> through a mailing list archive and drop participants in a date-sorted
> list.

Yep.  But painful to work out.

> I had help from my hosting provider.  Their tech support was prompt and
> happy to take the couple of minutes to help.

That's good.  Implies that someone in their NOC knows his/her way around 
pipermail and the /var/lib/mailman/bin/* utilities.

> Urk.  I don't think I understand.  If this were to happen, would this be
> seen in with a web browser looking at the mailman list archives?

Yes.  Definitely.  Just open a Web browser, look at the first and last
months of the pipermail-generated Web archive.  If you see several
garbage posts that appear to start right in the middle of an actual 
post's message body but lack any header information, that's produced
by this syndrome.

Fortunately, it's not at all difficult to fix by finding all
message-body lines that have "From " in columns 1-5 of the line,
and prepend ">" to that word, turning it into ">From " as columns
1-6.  But you must take care to _not_ make that transform to the SMTP
envelope-header lines, which also all starts with a flush-left "From ".

There are even canned utilities ancillary to Mailman that people have
crafted to parse through an mbox and carry out that operation on every
such message-body line found.  

BTW, it is also a semi-common practice by SMTP-processing software (but
not all such software) to pre-emptively "munge" any such line in a
message-body text, to _prevent_ subsequent misparsing of that line as an
SMTP envelope-header "From " line.

> I'll have to research and note the deduplication efforts by others,
> even if I don't use any of them.

If the messages you collected all possess a Message-ID: header, that 
is required by the RFCs to be globally unique to that message, and
could be used to identify dupes.  However, quite a lot of end-user MUAs
(Mail User Agents = mail clients) deliberately truncate headers 
instead of doing The Right Thing, which IMO is to suppress _display_
of headers the user doesn't want to see.


> When I look at the physical files referenced by my email client (that
> is, the raw source), I see lots of non-conversation information.  For
> example, when I look at the email which the mailing list sent to me, and
> which is your reply, I see information about my ISP.  Were I to give
> that email to mailman, I wouldn't want that ISP information "getting
> out".  The question is, would the mailman import discard those data or
> would they remain in its own internal archives and become visible on a
> future export/backup?

Are you talking, here, about the many SMTP headers normally not
displayed to end-users, like all of the Received: and Message-ID: 
lines?

Mailman keeps _all_ of those lines -- which are important! -- in the
cumulative mbox file for the mailing list, as the master record of past
mailing list traffic.  But /var/lib/mailman/bin/arch , when it generates
the HTML and TXT pipermail archive, drastically suppresses almost all of
them, showing _only_ the From:, Subject:, and Date: lines (without their 
keywords).

If you wish to compare displayed header information versus the full
information in the archive's master cumulative mbox file compare your
choice of postings shown on http://linuxmafia.com/pipermail/conspire/
with the publicly-accessible mbox at
http://linuxmafia.com/pipermail/conspire.mbox/conspire.mbox .

As a reminder of what I mentioned before, the latter file is available
to the public because, as Mailman siteadmin, I toggled a setting in 
Mailman's global options to enable public access to per-list mbox'es.

Your shared-hosting provider may or may not have that option set to
"yes".  It may or may not turn it on, at your request.


> > Be aware that, when you edit the mbox and regenerate the archive, 
> > often bin/arch will renumber some of the archived messages, changing 
> > their URLs.
> 
> Ick.
> 
> I consider this a really offensive bug.

It's one of the things people _most_ wanted the Mailman project to fix
with the new Mailman3 codebase -- and they did.  However, this step
forward was, in the opinion of many, counterbalanced by so many hated
misfeatures in the Mailman3 design that many Mailman admins are
continuing to limp forward old Mailman 2.1.x installations despite its
unfixable dependency on Python 2.7.x, which has now been orphaned for
years and is becoming a problem.

I will personally soon have to get serious about my migration path from
Mailman 2.1.x.  The landscape of choices is a bit grim.

> I do not have the necessary elevated commandline, but the functionality
> is *supposed* to be available via the web by visiting a specified URL
> and logging in as the list admin.  I should be able to figure it out;
> presumably it's just me not understanding things.

Sure.  Get to know the [domain]/mailman/admin/[listname] admin WebUI
pages.  They're reasonably self-explanatory, I always found.  However,
I'd be glad to help with any questions.




More information about the GoLUG mailing list