[GoLUG] Mailman archive restoration from subscriber/participant emails (was: Web host discussion, 7/5/2023)
Syeed Ali
syeedali at syeedali.com
Thu Jul 6 07:59:16 EDT 2023
This is verbose not necessarily for you Rick but for future
generations.
On Thu, 6 Jul 2023 02:51:29 -0700
Rick Moen <rick at linuxmafia.com> wrote:
> Steve Litt <slitt at troubleshooters.org> wrote:
>
> > On another subject, GoLUG's mailing list died, so we made a new,
> > opt-in mailing list.
>
> It's truly tragic whenever a longstanding public mailing list not only
> dies, but also loses years of history, for no better reason than
> nobody ever having bothered to do basic backups of primary data.
I offered help, and even though I have no idea what I'm doing we have
success.
I've learned how to convert a participants email archives into what
mailman can use for its archives. It's tested and works, and I have a
chunk of data stretching back to 2016 with more to come. There are two
considerations which I will have to address:
1. Filtering emails with X-No-Archive headers.
https://en.wikipedia.org/wiki/X-No-Archive
Mailman has an option for users to have their emails thereafter be
excluded from future Mailman archives. Presumably that's indicated in
the email with X-No-Archive, although I don't know what to specifically
look for yet.
Users still get the email, and so if those emails are handed to me I'd
have to figure that out and filter them out.
(I don't understand why this is an option given that people reply to one
another, and the quotes are themselves archived.)
2. Filtering off-list conversations. Some are stored alongside
mailing list emails and might be sent to me, and those must not be
uploaded into Mailman archives.
This is probably straightforward to figure out by checking that only
emails which are either to or cc to the mailing list are included. I
think the problem would be to identify what iterations of what domain
names constitutes "the mailing list"; there seems to be more than one
generation of this one:
tech at golug.org
golugtech at diypython.us
3. Deduplication. Maybe. Claws Mail can trivially deduplicate.
However when I export and then re-import them, some emails are
duplicated. It's not a display problem, they are real emails. I don't
know if importing into Mailman will automatically deduplicate those. I
do know there are other processes to deduplicate other than this email
client. Deduplication would have to be sorted out if multiple users
provided emails from their separate subscriptions.
4. Personal information. People who give me their emails are also
giving me personal information embedded within them. Testing does not
show any personal information appearing in archives, so I'm confident
there. However, I wonder if exporting the Mailman archives and then
looking within them would show anything.
-
Ultimately what I'm going to do is make sure everything gets put into a
simple one-textfile-per-email format (MH) and then iterate with a shell
script (because that's all I know) to set aside some emails as needed,
and then confirm those by hand. I'll use a test email list, then cross
my fingers and use intuition when randomly auditing.
-
I'll probably make documentation for future generations to learn from
this. At the least it would be for my reference because going forward
I and many people can freely host any LUG piggybacked on existing
hosting for free; no LUG website or mailing list needs to vanish
because of similar hosting concerns.
Many hosting plans allow:
- Point-and-click Mailman setup.
- Point whatever domain name at it, and it just works. Or just use a
free subdomain.
- Functionally limitless subdomains, email addresses, storage, and
bandwidth (these are LUG mailing lists after all)
I have to pay for a hosting plan and at least one of my domain names,
so all this is for free for my lifetime. It's all also trivially
exportable as you say below.
(I do presently have a problem with exporting Mailman's complete
archive. I did it once but can't figure out how to do it again, and
this is worrying.)
> Was /var/lib/mailman/archives/private/tech.mbox/tech.mbox periodically
> copied off-system as archival storage, from the old mailing list host?
> Given any archival copy of that file, _all_ prior mailing list
> traffic can be easily re-generated at the new host, and merged into
> the new host's pipermail archive.
>
> If it wasn't, why wasn't this Backup 101 step taken?
No it wasn't backed up. It was protected by a Douglas Adams NMP field,
and the previous list admin wasn't a hawk on the mailing list to see my
advice.
And I quote myself:
On Fri, 14 Apr 2023 15:25:02 -0700
Syeed Ali <syeedali at syeedali.com> wrote:
> Exporting a mailing list's archive (list administrator):
>
> https://diypython.us/mailman/private/golugtech_diypython.us.mbox/golugtech_diypython.us.mbox
>
>
> Importing (system administrator-ish):
>
> https://wiki.list.org/DOC/How%20do%20I%20import%20an%20archive%20into%20a%20new%20mailing%20list%3F
More information about the GoLUG
mailing list