[GoLUG] Mailman archive restoration from subscriber/participant emails (was: Web host discussion, 7/5/2023)

Alex Finkel alex.finkel at gmail.com
Thu Jul 6 08:08:55 EDT 2023


I was not able to attend the meeting last night; was there a root cause
identified for loss of the previous mailing list hosting?

Alex


On Thu, 2023-07-06 at 04:59 -0700, Syeed Ali wrote:
> This is verbose not necessarily for you Rick but for future
> generations.
> 
> 
> 
> On Thu, 6 Jul 2023 02:51:29 -0700
> Rick Moen <rick at linuxmafia.com> wrote:
> 
> > Steve Litt <slitt at troubleshooters.org> wrote:
> > 
> > > On another subject, GoLUG's mailing list died, so we made a new,
> > > opt-in mailing list.  
> > 
> > It's truly tragic whenever a longstanding public mailing list not
> > only
> > dies, but also loses years of history, for no better reason than
> > nobody ever having bothered to do basic backups of primary data.
> 
> I offered help, and even though I have no idea what I'm doing we have
> success.
> 
> I've learned how to convert a participants email archives into what
> mailman can use for its archives.  It's tested and works, and I have
> a
> chunk of data stretching back to 2016 with more to come.  There are
> two
> considerations which I will have to address:
> 
> 
> 1.  Filtering emails with X-No-Archive headers.
> 
> https://en.wikipedia.org/wiki/X-No-Archive
> 
> Mailman has an option for users to have their emails thereafter be
> excluded from future Mailman archives.  Presumably that's indicated
> in
> the email with X-No-Archive, although I don't know what to
> specifically
> look for yet.
> 
> Users still get the email, and so if those emails are handed to me
> I'd
> have to figure that out and filter them out.
> 
> (I don't understand why this is an option given that people reply to
> one
> another, and the quotes are themselves archived.)
> 
> 
> 2.  Filtering off-list conversations.  Some are stored alongside
> mailing list emails and might be sent to me, and those must not be
> uploaded into Mailman archives.
> 
> This is probably straightforward to figure out by checking that only
> emails which are either to or cc to the mailing list are included.  I
> think the problem would be to identify what iterations of what domain
> names constitutes "the mailing list"; there seems to be more than one
> generation of this one:
> 
> tech at golug.org
> golugtech at diypython.us
> 
> 
> 3.  Deduplication.  Maybe.  Claws Mail can trivially deduplicate.
> However when I export and then re-import them, some emails are
> duplicated.  It's not a display problem, they are real emails.  I
> don't
> know if importing into Mailman will automatically deduplicate those. 
> I
> do know there are other processes to deduplicate other than this
> email
> client.  Deduplication would have to be sorted out if multiple users
> provided emails from their separate subscriptions.
> 
> 
> 4.  Personal information.  People who give me their emails are also
> giving me personal information embedded within them.  Testing does
> not
> show any personal information appearing in archives, so I'm confident
> there.  However, I wonder if exporting the Mailman archives and then
> looking within them would show anything.
> 
> -
> 
> Ultimately what I'm going to do is make sure everything gets put into
> a
> simple one-textfile-per-email format (MH) and then iterate with a
> shell
> script (because that's all I know) to set aside some emails as
> needed,
> and then confirm those by hand.  I'll use a test email list, then
> cross
> my fingers and use intuition when randomly auditing.
> 
> -
> 
> I'll probably make documentation for future generations to learn from
> this.  At the least it would be for my reference because going
> forward
> I and many people can freely host any LUG piggybacked on existing
> hosting for free; no LUG website or mailing list needs to vanish
> because of similar hosting concerns.
> 
> Many hosting plans allow:
> 
> - Point-and-click Mailman setup.
> - Point whatever domain name at it, and it just works.  Or just use a
>   free subdomain.
> - Functionally limitless subdomains, email addresses, storage, and
>   bandwidth (these are LUG mailing lists after all)
> 
> I have to pay for a hosting plan and at least one of my domain names,
> so all this is for free for my lifetime.  It's all also trivially
> exportable as you say below.
> 
> (I do presently have a problem with exporting Mailman's complete
> archive.  I did it once but can't figure out how to do it again, and
> this is worrying.)
> 
> 
> > Was /var/lib/mailman/archives/private/tech.mbox/tech.mbox
> > periodically
> > copied off-system as archival storage, from the old mailing list
> > host?
> > Given any archival copy of that file, _all_ prior mailing list
> > traffic can be easily re-generated at the new host, and merged into
> > the new host's pipermail archive.
> > 
> > If it wasn't, why wasn't this Backup 101 step taken?
> 
> No it wasn't backed up.  It was protected by a Douglas Adams NMP
> field,
> and the previous list admin wasn't a hawk on the mailing list to see
> my
> advice.
> 
> And I quote myself:
> 
> 
> On Fri, 14 Apr 2023 15:25:02 -0700
> Syeed Ali <syeedali at syeedali.com> wrote:
> 
> > Exporting a mailing list's archive (list administrator):
> > 
> > https://diypython.us/mailman/private/golugtech_diypython.us.mbox/golugtech_diypython.us.mbox
> > 
> > 
> > Importing (system administrator-ish):
> > 
> > https://wiki.list.org/DOC/How%20do%20I%20import%20an%20archive%20into%20a%20new%20mailing%20list%3F
> 
> 




More information about the GoLUG mailing list