Archives of other dead forums

  • In addition to hosting the archive of, it would be good to host the archives of election-related Yahoo Groups (and maybe Google Groups, if CES group is going away?)

    I got some of the data from each of these, but I haven't gone through to see how complete they are:

    • ApprovalVoting [Citizens For Approval Voting]
    • AR-NewsWI ["Animal Right News- Wisconsin", not sure why listed]
    • AVFA [Approval Voting Free Association]
    • btpnc-talk [Libertarian Boston Tea Party Free Association, not sure why listed]
    • Condorcet [Membership approved]
    • electionmethods
    • EMIG-Wikipedia [Wikipedia Election Methods Interest Group]
    • instantrunoff-freewheeling
    • InstantRunoffCA [Membership approved]
    • InstantRunoffWI
    • RangeVoting [Automatically rejected]
    • stv-voting

    Total size of my dumps are probably <1.5 GB with all the duplicated content removed.

    Kristofer listed a few more groups that he downloaded. ArchiveTeam maybe got some that were not listed? Maybe others are floating around out there?

    Unfortunately, I think some of these were lost forever, because they were accessible only to members, and no members archived them (unless they did so without posting about it).

  • Reconciling different formats and relational schemata and schemes of identification should be interesting, but probably possible.

    The archive from Discourse embodies a strict hierarchy. At the root, it has "categories", and with a category, "topics", and within a topic, posts. There are no multiples of tags or categories associated to anything. What data schema would unify the archives coming from different origins?

    What is the most human-friendly way to URL-encode arbitrary trees of constraints linked with logical operators for a query?

  • @Jack-Waugh said in Archives of other dead forums:

    What data schema would unify the archives coming from different origins?

    Do they need to be in a unified format?

    I've gone through all my files and removed duplicates, made a list of which contain what, etc. The dumps I have are in two different formats:

    • Official dumps generated from the getmydata URL
    • Scraped data using the yahoo-group-archiver tool

    The getmydata dumps all contain messages, and most contain links and files as well. There are apparently bugs that make these dumps incomplete, however. (reddit, github)

    Structure is:

    • Messages
      • Series of .mbox files, each containing many messages
    • Links
      • .url files and folders of .url files
    • Files
      • Folders of uploaded files, usually one folder per user or project
      • Doesn't include attachments

    The yahoo-group-archiver didn't do well in groups I wasn't a member of. The ApprovalVoting group contains nothing except the About description, for instance. In other groups it got more than the getmydata link did.

    Structure is described on ArchiveTeam along with list of viewers:

    • about
      • .json files and photo
    • attachments
      • Folders of files, with .json metadata in each
    • calendar
      • .json files
    • databases
      • .json and .csv files
    • email
      • .json files for metadata of many messages
      • .json files, one per message
    • files
      • Same folder names and files as "Files" above, though each also contains .json metadata
    • links
      • .json files
    • members
      • .json files
    • photos
      • Image files and .json metadata, don't seem to be included in "Files" of other format.
    • polls
      • .json files
    • topics
      • .json files for metadata of many messages
      • .json files, one per message
      • Folders containing attachments

    These can be converted into .eml or .mbox files using Yahoo Groups Archive Tools

  • Sounds like a lot of work, if the messages are to be presented via a browser.

  • @Jack-Waugh In that case we might want to figure out which of these forums are most worth the effort of archiving. (At least, if each forum would require separate work.)

    If some of these forums weren't open to the public we might want to be careful about just publishing it though.

  • @Marylander They would require separate work iff they are in separate formats.

    Maybe we need to ask stakeholders for a rating of the value of each type of archive, grouping them by format if any two of them have the same format.

  • Kristofer set up a browseable archive:

    The forums I've archived have at least one message with at least one of
    the terms "center squeeze", "Condorcet", "d'Hondt", "favorite betrayal",
    "monotonicity", "Range voting", "Ranked Pairs", Sainte Lague", "Schulze
    method" or "Score voting".

    The browseable parts have the /web/ foldername:

Log in to reply