[Next] Archiving Story Games

edited May 28 in Story Games
Hi there! After conferring with @Andy and learning some hilarious things about Bing, I've stopped blocking crawlers. This means that archive.org should be able to see our site again.

I just tested, and it now can save for posterity discussions on here, see:
https://web.archive.org/web/20190528123330/http://story-games.com/forums/discussion/21763

There's extensions to make this easier ( for example, the chrome extension: https://chrome.google.com/webstore/detail/wayback-machine/fpnmgdkabkmnadcjpehmlllkndpkmiak )

There are also various software libraries, that might make the whole process automated (nothing guaranteed):
https://github.com/buren/wayback_archiver

@Paul_T, I know this was of particular interest to you.

Comments

  • edited May 28
    Some notes:

    * I know archive.org does a lot of automatic crawling, so maybe given time, that will index the site! Maybe is not certain for sure though, although they do provide an api to check if a page is indexed
    * A complexity of this project is that you need to account for multi-page threads.
    * Another complexity is that you want to make sure that somebody does it after it's not possible to coordinate via this site. (Basically just making sure the last week or so got saved.)
    * The upside though, is that you'll have more or less a year after the site closes to get it perfect.
  • FYI - I have copied a few of my favorite OPs (my own) from here to the forums at fictioneers.net. If you want to keep some conversations going (or simply want to archive the clever things you posted here), you are all invited to do the same.

  • Thanks, @James_Stuart and @AsIf.

    (For what it's worth, that web archive link doesn't actually go to anything; just says that "this page is not archived, but is available on the web".)

    Is anyone knowledgeable enough about this process to know if there's a way to archive the entire forum? Has anyone ever automated this kind of process?
  • I've read about archivebot, and no more.
  • Of course it can be done - there might even be a Vanilla plugin specifically for that purpose - but it would require cooperation between the current server team and the destination server team. FTP access to the avatars folder and a dump of the database would be a start. Given those things, the content could be "massaged" into the format of whatever forum you decide to move it to.

  • @asif: you definitely don't need anything even approaching the database to archive via the wayback machine.

    If somebody went to the wayback machine, and typed in every url for every thread (once for every page of comments), it would all be archived and job done.

    So, to not have that job, you want to automate the URL submissions.

    So you need a list of urls, and then a way to submit them to the archive.org archiver (with little delays so you don't trip their rate limiter)

    You can get the URLs via just scraping the site, and then more or less any language which can make a http request can send those requests.
  • edited May 28
    I hear ya. That's just a whole lot more labor-intensive than copying the DB, and each thread would come in as a single entity, i.e. without data formatting.
  • edited May 29
    As for my own self, I intend to finish the archives indexes I started and copy them in .text . Is there a legal issue with the copying part ?
    I think the verbatim of exchanges is not important. What is of value is the technology, tools and MO. It's an occasion for me to learn while collecting and make tradition live.
    I don't believe the Internet needs Gigs of storygamers rants. On the contrary, it makes search difficult in the noise. I believe what will be forgotten is what was not worth saving.
    I haven't a good visibility of stakes and event in pages 50-100 though, so if the elder dont spill the beans or go to work, some good pages may be lost in this noise forever.
  • While I think the technical lift in submitting to the Internet Archive isn't that big, in addition, I'll just be honest:

    * There's a zero chance I'd hand out the database without doing a bunch of work to sanitize it / removing a lot of personal data. There's probably a near nil chance that I'd hand out even a sanitized version, because propping up another forum site with this data doesn't seem like a great idea from a number of reasons.

    * Even if I did undertake the work of sanitization, it's not actually accomplishing @Paul_T's goal (which I and @andy both agree with is a good goal), which is preservation. The Internet Archive is a good digital repository for long-term preservation. Another forum site isn't.



  • Yes, my suggestion went beyond mere archiving, I was thinking about how to preserve threads "in motion." But your position is totally understandable.

  • DeReel said:


    I don't believe the Internet needs Gigs of storygamers rants. On the contrary, it makes search difficult in the noise. I believe what will be forgotten is what was not worth saving.

    That’s somewhat true... but, on the other hand, I’ve found the Story Games archives to be incredibly useful over the years:

    For instance, every time I play a new game, I can do a search for that game here in the forums and some threads with truly excellent advice, house rules, and ideas come up.

    It’s remarkable how much good discussion went on and is still useful today.

    I’ve been able to find reliably good information on almost any topic by searching old threads on this forum.

  • I agree with Paul. Sometimes you don't know that you wanted something until you're looking for it.
  • edited May 29
    I'm now imagining a big-ass list of URLs, linked to a script that submits one or a batch at a time, and then marks them off. And then we crowdsource the clicking effort.

    ETA: Then I realize that's exactly what James said yesterday. Except for the crowdsourcing part. :-)
  • I would participate!
  • I was a pretty good clicker in my time. Count me in.
  • You absolutely don't want Story Games to go the way of the Gaming Outpost, which in the late 90s and very early 00s was the site of a lot of discussion by the people who then moved over to the Forge. Those discussions would have been useful to trace the intellectual history of TRPG discourse, but now they are gone. A Story Games archive will help some grad student five or ten years down the road.
  • I’ve been talking about this with a few friends, and we’re all concerned that so many valuable links will be lost.

    There are whole games and subsystems and discussions here which get linked to all over the net, as well as actual play reports (by people advertising their games, for instance).

    Will all that be gone and have to rebuilt from scratch somehow? It seems like a real waste.
  • edited May 31
    [post removed for careful consideration while processing possible solutions]
  • I've written a scraper and am archiving the entirety of Story Games. When I'm done, I'll throw it up on Github.

    Lemme know if I DDOS the site or something!
  • Ooh!
  • https://github.com/jeffschecter/storygames

    Got the first year or so's raw HTML up.

    It also has the scraper script, for anyone else who wants to replicate.
  • @Jeph you beat me to it! Good on ya! And in reverse chron order? Great idea.
Sign In or Register to comment.