this post was submitted on 08 Aug 2024
8 points (100.0% liked)

datahoarder

6758 readers
1 users here now

Who are we?

We are digital librarians. Among us are represented the various reasons to keep data -- legal requirements, competitive requirements, uncertainty of permanence of cloud services, distaste for transmitting your data externally (e.g. government or corporate espionage), cultural and familial archivists, internet collapse preppers, and people who do it themselves so they're sure it's done right. Everyone has their reasons for curating the data they have decided to keep (either forever or For A Damn Long Time). Along the way we have sought out like-minded individuals to exchange strategies, war stories, and cautionary tales of failures.

We are one. We are legion. And we're trying really hard not to forget.

-- 5-4-3-2-1-bang from this thread

founded 4 years ago
MODERATORS
 

cross-posted from: https://beehaw.org/post/15404535

Data: https://archive.org/details/gamefaqs_txt

Mirror upload for faster download, 1 Mbit (expires in 30 days): https://ufile.io/f/r0tmt

GameFAQs at https://gamefaqs.gamespot.com hosts user created faqs and documents. Unfortunately they are baked into the HTML webpage and cannot be downloaded on their own. I have scraped lot of pages and extracted those documents as regular TXT files. Because of the sheer amount of data, I only focused on a few systems.

In 2020, a Reddit user named "prograc" archived faqs for all systems at https://archive.org/details/Gamespot_Gamefaqs_TXTs . So most of it is already preserved. I have a different approach of organizing the files and folders. Here a few notes about my attempt:

  • only 17 selected systems are included, so it's incomplete
  • folder names of systems have their long name instead short, i.e. Playstation instead ps
  • similarly game titles have their full name with spaces, plus a starting "The" is moved to the end of the name for sorting reasons, such as "King of Fighters 98, The"
  • in addition to the document id, the filename also contain category (such as "Guide and Walkthrough"), the system name in short "(GB)" and the authors name, such as "Guide and Walkthrough (SNES) by BSebby_6792.txt"
  • the faq documents contain an additional header taken from the HTML website, including a version number, the last update and the previously explained filename, plus a webadress to the original publication
  • HTML documents are also included here with a very poor and simple conversion, but only the first page, so multi page HTML faqs are still incomplete
  • no zip archives or images included, note: the 2020 archive from "prograc" contains false renamed .txt files, which are in reality .zip and other files mistakenly included, in my archive those files are correctly excluded, such as nes/519689-metroid/faqs/519689-metroid-faqs-3058.txt
  • I included the same collection in an alternative arrangement, where games are listed without folder names for the system, this has the side effect of removing any duplicates (by system: 67.277 files vs by title: 55.694 files), because the same document is linked on many systems and therefore downloaded multiple times
no comments (yet)
sorted by: hot top controversial new old
there doesn't seem to be anything here