If you've been using Scaleway Glacier for storing backups and files, I would greatly recommend trying to download your files right now while you still have a way of replacing them!

I've been storing backups of my pictures at Scaleway (~1.5TB) since 2020 or so, and I unfortunately needed to pull them from backup.

When I tried restoring them from GLACIER to STANDARD tier, some files did not come out of GLACIER tier.

There also are were about 4 files that could not be downloaded from STANDARD tier (that did not get stuck in GLACIER but) trying to do so would result in an unexpected EOF error.

I contacted Scaleway about this, and they confirmed some of my data has been permanently lost and I got a whole 50 euros (in Scaleway vouchers) for it... If this was just a single file I'd be, sure, shit happens, but they lost 5 full size pictures (mixed between .jpg and raw files)? I don't want a voucher I want my photos lol

Looking online I found another user reporting the exact same happening to them:

The same issue with data that appeared to be just fine, until they tried to pull it out of GLACIER.

I've been in contact with Scaleway, and they don't want to produce a post mortem because it's an "isolated incident", even though it obviously isn't as at least two users (me and Ruben) have spoken out about this having happened to them. Considering the majority of the users are probably not touching their GLACIER data, I'd argue that there definitely are a lot of users that are likely to be also affected by this (without them knowing)

If you're using GLACIER, I would highly recommend trying to download all your files once. To test downloading, do this through a VM at Scaleway so you don't get charged Egress fees.

It also can take up to 24 hours to get everything from GLACIER to STANDARD, you can do so through rclone.

Keep in mind "restore" fees are 0.009 euros per GB since Q3 2023.

Commands:

GLACIER to STANDARD: rclone backend restore s3:bucket/ -o priority=Standard

Download to /tmp (make sure this isn't memory backed, may wanna setup nullfs etc.): rclone copy scaleway:bucket/ /tmp/

If you find out you have data loss, make sure to open a ticket with them and please report it here as well so I can pressure them into making a post mortem.

4

2

I made an unattended batch audio conversion tool that can handle 10's of thousands of files (lemmit.online)

submitted 5 months ago by [email protected] to c/[email protected]

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/datahoarder by /u/Puzzleheaded-Soup362 on 2024-04-09 12:28:53.

Ok, so it really tailored for gave devs with no technical knowledge but it's easy to change bitrates and such if you know ffmpeg. Open source, you can run/build from node or run the prepacked exe that requires nothing.

Just give it the input and output and this tool does the rest. Supports WAV, MP3, OGG, FLAC, AIFF, and M4A including Vorbis and Opus. Tags were a pain so it dumps all but the common ones and loop data. Finds the closest sample rate to input(Opus...) and converts loop data timing too!

Detailed error logs can find corruption in files that convert and play just fine. You would never know otherwise.

Multi-threaded and very low overhead make this the fastest way I know to convert audio. I have tested with 25k file batches on an old machine. The UI still doesn't lag and it will run for hours without being a problem except for disk space.

If you like, drop a star/like please.

itch.io

Source and Release on GitHub

5

1

Restoring Robotech: Crystal Dreams (N64) source code from tape backup (lemmit.online)

submitted 5 months ago by [email protected] to c/[email protected]

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/datahoarder by /u/T0rg0spizza on 2024-04-09 00:09:27.

I'm working on a side-project that I literally inherited from my father a little over a year ago. Back in 1998, he was in charge of shutting down one of the GameTek studios (which is an entire story by itself). Amongst his boxes was a set of five DDS2 backup tapes and the prototype N64 cartridge of Robotech: Crystal Dreams. The game was never released, but my son and I did a dump of the prototype cartridge and released it.

I've started to focus on the backup tapes and seeing what secrets they may contain. This is very similar to the Frogger 2 restoration project (). My goal is to extract the data and post it in a similar manner. I'm aware of the potential legalities, but I'm operating under a preservationist perspective . I had to do a bit of investigation work first. GameTek used a collection of Mac Quadras 840av, PowerPC 8100/100av (of which I have two still working), somewhere an IRIX Indigo server was in the mix, and the file servers used Netware File System. I had to source a DDS2 compatible tape drive and got a STD2401LW drive from eBay. I'm using a 68-pin to 50-pin SCSI adapter to my Adaptec 2940 card and put the drive into narrow mode. Using tapeimgr I was able to do a dump of the first tape and figure out that the backup software used was ARCServe 6 and a block size of 512k.

Thanks to the Internet Archive I got the closest version that I could (ArcServe 9) running on a Windows XP machine that my dad had laying around. I started a merge operation on the first tape and I've verified that indeed - this ***is*** a backup of the source code repository. However, the bad news is that the merge operation is failing about 34 minutes into the operation on session #3 which is where it got interesting where there were some design files for another prototype: Cajun Racing.

The errors are all media "unrecoverable data error" related. Even after running a clean, I'll get E6052 Sequentional Positioning Error, E6092, E3713, E3803 Invalid trailer signature, E3802 Invalid header signature, etc. The error varies on the merge session. I've tried to merge the other four tapes using ARCServe, but the software reports that it can't find the 4th session so nothing happens.

At the moment, I'm a bit stuck as to what to do. ARCServe isn't the most user friendly piece of software and I can't figure out a way to tell it to just ignore read-errors and continue onwards. I'm trying to be cautious as well since the tapes are old, despite being kept in a dark temperature controlled room. I was able to restore by tree for session 2 and as much as I could get from session 3 but the file sizes are all 1K (Netware to NTFS issue?). How would I get the remaining files if I can't get past the merge error on the first tape? There was one registry hack to "DonotUseCatalogMerge" but that doesn't seem to help. I'm more than happy to provide any additional information.

6

1

Full comparison between H.264, H.265, and AV1 video encoding capabilities in speed, file size, and quality to see which encoders can properly save the most space for their speeds. (www.reddit.com)

submitted 5 months ago by [email protected] to c/[email protected]

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/datahoarder by /u/AlternateWitness on 2024-04-08 20:30:19.

7

1

Xbox President has set up a new team dedicated to game preservation and forward compatibility (www.windowscentral.com)

submitted 5 months ago by [email protected] to c/[email protected]

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/datahoarder by /u/retrac1324 on 2024-04-08 16:32:09.

8

1

I can live without my flying car but I want my 64TB SSD. (lemmit.online)

submitted 5 months ago by [email protected] to c/[email protected]

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/datahoarder by /u/Deep-Egg-6167 on 2024-04-07 21:21:50.

I remember reading many years ago that samsung was working on stacked ssd storage so their 2TB would be 4, 8, 16, 32 and 64tb in time. I'm not sure if they are still working on that tech or gave up on it. I realize you can pay a fortune for commercial SSDs but I'd love to build my first SSD array for home use.

I have a couple of arrays now, both over 100gb but I'd love a near silent one that didn't require so much power or fans. Granted I've slowed my fans but still it would be much nicer if affordable large ssds were available.

Theres always someone saying something like consumers don't NEED this or that - pretty sure that is up to the consumer to decide what they need. The consumer doesn't NEED a computer if you think about it, hot showers, indoor plumbing etc.

9

1

What's the best way to test a set of files for corruption? (lemmit.online)

submitted 5 months ago by [email protected] to c/[email protected]

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/datahoarder by /u/Leeroy909 on 2024-04-07 18:40:48.

TL;DR What's the easiest way to test my backed up files against current versions for corruption and to make sure everything is there?

Evening folks, I'm looking for the easiest way to test my backup protocol on Windows by checking the backup against my current files for corruption and to make sure everything is identical and up-to-date.

What would you suggest?

Thanks

10

1

Do I really need to save old VHS recordings of network TV??? (lemmit.online)

submitted 5 months ago by [email protected] to c/[email protected]

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/datahoarder by /u/Icy_Occasion_3105 on 2024-04-06 21:10:29.

Recently I have been digitizing my old home movies and in the process I grabbed a bunch of mid-80s to mid-90s VHS tapes my parents used to record TV shows on. With streaming and digital files now, you can pretty much find any old TV shows you can think of so the shows are not necessarily the reason to save. But it is kind of cool to see the old commercials and even TV news broadcasts. We also have tons of complete NBA games (mostly the Chicago Bulls from their playoff runs.)

Is there really any need to save this stuff? Does anyone care? Is YouTube the best place to dump it or maybe the Internet Archive? The hoarder in me hates to just throw it out without at least saving it but is it really useful to anyone but me? Data and physical storage is not an issue.

11

1

The largest campaign ever to stop publishers destroying games (www.youtube.com)

submitted 5 months ago by [email protected] to c/[email protected]

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/datahoarder by /u/cheater00 on 2024-04-03 08:53:38.

12

1

How do you decide what to purge from your library? (lemmit.online)

submitted 5 months ago by [email protected] to c/[email protected]

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/datahoarder by /u/endiZ on 2024-04-02 23:56:03.

Up to this point, I haven't had to purge any content as I've just added more drives (usually one 8tb drive per year). I just added a few more drives, which maxed out my chassis (12 drives). So over the next few years I either have to come up with a method to purge older ISOs, or start replacing the 8tb's with larger drives... and tbh I don't like the sounds of either option.

Life of a data hoarder, I know.. but what does everyone else do to get over the hump?

edit I just realized I walked into a room of alcoholics and asked for sobriety tips. God speed everyone.

13

1

Offline personal wiki like Wikipedia? (lemmit.online)

submitted 5 months ago by [email protected] to c/[email protected]

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/datahoarder by /u/TimothyRoderick88 on 2024-04-01 21:31:10.

Is there a way to have a personal wiki that's laid out similar to Wikipedia where I can make my own custom entries? I've been using text files up until now but I want something that looks better.

14

1

We all know that SD cards and flash drives can’t reliably hold data for many years. However, can an SD card/USB drive be considered reliable after it has been unused for say nine years but then yo... (lemmit.online)

submitted 5 months ago by [email protected] to c/[email protected]

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/datahoarder by /u/Skarmory113 on 2024-04-02 03:35:09.

Original Title: We all know that SD cards and flash drives can’t reliably hold data for many years. However, can an SD card/USB drive be considered reliable after it has been unused for say nine years but then you format it?

Now I know that SD cards and USB drives can’t really guarantee to hold data for more than a year or two. Without being used. But what if you find an old USB drive from nine years ago, format it, and then want to store it for say six months. Are you likely safe? Even though it INITIALLY wasn’t powered on for nine years? like will it still be as reliable of a drive even if it has been untouched for nine years?

15

1

Idea concept proposal- donate your hot spare (lemmit.online)

submitted 5 months ago by [email protected] to c/[email protected]

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/datahoarder by /u/SirEDCaLot on 2024-04-01 09:11:39.

I was in another thread discussing how IA is a single point of failure, but that got me thinking about how to better distribute data to make it censorship-resistant or delete-resistant.

Many years ago when people left their computers on 24/7 (before sleep worked well and power was expensive), there were screen savers- animations that'd play on the screen to prevent CRT burn in. Some were basic, some were quite elaborate and required significant CPU power. A few researchers got in on that and the result was distributed computing projects like Seti@Home (analyzed radio signals for signs of aliens), Folding@Home (ran protein folding simulations to try and find cures for diseases), etc. There were a bunch.

As computer sleep modes started to work well, and power got more expensive, these fell out of fashion as people didn't leave their computers running 24/7. But the idea was a good one.

People like us almost all have a server or NAS running 24/7, usually with no spin down time. We're all (hopefully) running some sort of redundant RAID. So why not have a way to donate either free array space, or a hot spare drive itself? This software would ensure that the user's needs come first, and distributed data would always be dumped in favor of user data. Or if a hot spare drive, if the main array lost a drive the hot spare would instantly kick in and rebuild so the real array overwrites the distributed storage data.

The key with this is it has to be 100% safe, and 100% brain dead easy.

I think such a system should probably start with the guts of BitTorrent or something like it so others can retrieve the data without the user needing to forward a port.

There should be a way to keep it anonymous, IE only allow download requests via TOR.

It should be secure- whole thing runs in a sandbox.

And it should be user controlled- the operating user can optionally select what sort of stuff they want to host. So for example I could say I want to host early Internet images not subject to copyright disputes, I want to allow 5mbps upload maximum to anyone who wants the data, I want to allocate my whole 22TB hot spare, etc. The central server would know who has copies of what and who's willing to share what with whom.

The immediate result of this could be mirroring large parts of IA itself. However the system overall could also be used by other websites or groups (including businesses) as a sort of reliable distributed storage system.

Thoughts?

16

1

Just got the JVC HM-DH40000U (my holy grail VCrR) from eBay!!! (i.redd.it)

submitted 5 months ago by [email protected] to c/[email protected]

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/datahoarder by /u/moonbasemaria on 2024-04-01 21:28:30.

17

1

If there is a book on Internet Archive your interested in, GO DOWNLOAD IT NOW. Also PLEASE stop using the IA as the sole host for preservation projects. (lemmit.online)

submitted 5 months ago by [email protected] to c/[email protected]

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/datahoarder by /u/ArguingMaster on 2024-04-01 08:25:22.

So as many of you probably know, the Internet Archive has an extensive selection of books available through both its publicly available, fully downloadable texts and its "CDL" lending library. As many of you also likely know, in 2020 they were sued by an alliance of corporate publishers, a lawsuit which last year they lost. Appeals are on going, but I feel like everyone should know that the settlement isn't likely to improve, in fact the publishers want to make it worse.

When they lost their case initially, there was a single concession the judge made in favor of the IA which is that he limited the scope to works currently being commercially exploited by the publishers. This meant that arguably the most valuable books in the archive, those which are NOT commercially available as eBooks (and in most cases or as physical books) are still available for the time being. The corporate lawyers were NOT happy about that, and part of their appeal is specifically asking to have that exception removed. The injunction they are asking for is a complete dismantling of the IA's CDL system, meaning any book that is currently in the "Books to Borrow" library on IA would immediately become unavailable.

If there is a book in that section that you are interested in, that you think you might be interested in, that you think might be useful to a hobby space your in in the future, if you think you might want to access that book for any reason: GO DOWNLOAD IT NOW, DON'T WAIT.

Stop reading, go download it. There are two scripts currently available for downloading borrowed books, which download the raw page images which you can easily assemble into a PDF.

Option 1: This is a bookmarlet that lets you download it. Its somewhat annoying to use because you have to inspect the page source while in a certain view of the book and find a link in the code. This is what I'm using currently.
Option 2: That is a ViolentMonkey script, I can't test it as I am a Firefox user and it only supports Chromium based browsers and I refuse to install that dogshit browser on my system.

Honestly: I could not give less of a fuck about the books that are commercially available as eBooks. If I want access to a book badly enough I can scrounge up $15 to go buy it (assuming it is not *ahem* available elsewhere). What concerns me is all the collectible books, obscure/very old technical manuals, limited print run books, etc that are available on Archive.org because thanks to eBay scalpers spamming listings like "VERY RARE ONLY 2 PRINT RUNS OUT OF PRINT L@@K" alot of those books are artificially inflated to be $50-100+ and I will not pay that for a book. Books are also one of the most difficult forms of media for the average person to archive. You either need an extremely expensive book scanning device setup and lots of time, or to destroy the original by removing its bindings and running it through an automatic document feeder. So once the IA downloads are gone, if no one else reuploads them alot of these likely to just disappear from digital availability.

Ideally (and maybe there already is such a project that I am not aware of) someone would go through with a more powerful, customized ripping tool and grab everything they can from the IA. Theoretically the data storage requirements shouldn't be too insane, a PDF at a reasonable resolution is basically negligible in file size in 2024.

ONTO MY SECOND POINT: PLEASE STOP USING SOLELY ARCHIVE.ORG TO HOST YOUR PRESERVATION PROJECTS.

The number of times I see a website has gone down, and I ask "well did anyone save the files?" and the answer is "Yeah, they are right here at *insert archive.org link*" is driving me insane. In 2024, with the current ongoing legal battles and the uncertain effects they will have on the archive Internet Archive cannot and must not be considered a safe long term data storage solution for unique and valuable data. As I stated, the outcomes of these legal battles are only likely to get worse. The book publishing industry obviously wants the IA to have 0 books available on its website, and US copyright law, being heavily biased towards corporate profit interests, supports them fully. The Judge in the case made it very clear that if even $1 dollar was lost from the publishers bottom line, that outweighs any and all public interests under fair use.

Read this next sentence carefully: What I am about to say is NOT my opinion of what is right or what is wrong in this case, it is my (admittedly non lawyer) interpretation of the legal situation Archive.org has brought upon itself.

Controlled Digital Lending, and the activities of The Internet Archive are brazenly, openly illegal activities of copyright infringement. Why they ever thought that in the country where corporations basically own the legal and legislative systems (I should note, I do not believe the US is a democracy of people anymore, I believe it is a democracy of corporations, so my viewpoints are coming from that viewpoint) and consumer protections are basically non-existent they thought that this would fly is beyond me. IMO CDL flew under the radar for as long as it did because they intentionally limited the scope of it, and the negative PR associated with going after a non profit served as a serious deterrent to potential lawsuit claimants. Over the last decade the Internet Archive has expanded and accelerated that program slowly expanding the scope at which it operated, culminating in the tremendously stupid decision to implement the National Emergency Library allowing unlimited borrowing of every eBook in the Internet Archives collection. At that point, the IA essentially began operating as a piracy website. There was functionally no difference between it, and shadypdffiles4free.biz or any of the dozens of other sources to download PDFs of books.

What I suspect but cannot confirm is that they knew this lawsuit was coming sooner or later, and purposefully decided to fire the opening salvo at a time during which public support for such an effort would be maximized, but by the time this reached the court system the pandemic was functionally over for most people as far as impacts on their day to day life and they got steam rolled by the publishing industry. What Archive.org was almost certainly hoping to achieve, was causing a change in law to legalize their CDL concepts. IMO that was hopeless in the US, where both political parties though indeed different in social policy are very much on the side of Neo-liberal capitalist economic policy. If they had played their cards differently I think they could have flew under the radar for a good deal longer than they had, but instead they played their hand, lost their entire bet, and are now probably coming out worse off than when they entered the game.

There are almost certainly going to be more lawsuits.

Now that the book publishers lawsuit is nearing finalization (I don't see this making it up to the Supreme Court, and even if it does the current supreme court is probably the most corporate friendly court in history) and there has been almost nothing in the way of meaningful public outcry (no, normal people do not care about random people/bots screaming on twitter from their moms basement) we are going to see more lawsuits from other industries which feel like they have been harmed in some way by the Internet Archive. One which I PROMISE is coming, and I am amazed it hasn't yet, is a lawsuit from the video game publishing industry. Archive.org has, over the last decade or so, become a hub for hosting ROMS for basically every video game platform ever made. The IA, at one time, was very good about quickly removing things like REDUMP romsets but has over the years seemingly embraced hosting them. I cannot fathom why they thought that was a good idea, or necessary. Retro gaming isn't a niche hobby anymore, its a billion dollar business they've put themselves firmly in the crosshairs of. Gaming corporations are some of the most litigious corporations on the face of the earth, and the kicker is these files are not in any danger at all. Literally any commercially released game for a commercially released video game platform has 10000 websites that are hosting those files, and those websites continue to exist because they get enough traffic to be profitable through ad revenue, and they are easy enough to quickly dismantle in the even of a cease and desist and then have spring back up 10 days later under a new name with a slightly different layout. The IA does not have that luxury.

What I am worried about is all the different software, computer games (ranging from the earliest Apple II games up to 1990s PC games), prototypes, etc that are only available on the Internet Archive, getting caught up in something stupid like a Lawsuit from video game publishers because the IA was found to be hosting 20 different copies of every Xbox 360 game ever made. I've already seen a small scale version of this happen when TheIsoZone imploded and took its decade plus old archive of digitized PC games, homebrew software, etc with it. Alot of games ...

Content cut off. Read original on https://old.reddit.com/r/DataHoarder/comments/1bswhdj/if_there_is_a_book_on_internet_archive_your/

18

1

World Backup Day: Training scenario: Ransomware (lemmit.online)

submitted 5 months ago by [email protected] to c/[email protected]

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/datahoarder by /u/--Arete on 2024-03-31 13:11:08.

In honor of World Backup Day today (March 31st) I decided to share a training scenario I use every year to make sure my backups are solid. It's a worst-case scenario, but it helps to put things into perspective.

Training scenario: Ransomware

You come back from vacation and access your computer only to realize you have a ransom virus that has managed to encrypt all your data.
You have local backups and redundancy, but these files are also encrypted since they were connected to the computer at the time of infection.
You check your offline backup on an external disk, but this backup is outdated since you haven't had time to make backups.
You check your cloud backup, but the cloud backup has been overwritten with corrupted/encrypted files.
You check the file history (retention) of the cloud backup. But even these files are encrypted because you were on vacation and didn't realize your computer was infected until you came back a month later.

What have you learned?

I would love to hear about other training scenarios or if this one could be better.

19

1

I don't understand the "2" of the 3-2-1 rule (lemmit.online)

submitted 5 months ago by [email protected] to c/[email protected]

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/datahoarder by /u/LoganJFisher on 2024-03-30 21:56:09.

3 copies of your data

2 types of media

1 of the copies being off-site

I get the point of "3" and "1", but I'm failing to understand what "2" actually does to help make my data safer.

Perhaps I'm misunderstanding what is meant by "2 types of media", so let's start there. What does that mean to you? I interpret that as meaning I shouldn't have all copies on drives (e.g. HDD or SSD), but rather should diversify and have at least one copy also on a different form of data like a CD, flashdrive, or tape. Is that right? What's the point of that? Why not instead just make the "1" off-site copy be offline too so there's no risk of all copies being simultaneously erased or otherwise modified?

20

1

Rugrats reboot removed from streaming platforms and everything removed from Nick website (i.redd.it)

submitted 5 months ago by [email protected] to c/[email protected]

0 comments fedilink