Technology

37742 readers

497 users here now

A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.

Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.

Subcommunities on Beehaw:

This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

founded 2 years ago

MODERATORS

[email protected]

189

The Internet Is Failing The Website Preservation Test (archive.ph)

submitted 1 year ago by [email protected] to c/[email protected]

87 comments fedilink hide all child comments

This is something that keeps me worried at night. Unlike other historical artefacts like pottery, vellum writing, or stone tablets, information on the Internet can just blink into nonexistence when the server hosting it goes offline. This makes it difficult for future anthropologists who want to study our history and document the different Internet epochs. For my part, I always try to send any news article I see to an archival site (like archive.ph) to help collectively preserve our present so it can still be seen by others in the future.

(page 2) 30 comments

sorted by: hot top controversial new old

[–] [email protected] 2 points 1 year ago (6 children)

Ultimately this is a problem that’s never going away until we replace URLs. The HTTP approach to find documents by URL, i.e. server/path, is fundamentally brittle. Doesn’t matter how careful you are, doesn’t matter how much best practice you follow, that URL is going to be dead in a few years. The problem is made worse by DNS, which in turn makes URLs expensive and expire.

There are approaches like IPFS, which uses content-based addressing (i.e. fancy file hashes), but that’s note enough either, as it provide no good way to update a resource.

The best™ solution would be some kind of global blockchain thing that keeps record of what people publish, giving each document a unique id, hash, and some way to update that resource in a non-destructive way (i.e. the version history is preserved). Hosting itself would still need to be done by other parties, but a global log file that lists out all the stuff humans have published would make it much easier and reliable to mirror it.

The end result should be “Internet as globally distributed immutable data structure”.

Bit frustrating that this whole problem isn’t getting the attention it deserves.

[–] [email protected] 1 points 1 year ago (1 children)

even beyond what you said, even if we had a global blockchain based browsing system, that wouldnt make it easier to keep the content ONLINE. If a website goes offline, the knowledge and reference is still lost, and whether its a URL or a blockchain, it would still point towards a dead resource.

[–] [email protected] 0 points 1 year ago (1 children)

It would make it much easier to keep content online, as everybody could mirror content with close to zero effort. That's quite opposite to today where content mirroring is essentially impossible, as all the links will still refer to the original source and still turn into 404s when that source goes down. That that file might still exist on another server is largely meaningless when you have no easy way to discover it and no way to tell if it is even the right file.

The problem we have today is not storage, but locating the data.

[–] [email protected] 1 points 1 year ago

Why would people mirror some body else's stuff?

Maybe youd personally do a small number of things if you found it interesting, but i dont see that being very side scale.

load more comments (5 replies)

[–] [email protected] 2 points 1 year ago

It sucks that we already have internet lost media

[–] [email protected] 1 points 1 year ago

To be realistic we need to pick and choose what to keep and expend effort/resources on those chosen things.

Without a technological breakthrough in data storage at some point there's got to be some kind of triage done. We all generate more information now than ever before, and this trend just keeps increasing. With things like A.I, XR, the metaverse or other similar concepts it'll also get exponentially more insane how much data we generate. It's not realistic at the moment, technologically or financially, to keep all of it in multiple geographically distributed copies, in a format that will last forever. For a lot of people or organizations it's not even feasible to keep one copy in some cases due to costs.

To do otherwise we would need a breakthrough that enables insanely cheap, infinitely scalable storage, that is immune to corruption (physical or digital) and optionally immutable to prevent modification. It would have to function in such a way that any reasonably advanced civilization can use the basic laws of physics to figure out how it works and consume the contents without any context of what the devices are. It would also have to work regardless of how fragmented it is, to use terms of today's technology if they only find one hard drive out of what used to be a pool of 100, it still needs to work on some level.

It's an interesting thought experiment and hopefully there's some ridiculously smart people working on it.

[–] [email protected] 1 points 1 year ago

One of the most interesting aspects of historic preservation of anything is that it's an extremely new concept. The modern view of it is about a single lifetime old, dating back to the early 20th century. Historic structures were nothing but old buildings and would be torn down with the materials repurposed as soon as there was a better use for the land or materials. Most historic buildings that date to the 19th century and earlier are standing not because people invested significant time and money into maintaining a historic structure as it originally was but because people were continuing to live, work, socialize or worship in the structure.

Preservation is entering a very interesting new phase right now particularly in transportation preservation as many of the vehicles in preservation have now spent significantly longer in preservation than they did in active service. There are locomotives that were preserved in the 50s and 60s who's early days of preservation are themselves a matter of their history. There are new-built replicas of locomotives from a hundred years earlier that are now a hundred years old. In railroad preservation there's also now the challenge of steam locomotives being so old and so costly to maintain that some museums are turning to building brand new locomotives based on original blueprints

[–] [email protected] 1 points 1 year ago (1 children)

Other historical artefacts like pottery, vellum writing, or stone tablets

I mean I could just smash or burn those things, and lots of important physical artifacts were smashed and burned over the years. I don't think that easy destructability is unique to data. As far as archaeology is concerned (and I'm no expert on the matter!), the fact that the artefacts are fragile is not an unprecedented challenge. What's scary IMO is the public perception that data, especially data on the cloud, is somehow immune from eventual destruction. This is the impulse that guides people (myself included) to be sloppy with archiving our data, specifically by placing trust in the corporations that administer cloud services to keep our data as if our of the kindness of their hearts.

[–] [email protected] 1 points 1 year ago

Yeah, it's somewhat ironic that in the "information age" information is never been so volatile

load more comments