27
Online Content Is Disappearing (www.pewresearch.org)
submitted 4 months ago by [email protected] to c/[email protected]
all 47 comments
sorted by: hot top controversial new old
[-] [email protected] 13 points 4 months ago

This is why we need the internet archive

[-] [email protected] 5 points 4 months ago* (last edited 4 months ago)

Yes. And wikis, too.
We (people in general) have a tendency to share stuff in forums, like Lemmy. That's fine in the short term, but in the long term this stuff should be sorted, organised, and preferably mirrored. Wikis are perfect for that, while the internet archive is more like "bulk" storage.

[-] [email protected] 5 points 4 months ago

This is why Discord is poison to our shared pool of knowledge, it's such a black hole for many games and software (especially ironically enough open source projects) in lieu of decent docs.

[-] [email protected] 3 points 4 months ago* (last edited 4 months ago)

Ugh!

The worst part is, after wasting a bunch of time tracking down the correct Discord server to ask a question about a piece of software, you generally get lambasted by the "regulars" of that server to "just use the search feature, that's what it's for!"

Yeah, no. I don't want to wade through a reverse chronology of a bunch of conflicting back-and-forth conversations - just gimme a FAQ or some actual documentation!!!

[-] [email protected] 2 points 4 months ago

Thanks for giving me another bone to pick against Discord ¬¬
Seriously. Fuck Discord.

[-] [email protected] 1 points 4 months ago* (last edited 4 months ago)

Wikis are not really a defense against this issue, they are by nature a secondary or (occasionally by policy) a tertiary source of information. Once the source they are recording dies so does the value of that page on the wiki. From the OP:

54% of Wikipedia pages contain at least one link in their “References” section that points to a page that no longer exists.

[-] [email protected] 1 points 4 months ago

There's nothing intrinsically non-primary in the format. At the end of the day they're collaborative writing projects, split into pages with internal and external links; it's just that the biggest one out there happens to be tertiary.

And I believe that they could help a lot with this issue if people migrated/copied meaningful info from forums (like Lemmy) to wikis. Forums are good for discussion, but they tend to accumulate a lot of trash; having the good content sieved and sorted in a wiki makes it more accessible for everyone.

[-] [email protected] 12 points 4 months ago* (last edited 4 months ago)

Donate to the internet archive!

[-] [email protected] 2 points 4 months ago

This is important. I signed up a week ago.

[-] [email protected] 10 points 4 months ago

God bless archive.org. Fuck the turds trying to bring it down.

[-] [email protected] 3 points 4 months ago

Donate and contact your government rep

[-] [email protected] 2 points 4 months ago

run the archivewarrior to help them out, donate or pressure your government to stop it from being killed.

[-] [email protected] 10 points 4 months ago

This content has been moving from free accessible internet into the walled gardens of social media. we did it ourselves. blogs and forums disappeared, copycat farms and SEO made it so maintaining blog or a community forum a waste of time, everyone is just tiktoking and looking to monetise every bit of content they put on the internet.

[-] [email protected] 9 points 4 months ago

The biggest crime against shared knowledge ever committed is photobucket fucking off with the pictures in every "how to fix this car problem" forum post.

[-] [email protected] 5 points 4 months ago

There’s some old Reddit posts like this too. Advice threads where the person who posted a solution went back and overwrote their comments during the boycott last year. I know why they did it but we still lost some information in the grand scheme of things.

[-] [email protected] 3 points 4 months ago

Most of reddit was already archived before: https://the-eye.eu/redarcs/

[-] [email protected] 2 points 4 months ago
[-] [email protected] 2 points 4 months ago

And that is why I criticized the decisions every time I read about it. Every time I got mixed responses but ultimately got a higher downvote ratio.
Also a reason I participate(d) in the archive warrior reddit project.

[-] [email protected] 2 points 4 months ago

And all the "Thanks! Took two minutes to fix after seeing your post" comments just to rub it in.

[-] [email protected] 8 points 4 months ago

This allows the ruling class to write history as they see fit.

[-] [email protected] 2 points 4 months ago

In most cases, this is because an individual page was deleted or removed on an otherwise functional website.

How is this news? I bet a lot of pages were also added in the same time frame, very likely orders of magnitude more.

[-] [email protected] 3 points 4 months ago

I’ve heard the early Internet age referred to as the future dark ages. When all the work, information and content is digitized, it’s prone to being lost to history forever.

[-] [email protected] 1 points 4 months ago

My partner works in historical archiving for science and medicine. Museum work, basically. He's told me so much of the archives are donated collections of notes, letters, journals, and so on from important doctors, researchers, scientists, etc. Donated by the subject themselves in their later years or by their families.

He's told me there is a growing issue with those people starting to donate entirely digital collections, but even worse than that, are all the documents that are not being stored on a physical hard drive, but on web services and clouds. By the time these people are willing to start donating their things, so much of it has just been deleted forever without them realizing it. Or worse, they die, and their families no longer have access.

Working in IT, I told him about Microsoft's growing push to eliminate Outlook and PST files, make it all web based email, and he wasn't surprised, but he was still bummed to hear it. Apparently a not insignificant amount of those donations are locally stored emails.

[-] [email protected] 1 points 4 months ago* (last edited 4 months ago)

Early Internet - yes, but then there's the middle Internet (or the high Internet if you like, like high Middle Ages) which was in large part scraped by archive.org, and also people generally still knew about offline backups in both eras, and then there's the late Internet, which moved to siloed services and at the same time most people using it were and are oblivious about preserving data elsewhere. That's the worst one.

[-] [email protected] 1 points 4 months ago

And those added pages were probably just as worthless as the ones they replaced.

[-] [email protected] -1 points 4 months ago

Because those pages had information that wasn't on the new pages?

Just from my own experience, WotC migrated the Magic the Gathering site to a new one, and while some articles were brought over there were a whole lot of stories, strategies and event coverage that were lost or are only available thanks to Archive.org

[-] [email protected] 1 points 4 months ago

Yes. The whole post is a trick with statistics. Web pages have a limited lifespan. You can do the aame trick with human life spans.

"50 % of humans that lived 60 years ago are now dead". You would tweak the numbers to be factual but something like that makes sense to me.

If you only keep the samples you started out with, of course it's going to decline over time. The data is guaranteed to not grow since nothing is ever added.

[-] [email protected] 2 points 4 months ago* (last edited 4 months ago)

The internet is dying. Everyone knows it. Capitalists ruined it and now AI is propping up a decaying corpse.

[-] [email protected] 2 points 4 months ago* (last edited 4 months ago)

I’ve often wondered what the implications of the internet will be for future historians. On the one hand, there is now an enormous body of writings from not just the educated elite as in the past but from all sorts of ordinary people, which is something that has never really existed before.

On the other hand, how and for how long will these writings be retained? If we stop writing things on paper, will these digital writings become completely inaccessible at some point? Could we have a situation where there are almost no writings from a certain period down the road? That would be unfortunate.

[-] [email protected] 1 points 4 months ago

Already a lot of stuff is becoming one harddrive failure away from being lost forever. Companies don't care about preserving content, so it's largely up to random people happening to have saved a copy of something for it to still exist at all.

[-] [email protected] 1 points 4 months ago

And National Libraries and similar institutions around the world, for example https://www.nb.no/en/digital-preservation/

[-] [email protected] 1 points 4 months ago

Freely licensed works will be preserved a lot better because there will be more copies of them.

Likewise the fediverse is a step in that direction: this message will be federated to hundreds of servers so is more likely to survive longer than if I posted it to reddit.

[-] [email protected] 2 points 4 months ago

It is more important to archive things you like if you got the space. Even if you don't plan on using it for a long time.

[-] [email protected] 1 points 4 months ago

I believe it's often because nobody does their own website anymore but instead uses managed services, e.g. Medium. Or bits of information, that would've been worth a blog post some while ago, end up on sites like StackOverflow, Reddit, etc.. And once these services want to monetise these contents, they usually start with limiting public access.

And OTOH TikTok, Instagram Reels and YouTube Shorts are doing everything they can to further limit people's attention spans and get them addicted to those services. So the people capable of and/or interested in producing proper "content" are dwindling, too.

[-] [email protected] 1 points 4 months ago

That's pretty interesting. It looks like they define inaccessible links as urls that get a 404 or the server doesn't resolve.

I wonder if there are any real implications of this. We seem to know it and work around it in some cases, e.g. StackOverflow saying answers need to contain quotes from pages they reference.

[-] [email protected] 1 points 4 months ago

Yeah, just like most material that was ever printed or carved into a clay tablet. It's the way of things.

[-] [email protected] -1 points 4 months ago

The difference is that most of that content lasted for at least a few decades, if not centuries before being lost to time. As content on the internet is 'destroyed' if no one hosts it any more, a lot of valuable content is being lost in just a few years after being created. Archiving needs to be more widespread and better supported if the resources and culture of the internet as it has evolved over time are to be preserved for posterity.

[-] [email protected] 1 points 4 months ago

The content isn't significantly disappearing. It is being consolidated and monetized.

[-] [email protected] 0 points 4 months ago

I remember a small RPG maker game that I no longer can find on the web, let alone anything that used to be hosted on FreewareFiles or Raymond.cc...

[-] [email protected] 1 points 4 months ago

I used to be on the rpg maker forums back in the day, can you loosely describe it to me?

[-] [email protected] 1 points 3 months ago
[-] [email protected] 2 points 3 months ago

Sorry no luck! I really wish we had an archive of that site. So much high quality original content.

[-] [email protected] 2 points 3 months ago

No worries, someone found me the site it was originally on back in 2017 but lost it due to a total drive failure and only started looking for it again recently.

[-] [email protected] 0 points 4 months ago

If Ian's shoelace site dissapears, I'ma bounce too.

this post was submitted on 18 May 2024
27 points (93.5% liked)

Technology

58133 readers
4724 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS