this post was submitted on 18 Dec 2024
44 points (95.8% liked)

Fediverse

28688 readers
766 users here now

A community to talk about the Fediverse and all it's related services using ActivityPub (Mastodon, Lemmy, KBin, etc).

If you wanted to get help with moderating your own community then head over to !moderators@lemmy.world!

Rules

Learn more at these websites: Join The Fediverse Wiki, Fediverse.info, Wikipedia Page, The Federation Info (Stats), FediDB (Stats), Sub Rehab (Reddit Migration), Search Lemmy

founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] scrubbles@poptalk.scrubbles.tech 3 points 12 hours ago (3 children)

Anyone know:

How to rip a wiki from something like fandom and save it in a format that could be uploaded to this and

If that's legal in the first place?

[–] Cochise@lemmy.eco.br 4 points 9 hours ago

It's legal if credit is given and it's shared under CC-BY-SA.

https://www.fandom.com/licensing

Except where otherwise permitted, the text on Fandom communities (known as “wikis”) is licensed under the Creative Commons Attribution-Share Alike License 3.0 (Unported) (CC BY-SA).

[–] Nothing4You@programming.dev 3 points 11 hours ago

you might find some inspiration from https://breezewiki.com/ - either its codebase directly or using it as an intermediary while scraping

[–] PhilipTheBucket@ponder.cat 0 points 12 hours ago (1 children)
  1. Paste each article's raw source to ChatGPT, ask it to do it for you. If there are too many, you can automate it through the API for a negligible cost.
  2. Is it not.
[–] ComradeMiao@lemmy.dbzer0.com 5 points 11 hours ago (1 children)

Maybe also wget the website.

I’d be careful with using “ai.” Sometimes ChatGPT makes up answers even when you provide it with the data. -source it lies to me all the time

[–] PhilipTheBucket@ponder.cat 1 points 10 hours ago (1 children)

Converting from one format to another, it can do like gangbusters. I wouldn't trust it to summarize stuff from its training data, it can do a little bit better with summarizing stuff you give it, but just mechanically finding the text and putting it verbatim into a different markup it's pretty capable with.

[–] ComradeMiao@lemmy.dbzer0.com 3 points 9 hours ago (1 children)

Even reformatting has caused me issues. My best example is I gave it 100 citations in a non standardized format and asked for MLA. It returned 100 in MLA but randomly 10 of the books were made up. It decided to delete ten I sent at random and make them up instead of just giving me what I sent

[–] PhilipTheBucket@ponder.cat 1 points 9 hours ago (1 children)

Oh... yeah, you might have a point. Beyond a certain size of repeated things, it sometimes goes haywire, I've seen that.

[–] ComradeMiao@lemmy.dbzer0.com 2 points 9 hours ago (1 children)

I didn’t consider length! That’s a good point too

Another goofy example is I asked it a python question using a specific package import . I sent a big chunk of code. It answered using a package I wasn’t even importing breaking everything. It could never figure it out either lol

[–] PhilipTheBucket@ponder.cat 2 points 8 hours ago

Yeah, that kind of thing requires reasoning, and it goes awry almost immediately. It's still pretty useful for generating snippets of boilerplate or finding stuff in big chunks of code, but I more or less gave up on having it actually create anything nontrivial in code.