Maybe search for this on kaggle? Or scrape Wikipedia?
Programming
Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!
Cross posting is strongly encouraged in the instance. If you feel your post or another person's post makes sense in another community cross post into it.
Hope you enjoy the instance!
Rules
Rules
- Follow the programming.dev instance rules
- Keep content related to programming in some way
- If you're posting long videos try to add in some form of tldr for those who don't want to watch videos
Wormhole
Follow the wormhole through a path of communities [email protected]
You don’t even need to scrape Wikipedia. Simply download all of Wikipedia text only and you could match on articles. It’s only like 20 GB or even less for certain database dumps.
Thank you for recommending kaggle! I found just what I needed there (actually way more data than I needed!). Here's what I went with, if it's useful to anyone else: https://www.kaggle.com/datasets/mfrye0/bigpicture-company-dataset
A lot of public libraries offer access to ReferenceUSA through your library card. I vaguely remember that queries are pretty customizable on there, and exportable to various formats. Despite the generic name, it's specifically for businesses. Would that work?
Thank you for this suggestion, I had never heard of it! I was able to access it via my library, and easily searched up companies with websites, but then it made me manually click which results to download (wouldn't let me export the results as one large file). Keeping it bookmarked though, seems like a great resource for other use cases.
Could take a subset and do ICANN lookups. Not sure about doing that at scale though.