Not sure this is the right sub for this but I figured this community would appreciate the purpose of the script.
I recently downloaded ~80k epubs (zip folder, no way to pre-select what I wanted). I didn't want to keep ALL of them, but I also didn't want to go through them one by one. I spent the last few days chatting with chatgpt to get a working script, and now I want to make it more efficient. Right now it takes about 3hr to process 1000 books, so 80k would take a few days.
In the readme I outline the flow of the script. It uses a LLM to clean up filenames, and passes them to GoodReads to parse genres and save them in a txt file. Then the txt files are used in a separate GUI script to filter, delete, and move the epubs by genre.
From what I can tell, the main slowdown is being caused by the way selenium webdriver and beautifulsoup are being implemented.
Here is the github repo -
https://github.com/secretlycarl/epub_filter_tool
And the file I'm looking for advice about - https://github.com/secretlycarl/epub_filter_tool/blob/main/grsearch/grsearch.py