this post was submitted on 02 Oct 2024
10 points (81.2% liked)

Linux

8111 readers
59 users here now

Welcome to c/linux!

Welcome to our thriving Linux community! Whether you're a seasoned Linux enthusiast or just starting your journey, we're excited to have you here. Explore, learn, and collaborate with like-minded individuals who share a passion for open-source software and the endless possibilities it offers. Together, let's dive into the world of Linux and embrace the power of freedom, customization, and innovation. Enjoy your stay and feel free to join the vibrant discussions that await you!

Rules:

  1. Stay on topic: Posts and discussions should be related to Linux, open source software, and related technologies.

  2. Be respectful: Treat fellow community members with respect and courtesy.

  3. Quality over quantity: Share informative and thought-provoking content.

  4. No spam or self-promotion: Avoid excessive self-promotion or spamming.

  5. No NSFW adult content

  6. Follow general lemmy guidelines.

founded 1 year ago
MODERATORS
 

Hey, how would you go about this:

I have let’s say hundreds of files, most of which contain some emoji characters in the filenames. How to script - or if an app can do it great! - parsing all these files and removing those … idiotic characters.

Not for nothing but yeah Unicode is great, lots of languages yada yada, but emojis? Fucking emojis??? Ian Malcom thinks just because we could, doesn’t mean we should!

So yeah. Back in the day when I did some developing in VB, I guess I’d load the filenames into memory as strings and then do an instring replacement to null of any character that is within the char() range. So… if I could find out the range wherein lay the damned-to-hades-for-eternity emoji character set, I’d null those out or replace them each with an E for Evil.

So… anyone know the easy approach to scripting this? Or is there an app that will already do it?

I’m gonna look through all the options in krename kfind etc. all those but I doubt any of them has this.

Anyway thanks if you have any ideas. Especially something I can save and just use on a directory of files anytime.

top 6 comments
sorted by: hot top controversial new old
[–] [email protected] 5 points 1 month ago (1 children)

I've got a couple sources that can probably explain it better than I can, but here's the gist:

  • Walk through each file you want to read
  • Walk through each line you want to read
  • Replace emojis using RegEx

The tricky part seems to be defining the range of characters you want to replace. Most emoji are in a particular block of unicode, but there could be a few outliers particularly if you want to remove kaomoji.

Here's a Stack Overflow link where there's some discussion on what that range will be.

Here's a Python script that shows how you might do it for one file.

Good luck!

[–] [email protected] 4 points 1 month ago

That Python script is awesome. Thanks for sharing. I might try to add that to my LLM text post processor.

It is nice to make an AI assistant that is more conversational, but if it adds a single emoji, all bets are off. The LLM emoji cancer is terminal once started.

[–] [email protected] 3 points 1 month ago* (last edited 1 month ago) (1 children)

Any renamer can do this with a trivial regular expression. What have you tried so far?

[–] [email protected] 0 points 1 month ago

I haven’t tried because I don’t know exactly how to go about it. I said the ideas I think would conceivably work but I don’t know the specifics to make it happen.

[–] [email protected] 1 points 1 month ago* (last edited 1 month ago) (1 children)

See if the command line utility “detox” removes it. I use it in a bash file that tidies everything up for me before I do my daily backup.

Go to the directory you want to clean up and type:

detox *

If you don't want to clean up everything, you can limit it. For example, if I only want to clean up PDF files, I'd do:

detox *.pdf

Detox replaces all the ugly characters with underscores. I personally hate underscores for most everything and use dashes instead. So after detox runs I also run:

rename ‘y/_/-/’ *

Then, because I want every file named only with lowercase letters, I also run:

rename ‘y/A-Z/a-z/’ *

(edit to remove duplicate info)

[–] [email protected] 1 points 1 month ago

Detox. Cool I’ll check it out. Thank you!