605
submitted 1 week ago by [email protected] to c/[email protected]
you are viewing a single comment's thread
view the rest of the comments
[-] [email protected] 19 points 1 week ago* (last edited 1 week ago)

Case-sensitive is easier to implement; it's just a string of bytes. Case-insensitive requires a lot of code to get right, since it has to interpret symbols that make sense to humans. So, something over wondered about:

That's not hard for ASCII, but what about Unicode? Is the precomposed ç treated the same lexically and by the API as Latin capital letter c + combining cedilla? Does the OS normalize all of one form to the other? Is ß the same as SS? What about alternate glyphs, like half width or full width forms? Is it i18n-sensitive, so that, say, E and É are treated the same in French localization? Are Katakana and Hiragana characters equivalent?

I dunno, as a long-time Unix and Linux user, I haven't tried these things, but it seems odd to me to build a set of character equivalences into the filesystem code, unless you're going to do do all of them. (But then, they're idiosyncratic and may conflict between languages, like how ö is its letter in the Swedish alphabet.)

[-] [email protected] 2 points 1 week ago

This thread is giving me flashbacks to the times before Unicode, when swapping files between Windows and Linux partitions would have a good chance of fucking up every non-ASCII characters in their names.

There was ways to set it up so the ISO character sets would match, but it was still a giant pain to deal with different ones.

Blessed be Unicode.

[-] [email protected] 2 points 1 week ago

A related issue I still see very often, even with files newly created just this year, is when trying to extract zip files on my Linux systems that contain non-ASCII filenames and that were created on Windows systems, especially ones with apparently non-English locales like Japanese. Need to trial and error the locale I give to unzip and sometimes hack together fixed names with iconv until the mojibake seems to fix itself.

load more comments (2 replies)
this post was submitted on 06 Sep 2024
605 points (90.3% liked)

linuxmemes

20728 readers
1372 users here now

I use Arch btw


Sister communities:

Community rules

  1. Follow the site-wide rules and code of conduct
  2. Be civil
  3. Post Linux-related content
  4. No recent reposts

Please report posts and comments that break these rules!

founded 1 year ago
MODERATORS