NFG is a user on nfg.zone. You can follow them or interact with them if you have an account anywhere in the fediverse.
NFG @NFG

OK let's see if this works.

LINUX QUESTION!

I'm migrating a 15 year old Windows webserver to Linux. That's 15 years of not caring about case sensitivity in filenames, which I predict will result in endless 404s when nginx & Linux take over.

What's a nice elegant way to handle this, short of editing and renaming 15 years of stuff until the 404s go away?

· Web · 1 · 1

@NFG There should be some quality bulk file naming tools out there.

If you're feeling really IT adminy you could put together a Python/Ruby/NOT Bash language and walk the file system looking for collisions.

I'd recommend a little googling to see what may already be existing.

You could also toss your hands up and just eyeball the logs in nginx for a few years 😉

@kemonine @NFG Quick and dirty: stackoverflow.com/questions/15

... but this solves only half of the problem. It won't fix broken links in your web pages.

@thomas @NFG I'd STRONGLY recommend NOT using REGEX to fixup any URLs in HTML code...

I've attempted it before and, well, you don't want to do that.

I'd look at beautiful soup and other libraries to re-work the HTML links. They cover a LOT of corner cases you won't want to be figuring out as you fixup URLs.

@thomas Sharing my non-recursive mvlow.sh:

#!/bin/sh

for f in $(find -type f); do
t=$(echo $f | tr '[A-Z]' '[a-z]')

if [ "$f" = "$t" ]
then
echo "$f is already lower-case."
elif [ -e $t ]
then
echo "$t does already exist."
else
mv -v $f $t
fi
done

It shouldn't be that hard to expand it renaming recursively.

@NFG Back in the Apache days, there was a mod for that. nginx may have an equivalent, maybe searching on nginx mod_speling might help? httpd.apache.org/docs/current/

@zigg @NFG if you go with case insensitivity on the URL side won't that limit options going forward?

@kemonine @NFG Technically, yes, but I wouldn't really care to serve content containing both "fOO" and "Foo" referring to entirely different resources

@zigg @NFG I agree, probably not something too concerning.

However, if there are php-fpm or uwsgi apps that get lit up under the hood or behind the nginx config case sensitivity will likely be a lot more important suddenly.

Might be smart to migrate to a more typical case sensitive approach and front-load the pain as part of the migration rather than being bitten in some creative way further down the line.

@kemonine @NFG I wouldn't expect a mod that could find close matches for static content out of a directory of files to interfere with the operation of any proxy-style connector.

Mind, I won't say it could never happen 😄 but I'd consider that buggy behavior. Trying to fix casing and spelling in requests requires a list of possibilities to match to, something that a directory full of static content supplies and a proxy-style connector does not.

@kemonine @zigg

Ulitimately I think the fix has to be part of the 404 handling, otherwise it'll be a massive overhead and/or time vampire changing every file and link everywhere.

I guess I just have to do the migration and see how bad the problem will actually be.

@NFG @zigg That might work. I can see that being somewhat resource intense over time.

Could you bake a rename and re-serve operation into the search? It'll prevent problems going forward and self-heal in time.

@kemonine @zigg

The problem with this self healing is those times when a file is linked as two different names, it'll be renamed back and forth forever.

Which is not really a problem as such, I honestly don't expect traffic to be so demanding it'll cause trouble... 😊

@NFG @zigg Fair point, however you could keep an eye out for flapping and just symlink / update a link at that point. I'd imagine that the flapping would be pretty minimal relative to just case insensitive problems.

@zigg
@kemonine
@galaxis
@thomas

Thank you all for the suggestions, you've been incredibly helpful. <3

I suspect the number of things to fix will be fewer if I don't slam every file into lower case, because that's going to break -every- mixed case link on dozens of scripts and platforms.

Perhaps something in the 404 processing that scans the dir to an array, then strtolower both request and the files, and return the match..?

@NFG The solutions I have seen usually use the nginx Perl module to rewrite the request URL to all lowercase (provided you have renamed all the files accordingly and there are no collisions).

@galaxis What he said ^

Do rename 'y/A-Z/a-z/' * to rename everything in lower case, then use the perl module to do something ugly, like stackoverflow examples stackoverflow.com/questions/36.

Then fix it properly. This might work, but it's a dirty ugly hack, and dirty ugly hacks always break eventually.