The story behind fixnames.sh is worth telling : as the FP was working as a network- and sys-admin in a small IT company near Lausanne, his boss informed him he was to stay for the entire weekend. This meant from Friday morning to Sunday evening, and possibly until Monday morning, because a client had decided to change from MacOS systems to Windows. While nobody in their right mind would ever want to do this, the boss would take MacOS’s file manager to drag-and-drop files from one network share to another while requesting the client stop all activity from Friday noon onwards. The reason is that most employees were basically computer illiterates, had no file naming convention whatsoever and would frequently insert pseudo-random characters so that M$ Windows would go berzerk (it’s quite restrictive on filenames). And using Apple’s file manager (At the time, Windows Explorer would behave the same) the thing would just throw an obscure “copy failed” error and halt with no further details ; so the boss’ procedure would be to recursively (and manually) enter the directory, and copy each and every file and subdir until he found which filename was causing the error.
My heart almost stopped.
At this point, it was Monday and I had a few days ahead, so I told him I was putting aside all my current tasks to write a script to automate that. He was very dubious and tried to object, but I told him I refused to stay for the weeked if I didn’t get a chance to try.
So in the course of the next few days, I wrote this script ; what took me the longuest was to run it on a copy of the client’s 12k or so files, discovering the absurdities users would come up with (such as duplicating files (this script sort-of fixes them) so I could write a suitable regex.
On Thursday afternoon and after a final check run on other client’s backups, the script was ready and demonstrated to the boss : more than 12k files were checked and renamed at a rate of around 1k files a minute, additionally freeing some disk space. From then on, my boss never objected to my suggestions and made me join the dev team.
The script follows ; its features are:
- recursively run a regexp on dirnames, then filenames to rename them
- check for collisions
- delete file duplicates
- use an alternate name if the content is not the same
- dry-run option (with some limitations)
- verbose output and logging
Script content
(the above title was added because of an issue while parsing the current page’s .md file)
#!/bin/bash
#
# superbe script magique pour corriger les noms de fichier qui peuvent poser problème
# avec windobe mais qui sont valables sous mac
#
# créé pour les migrations netatalk -> samba
#
# auteur: david.lutolf@adbin.ch
# date: 2008-08-20
# licence: GPL v2
# modifications:
# 2008-08-21, david@adbin.ch
# petit fix youpie pour les espaces un fin de nom
# 2008-08-22, david@adbin.ch
# utilisation de $REPLAY pour les espaces en fin de nom
# remplacement de ' par `
# option --pretend
# enlever les logs inutiles
# vérifier si la destination existe (avec arrêt pour les dossiers)
# jolies couleurs pour la sortie
# 2008-08-25, david@adbin.ch
# espaces multiples
# teste un nom supplémentaire si le premier est indispo
# vérifie si des fichiers portant un nom semblable sont identiques
# 2009-06-21, david@lutolf.net
# correction du comptage d'erreurs et de fichiers
#
#
# arguments:
# --pretend doit-on simplement simuler?
# target directory to recurse in
#
# limitations/bugs:
# problèmes si le nom de fichier n'est composé que d'un caractère invalide
# the --pretend option will output more results than a real run, simply because
# invalid but unmodified dir names will get detected in file paths
# ne remplace pas les \
# ne vérifie pas les duplicatas de noms insensibles à la casse (sous win, Foo = foo)
# (utiliser casefix.sh à cette fin) - en fait je crois que oui, à vérifier svp
# path given in argument must not contain spaces or script will break
# the target directory's name must not be changed by the rules (eg. Tmp > tmp)
#
# liste des choses qui posent problèmes dans les noms
# caractères à remplacer par -
# :2f \
# caractères à éliminer:
# ? * ' : <espaces en début/fin de nom> <. en fin de nom> <espaces multiples>
# caractères bizzares qu'on doit laisser tels quels
# :2e :2f2e
# caractères qui posent problème:
# \
# chaîne utilisée par sed lors des substitutions. PAS UTILISÉ ACTUELLEMENT, MODIFIER LA CHAINE PLUS BAS DANS LE CODE
SEDARGS1="-e s/:2f2e/KEEP_2F2E_KEEP/g -e s/:2e/KEEP_2E_KEEP/g -e s/:2f/-/g -e s/\ \ */\ /g -e s/\!//g -e s/\?//g -e s/[*]//g -e s/://g -e s/[\ ]$// -e s/[.]*$// -e s/[\/][\ ]/\\\// -e s/\'/\\\`/g -e s/[\\\]/-/g -e s/KEEP_2F2E_KEEP/:2f2e/g -e s/KEEP_2E_KEEP/:2e/g"
#SEDARGS1="-e s/:2f2e/KEEP_2F2E_KEEP/g"
#SEDARGS1="-e s/:2f2e/KEEP_2F2E_KEEP/g -e s/:2e/KEEP_2E_KEEP/g -e s/:2f/-/g -e s/!//g -e s/\?//g -e s/[*]//g -e s/://g -e s/[\ ]$// -e s/[.]*$// -e s/[\/][\ ]/\\\// -e s/\'/\\\`/g -e s/[\\\]/-/g -e s/KEEP_2F2E_KEEP/:2f2e/g -e s/KEEP_2E_KEEP/:2e/g"
# au cas où un nom de fichier existe, on essaie avec ça: (on remplace la plupart des char par _ au lieu de les supprimer)
SEDARGS2="-e s/:2f2e/KEEP_2F2E_KEEP/g -e s/:2e/KEEP_2E_KEEP/g -e s/:2f/-/g -e s/\ \ */\ /g -e s/!//g -e s/\?/_/g -e s/[*]/_/g -e s/:/_/g -e s/[\ ]$/_/ -e s/[.]*$/_/ -e s/[\/][\ ]/\\\_// -e s/\'/\\\`/g -e s/[\\\]/-/g -e s/KEEP_2F2E_KEEP/:2f2e/g -e s/KEEP_2E_KEEP/:2e/g"
#SEDARGS2="-e s/:2f2e/KEEP_2F2E_KEEP/g -e s/:2e/KEEP_2E_KEEP/g -e s/:2f/-/g -e s/!/_/g -e s/\?/_/g -e s/[*]/_/g -e s/:/_/g -e s/[\ ]$/_/ -e s/[.]*$/_/ -e s/[\/][\ ]/\\\_// -e s/\'/\\\`/g -e s/[\\\]/-/g -e s/KEEP_2F2E_KEEP/:2f2e/g -e s/KEEP_2E_KEEP/:2e/g"
#echo "$SEDARGS1"
#echo "$SEDARGS2"
if [ $# -lt 1 ] || [ $# -gt 2 ]
then
echo "usage: fixnames [--pretend] target > logfile"
exit 1
fi
if [ $1 == '--pretend' ]
then
PRETEND=true
TARGET=$2
else
PRETEND=false
TARGET=$1
fi
# fichiers utilisés pour la conversion
TMPLIST=/tmp/fixnames_tmp
ERRLOG=/tmp/fixnames.err
TIMESTART=`date +%s`
# 1ère étape, on commence par chercher les répertoires récursivement
echo -e "\033[1;33m*\033[0;37m starting..." 1>&2
DEPTH=1
DIRTOT=0
DIRMOD=0
DIRERR=0
DIRSIM=0
echo -ne "\033[1;33m*\033[0;37m processing directories, level: " 1>&2
while find $TARGET -maxdepth $DEPTH -mindepth $DEPTH -type d | grep \.. > $TMPLIST
do
echo -n "$DEPTH " 1>&2
while read
do
DIRTOT=`expr $DIRTOT + 1`
#NEWNAME=`echo "$REPLY" | sed "$SEDARGS1"`
# ORIGIN/WORKING: NEWNAME=`echo "$REPLY" | sed -e s/:2f2e/KEEP_2F2E_KEEP/g -e s/:2e/KEEP_2E_KEEP/g -e s/\ \ */\ /g -e s/:2f/-/g -e s/\?//g -e s/[*]//g -e s/://g -e s/[\ ]$// -e s/[.]*$// -e s/[\/][\ ]/\\\// -e s/\'/\\\`/g -e s/[\\\]/-/g -e s/KEEP_2F2E_KEEP/:2f2e/g -e s/KEEP_2E_KEEP/:2e/g`
# very simplified version:
NEWNAME=`echo "$REPLY" | sed -e s/__*/_/g -e s/\ \ */\ /g -e s/\ /_/g -e s/[A-Z]/"\L&"/g -e s/_$//`
#'` # color fix
# on ne fait les tests suivant que si le nom a été modifié
if [ "$REPLY" != "$NEWNAME" ]
then
# on vérifie si un répertoire du même nom existe
if test -d "$NEWNAME"
then
# le nom existe déjà, on essaye le nom alternatif
DIRSIM=$(($DIRSIM+1))
#NEWNAME=`echo "$REPLY" | sed "$SEDARGS2"`
#NEWNAME=`echo "$REPLY" | sed -e s/:2f2e/KEEP_2F2E_KEEP/g -e s/:2e/KEEP_2E_KEEP/g -e s/:2f/-/g -e s/\ \ */\ /g -e s/\?/_/g -e s/[*]/_/g -e s/://g -e s/[\ ]$/_/ -e s/[.]*$/_/ -e s/[\/][\ ]/\\\// -e s/\'/\\\`/g -e s/[\\\]/-/g -e s/KEEP_2F2E_KEEP/:2f2e/g -e s/KEEP_2E_KEEP/:2e/g`
NEWNAME=`echo "$REPLY" | sed -e s/__*/_/g -e s/\ \ */\ /g -e s/\ /_/g -e s/[A-Z]/"\L&"/g`_
#'` # color fix
if test -d "$NEWNAME"
then
# le nom alternatif existe également (très peu probable)
echo -e "\n\033[0;31mE: could not move '$REPLY'\033[0;37m" 1>&2
echo "alt name already exists for '$REPLY'" >>$ERRLOG
DIRERR=`expr $DIRERR + 1`
else
# ok, on peut renommer
if [ $PRETEND == true ]
then
echo "'$REPLY' -> '$NEWNAME'"
DIRMOD=`expr $DIRMOD + 1`
else
if mv -v "$REPLY" "$NEWNAME" 2>>$ERRLOG
then
DIRMOD=`expr $DIRMOD + 1`
else
FILERR=$((DIRERR++))
fi
fi
fi
else
# ok, on renomme
if [ $PRETEND == true ]
then
echo "'$REPLY' -> '$NEWNAME'"
DIRMOD=`expr $DIRMOD + 1`
else
if mv -v "$REPLY" "$NEWNAME" 2>>$ERRLOG
then
DIRMOD=`expr $DIRMOD + 1`
else
FILERR=$((DIRERR++))
fi
fi
fi
fi
done < $TMPLIST
# on arrête si des dossiers doivent être modifiés
if [ $DIRERR != 0 ] && [ $PRETEND != true ]
then
break 2;
fi
DEPTH=`expr $DEPTH + 1`
done
# 2ème étape, les noms de fichier
FILEERR=0
FILECUR=0
FILEMOD=0
FILEREM=0
FILESIM=0
if [ $DIRERR == 0 ] || [ $PRETEND == true ]
then
echo -en "\n* processing files: " 1>&2
find $TARGET -type f > $TMPLIST
# 3ème étape, on renomme les fichiers en parsant les listes de noms
FILETOT=`wc -l $TMPLIST | cut -f 1 -d ' '`
while read
do
#NEWNAME=`echo "$REPLY" | sed "$SEDARGS1"`
# ORIGINAL/WORKING: NEWNAME=`echo "$REPLY" | sed -e s/:2f2e/KEEP_2F2E_KEEP/g -e s/:2e/KEEP_2E_KEEP/g -e s/:2f/-/g -e s/\ \ */\ /g -e s/\?//g -e s/[*]//g -e s/://g -e s/[\ ]$// -e s/[.]*$// -e s/[\/][\ ]/\\\// -e s/\'/\\\`/g -e s/[\\\]/-/g -e s/KEEP_2F2E_KEEP/:2f2e/g -e s/KEEP_2E_KEEP/:2e/g`
# very simplified version:
NEWNAME=`echo "$REPLY" | sed -e s/__*/_/g -e s/\ \ */\ /g -e s/\ /_/g -e s/[A-Z]/"\L&"/g -e s/_$//`
#'` # color fix
# on ne fait les tests suivant que si le nom a été modifié
if [ "$REPLY" != "$NEWNAME" ]
then
if test -f "$NEWNAME"
then
# le nom existe déjà, on vérifie si les fichiers sont identiques
FILESIM=$(($FILESIM+1))
#echo -ne "\n - diff '$REPLY' '$NEWNAME'" >&2
if diff "$REPLY" "$NEWNAME" > /dev/null
then
# même fichier, on supprime
# echo " SAME FILE, REMOVING" >&2
rm "$NEWNAME"
FILEREM=$(($FILEREM+1))
else
# pas le même contenu, on essaie le nom alternatif
# echo -n " DIFFERENTS!" >&2
#NEWNAME=`echo "$REPLY" | sed "$SEDARGS2"`
#NEWNAME=`echo "$REPLY" | sed -e s/:2f2e/KEEP_2F2E_KEEP/g -e s/:2e/KEEP_2E_KEEP/g -e s/:2f/-/g -e s/\ \ */\ /g -e s/\?/_/g -e s/[*]/_/g -e s/://g -e s/[\ ]$/_/ -e s/[.]*$/_/ -e s/[\/][\ ]/\\\// -e s/\'/\\\`/g -e s/[\\\]/-/g -e s/KEEP_2F2E_KEEP/:2f2e/g -e s/KEEP_2E_KEEP/:2e/g`
#'` # color fix
NEWNAME=`echo "$REPLY" | sed -e s/__*/_/g -e s/\ \ */\ /g -e s/\ /_/g -e s/[A-Z]/"\L&"/g`_
if test -f "$NEWNAME"
then
# le nom alternatif existe également (très peu probable)
echo -en "\r\033[0;31mE: could not move '$REPLY'\033[0;37m\n" 1>&2
echo "alt name already exists for '$REPLY'" >>$ERRLOG
FILEERR=`expr $FILEERR + 1`
else
# echo " ALTERNATIVE OK!" >&2
# ok, on peut renommer
if [ $PRETEND == true ]
then
echo "'$REPLY' -> '$NEWNAME'"
FILEMOD=`expr $FILEMOD + 1`
else
if mv -v "$REPLY" "$NEWNAME" 2>>$ERRLOG
then
FILEMOD=`expr $FILEMOD + 1`
else
FILERR=$((FILEERR++))
fi
fi
fi
fi
else
# ok, on renomme
if [ $PRETEND == true ]
then
echo "'$REPLY' -> '$NEWNAME'"
FILEMOD=`expr $FILEMOD + 1`
else
if mv -v "$REPLY" "$NEWNAME" 2>>$ERRLOG
then
FILEMOD=`expr $FILEMOD + 1`
else
FILERR=$((FILEERR++))
fi
fi
fi
fi
FILECUR=`expr $FILECUR + 1`
echo -en "\r\033[1;33m*\033[0;37m processing files: $FILECUR / $FILETOT" 1>&2
done < $TMPLIST
fi
# RAPPORT DE FIN #
rm $TMPLIST
TIMESTOP=`date +%s`
echo -e "\n\033[1;33m*\033[0;37m done. Operation completed in $(($TIMESTOP-$TIMESTART)) seconds" 1>&2
echo " $DIRTOT directories processed, $DIRMOD renamed, $DIRSIM with similar names, $DIRERR errors." 1>&2
echo " $FILECUR files processed, $FILEMOD renamed, $FILESIM with similar names, $FILEREM removed, $FILEERR errors." 1>&2
if [ $DIRERR -ge 1 ]
then
echo -e " \033[0;31msome directories could not be renamed. fix manually and rerun script. see '$ERRLOG' for details\033[0;37m" 1>&2
exit 3
fi
if [ $FILEERR -ge 1 ]
then
echo -e " \033[0;31msome files could not be renamed. see '$ERRLOG' for details\033[0;37m" 1>&2
exit 2
fi