python os.walk() umlauts u'\u0308'

  • Last Update :
  • Techknowledgy :

I'm on a OSX machine and running Python 2.7. I'm trying to do a os.walk on a smb share.

for root, dirnames, filenames in os.walk("./test"):
   for filename in filenames:

   print filename

matchObj = re.match(r ".*ö.*", filename, re.UNICODE)

screenshot Expected:

filename.jpeg
filename_ö.jpg

Of course the regex fails with that. if i hardcode the filename like:

re.match(r ".*ö.*", 'filename_ö', re.UNICODE)

but gives me:

return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii'
codec can 't encode character u'
\u0308 ' in position 10: ordinal not in range(128)

Suggestion : 2

Last Updated : 07 Jul, 2022

Output:


Suggestion : 3

I use Linux Mint 13 (based on Ubuntu 12.04 LTS with Kernel 3.2.0-23), so this is far from antique. And as I already wrote I also tried that files on a Windows 7 VM. But of course I don't know what the one who created the zip files used. – cider Jan 13, 2013 at 15:59 ,I tried to fix this with "detox" but couldn't find a way to chain characters together. Based on the answer of @S2VpdGgA I made this compendium.,The name encoding of the zip files themselves on the FAT32 system are most likely not going to change or be fixed when you copy them to a proper supporing filesystem, but the subdirectories when decompressed should be fine.,Connect and share knowledge within a single location that is structured and easy to search.

A quick proof, if you read ruby, that displays as expected in my UTF-8 terminal:

$ ruby - e 'puts "u\xCC\x88"' | iconv - f cp437 - t utf - 8
u╠ ê
$ ruby - e 'puts "u\xCC\x88"'
ü

But really somebody may want to do this properly. There may be loads of other cases like "é", "à", "è" etc...

# # # # # #
# Preparation and tests
# You may need to extract your own character group from your filename,
if this gets lost via this web form.
# My reconstruction as follows:

   # note: the echo sends the chain of chars, copied from the console.

echo 'ä' | perl - pe 's/([^x\0-\x7f])/"\\x" . sprintf "%x", ord $1/ge'
echo 'ä' | sed - e 's/a\xcc\x88/ä/'

echo 'ö' | perl - pe 's/([^x\0-\x7f])/"\\x" . sprintf "%x", ord $1/ge'
echo 'ö' | sed - e 's/o\xcc\x88/ö/'

echo 'ü' | perl - pe 's/([^x\0-\x7f])/"\\x" . sprintf "%x", ord $1/ge'
echo 'ü' | sed - e 's/u\xcc\x88/ü/'

echo 'Ä' | perl - pe 's/([^x\0-\x7f])/"\\x" . sprintf "%x", ord $1/ge'
echo 'Ä' | sed - e 's/A\xcc\x88/Ä/'

echo 'Ö' | perl - pe 's/([^x\0-\x7f])/"\\x" . sprintf "%x", ord $1/ge'
echo 'Ö' | sed - e 's/O\xcc\x88/Ö/'

echo 'Ü' | perl - pe 's/([^x\0-\x7f])/"\\x" . sprintf "%x", ord $1/ge'
echo 'Ü' | sed - e 's/U\xcc\x88/Ü/'

# Final version
# test all at once
echo 'Ä' | sed - e 's/a\xcc\x88/ä/' | sed - e 's/o\xcc\x88/ö/' | sed - e 's/u\xcc\x88/ü/' | sed - e 's/A\xcc\x88/Ä/' | sed - e 's/O\xcc\x88/Ö/' | sed - e 's/U\xcc\x88/Ü/'

# wrap into a recursion
# note: not recursive as - is because folder can change

cd / path / to / dir
find. - maxdepth 1 |
   while read FILE;
do
   newfile = "$(echo ${FILE} | sed -e 's/a\xcc\x88/ä/' | sed -e 's/o\xcc\x88/ö/' | sed -e 's/u\xcc\x88/ü/' | sed -e 's/A\xcc\x88/Ä/' | sed -e 's/O\xcc\x88/Ö/' | sed -e 's/U\xcc\x88/Ü/')";
echo mv - T "${FILE}"
"${newfile}";
done

#(remove the 'echo '
   to actually make changes)
# # # # # # #

Suggestion : 4

You may not have access to this page. You can retry after signing in.Sign in

You may not have access to this page. You can retry after signing in.
Sign in

[type.googleapis.com / google.rpc.LocalizedMessage]
locale: "en-US"
message: "Cannot parse URL as a Gitiles URL"

[type.googleapis.com / google.rpc.RequestInfo]
request_id: "1f86b6a130564cafa27c633d9e4fa964"