String handling broken?
by Fuzzier other posts by this author
Jun 23 2007 11:13PM messages near this date
view in the new Beta List Site
Testing string encoding
|
RE: String handling broken?
Iâ??m currently working on Japanese version of windows XP.
Yesterday, I tried to use pythonâ??s (ver 2.5.1.1) regular expression service to batch renam
e files in a directory, but the problem occurred: python failed to match file names.
Then I made some small scripts, and found that python canâ??t handle string very well (or ma
ybe this is intended?).
The test environment:
1. An empty directory is made.
2. A file named â??ã??ã?¤ã?½ã?³.txtâ? is created under the directory.
3. Python scripts are placed and run under the directory.
4. All python scripts are written by Notepad. Because python cannot recognize Unicode text f
ile, so I test 2 file encodings: ANSI and UTF-8.
################ Script ############################################
# -*- encoding: shift_jis -*-
# script1.py (saved as ansi text file)
import os, re
def rename():
pattern = 'ã??ã?¤ã?½ã?³\.txt' # ANSI
print 'pattern: ', pattern
myre = re.compile(pattern)
for f in os.listdir('.'):
m = myre.match(f)
if m != None: print f, ': match!'
else: print f, ': doesn\'t match!'
rename()
################# Output ###########################################
pattern: ã??ã?¤ã?½ã?³\.txt
ã??ã?¤ã?½ã?³.txt : doesn't match!
################ Script ############################################
# -*- encoding: shift_jis -*-
# script2.py (saved as ansi text file)
import os, re
def rename():
pattern = u'ã??ã?¤ã?½ã?³\.txt' # Unicode
print 'pattern: ', pattern
myre = re.compile(pattern)
for f in os.listdir('.'):
m = myre.match(f)
if m != None: print f, ': match!'
else: print f, ': doesn\'t match!'
rename()
################# Output ###########################################
pattern: ã??ã?¤ã?½ã?³\.txt
ã??ã?¤ã?½ã?³.txt : doesn't match!
################ Script ############################################
# script3.py (saved as UTF-8 text file)
import os, re
def rename():
pattern = 'ã??ã?¤ã?½ã?³\.txt' # ANSI
print 'pattern: ', pattern
myre = re.compile(pattern)
for f in os.listdir('.'):
m = myre.match(f)
if m != None: print f, ': match!'
else: print f, ': doesn\'t match!'
rename()
################# Output ###########################################
pattern: ç¹ä»£ã?ç¹§ï½½ç¹ï½³\.txt
ã??ã?¤ã?½ã?³.txt : doesn't match!
(pattern is shown as unrecognizable characters)
################ Script ############################################
# script4.py (saved as UTF-8 text file)
import os, re
def rename():
pattern = u'ã??ã?¤ã?½ã?³\.txt' # Unicode
print 'pattern: ', pattern
myre = re.compile(pattern)
for f in os.listdir('.'):
m = myre.match(f)
if m != None: print f, ': match!'
else: print f, ': doesn\'t match!'
rename()
################# Output ###########################################
pattern: ã??ã?¤ã?½ã?³\.txt
ã??ã?¤ã?½ã?³.txt : doesn't match!
Thread:
Fuzzier
Ueta Masayuki
Fuzzier
Terry Carroll
|