ASPN ActiveState Programmer Network
ActiveState
/ Home / Perl / PHP / Python / Tcl / XSLT /
/ Safari / My ASPN /
Cookbooks | Documentation | Mailing Lists | Modules | News Feeds | Products | User Groups


Recent Messages
List Archives
About the List
List Leaders
Subscription Options

View Subscriptions
Help

View by Topic
ActiveState
.NET Framework
Open Source
Perl
PHP
Python
Tcl
Web Services
XML & XSLT

View by Category
Database
General
SOAP
System Administration
Tools
User Interfaces
Web Programming
XML Programming


MyASPN >> Mail Archive >> activepython
activepython
String handling broken?
by Fuzzier other posts by this author
Jun 23 2007 11:13PM messages near this date
view in the new Beta List Site
Testing string encoding | RE: String handling broken?
Iâ??m currently working on Japanese version of windows XP.

Yesterday, I tried to use pythonâ??s (ver 2.5.1.1) regular expression service to batch renam
e files in a directory, but the problem occurred: python failed to match file names.

 

Then I made some small scripts, and found that python canâ??t handle string very well (or ma
ybe this is intended?).

 

The test environment:

1. An empty directory is made.

2. A file named â??ã??ã?¤ã?½ã?³.txtâ? is created under the directory.

3. Python scripts are placed and run under the directory.

4. All python scripts are written by Notepad. Because python cannot recognize Unicode text f
ile, so I test 2 file encodings: ANSI and UTF-8.

 

################ Script ############################################

# -*- encoding: shift_jis -*-

# script1.py (saved as ansi text file)

 

import os, re

 

def rename():

      pattern = 'ã??ã?¤ã?½ã?³\.txt'     # ANSI

      print 'pattern: ', pattern

 

      myre = re.compile(pattern)

      for f in os.listdir('.'):

           m = myre.match(f)

           if m != None: print f, ': match!'

           else: print f, ': doesn\'t match!'

 

rename()

################# Output ###########################################

pattern:  ã??ã?¤ã?½ã?³\.txt

ã??ã?¤ã?½ã?³.txt : doesn't match!

 

################ Script ############################################

# -*- encoding: shift_jis -*-

# script2.py (saved as ansi text file)

 

import os, re

 

def rename():

      pattern = u'ã??ã?¤ã?½ã?³\.txt'    # Unicode

      print 'pattern: ', pattern

 

      myre = re.compile(pattern)

      for f in os.listdir('.'):

           m = myre.match(f)

           if m != None: print f, ': match!'

           else: print f, ': doesn\'t match!'

 

rename()

################# Output ###########################################

pattern:  ã??ã?¤ã?½ã?³\.txt

ã??ã?¤ã?½ã?³.txt : doesn't match!

 

################ Script ############################################

# script3.py (saved as UTF-8 text file)

 

import os, re

 

def rename():

      pattern = 'ã??ã?¤ã?½ã?³\.txt'     # ANSI

      print 'pattern: ', pattern

 

      myre = re.compile(pattern)

      for f in os.listdir('.'):

           m = myre.match(f)

           if m != None: print f, ': match!'

           else: print f, ': doesn\'t match!'

 

rename()

################# Output ###########################################

pattern:  繝代�繧ス繝ウ\.txt

ã??ã?¤ã?½ã?³.txt : doesn't match!

(pattern is shown as unrecognizable characters)

 

################ Script ############################################

# script4.py (saved as UTF-8 text file)

 

import os, re

 

def rename():

      pattern = u'ã??ã?¤ã?½ã?³\.txt'    # Unicode

      print 'pattern: ', pattern

 

      myre = re.compile(pattern)

      for f in os.listdir('.'):

           m = myre.match(f)

           if m != None: print f, ': match!'

           else: print f, ': doesn\'t match!'

 

rename()

################# Output ###########################################

pattern:  ã??ã?¤ã?½ã?³\.txt

ã??ã?¤ã?½ã?³.txt : doesn't match!

 
Thread:
Fuzzier
Ueta Masayuki
Fuzzier
Terry Carroll

Privacy Policy | Email Opt-out | Feedback | Syndication
© ActiveState Software Inc. All rights reserved