Issue
In the "string" module of the standard library,
string.ascii_letters ## Same as string.ascii_lowercase + string.ascii_uppercase
is
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
Is there a similar constant which would include everything that is considered a letter in unicode?
Solution
You can construct your own constant of Unicode upper and lower case letters with:
import unicodedata as ud
all_unicode = ''.join(unichr(i) for i in xrange(65536))
unicode_letters = ''.join(c for c in all_unicode
if ud.category(c)=='Lu' or ud.category(c)=='Ll')
This makes a string 2153 characters long (narrow Unicode Python build). For code like letter in unicode_letters
it would be faster to use a set instead:
unicode_letters = set(unicode_letters)
Answered By - Mark Tolonen
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.