Issue
I've got strings that look something like this:
a = "testing test<U+00FA>ing <U+00F3>"
Format will not always be like that, but those unicode characters in brackets will be scattered throughout the code. I want to turn those into the actual unicode characters they represent. I tried this function:
def replace_unicode(s):
uni = re.findall(r'<U\+\w\w\w\w>', s)
for a in uni:
s = s.replace(a, f'\u{a[3:7]}')
return s
This successfully finds all of the <U+> unicode strings, but it won't let me put them together to create a unicode escape in this manner.
File "D:/Programming/tests/test.py", line 8
s = s.replace(a, f'\u{a[3:7]}')
^
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-1: truncated \uXXXX escape
How can I create a unicode escape character using an f-string, or via some other method with the information I'm getting from strings?
Solution
You can use an f-string to create an appropriate argument to int
, whose result the chr
function can use to produce the desired character.
for a in uni:
s = s.replace(a, chr(int(f'0x{a[3:7]}', base=16)))
Answered By - chepner
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.