Friday, December 10, 2021

[FIXED] Character-to-substring translation using vanilla Python libraries

December 10, 2021 built-in, python-2.x No comments

Issue

Question

Using only modules and functions that are built-in to vanilla Python 2.x (where x is >= 7), and without rolling my own Python function to do so (see below), how do I translate characters to strings similar to how urllib.quote(), but limited to a subset of the characters that urllib.quote() translates?

In the example below, it is doing urlencoding, but my question happens to be more basic and general than just urlencoding. In the general case, I would like to translate arbitrarily specified characters into arbitrarily specified strings, and not just those for RFC compliance for url encoding.

Example

In the example below, I am translating only the balanced paren characters (square brackets, curly brackets, and parenthesis), and not all other characters that need to be quoted for URL query strings. This is not web-related at all, but is for paren navigation of the output the Python script produces.

I thought I would be able to use something that would seem both concise and efficient, that would be built into vanilla Python 2.x (vanilla being not requiring additional modules to be installed) using something like string translate, but the latter requires that the translation table be a mapping from single characters to single characters, but not my desired mapping from a single character to a multiple character strings.

So I wrote my own below:

def urlencode_parens(line):
    """url encode parens as would urllib.quote would do, but we only want it for parens"""
    trans_table = {
        '(': '%28',
        ')': '%29',
        '{': '%7B',
        '}': '%7D',
        '[': '%5B',
        ']': '%5D',
    }
    retval = []
    for char in line:
        if char in trans_table:
            char = trans_table[char]
        retval.append(char)
    return "".join(retval)

Although rudimentary timing analysis showed the above to be blazingly fast, it bothers me that I had to code that up in the first place (because now I have to stash that away in my own set of personal modules and maintain it).

Things I tried

I investigated how to force urllib.quote() into only translating the above mentioned characters, but it seems it internally hardcodes the translation of those characters without any way to extend/customize it.

I can do this using re.sub(), but I would have to chain them, given the immutability of strings in Python anyhow. It resulted in code that looked more like Lisp than Python (not that Lisp is bad, just that "When in Rome ... etc."). And given that regexp translation probably involves repeated re compilation of the regexps, I gave up on that due to thoughts that it might be less performant that what I cooked up above.

Update #1: maketrans documentation string is misleading and/or incorrect

Looking at string.maketrans I see:

That gives no indication that the to argument is optional. Nor does it state anything about how you can use the from field as was helpfully indicated by the https://stackoverflow.com/a/51481561/257924 answer.

Solution

in python 3 you can do this:

if you only pass one argument to str.maketrans (a mapping dictionary) you can also have more than one character:

trans_dict = {
        '(': '%28',
        ')': '%29',
        '{': '%7B',
        '}': '%7D',
        '[': '%5B',
        ']': '%5D',
    }

trans_table = str.maketrans(trans_dict)

print('dict={} tuple=()'.translate(trans_table))
# dict=%7B%7D tuple=%28%29

in python 2.7 you might try to use unicode.translate:

trans_dict = {
        '(': '%28',
        ')': '%29',
        '{': '%7B',
        '}': '%7D',
        '[': '%5B',
        ']': '%5D',
    }

trans_table = {ord(char): unicode(repl) for char, repl in trans_dict.items()}
print(u'dict={} tuple=()'.translate(trans_table))

Answered By - hiro protagonist

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Friday, December 10, 2021

[FIXED] Character-to-substring translation using vanilla Python libraries

Issue

Question

Example

Things I tried

Update #1: maketrans documentation string is misleading and/or incorrect

Solution

0 comments:

Post a Comment

Popular Posts

Labels