Tuesday, December 28, 2021

[FIXED] Python TypeError: expected a string or other character buffer object when importing text file

December 28, 2021 io, python, python-2.7, python-2.x, typeerror No comments

Issue

I am pretty new to python. For this task, I am trying to import a text file, add ~~and~~ to id, and remove punctuation from the text. I tried this method How to strip punctuation from a text file.

import string
def readFile():

translate_table = dict((ord(char), None) for char in string.punctuation)
with open('out_file.txt', 'w') as out_file:
    with open('moviereview.txt') as file:
        for line in file:
            line = ' '.join(line.split(' '))
            line = line.translate(translate_table)
            out_file.write("<s>" + line.rstrip('\n') + "</s>" + '\n')

return out_file

However, I get an error saying:

TypeError: expected a string or other character buffer object

My thought is that after I split and join the line, I get a list of strings, so I cannot use str.translate() to process it. But it seems like everyone else have the same thing and it works, ex. https://appliedmachinelearning.blog/2017/04/30/language-identification-from-texts-using-bi-gram-model-pythonnltk/ in example code from line 13.

So I am really confused, can anyone help? Thanks!

Solution

On Python 2, only unicode types have a translate method that takes a dict. If you intend to work with arbitrary text, the simplest solution here is to just use the Python 3 version of open on Py2; it will seamlessly decode your inputs and produce unicode instead of str.

As of Python 2.6+, replacing the normal built-in open with the Python 3 version is simple. Just add:

from io import open

to the imports at the top of your file. You can also remove line = ' '.join(line.split(' ')); that's definitionally a no-op (it splits on single spaces to make a list, then rejoins on single spaces). You may also want to add:

from __future__ import unicode_literals

to the very top of your file (before all of your code); that will make all of your uses of plain quotes automatically unicode literals, not str literals (prefix actual binary data with b to make it a str literal on Py2, bytes literal on Py3).

The above solution is best if you can swing it, because it will make your code work correctly on both Python 2 and Python 3. If you can't do it for whatever reason, then you need to change your translate call to use the API Python 2's str.translate expects, which means removing the definition of translate_table entirely (it's not needed) and just doing:

line = line.translate(None, string.punctuation)

For Python 2's str.translate, the arguments are a one-to-one mapping table for all values from 0 to 255 inclusive as the first argument (None if no mapping needed), and the second argument is a string of characters to delete (which string.punctuation already provides).

Answered By - ShadowRanger

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Tuesday, December 28, 2021

[FIXED] Python TypeError: expected a string or other character buffer object when importing text file

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels