Issue
I'm coming to Python from a SAS background.
I've imported a SAS version 5 transport file (XPT) into python using:
df = pd.read_sas(r'C:\mypath\myxpt.xpt')
The file is a simple SAS transport file, converted from a SAS dataset created with the following:
DATA myxpt;
DO i = 1 TO 10;
y = "XXX";
OUTPUT;
END;
RUN;
The file imports correctly and I can view the contents using:
print(df)
screenshot showing print of dataframe
However, when I view the file using the variable explorer, all character columns are shown as blank.
Screenshot showing data frame viewed through Variable explorer
I've tried reading this as a sas dataset instead of a transport file and importing this into Python but have the same problem.
I've also tried creating a dataframe within python containing character columns and this displays correctly within the variable explorer.
Any suggestions what's going wrong?
Thanks in advance.
Solution
Column Y is a column of binary strings. You have to decode it first. The variable explorer cannot guess the correct encoding and apparently does not show binary strings. If you do not know the encoding you will have to guess. Try df['utf8']=df.Y.str.decode('utf8')
and see if the info makes any sense.
As you have noted, it is possible to specify the encoding in the import function:
df = pd.read_sas(r'C:\mypath\myxpt.xpt', encoding='utf8')
As a sidenote, you should always be aware and preferably explicit of the encodings in use to avoid major headaches.
For a list of all available encodings and ther aliases check here.
Answered By - Francio Rodrigues
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.