Wednesday, November 2, 2022

[FIXED] Store 2 different encoded data in a file in python

November 02, 2022 encoding, pandas, python-3.x No comments

Issue

I have 2 types of encoded data

ibm037 encoded - a single delimiter variable - value is @@@
UTF8 encoded - a pandas dataframe with 100s of columns.

Example dataframe:

Date Time

1    2

My goal is to write this data into a python file. The format should be:

@@@  1    2

In this way I need to have all the rows of the dataframe in a python file where the 1st character for every line is @@@.

I tried to store this character at the first location in the pandas dataframe as a new column and then write to the file but it throws error saying that two different encodings can't be written to a file.

Tried another way to write it:

df_orig_data = pandas dataframe, Record_Header = encoded delimiter

    f = open("_All_DelimiterOfRecord.txt", "a")
        for row in df_orig_data.itertuples(index=False):
            f.write(Record_Header)
            f.write(str(row))
    f.close()

It also doesn't work.

Is this kind of data write even possible? How can I write these 2 encoded data in 1 file?

Edit:

StringData = StringIO(
    """Date,Time
1,2
1,2
"""
)

df_orig_data = pd.read_csv(StringData, sep=",")

Record_Header = "2 "

    f = open("_All_DelimiterOfRecord.txt", "a")
    for index, row in df_orig_data.iterrows():
        f.write(
            "\t".join(
                [
                    str(Record_Header.encode("ibm037")),
                    str(row["Date"]),
                    str(row["Time"]),
                ]
            )
        )
    f.close()

Solution

I would suggest doing the encoding yourself, and writing a bytes object to the file. This isn't a situation where you can rely on the built-in encoding do it.

That means that the program opens the file in binary mode (ab), all of the constants are byte-strings, and it works with byte-strings whenever possible.

The question doesn't say, but I assumed you probably wanted a UTF8 newline after each line, rather than an IBM newline.

I also replaced the file handling with a context manager, since that makes it impossible to forget to close a file after you're done.

import io
import pandas as pd
StringData = io.StringIO(
    """Date,Time
1,2
1,2
"""
)

df_orig_data = pd.read_csv(StringData, sep=",")
Record_Header = "2 "


with open("_All_DelimiterOfRecord.txt", "ab") as f:
    for index, row in df_orig_data.iterrows():
        f.write(Record_Header.encode("ibm037"))
        row_bytes = [str(cell).encode('utf8') for cell in row]
        f.write(b'\t'.join(row_bytes))
        # Note: this is an UTF8 newline, not an IBM newline.
        f.write(b'\n')

Answered By - Nick ODell

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Wednesday, November 2, 2022

[FIXED] Store 2 different encoded data in a file in python

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels