Sunday, February 6, 2022

[FIXED] Reorder xmin, xmax, ymin, and ymax for each column in CSV file into new columns

February 06, 2022 csv, numpy, pandas, python No comments

Issue

I am new to python, and am struggling with a calculation. I have several thousand rows of data in a CSV table in the following format:

Link to image table

This data is in the wrong format in that several of my xmin/ymin values are higher than the xmax/ymax values (examples can be seen in the image link above). I need to create new columns and use either numpy or pandas to reorder the data so that they are in the correct format, such as using this code:

import numpy as np

xmin_new = np.min(xmin, xmax)
xmax_new = np.max(xmin, xmax)
ymin_new = np.min(ymin, ymax)
ymax_new = np.max(ymin, ymax)

The trouble is that I'm having trouble defining a column in a CSV and iterating through rows to do this. Can anyone suggest how I could modify this script to accomplish this?

import pandas
import numpy as np
import os
import csv

#Set cwd
os.chdir("C:\\Users\\desired_directory")

#Open desired csv file
v = open("train.csv")
r = csv.reader(v)
row0 = r.next()

#print header to look at file
print row0

row0.append('xmin_new')
row0.append('xmax_new')
row0.append('ymin_new')
row0.append('ymax_new')

#Check appends
print row0

xmin_new = np.min(xmin, xmax)
xmax_new = np.max(xmin, xmax)
ymin_new = np.min(ymin, ymax)
ymax_new = np.max(ymin, ymax)

#Errors occur here saying that the "xmin_new" column is undefined.
#Also looking to save the file to the directory, but unsure of how to do this properly.

Solution

If you are looking for speed, numpy is a good way to go. I assume you know how to read the whole data into a DataFrame (look up pandas.read_csv()).

# First, make a reproducible example
# In your case, you would read the df instead

n = 6
np.random.seed(0)
cols = 'xmin xmax ymin ymax'.split()
df = pd.DataFrame(
    np.random.randint(0, 10, (n,4)),
    columns=cols,
).assign(foo=np.random.choice(list('abcd'), n))

>>> df
   xmin  xmax  ymin  ymax foo
0     5     0     3     3   a
1     7     9     3     5   d
2     2     4     7     6   a
3     8     8     1     6   d
4     7     7     8     1   b
5     5     9     8     9   c

Then, the actual bit:

# reorder min/max for both x and y
#
# Note: cols must be ['xmin', 'xmax', 'ymin', 'ymax']
# or ['ymin', 'ymax', 'xmin', 'xmax']

z = df[cols].values.reshape(-1, 2)
df[cols] = np.c_[z.min(1), z.max(1)].reshape(-1, 4)

And now:

>>> df
   xmin  xmax  ymin  ymax foo
0     0     5     3     3   a
1     7     9     3     5   d
2     2     4     6     7   a
3     8     8     1     6   d
4     7     7     1     8   b
5     5     9     8     9   c

Note: if instead, you want to create new columns as per your question, consider this instead:

cols_new = [f'{k}_new' for k in cols]
z = df[cols].values.reshape(-1, 2)
df[cols_new] = np.c_[z.min(1), z.max(1)].reshape(-1, 4)

There is a slightly more verbose way in pandas-only:

df = df.assign(
    xmin=df[['xmin', 'xmax']].min(1),
    xmax=df[['xmin', 'xmax']].max(1),
    ymin=df[['ymin', 'ymax']].min(1),
    ymax=df[['ymin', 'ymax']].max(1),
)

Same remark as before, if you intend to create new columns instead, then df.assign(xmin_new=...) etc.

Answered By - Pierre D

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Sunday, February 6, 2022

[FIXED] Reorder xmin, xmax, ymin, and ymax for each column in CSV file into new columns

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels