Issue
I have multiple files with two columns: one for minutes and the other for number of events. Each file contains the data for 19-22 minutes. For example, I have 00:14, 00:33, 00:54, 01:12... The quantity of minutes on each file is random. I wrote a code to read all the files in a loop and then repeat the same process to treat them, using pd.concat to merge them in the end. This code works correctly, the problem is, for the files at the end of an hour, the 45-59 minutes are in the last rows and the 00-10 minutes in the first, this is an error that already comes within the files. Like this (minute,events):
00, 24711,
01, 24795,
02, 24507,
03, 24523,
04, 24460,
05, 24697,
06, 24482,
07, 24689,
08, 24504,
09, 24682,
10, 24763,
52, 24320,
53, 24605,
54, 24659,
55, 24705,
56, 24928,
57, 24718,
58, 24620,
59, 24704
How can I fix the order of these files and incorporate it to the loop as a condition (if minutes>40 then reorder dataframe)? I don't really care about the minutes column, so I tried to preserve the order by changing the zeros to 60's, and the ones to 70's but it obviously changed 00 to 66 and 11 to 77:
counts.columns = ['Minute', 'Events']
mask = counts['Minute'].str[0] == '0'
counts.loc[mask, 'Minute'] = counts.loc[mask, 'Minute'].str.replace('0','6')
mask = counts['Minute'].str[0] == '1'
counts.loc[mask, 'Minute'] = counts.loc[mask, 'Minute'].str.replace('1','7')
I was thinking after changing these minutes correctly, I could just reset the index and it will automatically put the 40-59 minutes first, followed by 60-70 minutes.
Another solution I thought of but seemed more complicated, was to match the filename, which contains the starting minute, find that number in the Minute column and make it the first row but I don't know how to make the following minutes go to the top too and I didn't understand the fnmatch documentation.
Thank you so much for any help!
Solution
Try with the following:
counts["Minute"] = counts["Minute"].astype(int)
counts["Minute"] = counts["Minute"].where(counts["Minute"].gt(40), counts["Minute"].add(60))
counts = counts.sort_values("Minute")
Answered By - not_speshal
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.