Issue
I have a read.csv DataFrame which is continuously updating with addition of one new row on every run of my script which looks like ....
df = pd.read_csv(file_path)
print(df.to_string(index=False))
timestamp Puts Calls PutCh CallCh ChDiff
09:41:12 AM 2027891 1820724 280101 200974 79127
09:48:51 AM 2075976 1862053 328186 242303 85883
09:58:48 AM 2091487 1885842 343697 266092 77605
10:08:21 AM 2091879 1918592 344089 298842 45247
02:26:00 PM 1995234 1941917 247444 322167 -74723
02:44:36 PM 1990071 1934874 242281 315124 -72843
02:56:17 PM 1970892 1938472 223102 318722 -95620
Now I want the difference of each succeeding row from previous one for which I have read about df.diff(). So I dropped the timestamp column to get a new datafame as df1 and wrote my script ...
df1.diff()
and got my output as ....
Puts Calls PutCh CallCh ChDiff
NaN NaN NaN NaN NaN
48085.0 41329.0 48085.0 41329.0 6756.0
15511.0 23789.0 15511.0 23789.0 -8278.0
392.0 32750.0 392.0 32750.0 -32358.0
-96645.0 23325.0 -96645.0 23325.0 -119970.0
-5163.0 -7043.0 -5163.0 -7043.0 1880.0
-19179.0 3598.0 -19179.0 3598.0 -22777.0
Here I want these difference values are added to my original DataFrame(df) in bracket for every column. More elaborately my output should looks like(here timestamp column also should be there as in my df)....
Puts Calls PutCh CallCh ChDiff
2027891 1820724 280101 200974 79127
2075976 1862053 328186 242303 85883
(48085) (41329) (48085) (41329) (6756)
2091487 1885842 343697 266092 77605
(15511) (23789) (15511) (23789) (-8278)
2091879 1918592 344089 298842 45247
(392) (32750) (392) (32750) (-32358)
Is there any way to do same plz ??
Solution
Convert the output of diff
to string, add the parentheses and concat
back to the original, finally sort_index
to reorganize the rows in order:
tmp = (df.drop(columns='timestamp').diff()
.iloc[1:]
.apply(lambda s: '('+s.astype(str)+')')
)
out = pd.concat([df, tmp]).sort_index()
Output:
timestamp Puts Calls PutCh CallCh ChDiff
0 09:41:12 AM 2027891 1820724 280101 200974 79127
1 09:48:51 AM 2075976 1862053 328186 242303 85883
1 NaN (48085.0) (41329.0) (48085.0) (41329.0) (6756.0)
2 09:58:48 AM 2091487 1885842 343697 266092 77605
2 NaN (15511.0) (23789.0) (15511.0) (23789.0) (-8278.0)
3 10:08:21 AM 2091879 1918592 344089 298842 45247
3 NaN (392.0) (32750.0) (392.0) (32750.0) (-32358.0)
4 02:26:00 PM 1995234 1941917 247444 322167 -74723
4 NaN (-96645.0) (23325.0) (-96645.0) (23325.0) (-119970.0)
5 02:44:36 PM 1990071 1934874 242281 315124 -72843
5 NaN (-5163.0) (-7043.0) (-5163.0) (-7043.0) (1880.0)
6 02:56:17 PM 1970892 1938472 223102 318722 -95620
6 NaN (-19179.0) (3598.0) (-19179.0) (3598.0) (-22777.0)
variant with termcolor
:
from termcolor import colored
tmp = (df.drop(columns='timestamp').diff()
.iloc[1:]
.applymap(lambda x: colored(f'({x})',
'green' if x>0 else 'red')
)
)
out = (pd.concat([df.applymap(lambda x: colored(x, 'white')), tmp])
.fillna(colored('', 'white'))
.sort_index(kind='stable')
.rename(columns=lambda x: colored(x, 'white', None))
)
print(out.to_string(index=False))
Output:
Answered By - mozway
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.