Issue
I'm working on an ML model that requires RSSI values from a BLE device. For this case, I've created a Mac application where I store a dictionary of type <K: Date,v: Int>
in a text file. See the sample below.
string = '[2021-10-17 06:52:00 +0000: -47, 2021-10-17 06:52:04 +0000: -50, 2021-10-17 06:52:03 +0000: -50, 2021-10-17 06:52:02 +0000: -47, 2021-10-17 06:52:08 +0000: -46, 2021-10-17 06:51:57 +0000: -50, 2021-10-17 06:52:09 +0000: -48, 2021-10-17 06:52:05 +0000: -49, 2021-10-17 06:52:01 +0000: -48, 2021-10-17 06:51:58 +0000: -50, 2021-10-17 06:51:59 +0000: -50, 2021-10-17 06:52:06 +0000: -47, 2021-10-17 06:52:07 +0000: -48]'
Here, the negative values in the sample are the RSSI values. For example for the first 2 indices
Date | RSSI |
---|---|
2021-10-17 06:52:00 +0000 | -47 |
2021-10-17 06:52:04 +0000 | -50 |
For me to perform any calculations I need the data to be of type <Date, Int>
equivalent on python. How can I convert the above string into a Pandas Dataframe to perform calculations? Hope this provides enough information. Thank you in advance.
Solution
You can use re.findall
and a small regex:
as dataframe
string = '[2021-10-17 06:52:00 +0000: -47, 2021-10-17 06:52:04 +0000: -50, 2021-10-17 06:52:03 +0000: -50, 2021-10-17 06:52:02 +0000: -47, 2021-10-17 06:52:08 +0000: -46, 2021-10-17 06:51:57 +0000: -50, 2021-10-17 06:52:09 +0000: -48, 2021-10-17 06:52:05 +0000: -49, 2021-10-17 06:52:01 +0000: -48, 2021-10-17 06:51:58 +0000: -50, 2021-10-17 06:51:59 +0000: -50, 2021-10-17 06:52:06 +0000: -47, 2021-10-17 06:52:07 +0000: -48]'
import re
import pandas as pd
df = (pd.DataFrame.from_records(re.findall('([^,]+): (-?\d+)(?:, )?', string[1:-1]),
columns=['Date', 'RSSI'])
.astype({'Date': 'datetime64', 'RSSI': 'int'})
)
output:
Date RSSI
0 2021-10-17 06:52:00 +0000 -47
1 2021-10-17 06:52:04 +0000 -50
2 2021-10-17 06:52:03 +0000 -50
3 2021-10-17 06:52:02 +0000 -47
4 2021-10-17 06:52:08 +0000 -46
5 2021-10-17 06:51:57 +0000 -50
6 2021-10-17 06:52:09 +0000 -48
7 2021-10-17 06:52:05 +0000 -49
8 2021-10-17 06:52:01 +0000 -48
9 2021-10-17 06:51:58 +0000 -50
10 2021-10-17 06:51:59 +0000 -50
11 2021-10-17 06:52:06 +0000 -47
12 2021-10-17 06:52:07 +0000 -48
as dictionary
import re
dict(re.findall('([^,]+): (-?\d+)(?:, )?', string[1:-1]))
output:
{'2021-10-17 06:52:00 +0000': '-47',
'2021-10-17 06:52:04 +0000': '-50',
'2021-10-17 06:52:03 +0000': '-50',
'2021-10-17 06:52:02 +0000': '-47',
'2021-10-17 06:52:08 +0000': '-46',
'2021-10-17 06:51:57 +0000': '-50',
'2021-10-17 06:52:09 +0000': '-48',
'2021-10-17 06:52:05 +0000': '-49',
'2021-10-17 06:52:01 +0000': '-48',
'2021-10-17 06:51:58 +0000': '-50',
'2021-10-17 06:51:59 +0000': '-50',
'2021-10-17 06:52:06 +0000': '-47',
'2021-10-17 06:52:07 +0000': '-48'}
as dictionary with correct types:
import re
import pandas as pd
{pd.to_datetime(k): int(v)
for k,v in re.findall('([^,]+): (-?\d+)(?:, )?', string[1:-1])}
output:
{Timestamp('2021-10-17 06:52:00+0000', tz='UTC'): -47,
Timestamp('2021-10-17 06:52:04+0000', tz='UTC'): -50,
Timestamp('2021-10-17 06:52:03+0000', tz='UTC'): -50,
Timestamp('2021-10-17 06:52:02+0000', tz='UTC'): -47,
Timestamp('2021-10-17 06:52:08+0000', tz='UTC'): -46,
Timestamp('2021-10-17 06:51:57+0000', tz='UTC'): -50,
Timestamp('2021-10-17 06:52:09+0000', tz='UTC'): -48,
Timestamp('2021-10-17 06:52:05+0000', tz='UTC'): -49,
Timestamp('2021-10-17 06:52:01+0000', tz='UTC'): -48,
Timestamp('2021-10-17 06:51:58+0000', tz='UTC'): -50,
Timestamp('2021-10-17 06:51:59+0000', tz='UTC'): -50,
Timestamp('2021-10-17 06:52:06+0000', tz='UTC'): -47,
Timestamp('2021-10-17 06:52:07+0000', tz='UTC'): -48}
Answered By - mozway
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.