Issue
I have two simplified csv files below (uploaded as images and link in google drive) and I would like to ask how can I import both files in a python list format where I can iterate over the "templates.csv" and find the best match (if no perfect match exist).
https://drive.google.com/drive/folders/1dOv-8fQgeGEsaPMXYU8kbo2mY4B2KkDH?usp=sharing
test.csv = enter image description here
templates.csv = enter image description here
My current issue is how can I import the csv files in such a way that it will be read as a list where: test.csv file becomes:
test = ['AC_CURRENT', 'AC_POWER', 'AC_POWER.MAX', 'DC_CURRENT', 'DC_VOLTAGE', ....]
templates.csv file becomes a list of templates:
template1 = ['AC_CURRENT', 'AC_CURRENT_15MIN', 'AC_CURRENT_A', 'AC_CURRENT_B', 'AC_CURRENT_C',...]
template2 = ['AC_CURRENT', 'AC_POWER', 'AC_POWER.MAX', 'DC_CURRENT', 'DC_VOLTAGE', ....]
template3 = ['AC_CURRENT', 'AC_POWER', 'AC_POWER.MAX', 'AC_VOLTAGE', 'DC_CURRENT', ....]
templateN = ....
The first issue I have is reading both files as a dataframe where it will be converted as a list above then I can perform the iteration, finding a good match, and if no perfect match exists, it will give me a recommendation in percentage which template gives the most attributes match and will also list the missing attributes from my test.csv file that didnt have any match.
Take note that I cannot manipulate the excel file above in a easy format to pull as a list as we natively pull this from a software where these templates sits.
Thanks a lot for the help
I only have this line of codes for now which just reads the files as dataframe using pandas:
import pandas as pd
df1 = pd.read_csv('test.csv', sep=',')
df2 = pd.read_csv('templates.csv', sep=',')
print(df1)
print(df2)
Solution
Take this example:
import pandas as pd
df = pd.DataFrame({"Parent": ["x", "x", "y", "y", "z"], "Name": [1, 2, 3, 4, 5]})
> Parent Name
0 x 1
1 x 2
2 y 3
3 y 4
4 z 5
# group by 'Parent', aggregate group values to lists for each column, then get column 'Name'
grouped_as_series = df.groupby('Parent').agg(list)['Name']
> Parent
x [1, 2]
y [3, 4]
z [5]
# to access single values
print(grouped_as_series["x"])
> [1, 2]
So for your case:
test = df1.groupyby('Parent').agg(list)['Name']['test']
series_of_templates = df2.groupyby('Parent').agg(list)['Name']
Note that I'm not sure if I understand what you plan to do with the data afterwards so a series might not be the best way to store the extracted data. E.g. a dictionary or list might be better.
Edit:
If I understand it correctly you want to then check the percentage of match between the test list and each template:
test = [6, 4]
best_template_percent = 0.0
best_template_index = -1
for ind, val in grouped_as_series.items():
match_percent = len([v for v in test if v in val])/len(test)
if match_percent > best_template_percent:
best_template_percent = match_percent
best_template_index = ind
print(best_template_index)
> y
print(grouped_as_series[best_template_index])
> [3, 4]
Or if you want it shorter:
import numpy as np
best_template_index = grouped_as_series.index[np.argmax([len([v for v in test if v in val])/len(test) for ind, val in grouped_as_series.items()])]
For your data:
import numpy as np
best_template_index = series_of_templates.index[np.argmax([len([v for v in test if v in val])/len(test) for ind, val in series_of_templates.items()])]
best_template_values = series_of_templates[best_template_index]
Answered By - ewz93
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.