Wednesday, July 27, 2022

[FIXED] Import csv/xlsx file into python as a list and find a template match

July 27, 2022 csv, dataframe, pandas, python No comments

Issue

I have two simplified csv files below (uploaded as images and link in google drive) and I would like to ask how can I import both files in a python list format where I can iterate over the "templates.csv" and find the best match (if no perfect match exist).

https://drive.google.com/drive/folders/1dOv-8fQgeGEsaPMXYU8kbo2mY4B2KkDH?usp=sharing

test.csv = enter image description here

templates.csv = enter image description here

My current issue is how can I import the csv files in such a way that it will be read as a list where: test.csv file becomes:

test = ['AC_CURRENT', 'AC_POWER', 'AC_POWER.MAX', 'DC_CURRENT', 'DC_VOLTAGE', ....]

templates.csv file becomes a list of templates:

template1 = ['AC_CURRENT', 'AC_CURRENT_15MIN', 'AC_CURRENT_A', 'AC_CURRENT_B', 'AC_CURRENT_C',...]

template2 = ['AC_CURRENT', 'AC_POWER', 'AC_POWER.MAX', 'DC_CURRENT', 'DC_VOLTAGE', ....]

template3 = ['AC_CURRENT', 'AC_POWER', 'AC_POWER.MAX', 'AC_VOLTAGE', 'DC_CURRENT', ....]

templateN = ....

The first issue I have is reading both files as a dataframe where it will be converted as a list above then I can perform the iteration, finding a good match, and if no perfect match exists, it will give me a recommendation in percentage which template gives the most attributes match and will also list the missing attributes from my test.csv file that didnt have any match.

Take note that I cannot manipulate the excel file above in a easy format to pull as a list as we natively pull this from a software where these templates sits.

Thanks a lot for the help

I only have this line of codes for now which just reads the files as dataframe using pandas:

import pandas as pd

df1 = pd.read_csv('test.csv', sep=',')
df2 = pd.read_csv('templates.csv', sep=',')
print(df1)
print(df2)

enter image description here

Solution

Take this example:

import pandas as pd

df = pd.DataFrame({"Parent": ["x", "x", "y", "y", "z"], "Name": [1, 2, 3, 4, 5]})
>   Parent  Name
  0      x     1
  1      x     2
  2      y     3
  3      y     4
  4      z     5

# group by 'Parent', aggregate group values to lists for each column, then get column 'Name'
grouped_as_series = df.groupby('Parent').agg(list)['Name']
> Parent
  x    [1, 2]
  y    [3, 4]
  z       [5]

# to access single values
print(grouped_as_series["x"])
> [1, 2]

So for your case:

test = df1.groupyby('Parent').agg(list)['Name']['test']
series_of_templates = df2.groupyby('Parent').agg(list)['Name']

Note that I'm not sure if I understand what you plan to do with the data afterwards so a series might not be the best way to store the extracted data. E.g. a dictionary or list might be better.

Edit:

If I understand it correctly you want to then check the percentage of match between the test list and each template:

test = [6, 4]

best_template_percent = 0.0
best_template_index = -1
for ind, val in grouped_as_series.items():
    match_percent = len([v for v in test if v in val])/len(test)
    if match_percent > best_template_percent:
        best_template_percent = match_percent
        best_template_index = ind

print(best_template_index)
> y
print(grouped_as_series[best_template_index])
> [3, 4]

Or if you want it shorter:

import numpy as np

best_template_index = grouped_as_series.index[np.argmax([len([v for v in test if v in val])/len(test) for ind, val in grouped_as_series.items()])]

For your data:

import numpy as np

best_template_index = series_of_templates.index[np.argmax([len([v for v in test if v in val])/len(test) for ind, val in series_of_templates.items()])]
best_template_values = series_of_templates[best_template_index]

Answered By - ewz93

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Wednesday, July 27, 2022

[FIXED] Import csv/xlsx file into python as a list and find a template match

Issue

Solution

Edit:

0 comments:

Post a Comment

Popular Posts

Labels