Saturday, January 1, 2022

[FIXED] how to filter a particular column with python pandas?

January 01, 2022 dataframe, pandas, python-3.x No comments

Issue

I have an excel file where I have 2 columns: 'Name' and 'size'. The 'Name' column has multiple file types, namely ".apk, .dat, .vdex, .ttc" etc. But I only want to populate the files with the file extension ending with .apk. I do not want any other file type in the new excel file.

I have written the below code:

import pandas as pd
import json

def json_to_excel():
    with open('installed-files.json') as jf:
        data = json.load(jf)
        df = pd.DataFrame(data)
        new_df = df[df.columns.difference(['SHA256'])]
        new_xl = new_df.to_excel('abc.xlsx')
        return new_xl

def filter_apk():  `MODIFIED CODE`
    old_xl = json_to_excel()
    data = pd.read_excel(old_xl)
    a = data[data["Name"].str.contains("\.apk")]
    a.to_excel('zybg.xlsx')

Above program does following:

json_to_excel(), takes a Json file, converts it to a .xlsx format and save.
filter_apk() is suppose to create multiple excel file based on the file extension present in "Name" column.

1st function is doing what I intend to.
2nd function is not doing anything. Neither its throwing any error. I have followed this weblink

Below are the few samples of the "name" column

/system/product/<Path_to>/abc.apk
/system/fonts/wwwr.ttc
/system/framework/framework.jar
/system/<Path_to>/icu.dat
/system/<Path_to>/Normal.apk
/system/<Path_to>/Tv.apk

How to get that working? Or is there a better way to achieve the objective?

Please suggest.

ERROR

    raise ValueError(msg)
ValueError: Invalid file path or buffer object type: <class 'NoneType'>

Note:

I have all the files at the same location.

modified code:

import pandas as pd
import json

def json_to_excel():
    with open('installed-files.json') as jf:
        data = json.load(jf)
        df = pd.DataFrame(data)
        new_df = df[df.columns.difference(['SHA256'])]
        new_df.to_excel('abc.xlsx')

def filter_apk():
    json_to_excel()
    old_xl = pd.read_excel('abc.xlsx')
    data = pd.read_excel(old_xl)
    a = data[data["Name"].str.contains("\.apk")]
    a.to_excel('zybg.xlsx')


t = filter_apk()
print(t)

New error:

Traceback (most recent call last):
  File "C:/Users/amitesh.sahay/PycharmProjects/work_allocation/TASKS/Jenkins.py", line 89, in <module>
    t = filter_apk()
  File "C:/Users/amitesh.sahay/PycharmProjects/work_allocation/TASKS/Jenkins.py", line 84, in filter_apk
    data = pd.read_excel(old_xl)
  File "C:\Users\amitesh.sahay\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\util\_decorators.py", line 296, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\amitesh.sahay\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\excel\_base.py", line 304, in read_excel
    io = ExcelFile(io, engine=engine)
  File "C:\Users\amitesh.sahay\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\excel\_base.py", line 867, in __init__
    self._reader = self._engines[engine](self._io)
  File "C:\Users\amitesh.sahay\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\excel\_xlrd.py", line 22, in __init__
    super().__init__(filepath_or_buffer)
  File "C:\Users\amitesh.sahay\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\excel\_base.py", line 344, in __init__
    filepath_or_buffer, _, _, _ = get_filepath_or_buffer(filepath_or_buffer)
  File "C:\Users\amitesh.sahay\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\common.py", line 243, in get_filepath_or_buffer
    raise ValueError(msg)
ValueError: Invalid file path or buffer object type: <class 'pandas.core.frame.DataFrame'>

Solution

There is a difference between your use-case and use-case shown in the weblink. You want to apply a single filter (apk files), whereas the example you saw had multiple filters which were to be applied one after another (multiple species).

This will do the trick.

def filter_apk():
    old_xl = json_to_excel()
    data = pd.read_excel(old_xl)
    a = data[data["Name"].str.contains("\.apk")]
    a.to_excel("<path_to_new_excel>\\new_excel_name.xlsx")

Answered By - Kunal Shah

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Saturday, January 1, 2022

[FIXED] how to filter a particular column with python pandas?

Issue

modified code:

New error:

Solution

0 comments:

Post a Comment

Popular Posts

Labels