Friday, February 25, 2022

[FIXED] How can I filter a csv file based on its columns in python?

February 25, 2022 csv, pandas, python, python-3.x, spyder No comments

Issue

I have a CSV file with over 5,000,000 rows of data that looks like this (except that it is in Farsi):

Contract Code,Contract Type,State,City,Property Type,Region,Usage Type,Area,Percentage,Price,Price per m2,Age,Frame Type,Contract Date,Postal Code
765720,Mobayee,East Azar,Kish,Apartment,,Residential,96,100,570000,5937.5,36,Metal,13890107,5169614658
766134,Mobayee,East Azar,Qeshm,Apartment,,Residential,144.5,100,1070000,7404.84,5,Concrete,13890108,5166884645
766140,Mobayee,East Azar,Tabriz,Apartment,,Residential,144.5,100,1050000,7266.44,5,Concrete,13890108,5166884645
766146,Mobayee,East Azar,Tabriz,Apartment,,Residential,144.5,100,700000,4844.29,5,Concrete,13890108,5166884645
766147,Mobayee,East Azar,Kish,Apartment,,Residential,144.5,100,1625000,11245.67,5,Concrete,13890108,5166884645
770822,Mobayee,East Azar,Tabriz,Apartment,,Residential,144.5,50,500000,1730.1,5,Concrete,13890114,5166884645

I would like to write a code to pass the first row as the header and then extract data from two specific cities (Kish and Qeshm) and save it into a new CSV file. Somthing like this one:

Contract Code,Contract Type,State,City,Property Type,Region,Usage Type,Area,Percentage,Price,Price per m2,Age,Frame Type,Contract Date,Postal Code
765720,Mobayee,East Azar,Kish,Apartment,,Residential,96,100,570000,5937.5,36,Metal,13890107,5169614658
766134,Mobayee,East Azar,Qeshm,Apartment,,Residential,144.5,100,1070000,7404.84,5,Concrete,13890108,5166884645
766147,Mobayee,East Azar,Kish,Apartment,,Residential,144.5,100,1625000,11245.67,5,Concrete,13890108,5166884645

It's worth mentioning that I'm very new to python. I've written the following block to define the headers, but this is the furthest I've gotten so far.

import pandas as pd

path = '/Users/Desktop/sample.csv'

df = pd.read_csv(path , header=[0])
df.head = ()

Solution

You don't need to use header=... because the default is to treat the first row as the header, so

df = pd.read_csv(path)

Then, to keep rows on conditions:

df2 = df[df['City'].isin(['Kish', 'Qeshm'])]

And you can save it with

df2.to_csv(another_path)

Answered By - Raymond Kwok

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Friday, February 25, 2022

[FIXED] How can I filter a csv file based on its columns in python?

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels