Issue
I am learning how to write a python class and method chaining at the moment. Basically, I want a python (2.7) class that keeps my data and has (chain-able) methods that allow me to filter the data without mutating my original data. I have done some Googling and it seems like my answer might have something to do with return self
, but I am not sure how to implement it such that the methods will not mutate my original data.
Let's say I have a data stored in an excel file called file
as follows:
+--------+-----+-------+
| Person | Sex | Score |
+--------+-----+-------+
| A | M | 10 |
| B | F | 9 |
| C | M | 8 |
| D | F | 7 |
+--------+-----+-------+
I would like to write a class called MyData
such that I can do some basic data calling and filtering.
This is what I got so far
class MyData:
def __init__ (self, file):
import pandas as pd
self.data = pd.read_excel (file)
self.Person = self.data['Person']
self.Sex = self.data['Sex']
self.Score = self.data['Score']
def male_only(self):
self.data = self.data[self.Sex=="M"]
self.Person = self.Person[self.Sex=="M"]
self.Score = self.Score[self.Sex=="M"]
self.Sex = self.Sex[self.Sex=="M"]
return self
def female_only(self):
self.data = self.data[self.Sex=="F"]
self.Person = self.Person[self.Sex=="F"]
self.Score = self.Score[self.Sex=="F"]
self.Sex = self.Sex[self.Sex=="F"]
return self
This seems to work, but sadly my original data is permanently mutated with this code. For example:
Data = MyData(file)
Data.data
>>> Data.data
Person Sex Score
0 A M 10
1 B F 9
2 C M 8
3 D F 7
Data.male_only().data
>>> Data.male_only().data
Person Sex Score
0 A M 10
2 C M 8
Data.data
>>> Data.data
Person Sex Score
0 A M 10
2 C M 8
I would like a class that returns the same answers for Data.male_only().Person
and Data.Person.male_only()
or for Data.male_only().data
and Data.data.male_only()
without permanently mutating Data.data
or Data.Person
.
Solution
I agree with @Demi-Lune.
I changed OP's code so that male_only()
and female_only()
methods always return copy of its belonging object. And I changed __init__()
method because I think you don't want to call pd.read_csv()
method every time creating that new object. So male_only()
and female_only()
method always return new object, it will no effect on other objects.
import pandas as pd
# Added for creating file on memory.
import io
csv = '''Person,Sex,Score
p1,M,1
p2,M,2
p3,M,3
p4,F,4
p5,F,5
p6,F,6'''
file = io.StringIO(csv)
class MyData:
def __init__ (self, file=None, data=None):
import pandas as pd
if file:
self.data = pd.read_csv(file)
else:
self.data = data
self.Person = self.data['Person']
self.Sex = self.data['Sex']
self.Score = self.data['Score']
def copy_d(self):
return MyData(data=self.data.copy())
def male_only(self):
d = self.copy_d()
d.data = self.data[self.Sex=="M"]
d.Person = self.Person[self.Sex=="M"]
d.Score = self.Score[self.Sex=="M"]
d.Sex = self.Sex[self.Sex=="M"]
return d
def female_only(self):
d = self.copy_d()
d.data = self.data[self.Sex=="F"]
d.Person = self.Person[self.Sex=="F"]
d.Score = self.Score[self.Sex=="F"]
d.Sex = self.Sex[self.Sex=="F"]
return d
d = MyData(file)
print(d.female_only().data)
# Person Sex Score
# 3 p4 F 4
# 4 p5 F 5
# 5 p6 F 6
print(d.male_only().data)
# Person Sex Score
# 0 p1 M 1
# 1 p2 M 2
# 2 p3 M 3
print(d.data)
# Person Sex Score
# 0 p1 M 1
# 1 p2 M 2
# 2 p3 M 3
# 3 p4 F 4
# 4 p5 F 5
# 5 p6 F 6
But if your are just using pandas.DataFrame
, another approach is just using bare pandas.DataFrame
. First thing is that, in most case, pandas.DataFrame
object already has properties name which equals to columns name. So in fact, you don't need to define properties like Person
, Sex
, Score
because it already exists in DataFrame object.
ie:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.eye(3,3), columns=['Person', 'Sex', 'Score'])
# `df` already has these properteis.
df.Person
df.Sex
df.Score
# In [986]: df.Person
# Out[986]:
# 0 1.0
# 1 0.0
# 2 0.0
# Name: Person, dtype: float64
# In [987]: df.Sex
# Out[987]:
# 0 0.0
# 1 1.0
# 2 0.0
# Name: Sex, dtype: float64
# In [988]: df.Score
# Out[988]:
# 0 0.0
# 1 0.0
# 2 1.0
# Name: Score, dtype: float64
So, your male_only()
and female_only()
methods are written like following.
import pandas as pd
# Added for creating file on memory.
import io
csv = '''Person,Sex,Score
p1,M,1
p2,M,2
p3,M,3
p4,F,4
p5,F,5
p6,F,6'''
file = io.StringIO(csv)
def male_only(df):
return df[df.Sex=='M']
def female_only(df):
return df[df.Sex=='F']
df = pd.read_csv(file)
male_only(df)
# In [1034]: male_only(df)
# Out[1037]:
# Person Sex Score
# 0 p1 M 1
# 1 p2 M 2
# 2 p3 M 3
female_only(df)
# In [1038]: female_only(df)
# Out[1041]:
# Person Sex Score
# 3 p4 F 4
# 4 p5 F 5
# 5 p6 F 6
I hope it will help you.
Answered By - Kei Minagawa
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.