Thursday, February 10, 2022

[FIXED] Method-chaining without permanently mutating the object

February 10, 2022 method-chaining, pandas, python, python-2.x No comments

Issue

I am learning how to write a python class and method chaining at the moment. Basically, I want a python (2.7) class that keeps my data and has (chain-able) methods that allow me to filter the data without mutating my original data. I have done some Googling and it seems like my answer might have something to do with return self, but I am not sure how to implement it such that the methods will not mutate my original data.

Let's say I have a data stored in an excel file called file as follows:

+--------+-----+-------+
| Person | Sex | Score |
+--------+-----+-------+
| A      | M   |    10 |
| B      | F   |     9 |
| C      | M   |     8 |
| D      | F   |     7 |
+--------+-----+-------+

I would like to write a class called MyData such that I can do some basic data calling and filtering.

This is what I got so far

class MyData:
    def __init__ (self, file):
        import pandas as pd
        self.data = pd.read_excel (file)
        self.Person = self.data['Person']
        self.Sex = self.data['Sex']
        self.Score = self.data['Score']

    def male_only(self):
        self.data = self.data[self.Sex=="M"]
        self.Person = self.Person[self.Sex=="M"]
        self.Score = self.Score[self.Sex=="M"]
        self.Sex = self.Sex[self.Sex=="M"]
        return self

    def female_only(self):
        self.data = self.data[self.Sex=="F"]
        self.Person = self.Person[self.Sex=="F"]
        self.Score = self.Score[self.Sex=="F"]
        self.Sex = self.Sex[self.Sex=="F"]
        return self

This seems to work, but sadly my original data is permanently mutated with this code. For example:

Data = MyData(file)
Data.data
>>> Data.data
  Person Sex  Score
0      A   M     10
1      B   F      9
2      C   M      8
3      D   F      7

Data.male_only().data
>>> Data.male_only().data
  Person Sex  Score
0      A   M     10
2      C   M      8

Data.data
>>> Data.data
  Person Sex  Score
0      A   M     10
2      C   M      8

I would like a class that returns the same answers for Data.male_only().Person and Data.Person.male_only() or for Data.male_only().data and Data.data.male_only() without permanently mutating Data.data or Data.Person.

Solution

I agree with @Demi-Lune.

I changed OP's code so that male_only() and female_only() methods always return copy of its belonging object. And I changed __init__() method because I think you don't want to call pd.read_csv() method every time creating that new object. So male_only() and female_only() method always return new object, it will no effect on other objects.

import pandas as pd

# Added for creating file on memory.
import io
csv = '''Person,Sex,Score
p1,M,1
p2,M,2
p3,M,3
p4,F,4
p5,F,5
p6,F,6'''
file = io.StringIO(csv)

class MyData:
    def __init__ (self, file=None, data=None):
        import pandas as pd
        if file:
            self.data = pd.read_csv(file)
        else:
            self.data = data
        self.Person = self.data['Person']
        self.Sex = self.data['Sex']
        self.Score = self.data['Score']

    def copy_d(self):
        return MyData(data=self.data.copy())

    def male_only(self):
        d = self.copy_d()
        d.data = self.data[self.Sex=="M"]
        d.Person = self.Person[self.Sex=="M"]
        d.Score = self.Score[self.Sex=="M"]
        d.Sex = self.Sex[self.Sex=="M"]
        return d

    def female_only(self):
        d = self.copy_d()
        d.data = self.data[self.Sex=="F"]
        d.Person = self.Person[self.Sex=="F"]
        d.Score = self.Score[self.Sex=="F"]
        d.Sex = self.Sex[self.Sex=="F"]
        return d

d = MyData(file)
print(d.female_only().data)
#   Person Sex  Score
# 3     p4   F      4
# 4     p5   F      5
# 5     p6   F      6

print(d.male_only().data)
#   Person Sex  Score
# 0     p1   M      1
# 1     p2   M      2
# 2     p3   M      3

print(d.data)
#   Person Sex  Score
# 0     p1   M      1
# 1     p2   M      2
# 2     p3   M      3
# 3     p4   F      4
# 4     p5   F      5
# 5     p6   F      6

But if your are just using pandas.DataFrame, another approach is just using bare pandas.DataFrame. First thing is that, in most case, pandas.DataFrame object already has properties name which equals to columns name. So in fact, you don't need to define properties like Person, Sex, Score because it already exists in DataFrame object.

ie:

import pandas as pd
import numpy as np
df = pd.DataFrame(np.eye(3,3), columns=['Person', 'Sex', 'Score'])

# `df` already has these properteis.
df.Person
df.Sex
df.Score
# In [986]: df.Person
# Out[986]: 
# 0    1.0
# 1    0.0
# 2    0.0
# Name: Person, dtype: float64

# In [987]: df.Sex
# Out[987]: 
# 0    0.0
# 1    1.0
# 2    0.0
# Name: Sex, dtype: float64

# In [988]: df.Score
# Out[988]: 
# 0    0.0
# 1    0.0
# 2    1.0
# Name: Score, dtype: float64

So, your male_only() and female_only() methods are written like following.

import pandas as pd

# Added for creating file on memory.
import io
csv = '''Person,Sex,Score
p1,M,1
p2,M,2
p3,M,3
p4,F,4
p5,F,5
p6,F,6'''
file = io.StringIO(csv)

def male_only(df):
    return df[df.Sex=='M']

def female_only(df):
    return df[df.Sex=='F']

df = pd.read_csv(file)
male_only(df)
# In [1034]: male_only(df)
# Out[1037]: 
#   Person Sex  Score
# 0     p1   M      1
# 1     p2   M      2
# 2     p3   M      3

female_only(df)
# In [1038]: female_only(df)
# Out[1041]: 
#   Person Sex  Score
# 3     p4   F      4
# 4     p5   F      5
# 5     p6   F      6

I hope it will help you.

Answered By - Kei Minagawa

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Thursday, February 10, 2022

[FIXED] Method-chaining without permanently mutating the object

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels