Friday, November 24, 2023

[FIXED] AttributeError on End-to-End ML Project - 'DataTransformation' object has no attribute 'data_transformation_config'

November 24, 2023 data-transform, machine-learning, pipeline, python No comments

Issue

I am following the process shown on Wine Quality Prediction End-to-End ML Project on Krish Naik's YouTube channel to do a Flight Fare Prediction Project.

I run this cell of data transformation pipeline on 03_data_transformation.ipynb:

try:
    config = ConfigurationManager()
    data_transformation_config = config.get_data_transformation_config()
    data_transformation = DataTransformation(config=data_transformation_config)
    # data_transformation.train_test_spliting()
    # New Line
    data_transformation.initiate_data_transformation()
except Exception as e:
    raise e

I get this error:

AttributeError: 'DataTransformation' object has no attribute 'data_transformation_config'

Here is the traceback:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
g:\Machine_Learning_Projects\iNeuron internship\Flight-Fare-Prediction-End-to-End-ML-Project\research\03_data_transformation.ipynb Cell 10 line 9
      <a href='vscode-notebook-cell:/g%3A/Machine_Learning_Projects/iNeuron%20internship/Flight-Fare-Prediction-End-to-End-ML-Project/research/03_data_transformation.ipynb#X12sZmlsZQ%3D%3D?line=6'>7</a>     data_transformation.initiate_data_transformation()
      <a href='vscode-notebook-cell:/g%3A/Machine_Learning_Projects/iNeuron%20internship/Flight-Fare-Prediction-End-to-End-ML-Project/research/03_data_transformation.ipynb#X12sZmlsZQ%3D%3D?line=7'>8</a> except Exception as e:
----> <a href='vscode-notebook-cell:/g%3A/Machine_Learning_Projects/iNeuron%20internship/Flight-Fare-Prediction-End-to-End-ML-Project/research/03_data_transformation.ipynb#X12sZmlsZQ%3D%3D?line=8'>9</a>     raise e

g:\Machine_Learning_Projects\iNeuron internship\Flight-Fare-Prediction-End-to-End-ML-Project\research\03_data_transformation.ipynb Cell 10 line 7
      <a href='vscode-notebook-cell:/g%3A/Machine_Learning_Projects/iNeuron%20internship/Flight-Fare-Prediction-End-to-End-ML-Project/research/03_data_transformation.ipynb#X12sZmlsZQ%3D%3D?line=3'>4</a>     data_transformation = DataTransformation(config=data_transformation_config)
      <a href='vscode-notebook-cell:/g%3A/Machine_Learning_Projects/iNeuron%20internship/Flight-Fare-Prediction-End-to-End-ML-Project/research/03_data_transformation.ipynb#X12sZmlsZQ%3D%3D?line=4'>5</a>     # data_transformation.train_test_spliting()
      <a href='vscode-notebook-cell:/g%3A/Machine_Learning_Projects/iNeuron%20internship/Flight-Fare-Prediction-End-to-End-ML-Project/research/03_data_transformation.ipynb#X12sZmlsZQ%3D%3D?line=5'>6</a>     # New Line
----> <a href='vscode-notebook-cell:/g%3A/Machine_Learning_Projects/iNeuron%20internship/Flight-Fare-Prediction-End-to-End-ML-Project/research/03_data_transformation.ipynb#X12sZmlsZQ%3D%3D?line=6'>7</a>     data_transformation.initiate_data_transformation()
      <a href='vscode-notebook-cell:/g%3A/Machine_Learning_Projects/iNeuron%20internship/Flight-Fare-Prediction-End-to-End-ML-Project/research/03_data_transformation.ipynb#X12sZmlsZQ%3D%3D?line=7'>8</a> except Exception as e:
      <a href='vscode-notebook-cell:/g%3A/Machine_Learning_Projects/iNeuron%20internship/Flight-Fare-Prediction-End-to-End-ML-Project/research/03_data_transformation.ipynb#X12sZmlsZQ%3D%3D?line=8'>9</a>     raise e

g:\Machine_Learning_Projects\iNeuron internship\Flight-Fare-Prediction-End-to-End-ML-Project\research\03_data_transformation.ipynb Cell 10 line 8
     <a href='vscode-notebook-cell:/g%3A/Machine_Learning_Projects/iNeuron%20internship/Flight-Fare-Prediction-End-to-End-ML-Project/research/03_data_transformation.ipynb#X12sZmlsZQ%3D%3D?line=77'>78</a> logger.info(f' transformed df data head: \n{df.head().to_string()}')
     <a href='vscode-notebook-cell:/g%3A/Machine_Learning_Projects/iNeuron%20internship/Flight-Fare-Prediction-End-to-End-ML-Project/research/03_data_transformation.ipynb#X12sZmlsZQ%3D%3D?line=79'>80</a> # df.to_csv(self.data_transformation_config.transformed_data_file_path, index = False, header= True)
     <a href='vscode-notebook-cell:/g%3A/Machine_Learning_Projects/iNeuron%20internship/Flight-Fare-Prediction-End-to-End-ML-Project/research/03_data_transformation.ipynb#X12sZmlsZQ%3D%3D?line=80'>81</a> # New Line
---> <a href='vscode-notebook-cell:/g%3A/Machine_Learning_Projects/iNeuron%20internship/Flight-Fare-Prediction-End-to-End-ML-Project/research/03_data_transformation.ipynb#X12sZmlsZQ%3D%3D?line=81'>82</a> df.to_excel(self.data_transformation_config.transformed_data_file_path, index = False, header= True)
     <a href='vscode-notebook-cell:/g%3A/Machine_Learning_Projects/iNeuron%20internship/Flight-Fare-Prediction-End-to-End-ML-Project/research/03_data_transformation.ipynb#X12sZmlsZQ%3D%3D?line=82'>83</a> logger.info("transformed data is stored")
     <a href='vscode-notebook-cell:/g%3A/Machine_Learning_Projects/iNeuron%20internship/Flight-Fare-Prediction-End-to-End-ML-Project/research/03_data_transformation.ipynb#X12sZmlsZQ%3D%3D?line=83'>84</a> df.head(1)

AttributeError: 'DataTransformation' object has no attribute 'data_transformation_config'

Here is the code of data transformation cell, which contains data_transformation_config:

class DataTransformation:

    # New Function Added
    # https://github.com/yash1314/Flight-Price-Prediction/blob/main/src/utils.py
    def convert_to_minutes(self, duration):
        try:
            hours, minute = 0, 0
            for i in duration.split():
                if 'h' in i:
                    hours = int(i[:-1])
                elif 'm' in i:
                    minute = int(i[:-1])
            return hours * 60 + minute
        except :
            return None 

    def __init__(self, config: DataTransformationConfig):
        self.config = config

    
    ## Note: You can add different data transformation techniques such as Scaler, PCA and all
    #You can perform all kinds of EDA in ML cycle here before passing this data to the model

    # I am only adding train_test_spliting cz this data is already cleaned up

    # New Code Added Start
    def initiate_data_transformation(self):
        ## reading the data
        # df = pd.read_csv(self.config.data_path)
        # New Line
        df = pd.read_excel(self.config.data_path)

        logger.info('Read data completed')
        logger.info(f'df dataframe head: \n{df.head().to_string()}')

        ## dropping null values
        df.dropna(inplace = True)

        ## Date of journey column transformation
        df['journey_date'] = pd.to_datetime(df['Date_of_Journey'], format ="%d/%m/%Y").dt.day
        df['journey_month'] = pd.to_datetime(df['Date_of_Journey'], format ="%d/%m/%Y").dt.month

        ## encoding total stops.
        df.replace({'Total_Stops': {'non-stop' : 0, '1 stop': 1, '2 stops': 2, '3 stops': 3, '4 stops': 4}}, inplace = True)

        ## ecoding airline, source, and destination
        df_airline = pd.get_dummies(df['Airline'], dtype=int)
        df_source = pd.get_dummies(df['Source'],  dtype=int)
        df_dest = pd.get_dummies(df['Destination'], dtype=int)

        ## dropping first columns of each categorical variables.
        df_airline.drop('Trujet', axis = 1, inplace = True)
        df_source.drop('Banglore', axis = 1, inplace = True)
        df_dest.drop('Banglore', axis = 1, inplace = True)

        df = pd.concat([df, df_airline, df_source, df_dest], axis = 1)
       
        ## handling duration column
        # df['duration'] = df['Duration'].apply(convert_to_minutes)
        # New Line Added
        df['duration'] = df['Duration'].apply(self.convert_to_minutes)
        upper_time_limit = df.duration.mean() + 1.5 * df.duration.std()
        df['duration'] = df['duration'].clip(upper = upper_time_limit)

        ## encodign duration column
        bins = [0, 120, 360, 1440]  # custom bin intervals for 'Short,' 'Medium,' and 'Long'
        labels = ['Short', 'Medium', 'Long'] # creating labels for encoding

        df['duration'] = pd.cut(df['duration'], bins=bins, labels=labels)
        df.replace({'duration': {'Short':1, 'Medium':2, 'Long': 3}}, inplace = True)
        
        ## dropping the columns
        cols_to_drop = cols_to_drop = ['Airline', 'Date_of_Journey', 'Source', 'Destination', 'Route', 'Dep_Time', 'Arrival_Time', 'Duration', 'Additional_Info', 'Delhi', 'Kolkata']

        df.drop(cols_to_drop, axis = 1, inplace = True)

        logger.info('df data transformation completed')
        logger.info(f' transformed df data head: \n{df.head().to_string()}')

        # df.to_csv(self.data_transformation_config.transformed_data_file_path, index = False, header= True)
        # New Line
        df.to_excel(self.data_transformation_config.transformed_data_file_path, index = False, header= True)
        logger.info("transformed data is stored")
        df.head(1)
        ## splitting the data into training and target data
        X = df.drop('Price', axis = 1)
        y = df['Price']
        
        ## accessing the feature importance.
        select = ExtraTreesRegressor()
        select.fit(X, y)

        # plt.figure(figsize=(12, 8))
        # fig_importances = pd.Series(select.feature_importances_, index=X.columns)
        # fig_importances.nlargest(20).plot(kind='barh')
    
        # ## specify the path to the "visuals" folder using os.path.join
        # visuals_folder = 'visuals'
        # if not os.path.exists(visuals_folder):
        #     os.makedirs(visuals_folder)

        # ## save the plot in the visuals folder
        # plt.savefig(os.path.join(visuals_folder, 'feature_importance_plot.png'))
        # logger.info('feature imp figure saving is successful')

        ## further Splitting the data.
        X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 42, shuffle = True) 
        logger.info('final splitting the data is successful')
        

        ## returning splitted data and data_path.
        return (
            X_train, 
            X_test, 
            y_train, 
            y_test,
            self.data_transformation_config.transformed_data_file_path
        )class DataTransformation:

    # New Function Added
    # https://github.com/yash1314/Flight-Price-Prediction/blob/main/src/utils.py
    def convert_to_minutes(self, duration):
        try:
            hours, minute = 0, 0
            for i in duration.split():
                if 'h' in i:
                    hours = int(i[:-1])
                elif 'm' in i:
                    minute = int(i[:-1])
            return hours * 60 + minute
        except :
            return None 

    def __init__(self, config: DataTransformationConfig):
        self.config = config

    
    ## Note: You can add different data transformation techniques such as Scaler, PCA and all
    #You can perform all kinds of EDA in ML cycle here before passing this data to the model

    # I am only adding train_test_spliting cz this data is already cleaned up

    # New Code Added Start
    def initiate_data_transformation(self):
        ## reading the data
        # df = pd.read_csv(self.config.data_path)
        # New Line
        df = pd.read_excel(self.config.data_path)

        logger.info('Read data completed')
        logger.info(f'df dataframe head: \n{df.head().to_string()}')

        ## dropping null values
        df.dropna(inplace = True)

        ## Date of journey column transformation
        df['journey_date'] = pd.to_datetime(df['Date_of_Journey'], format ="%d/%m/%Y").dt.day
        df['journey_month'] = pd.to_datetime(df['Date_of_Journey'], format ="%d/%m/%Y").dt.month

        ## encoding total stops.
        df.replace({'Total_Stops': {'non-stop' : 0, '1 stop': 1, '2 stops': 2, '3 stops': 3, '4 stops': 4}}, inplace = True)

        ## ecoding airline, source, and destination
        df_airline = pd.get_dummies(df['Airline'], dtype=int)
        df_source = pd.get_dummies(df['Source'],  dtype=int)
        df_dest = pd.get_dummies(df['Destination'], dtype=int)

        ## dropping first columns of each categorical variables.
        df_airline.drop('Trujet', axis = 1, inplace = True)
        df_source.drop('Banglore', axis = 1, inplace = True)
        df_dest.drop('Banglore', axis = 1, inplace = True)

        df = pd.concat([df, df_airline, df_source, df_dest], axis = 1)
       
        ## handling duration column
        # df['duration'] = df['Duration'].apply(convert_to_minutes)
        # New Line Added
        df['duration'] = df['Duration'].apply(self.convert_to_minutes)
        upper_time_limit = df.duration.mean() + 1.5 * df.duration.std()
        df['duration'] = df['duration'].clip(upper = upper_time_limit)

        ## encodign duration column
        bins = [0, 120, 360, 1440]  # custom bin intervals for 'Short,' 'Medium,' and 'Long'
        labels = ['Short', 'Medium', 'Long'] # creating labels for encoding

        df['duration'] = pd.cut(df['duration'], bins=bins, labels=labels)
        df.replace({'duration': {'Short':1, 'Medium':2, 'Long': 3}}, inplace = True)
        
        ## dropping the columns
        cols_to_drop = cols_to_drop = ['Airline', 'Date_of_Journey', 'Source', 'Destination', 'Route', 'Dep_Time', 'Arrival_Time', 'Duration', 'Additional_Info', 'Delhi', 'Kolkata']

        df.drop(cols_to_drop, axis = 1, inplace = True)

        logger.info('df data transformation completed')
        logger.info(f' transformed df data head: \n{df.head().to_string()}')

        # df.to_csv(self.data_transformation_config.transformed_data_file_path, index = False, header= True)
        # New Line
        df.to_excel(self.data_transformation_config.transformed_data_file_path, index = False, header= True)
        logger.info("transformed data is stored")
        df.head(1)
        ## splitting the data into training and target data
        X = df.drop('Price', axis = 1)
        y = df['Price']
        
        ## accessing the feature importance.
        select = ExtraTreesRegressor()
        select.fit(X, y)

        # plt.figure(figsize=(12, 8))
        # fig_importances = pd.Series(select.feature_importances_, index=X.columns)
        # fig_importances.nlargest(20).plot(kind='barh')
    
        # ## specify the path to the "visuals" folder using os.path.join
        # visuals_folder = 'visuals'
        # if not os.path.exists(visuals_folder):
        #     os.makedirs(visuals_folder)

        # ## save the plot in the visuals folder
        # plt.savefig(os.path.join(visuals_folder, 'feature_importance_plot.png'))
        # logger.info('feature imp figure saving is successful')

        ## further Splitting the data.
        X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 42, shuffle = True) 
        logger.info('final splitting the data is successful')
        

        ## returning splitted data and data_path.
        return (
            X_train, 
            X_test, 
            y_train, 
            y_test,
            self.data_transformation_config.transformed_data_file_path
        )

Here is the code of configuration manager, which contains get_data_transformation_config() function.

class ConfigurationManager:
    def __init__(
        self,
        config_filepath = CONFIG_FILE_PATH,
        params_filepath = PARAMS_FILE_PATH,
        schema_filepath = SCHEMA_FILE_PATH):

        self.config = read_yaml(config_filepath)
        self.params = read_yaml(params_filepath)
        self.schema = read_yaml(schema_filepath)

        create_directories([self.config.artifacts_root])


    
    def get_data_transformation_config(self) -> DataTransformationConfig:
        config = self.config.data_transformation

        create_directories([config.root_dir])

        data_transformation_config = DataTransformationConfig(
            root_dir=config.root_dir,
            data_path=config.data_path,
        )

        return data_transformation_config

Here is my file in GitHub.

My file encoding is UTF-8.

Would you please help me to fix this issue?

Solution

The issue was using the wrong variable.

Instead of

df.to_excel(self.data_transformation_config.transformed_data_file_path, index = False, header= True)

It really should be:

df.to_excel(self.config.transformed_data_file_path, index = False, header= True)

and change the reference to this line:

return (
            X_train, 
            X_test, 
            y_train, 
            y_test,
            self.data_transformation_config.transformed_data_file_path
        )

return (
            X_train, 
            X_test, 
            y_train, 
            y_test,
            self.config.transformed_data_file_path
        )

Answered By - ewokx

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Friday, November 24, 2023

[FIXED] AttributeError on End-to-End ML Project - 'DataTransformation' object has no attribute 'data_transformation_config'

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels