Sunday, October 9, 2022

[FIXED] Python and scope

October 09, 2022 python, python-3.x No comments

Issue

So I've got this program I've been working on and I'm stuck with the scope of it. Basically, I'm taking csv's which are coming in from the field and scrubbing them of bad rows. One of the checks I need to write into the program is a date calculation. I have a function for this, but as my program is now, I have to call this function after I've already written the data to a new file. I am having a hard time wrapping my head around the scope of this problem. Here is my program:

import csv
import glob
import os
import stat
import shutil
from os import path
from datetime import datetime, timedelta


# path to files -- parent folder is source.
source = r'C:\Users\klucas\Desktop\Current Project\Data Cleanup Script\Raw Data Files'
destination = r'C:\Users\klucas\Desktop\Current Project\Data Cleanup Script\Completed Data Files'

data = []

# list of all csv files in path
TD_files = glob.glob(os.path.join(source, "*.csv"), recursive=True)


# A function to test whether a file is read only. If it is this function will change the file's attribute to writeable.
def IsFileReadOnly(f):
    file_att = os.stat(f)[0]

    if not file_att & stat.S_IWRITE:  # if the file is read only
        os.chmod(f, stat.S_IWRITE)  # it needs to be made writeable


# A function to decide if a file is a Trend file or not. If it is not a Trend File, it is simply erased.
def IsTrendFile(f):
    with open(f, newline='', encoding='utf-8') as g:  # Open file as read
        r = csv.reader(g)  # Declare read variable for list
        is_trend = next(r)[0] == 'TD'  # Initialize isTrend variable to cell A1
    if not is_trend:  # If file is not a trend file
        os.remove(f)  # Erase the file


# A function which moves files from source to destination. This function will overwrite any existing file with new data.
def MoveFiles():
    directory_exists = os.path.exists(destination)

    if not directory_exists:  # If the directory does not exist
        os.makedirs(destination)  # Make the directory at the destination
    files = os.listdir(source)  # List all csv files in the source directory
    for file in files:  # FOR EACH TREND FILE IN ALL TREND FILES:
        shutil.move(os.path.join(source, file),
                    os.path.join(destination, file))  # Move files from source to destination


def CheckDates(f):
    with open(f, 'r', newline='') as src:
        row_0 = src.readline()
        tokens = row_0.strip().split(',')
        orig_time = tokens[1] + ' ' + tokens[2]
        base_time = datetime.strptime(orig_time, '%m/%d/%Y %H:%M:%S')

        src.readline()

        for line in src:
            tokens = line.strip().split(',')
            row_time = datetime.strptime(tokens[0], '%m/%d/%Y %H:%M:%S')

            td = row_time - base_time

            if td < timedelta(0):
                pass


# A function which cleans the data of null and truncated rows.
def CleanUpData(f):
    IsTrendFile(f)  # check to see if file is a Trend File
    IsFileReadOnly(f)  # check to see if file is read only
    with open(f, newline='', encoding='utf-8') as g:  # open file as read
        r = csv.reader((line.replace('\0', '') for line in g))  # declare read variable while stripping nulls
        trend_header_tuple = next(r)  # get trend header
        machine_header_tuple = next(r)  # get machine header
        data = [line for line in r
                if len(line) == len(trend_header_tuple)
                or len(line) == len(machine_header_tuple)]
    WriteData(f, data, trend_header_tuple, machine_header_tuple)  # write the data to the file
    CheckDates(f)


# A function which writes data to .csv files.
def WriteData(f, data, trend_header_tuple, machine_header_tuple):
    with open(f, 'w', newline='') as g:  # open file as write
        w = csv.writer(g)  # declare write variable
        w.writerow(trend_header_tuple)  # write the trend header tuple to file
        w.writerow(machine_header_tuple)  # write the machine header tuple to file
        w.writerows(data)  # write rows to file


# A function which does all the work.
def DoWork():
    for f in TD_files:  # FOR ALL TREND FILES
        CleanUpData(f)  # Clean up the data
    MoveFiles()  # Move the files

How would I structure this program so that CheckDates() is part of the main with open() loop in the CleanUpData() function, or something which is equal to this? When I put CheckDates() inside of the data variable as part of the if... or statements, only a certain amount of data was written to the files. And moving the call around the program proved to be useless as well.

Here is my main:

import DataCleanupScript
import getpass


# A function which prints a welcome statement to the user.
def welcome_screen():
    print(f'Hello', get_name(), 'please wait while the script cleans the data. This may take a few moments depending'
                                ' on network speed and the amount of files.')


# A function which gets the users name.
def get_name():
    return getpass.getuser()


# Main
if __name__ != '__main__':
    pass
else:
    welcome_screen()
    DataCleanupScript.DoWork()
    print('Complete! Press Enter to continue.')
    input()

This is an example csv file with messed up data. you can see row 11 is messed up. This is supposed to be like this. Some rows come in from the field like this for some reason.

    TD,08/24/2021,14:14:08,21012,223,0,1098,0,031,810,12,01,092,048,0008,02
        Date/Time,G120010,M129000,G110100,M119030,G112070,G112080,G111030,G127020,G127030,G120020,G120030,G121020,G111040,G112010,P102000,G112020,G112040,G112090,G110050,G110060,G110070,T111100
        08/27/2021 00:00:00,75,249.75,0,0,12.61895,0,58.04886,64,87.6,1,2,5,41.5,5,686,2,239,2700,0,154,0,5
        08/27/2021 00:00:02,75,249.75,0,0,12.61895,0,58.04743,64,87.6,1,2,5,41.5,5,686,2,239,2700,0,154,0,5
        08/27/2021 00:00:04,75,249.75,0,0,12.61895,0,58.05036,64,87.6,1,2,5,41.5,5,686,2,239,2700,0,154,0,5
        08/31/2021 08:05:48,100,333,0,0,12.9439,0,0,0,0,0,0,5,42.5,5,0,0,233,0,1,154,0,5
        08/31/2021 08:05:50,100,333,0,0,12.9439,0,0,0,0,0,0,5,42.5,5,0,0,233,0,1,154,0,5
        08/31/2021 08:05:52,100,333,0,0,12.9439,0,0,0,0,0,0,5,42.5,5,0,0,233,0,1,154,0,5
        08/31/2021 08:05:54,100,333,0,0,12.9439,0,0,0,0,0,0,5,42.5,5,0,0,233,0,1,154,0,5
        08/31/2021 08:05:56,100,333,0,0,12.9439,0,0,0,0,0,0,5,42.5,5,0,0,233,0,1,154,0,5
                                                                                          08/31/2021 08:21:41,100,333,0,0,12.9439,0,0,0,0,0,0,1,0,5,0,0,233,0,1,154,0,5
        08/31/2021 08:21:43,100,333,0,0,12.9439,0,0,0,0,0,0,1,0,5,0,0,233,0,1,154,0,5
        08/31/2021 08:21:45,100,333,0,0,12.9439,0,0,0,0,0,0,1,0,5,0,0,233,0,1,154,0,5
        08/31/2021 08:21:47,100,333,0,0,12.9439,0,0,0,0,0,0,1,0,5,0,0,233,0,1,154,0,5
        08/31/2021 08:21:49,100,333,0,0,12.9439,0,0,0,0,0,0,1,42.5,5,0,0,233,0,1,154,0,5
        08/31/2021 08:21:51,100,333,0,0,12.9439,0,0,0,0,0,0,1,42.5,5,0,0,233,0,1,154,0,5
        08/31/2021 08:21:53,100,333,0,0,12.9439,0,0,0,0,0,0,1,42.5,5,0,0,233,0,1,154,0,5
        08/31/2021 08:21:55,100,333,0,0,12.9439,0,0,0,0,0,0,1,42.5,5,0,0,233,0,1,154,0,5
        08/31/2021 08:21:57,100,333,0,0,12.9439,0,0,0,0,0,0,1,42.5,5,0,0,233,0,1,154,0,5
        08/31/2021 08:21:59,100,333,0,0,12.9439,0,0,0,0,0,0,1,42.5,5,0,0,233,0,1,154,0,5
        08/31/2021 08:22:01,100,333,0,0,12.9439,0,0,0,0,0,0,1,42.5,5,0,0,233,0,1,154,0,5
        08/31/2021 08:22:03,100,333,0,0,12.9439,0,0,0,0,0,0,1,42.5,5,0,0,233,0,1,154,0,5
        08/31/2021 08:22:05,100,333,0,0,12.9439,0,0,0,0,0,0,1,42.5,5,0,0,233,0,1,154,0,5
        08/31/2021 08:22:07,100,333,0,0,12.9439,0,0,0,0,0,0,1,42.5,5,0,0,233,0,1,154,0,5
        08/31/2021 08:22:09,100,333,0,0,12.9439,0,0,0,0,0,0,1,42.5,5,0,0,233,0,1,154,0,5
        08/31/2021 08:22:11,100,333,0,0,12.9439,0,0,0,0,0,0,1,42.5,5,0,0,233,0,1,154,0,5
        08/31/2021 08:22:13,100,333,0,0,12.9439,0,0,0,0,0,0,1,42.5,5,0,0,233,0,1,154,0,5
        08/31/2021 08:22:15,100,333,0,0,12.9439,0,0,0,0,0,0,1,42.5,5,0,0,233,0,1,154,0,5
NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN

EDIT:

I have incorporated suggestions from the community and moved some of the hardcoded things to main(), and restructured my data variable to be more dynamic. Here is the new main().

import DataCleanupScript
import getpass
import glob
import os

# path to files -- parent folder is source.
source = r'C:\Users\klucas\Desktop\Current Project\Data Cleanup Script\Raw Data Files'
destination = r'C:\Users\klucas\Desktop\Current Project\Data Cleanup Script\Completed Data Files'

# list of all csv files in path
TD_files = glob.glob(os.path.join(source, "*.csv"), recursive=True)


# A function which prints a welcome statement to the user.
def welcome_screen():
    print(f'Hello', get_name(), 'please wait while the script cleans the data. This may take a few moments depending'
                                ' on network speed and the amount of files.')


# A function which gets the users name.
def get_name():
    return getpass.getuser()


# Main
if __name__ != '__main__':
    pass
else:
    welcome_screen()
    DataCleanupScript.DoWork(TD_files, source, destination)
    print('Complete! Press Enter to continue.')
    input()

And the functions: sorry I haven't had time to comment any of the new code, but the new CheckDate() loops through each row of a csv, and compares the date in that row to the date in the trend_header_tuple. If the date is before the date in the tuple, it needs to strip the row. Right now I have solved the problem of the data variable running off the end of the array when the csv ended in a row of bunk information like in the example I have provided. It is not dropping the found indices though.

import csv
from datetime import datetime, timedelta
import os
import stat
import shutil


# A function to test whether a file is read only. If it is this function will change the file's attribute to writeable.
def IsFileReadOnly(f):
    file_att = os.stat(f)[0]

    if not file_att & stat.S_IWRITE:  # If the file is read only
        os.chmod(f, stat.S_IWRITE)  # It needs to be made writeable


# A function to decide if a file is a Trend file or not. If it is not a Trend File, it is simply erased.
def IsTrendFile(f):
    with open(f, newline='', encoding='utf-8') as g:  # Open file as read
        r = csv.reader(g)  # Declare read variable for list
        is_trend = next(r)[0] == 'TD'  # Initialize is_trend variable
    if not is_trend:  # If file is not a trend file
        os.remove(f)  # Erase the file


# A function which moves files from source to destination. This function will overwrite any existing file with new data.
def MoveFiles(source, destination):
    directory_exists = os.path.exists(destination)

    if not directory_exists:  # If the directory does not exist
        os.makedirs(destination)  # Make the directory at the destination

    files = os.listdir(source)  # List all csv files in the source directory

    for file in files:  # FOR EACH TREND FILE IN ALL TREND FILES:
        shutil.move(os.path.join(source, file), os.path.join(destination, file))  # Move files from source to destination


def CheckDates(f, line: str, trend_header_tuple):
    row_time = datetime.strptime(line[0], '%m/%d/%Y %H:%M:%S')
    orig_time = trend_header_tuple[1] + ' ' + trend_header_tuple[2]
    base_time = datetime.strptime(orig_time, '%m/%d/%Y %H:%M:%S')
    # print(f'recovered this base time: {base_time}')

    td = row_time - base_time

    line = line[0].strip()

    if td < timedelta(0):
        print('this line is before the base time:')
        print(f'    {line}')
        print(f'    {f}')
        return False
    return True


# A function which cleans the data of null and truncated rows.
def CleanUpData(f, source, destination):
    IsTrendFile(f)  # check to see if file is a Trend File
    IsFileReadOnly(f)  # check to see if file is read only
    with open(f, newline='', encoding='utf-8') as g:  # open file as read
        r = csv.reader((line.replace('\0', '') for line in g))  # declare read variable while stripping nulls
        trend_header_tuple = next(r)  # get trend header
        machine_header_tuple = next(r)  # get machine header
        data = []

        for line in r:
            if line:
                valid = False

                if len(line) == len(trend_header_tuple):
                    valid = True
                elif len(line) == len(machine_header_tuple):
                    valid = True

                if CheckDates(f, line, trend_header_tuple):
                    valid = True

                if valid:
                    data.append(line)

    WriteData(f, data, trend_header_tuple, machine_header_tuple)  # write the data to the file
    # MoveFiles(source, destination)

# A function which writes data to .csv files.
def WriteData(f, data, trend_header_tuple, machine_header_tuple):
    with open(f, 'w', newline='') as g:  # open file as write
        w = csv.writer(g)  # declare write variable
        w.writerow(trend_header_tuple)  # write the trend header tuple to file
        w.writerow(machine_header_tuple)  # write the machine header tuple to file
        w.writerows(data)  # write rows to file


# A function which does all the work.
def DoWork(TD_files, source, destination):
    for f in TD_files:  # FOR ALL TREND FILES
        CleanUpData(f, source, destination)  # Clean up the data
    MoveFiles(source, destination)  # Move the files

Terminal output:

C:\Users\klucas\PycharmProjects\DataCleanupScript\venv\Scripts\python.exe C:\Users\klucas\PycharmProjects\DataCleanupScript\main.py 
Hello klucas please wait while the script cleans the data. This may take a few moments depending on network speed and the amount of files.
this line is before the base time:
    08/31/1521 00:00:00
    C:\Users\klucas\Desktop\Current Project\Data Cleanup Script\Raw Data Files\2022_09_06_08_00_10_554_IF1Rockwell-Trend_SN2231098_20210831.csv

Solution

I don't think I'm 100% clear on the problem, but I'll try my best to help. For this section of your code:

with open(f, newline='', encoding='utf-8') as g:  # open file as read
    r = csv.reader((line.replace('\0', '') for line in g))  # declare read variable while stripping nulls
    trend_header_tuple = next(r)  # get trend header
    machine_header_tuple = next(r)  # get machine header
    data = [line for line in r
            if len(line) == len(trend_header_tuple)
            or len(line) == len(machine_header_tuple)]
WriteData(f, data, trend_header_tuple, machine_header_tuple)  # write the data to the file
CheckDates(f)

I would suggest that you instead declare data as an empty list to begin, then iterate over r and append line to data if it meets your criteria. This way, instead of using a list comprehension, you can specify individual conditions and add more advanced logic easily:

def CheckDate(line: str):
    line = line.strip() # process line however you need
    if LINE_VALIDATION_HERE:
        return True
    return False

with open(f, newline='', encoding='utf-8') as g:  # open file as read
    r = csv.reader((line.replace('\0', '') for line in g))  # declare read variable while stripping nulls
    trend_header_tuple = next(r)  # get trend header
    machine_header_tuple = next(r)  # get machine header
    data = []
    for line in r:
        valid = False
        
        # Trend headers
        if len(line) == len(trend_header_tuple):
            valid = True
        # Machine headers
        elif len(line) == len(machine_header_tuple):
            valid = True
        # Dates (only if valid)
        elif CheckDate(line):
            valid = True

        if valid:
            data.append(line)
WriteData(f, data, trend_header_tuple, machine_header_tuple)  # write the data to the file

This restructuring might be a bit overboard, but hopefully it makes everything easier to understand and extend in the future.

Again, I'm not exactly sure what CheckDates does, so this code assumes each line can be checked independent of the others. If this is not true, I'd suggest adding additional parameters to CheckDate.

Answered By - Raj K

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Sunday, October 9, 2022

[FIXED] Python and scope

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels