Saturday, November 27, 2021

[FIXED] Loading multiple csv files from a directory and storing the columns in an array

November 27, 2021 csv, glob, jupyter-notebook, python, python-3.x No comments

Issue

I have a directory that contains multiple csv files named in a similar pattern eg:'1000 x 30.csv','1000 y 30'.csv, or '1111 z 60.csv' etc. My csv files are 2 columns of x-axis and y-axis values which I want to store separately in an array. I want to enter an input like: 1000 x 30 so that the program fetches the columns of (1000 x 30.csv) files and stores in an array. I have a code that runs when I enter the path of a particular file and I want to loop through the directory and give me the array values when I enter the file name. Any suggestions would really help me.

import csv
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
from scipy.optimize import curve_fit
from scipy import asarray as ar,exp
import lmfit
import glob

# reading the x/y/z values from the respective csv files
xData = []
yData = []
path = r'C:\Users\angel\OneDrive\Documents\CSV_FILES_NV_LAB\1111 x 30.csv'
with open(path, "r") as f_in:
    reader = csv.reader(f_in)
    next(reader)

    for line in reader:
        try:
            float_1, float_2 = float(line[0]), float(line[1])
            xData.append(float_1)
            yData.append(float_2)
        except ValueError:
            continue

Solution

I think the solution below should get you started. I've commented the code where needed, and pointed to a couple of SO questions/answers.

Note, please provide some pruned and sanitized sample input files for your next question. I had to guess a bit as to what the exact input was. Remember, the better your question, the better your answer.

input files, generated by keyboard mashing

path/to/csv/files/1111 x 30.csv

x,y
156414.4189,84181.46
16989.177,61619.4698974

path/to/csv/files/11 z 205.csv

x,z
3.123123,56.1231
123.6546,645767.654
65465.4561989,97946.56169

Actual code:

main.py

import os
import csv


def get_files_from_path(path: str) -> list:
    """return list of files from path"""
    # see the answer on the link below for a ridiculously 
    # complete answer for this. I tend to use this one.
    # note that it also goes into subdirs of the path
    # https://stackoverflow.com/a/41447012/9267296
    result = []
    for subdir, dirs, files in os.walk(path):
        for filename in files:
            filepath = subdir + os.sep + filename
            # only return .csv files
            if filename.lower().endswith('.csv'):
                result.append(filepath)
    return result


def load_csv(filename: str) -> list:
    """load a CSV file and return it as a list of dict items"""
    result = []
    # note that if you open a file for reading, you don't need
    # to use the 'r' part
    with open(filename) as infile:
        reader = csv.reader(infile)
        # get the column names
        # https://stackoverflow.com/a/28837325/9267296
        # doing this as you state that you're dealing with
        # x/y and x/z values
        column0, column1 = next(reader)

        for line in reader:
            try:
                result.append({column0: float(line[0]), 
                               column1: float(line[1])})
            except Exception as e:
                # I always print out error messages
                # in case of random weird things
                print(e)
                continue

    return result


def load_all(path: str) -> dict:
    """loads all CSV files into a dict"""
    result = {}
    csvfiles = get_files_from_path(path)
    for filename in csvfiles:
        # extract the filename without extension
        # and us it as key name
        # since we only load .csv files we can just
        # remove the last 4 characters from filename
        # https://stackoverflow.com/a/57770000/9267296
        keyname = os.path.basename(filename)[:-4]
        result[keyname] = load_csv(filename)
    return result


from pprint import pprint
all = load_all('path/to/csv/files')
pprint(all)
print('\n--------------------\n')
pprint(all['11 z 205'])

output

{'11 z 205': [{'x': 3.123123, 'z': 56.1231},
              {'x': 123.6546, 'z': 645767.654},
              {'x': 65465.4561989, 'z': 97946.56169}],
 '1111 x 30': [{'x': 156414.4189, 'y': 84181.46},
               {'x': 16989.177, 'y': 61619.4698974}]}

--------------------

[{'x': 3.123123, 'z': 56.1231},
 {'x': 123.6546, 'z': 645767.654},
 {'x': 65465.4561989, 'z': 97946.56169}]

Answered By - Edo Akse

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Saturday, November 27, 2021

[FIXED] Loading multiple csv files from a directory and storing the columns in an array

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels