Issue
I want to load multiple CSV files and run the same set of tests with pytest for each of them individually. Can anyone help me how to do this?
Many thanks in advance!
I tried to use a for loop, but that didn't work as intended.
import pandas as pd
file_names = ['file1.csv', 'file2.csv']
class TestCSV:
def __init__(self, data: pd.DataFrame) -> None:
self.data = pd.DataFrame
def test_na():
for col in self.data.columns:
assert self.data[col].isna().sum() == 0
for file in file_names:
df = pd.read_csv(file)
tests = TestCSV(df)
Solution
I have a feeling that you want each set of data[col]
to be a separate test case. If that is true, here is one way to approach this test:
#!/usr/bin/env python3
import pandas as pd
import pytest
def generate_data(file_names):
for file_name in file_names:
data = pd.read_csv(file_name)
for col_number, col in enumerate(data.columns):
yield pytest.param(data[col], id=f"file: {file_name}, col: {col_number}")
@pytest.mark.parametrize(
"data_col",
generate_data(["file1.csv", "file2.csv"]),
)
def test_na(data_col):
assert data_col.isna().sum() == 0
Output:
$ pytest -v
...
test_na.py::test_na[file: file1.csv, col: 0] PASSED
test_na.py::test_na[file: file1.csv, col: 1] PASSED
test_na.py::test_na[file: file2.csv, col: 0] PASSED
test_na.py::test_na[file: file2.csv, col: 1] PASSED
Notes
- The
generate_data()
function will generate every columns in every files - The
pytest.param()
provide a test case ID, which is useful to identify which file, which column failed - I don't know how to generate the column number, so I use the
enumerate()
function for that purpose
Answered By - Hai Vu
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.