Issue
I have data inside a directory as follows
IU.WRT.00.MTR.1999.081.081015.txt
IU.WRT.00.MTS.2007.229.022240.txt
IU.WRT.00.MTR.2007.229.022240.txt
IU.WRT.00.MTT.1999.081.081015.txt
IU.WRT.00.MTS.1999.081.081015.txt
IU.WRT.00.MTT.2007.229.022240.txt
and at first i want to group the data by using similar pattern of 3 files (differ by R,S,T) as follows:
IU.WRT.00.MTR.1999.081.081015.txt
IU.WRT.00.MTS.1999.081.081015.txt
IU.WRT.00.MTT.1999.081.081015.txt
and want to apply some operations on it
and then i want to read data
IU.WRT.00.MTT.2007.229.022240.txt
IU.WRT.00.MTS.2007.229.022240.txt
IU.WRT.00.MTR.2007.229.022240.txt
and want to apply similar operation on it.
In the sameway i want to continue the process for millions of data sets.
I tried the example script
import os
import glob
import matplotlib.pyplot as plt
from collections import defaultdict
def groupfiles(pattern):
files = glob.glob(pattern)
filedict = defaultdict(list)
for file in files:
parts = file.split(".")
filedict[".".join([parts[5], parts[6], parts[7]])].append(file)
for filegroup in filedict.values():
yield filegroup
for relatedfiles in groupfiles('*.txt'):
print(relatedfiles)
for filename in relatedfiles:
print(filename)
However it reads the file one by one but every time i need to read 3 files at a time(i.e by adopting sequence criteria, first of all it would read first three files and then next three files and so on.I hope experts may help me.Thanks in advance.
Solution
- Sort your list of files on multiple keys.
import os
files = [f for f in os.listdir("C:/username/folder") if f.endswith(".txt")]
grouped = sorted(files, key=lambda x: (x.split(".")[4:6], x.split(".")[3]))
>>> grouped
['IU.WRT.00.MTR.1999.081.081015.txt',
'IU.WRT.00.MTS.1999.081.081015.txt',
'IU.WRT.00.MTT.1999.081.081015.txt',
'IU.WRT.00.MTR.2007.229.022240.txt',
'IU.WRT.00.MTS.2007.229.022240.txt',
'IU.WRT.00.MTT.2007.229.022240.txt']
- Iterate through the sorted list in threes using the grouper recipe from
itertools
.
from itertools import zip_longest
def grouper(iterable, n, fillvalue=None):
"Collect data into fixed-length chunks or blocks"
# grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
args = [iter(iterable)] * n
return zip_longest(*args, fillvalue=fillvalue)
for f in grouper(grouped, 3): #f is a tuple of three file names
#your file operations here
Answered By - not_speshal
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.