Issue
I am now training my LSTM neural network with my GPU.
The problem is:
I have 23,000 .csv files in my training set, which has a shape of (40,76). Each time I got a batch(64) to load my data, I found that it took about 1s to load the data(read 64 .csv files), and it took about 0.08s to compute the loss and update the parameters. When I checked the power and utilization of my GPU, I found it was of low efficiency. Therefore, how can I improve the organization of my training data?
Here is my own dataset class.enter image description here
Solution
Combine the CSV files into a single file. Or, load the data from the CSVs and save the data in some other form. This way, you only have to read from one file instead of 23,000. Reading files is relatively very slow because it requires a system call (your program has to ask the operating system to read the file).
Easiest thing to do is just combine the csvs and then save them as a new csv. Then just use that csv to load the data. I would bet most of the run time of your code is from opening/closing the files
Answered By - ICW
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.