Issue
I have a dataset for an object detection algorithm containing pictures (.jpg) and corresponding .xml files containing bounding boxes.
I want to write a script that randomly splits the dataset into train and test set which means i have to make sure i allocate the jpg with it's corresponding XML to the same directory.
How should i edit the following code in order to fulfill this?
Also, is this the "best" way of doing this or is it better to split the dataset after xml-to-csv conversion or after generating csv to tfrecords conversion?
import shutil, os, glob, random
# List all files in a directory using os.listdir
basepath = '/home/bis/hans/bis/workspace/images/Synced_dataset'
filenames = []
for entry in os.listdir(basepath):
if os.path.isfile(os.path.join(basepath, entry)):
#print(entry)
filenames.append(entry)
filenames.sort() # make sure that the filenames have a fixed order before shuffling
random.seed(230)
random.shuffle(filenames) # shuffles the ordering of filenames (deterministic given the chosen seed)
split = int(0.8 * len(filenames))
train_filenames = filenames[:split]
test_filenames = filenames[split:]
Solution
The best option to me is to create two list of files (filenames
for jpg
and xmlnames
for xml
) in the correct order and one list of indices indices=[i for i in range(len(filenames))]
.
Then you can shuffle your indices list :
random.seed(230)
random.shuffle(indices)
Finally, you create your train and test sets for both your jpg
and xml
files:
split = int(0.8 * len(filenames))
file_train = [filenames[idx] for idx in indices[:split]]
file_test = [filenames[idx] for idx in indices[split:]]
xml_train = [xmlnames[idx] for idx in indices[:split]]
xml_test = [xmlnames[idx] for idx in indices[split:]]
Answered By - Joseph Budin
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.