Issue
I need to parse a directory of nested XML files and append the results into a single dataframe
For a single file it works. Here is a sample XML file from the directory:
<annotation>
<folder>VOC2007</folder>
<filename>361_0_00020.jpg</filename>
<size>
<width>800</width>
<height>800</height>
<depth>3</depth>
</size>
<segmented>0</segmented>
<object>
<name>361</name>
<pose>Unspecified</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>338</xmin>
<ymin>361</ymin>
<xmax>430</xmax>
<ymax>430</ymax>
</bndbox>
</object>
<object>
<name>361</name>
<pose>Unspecified</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>24</xmin>
<ymin>16</ymin>
<xmax>240</xmax>
<ymax>156</ymax>
</bndbox>
</object>
</annotation>
And here is the python code to combine it into a dataframe
import pandas as pd
import xml.etree.ElementTree as et
tree= et.parse("/content/drive/MyDrive/361_0_00020.xml")
root=tree.getroot()
filename = root.find('filename').text
obj= root.find('object')
bnb = obj.find('bndbox')
xmin = bnb.find('xmin').text
ymin = bnb.find('ymin').text
xmax = bnb.find('xmax').text
ymax = bnb.find('ymax').text
list_1 = [filename, xmin, ymin, xmax, ymax]
df_cols= ['filename','xmin', 'ymin', 'xmax', 'ymax']
df= pd.DataFrame([list_1], columns=df_cols)
df
And the result looks like this:
filename | xmin | ymin | xmax | ymax |
---|---|---|---|---|
361_0_00020.jpg 381 | 316 | 443 | 348 |
Now I created a for-loop to iterate over the directory and used df.append to append an empty dataframe at the the end of each iteration:
import os
import pandas as pd
import xml.etree.ElementTree as et
df_cols= ['filename','xmin', 'ymin', 'xmax', 'ymax']
df= pd.DataFrame([], columns=df_cols)
path= '/content/drive/MyDrive/Annotations'
for filename in os.listdir(path):
if not filename.endswith('.xml'): continue
fullname = os.path.join(path, filename)
tree = et.parse(fullname)
root=tree.getroot()
for child in root:
fnm = root.find('filename').text
obj= root.find('object')
bnb = obj.find('bndbox')
xmin = bnb.find('xmin').text
ymin = bnb.find('ymin').text
xmax = bnb.find('xmax').text
ymax = bnb.find('ymax').text
list_2 = [[fnm, xmin, ymin, xmax, ymax]]
df.append(pd.DataFrame(list_2))
The loop iterates through but the datafame is still empty. What am I missing?
Solution
I hope this help you. I just changed it to use concat instead of append and seems it works.
import os
import pandas as pd
import xml.etree.ElementTree as et
df_cols= ['filename','xmin', 'ymin', 'xmax', 'ymax']
df= pd.DataFrame([], columns=df_cols)
path= 'C:/Users/rober/CursoPython/'
for filename in os.listdir(path):
if not filename.endswith('.xml'): continue
fullname = os.path.join(path, filename)
tree = et.parse(fullname)
root=tree.getroot()
for child in root:
fnm = root.find('filename').text
obj= root.find('object')
bnb = obj.find('bndbox')
xmin = bnb.find('xmin').text
ymin = bnb.find('ymin').text
xmax = bnb.find('xmax').text
ymax = bnb.find('ymax').text
list_2 = [filename, xmin, ymin, xmax, ymax]
df_temp = pd.DataFrame(list_2)
df =pd.concat([df,df_temp])
Answered By - Loxley
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.