Issue
I am looking at the ShanghaiTech A and B datasets which are used for crowd counting which can be found at this link https://github.com/desenzhou/ShanghaiTechDataset I notice that each image is accompanied with a .mat file and a .hdf5 file.
.mat file The .mat file contains coordinates of each head as well as the ground truth. For example for image 1
coordinates are [[ 29.6225116 472.92022152]
[ 54.35533603 454.96602305]
[ 51.79045053 460.46220626]
...
[597.89732076 688.27900015]
[965.77518336 638.44693908]
[166.9965574 628.1873971 ]]
as well as a ground truth value of 1546
.hdf5 file
On the other hand, the .hdf5 file contains several keys
['attention', 'density', 'gt']
Using the keys, i extract the data like that
#extrating data
attention_data = f.get('attention')
print("attention shape:" , attention_data.shape)
attention_data = np.array(attention_data) # For converting to a NumPy array
print("sum of attention data:", attention_data.sum())
These turns out to be 768x1024 arrays containing values, as illustrated below
- Attention contains decimal values ranging from 0 to 0.05ish
- Density contains values of only 0 OR 1
attention shape: (768, 1024)
sum of attention data: 132021.0
density shape: (768, 1024)
sum of density data: 1545.0001
density_data * attention_data IS 1530.4147
gt is 1546
gt is of type <class 'numpy.ndarray'>
Questions:
- How may I understand the attention and density values ?
- Why does an element-wise multiplication not yield the ground truth ?
- How may I label additional images to add on to the dataset ?
Posts I have consulted to help decipher the dataset
- explain ground-thruth .mat file of an image for CNN
- https://github.com/desenzhou/ShanghaiTechDataset
Edit: 3) I believe I may have found how the hdf5 file was generated. I had mistakenly thought it was hand-labelled. https://www.kaggle.com/code/tthien/shanghaitech-a-train-density-gen/script
Solution
Ok i found out... to parse and understand the MAT file
import scipy
from scipy.io import loadmat
import pandas as pd
import numpy as np
import cv2
#specify your file directories here
img_dir = "A1.jpg"
matfile_dir = 'GT_IMG_1.mat'
#opening base image to draw on later
input_image = cv2.imread(img_dir)
#loads the .mat file using scipy
matContent = scipy.io.loadmat(matfile_dir)
#mat file is labelled in a certain array format.
#array format goes ['image_info'][0][0][0][0][x], where x is any number
# x = 0 gives the x,y coordinates of the points
# x = 1 gives the ground truth of the crowd count
coordinates = matContent['image_info'][0][0][0][0][0] #extracts coordinates of heads
print("coordinates are", coordinates)
To parse and understand the hdf5 files
import numpy as np
import cv2
import h5py
f = h5py.File('A1.h5','r')
print(list(f.keys()))
for item in f:
print(item) #gets the keys and prints them out
continue
#you will see the keys attention, density and gt
#extrating data
attention_data = f.get('attention')
print("attention shape:" , attention_data.shape)
attention_data = np.array(attention_data) # For converting to a NumPy array
print("sum of attention data:", attention_data.sum())
#print(attention_data)
density_data = f.get('density')
print("density shape:" , density_data.shape)
density_data = np.array(density_data) # For converting to a NumPy array
print("sum of density data:", density_data.sum())
#print(density_data)
density_times_attention = density_data * attention_data
total = density_times_attention.sum()
print("density_data * attention_data IS", total)
gt_data = f.get('gt')
gt_data = np.array(gt_data) # For converting to a NumPy array
print("gt is", gt_data)
print("gt is of type", type(gt_data))
Answered By - fatbringer
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.