Issue
I am trying to use a loop that will increment two variables so I can generate a heatmap plot that will reveal the similarity of the files in a simple form.
The idea is if I have 100 files, I would like to compare each of them to one another. Currently I repeat my comparisons (i.e. compare file 1 & 2 and then file 2 & 1) which is very inefficient. The current stripped down script I have is shown below:
for fileX in range(1,4):
for fileY in range(1,4):
print "X is " + str(fileX) + ", Y is " + str(fileY)
The output I obtain is something like this:
X is 1, Y is 1
X is 1, Y is 2
X is 1, Y is 3
X is 2, Y is 1
X is 2, Y is 2
X is 2, Y is 3
X is 3, Y is 1
X is 3, Y is 2
X is 3, Y is 3
Whereas what I am looking for is something like this:
X is 1, Y is 1 << not necessary since it is always 100 %
X is 1, Y is 2
X is 1, Y is 3
X is 2, Y is 2 << not necessary since it is always 100 %
X is 2, Y is 3
X is 3, Y is 3 << not necessary since it is always 100 %
The reason being, I have already compared files 1 & 2, 1 & 3 and 2 & 3 in the previous iteration. Obviously for a short list of a couple files this is not overly bad, however for hundred files it increases the computation significantly. This will enable me to speed up the comparison quite significantly, especially since the files that I am comparing are usually pretty large (~500K lines each).
Solution
You can use the value of the first loop as the starting value of the range of the second loop like
for fileX in range(1,4):
for fileY in range(fileX,4):
To also skip the equall ones do
for fileX in range(1,4):
for fileY in range(fileX+1,4):
Answered By - Niki van Stein
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.