Issue
Team, I have two files with some duplicates. I want to print or create new list with unique ones. however, my list is getting printed empty. not sure why
f1 = open(file1, 'r')
f2 = open(file2, 'r')
unique = []
for lineA in f1.readlines():
for lineB in f2.readlines():
if lineA != lineB:
print("lineA not equal to lineB", lineA, lineB)
else:
unique.append(lineB)
print(unique)
output
lineA not equal to lineB node789
node321
lineA not equal to lineB node789
node12345
[]
expected
lineA not equal to lineB node789
node321
lineA not equal to lineB node789
node12345
[node321,node12345]
Second Approach looking at comments list is getting populated but all empty and not recognizing actual strings.
[~] $ cat ~/backup/2strings.log
restr1
restr2
[~] $ cat ~/backup/4strings.log
restr1
restr2
restr3
restr4
file2 = os.environ.get('HOME') + '/backup/2strings.log'
file1 = os.environ.get('HOME') + '/backup/4strings.log'
f1 = open(file1, 'r')
f2 = open(file2, 'r')
unique = []
for lineA in f1.readlines():
for lineB in f2.readlines():
# if lineA.rstrip() != lineB.rstrip():
if lineA.strip() != lineB.strip():
print("lineA not equal to lineB", lineA, lineB)
else:
print("found uniq")
unique.append(lineB.rstrip())
print(unique)
print(len(unique))
output
found uniq
lineA not equal to lineB restr1
restr2
lineA not equal to lineB restr1
['', '', '', '', '']
5
Solution
I recommend you to use a different but simpler approach. Use sets
data structures. Link - https://docs.python.org/3/tutorial/datastructures.html#sets
Pseudo code
unique = []
items01 = set([line.strip() for line in open(file1).readlines()])
items02 = set([line.strip() for line in open(file2).readlines()])
# unique items not present file2
print(list(items01 - items02))
unique += list(items01 - items02)
# unique items not present file2
print(list(items02 - items01))
unique += list(items02 - items01)
# all unique items
print(unique)
In your code, you are using file01 as reference to check items in file01. You need to do the reverse of it too. Challenge No. 2 is too much time complexity. Python sets does hashing internally for performance boost, so use sets.
Answered By - sam
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.