Issue
I want to add new items to the original csv file. The original file's ID increases by 1 each time an item is added, as shown below.
Id | Name |
---|---|
0 | Alpha |
1 | Beta |
2 | Gamma |
3 | Delta |
I want to add the following array
items = ["Epsilon", "Beta", "Zeta"]
to the original csv file and eliminate duplicates, which would finally look like this:
Id | Name |
---|---|
0 | Alpha |
1 | Beta |
2 | Gamma |
3 | Delta |
4 | Epsilon |
5 | Zeta |
I tried it with pandas, but the id column becomes "nan" for some reason.
import pandas as pd
items = ["Epsilon", "Beta", "Zeta"]
df = pd.read_csv('original.csv', index_col='Id')
for i in range(len(items)):
df=df.append({'Id': len(df), 'Name': items[i]}, ignore_index=True)
df = df.drop_duplicates(['Name'], ignore_index=True)
df
I would appreciate it if you could help me with this problem.
Solution
Try:
items = ["Epsilon", "Beta", "Zeta"]
df = pd.concat([df, pd.DataFrame({"Name": items})]).drop_duplicates(subset="Name")
df["Id"] = range(len(df))
print(df)
# df.to_csv('out.csv')
Prints:
Id Name
0 0 Alpha
1 1 Beta
2 2 Gamma
3 3 Delta
0 4 Epsilon
2 5 Zeta
Answered By - Andrej Kesely
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.