Issue
I have a csv file where one field should be normalized over two records:
+-----+---------+
| id | field |
+-----+---------+
| 1 | A-a,B-b |
| 2 | C-c |
+-----+---------+
so some records are comma separated with two tuples to become different records
+-----+---------+
| id | field |
+-----+---------+
| 1 | A-a |
| 1 | B-b |
| 2 | C-c |
+-----+---------+
and then split over two fields
+-----+---------+---------+
| id | field_1 | field_2 |
+-----+---------+---------+
| 1 | A | a |
| 1 | B | b |
| 2 | C | c |
+-----+---------+---------+
I have this solution for the last step
df[['field_1', 'field_2']] = df['field'].str.split('-', expand = True)
but I'm missing the first step. Can you help?
Solution
df.field = df.field.str.split(',')
df1 = df.explode('field')
df1[['field_1', 'field_2']] = df1.field.str.split('-', expand = True)
df1
id field field_1 field_2
0 1 A-a A a
0 1 B-b B b
1 2 C-c C c
Answered By - onyambu
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.