Issue
I have written a code using pandas and graphviz to generate a family tree from a csv file.
ID | S | First name | Last name | DoB | DoD | FatherID | MotherID | SpouseID | Place of birth | Job |
---|---|---|---|---|---|---|---|---|---|---|
JoS1 | M | John | S | 1111 | 2222 | MaS1 | India | Job-1 | ||
MaS1 | F | Mary | S | 1112 | JoS1 | India | Job-2 | |||
JaS | M | Jacob | S | 1113 | JoS1 | MaS1 | KeS | India | Job-3 | |
JoS2 | M | Joe | S | 1114 | 2225 | JoS1 | MaS1 | AnS | India | Job-4 |
MaS2 | F | Macy | D | 1115 | JoS1 | MaS1 | AnD | India | Job-5 | |
KeS | F | Keysha | S | 1116 | JaS | India | Job-6 | |||
AnD | M | Andy | D | 1117 | MaS2 | India | Job-7 | |||
AnS | F | Anna | S | 1118 | JoS2 | India | Job-8 | |||
MiS | M | Mike | S | 1119 | JaS | KeS | India | |||
SaS | M | Sam | S | 1120 | JaS | KeS | India | |||
MaS3 | F | Matt | S | 2345 | JoS2 | AnS | India |
The code:
from graphviz import Digraph
import pandas as pd
import numpy as np
rawdf = pd.read_csv('/content/drive/MyDrive/ftdata.csv', keep_default_na=False) ## Change file path
el1 = rawdf[['ID','MotherID','SpouseID']]
el2 = rawdf[['ID','FatherID','SpouseID']]
el1.columns = ['Child', 'ParentID','SpouseID']
el2.columns = el1.columns
el = pd.concat([el1, el2])
el.replace('', np.nan, regex=True, inplace = True)
t = pd.DataFrame({'tmp':['no_entry'+str(i) for i in range(el.shape[0])]})
el['ParentID'].fillna(t['tmp'], inplace=True)
el['SpouseID'].fillna(t['tmp'], inplace=True)
df = el.merge(rawdf, left_index=True, right_index=True, how='left')
df['name'] = df[df.columns[4:6]].apply(lambda x: ' '.join(x.dropna().astype(str)),axis=1)
df = df.drop(['Child','FatherID', 'ID', 'First name', 'Last name'], axis=1)
df = df[['ID', 'name', 'S', 'DoB', 'DoD', 'Place of birth', 'Job', 'ParentID']]
#df
f = Digraph('neato', format='jpg', encoding='utf8', filename='testfile', node_attr={'style': 'filled'}, graph_attr={"concentrate": "true", "splines":"ortho"})
f.attr('node', shape='box')
for index, row in df.iterrows():
f.node(row['ID'],
label=
str(row['name'])
+ '\n' +
str(row['Job'])
+ '\n'+
str(row['DoB'])
+ '\n' +
str(row['Place of birth'])
+ '\n†' +
str(row['DoD']),
_attributes={'color':'lightpink' if row['S']=='F' else 'lightblue'if row['S']=='M' else 'lightgray'})
for index, row in df.iterrows():
f.edge(str(row["ParentID"]), str(row["ID"]), label='')
f.view()
The result:
Now, I want to put the spouses next to each other using clusters or groups but can't find a way to do it. So, I need help in figuring out how I could fix this issue.
Solution
I found the solution and was able to cluster the couples together using the code given below:
# Check if the node has a spouse
if str(row['SpouseID']) != '':
spouse_id = str(row['SpouseID'])
# Check if the spouse cluster exists, if not create one
if spouse_id not in spouse_clusters:
spouse_clusters[spouse_id] = Digraph('cluster_' + spouse_id)
spouse_clusters[spouse_id].attr(label='Couple', color='lightgreen', style='filled')
# Add the node to the spouse cluster
spouse_clusters[spouse_id].node(node_id, label=node_label, color=node_color)
else:
# Add nodes without spouses directly to the main Digraph
f.node(node_id, label=node_label, color=node_color)
# Add nodes and clusters to the main Digraph
for cluster_id, cluster in spouse_clusters.items():
f.subgraph(cluster)
This code was added in the part in which the nodes were created using the for loop after the creation of the nodes. Also a dictionary 'spouse_clusters' was created before the for loop. This adds the nodes which are spouses of each other as the values of the dictionary and adds them as a subgraph to the main digraph.
Note: This does require the values in the SpouseID field of the data to be changed so that both the spouse have the same SpouseID. This was achieved by taking the first 2 letters of the first names of the spouse and using it as SpouseID. Eg. John S - Mary S --> JoMa
Answered By - Kshitij Khandelwal
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.