Issue
The below is a small example of what I am trying to do in python. I am working with networks, having 15000 distinct nodes in my network. Data is from pandas dataset:
Node Target Node_Attrib
mom dad 0.2
mom grandmother 0.12
mom grandfather 0.24
mom Lucy 0.2
dad mom 0.4
dad Lucy 0.3
Lucy mom 0.1
Lucy dad 0.3
Lucy Mark 0.1
Lucy grandmother 0.2
Lucy grandfather 0.1
The network is created as follows:
G=nx.from_pandas_edgelist(df,’Node’, ‘Target’,[‘Node_Attrib’]
Where nx is networkx. Since I would like to perform some analysis, I would need to use adjacency matrix. I am thinking of using crosstab for doing that:
adj = pd.crosstab(df.Node, df.Target)
idx=adj.columns.union(df.index)
adj=adj.reindex(index=idx,columns=idx,fill_value=0)
I am wondering if this is the best approach to get the adjacency matrix in python, also due to the number of nodes in the network. Do you know a different approach that could better manage with thousands of nodes (and edges) in Python?
Solution
First of all, nx.from_pandas_edgelist()
will create an undirected graph by default. That means it first sets the value of the edge (mom, Lucy)
to 0.2, as it's the first time this edge is encountered in your table. But when you parse (Lucy, mom)
, the same edge will be updated to the new value.
>>> G.get_edge_data('mom', 'Lucy')
{'Node_Attrib': 0.1}
For a directed graph, change the line to
G = nx.from_pandas_edgelist(df, 'Node', 'Target', ['Node_Attrib'], create_using=nx.DiGraph())
Networkx has the function nx.adjacency_matrix()
which creates a scipy sparse matrix. This is useful to save memory when not all edges have values.
>>> adj = nx.adjacency_matrix(G, weight='Node_Attrib')
>>> adj[0,1] # (mom, dad) edge as the node ordering is taken from `G.nodes`
0.2
>>> array = adj.todense() # if for some reason you need the whole matrix
As the documentation of that function states, you can also create a pure Python equivalent of a sparse matrix with a dict-of-dicts. But if you want to perform some analysis, I suspect the array option from above will be more suitable for you.
>>> adj = nx.convert.to_dict_of_dicts(G)
>>> adj['mom']['Lucy']['Node_Attrib']
0.2
This would require a bit of a clean-up so that adj[node1][node2]
gives you the edge value straight up. You'd also need to actually use it with adj.get(node1, {}).get(node2, 0.)
to not run into any KeyError
.
Answered By - Reti43
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.