Issue
I am currently trying to compute shortest paths between every pair of connected nodes in a very large weighted graph (160K nodes, 15 million edges).
I would like to find the number of edges in the shortest path between every two nodes, when all edges with a weight >= the weight of the edge between the two nodes are deleted. If a path cannot be found, I will take the shortest path as 1 (i.e. the deleted edge connecting the nodes).
Is it possible to use igraph
v0.9.11 to do this (essentially GraphBase.shortest_paths(), with the option to ignore edges with a weight above a given threshold)? I would rather not hand code a solution given the size of the graph (I'm guessing the igraph implementation would be better and hopefully written in a faster non-python language). Ideally, it would look something like this:
g = ig.Graph.TupleList(data_all.itertuples(index=False),
directed=False,
edge_attrs = ['percent', 'length_difference']
)
for edge in g.es:
threshold_weight = edge.attributes()['length_difference']
shortest = g.my_pretend_shortest_paths(source = edge.source, target = edge.target, weights = 'length_difference', mode = 'all', ignore_weights_over = threshold_weight) #should not alter original graph obj
if shortest = []:
shortest_paths += [(edge.source, edge.target, threshold_weight, 1)]
else:
shortest_paths += [(edge.source, edge.target, threshold_weight, len(shortest))]
My current solution is below, but I'm hoping there's a better way than copying a big graph and doing costly graph operations outside of shortest_paths
, 15 million times. This takes 6-700 seconds to analyse 500 edges within a test graph with only ~250K edges (giving a predicted runtime of ~230 days for 15 million edges):
g = ig.Graph.TupleList(df.itertuples(index=False),
directed=False,
edge_attrs = ['percent', 'length_difference']
)
shortest_paths = []
for edge in g.es:
temp_g = copy.deepcopy(g)
threshold_weight = edge.attributes()['length_difference']
#delete edges in the graph copy with a length_difference edge attribute >= current edge's.
#https://stackoverflow.com/a/28951383/11357695
temp_g.es.select(length_difference_ge = threshold_weight).delete()
shortest = temp_g.shortest_paths(source = edge.source, target = edge.target, weights = 'length_difference', mode = 'all')
if shortest = []:
shortest_paths += [(edge.source, edge.target, threshold_weight, 1)] #will plot these downstream
else:
shortest_paths += [(edge.source, edge.target, threshold_weight, len(shortest))] #will plot these downstream
Ideally, this will complete within about 12 hours max - I can leave it overnight but beyond that it gets a bit awkward.
Thanks, Tim
EDIT
I have optimised it further to remove the copying of the large graph and removing redundant processing, but it still takes about a minute to process 1-2000 nodes (i.e. ~1 week for 15 million nodes assuming everything scales linearly). Any better solutions would be appreciated!
def ec_list_to_g(es):
edges_formatted = []
for e in es:
att = e.attributes()
edges_formatted += [(e.source, e.target, att['percent'], att['length_difference'])]
return ig.Graph.TupleList(edges_formatted,
directed=False,
edge_attrs = ['percent', 'length_difference']
)
g = ig.Graph.TupleList(df.itertuples(index=False),
directed=False,
edge_attrs = ['percent', 'length_difference']
)
shortest_paths = []
for edge in g.es:
threshold_weight = edge.attributes()['length_difference']
if threshold_weight <= 1:
continue
target_es = g.es.select(length_difference_lt = threshold_weight)
sub_g = ec_list_to_g(target_es)
shortest = sub_g.shortest_paths(source = edge.source, target =
edge.target, weights = 'length_difference', mode = 'all')#follow on as before
Solution
As of igraph 0.10 (and all earlier versions), graph modifications such as deleting edges is expensive. It causes the graph to be re-built from scratch, thus it takes time proportional to the graph size, even if you remove only a single edge.
When you perform weighted shortest path calculations, an efficient way to effectively disable an edge without actually removing it is to set its weight to infinity (float('inf')
). Using this technique is likely to significantly improve performance.
Answered By - Szabolcs
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.