One of the domains for which data lends itself well to be represented as a graph is trade. We can take any tradable good, represent the trading actors as nodes, and represent the (amount of) traded goods as (properties of) edges. As an example, the graph below displays the international trade of fish in 1998 (source data), where the nodes represent countries, the sizes of nodes represent the total exports per country, and the edges represent the fact that the edge's source country exported a given amount of fish to the destination country:
The simple solution: filtering on volumeAs a first experiment, we could try to apply a filter that drops all connections for which the trading volume does not belong to the top 1 %. This leads to the following result (after re-applying layout for the graph):
- Based on the chosen threshold it either completely discards most of the network, or adds enough complexity to the connected part to make it hard to find the preferred routes,
- The choice of a particular threshold requires domain knowledge in the best case and is fully arbitrary in the worst case.
A more advanced (but elegant) solution: link salienceLast year, a paper that addresses above issues was published in Nature Communications. It describes a method for automatically finding "salient" links in networks by using the following approach:
- Choose a property that defines the weight of an edge in a given network,
- Define the distance between two nodes as 1 / the weight,
- Using above distance measure, take a node and compute the shortest path to all other nodes in the network,
- Combine all edges from the previous step into a set, which is called the Shortest Path Tree (SPT),
- For each edge in the SPT, increase a "salience" counter,
- Repeat steps 3-5 for every other node in the network,
- For each edge, divide its salience counter by the total number of nodes in the network, leading to a salience property with a value in the range [0..1].
As we can see, salience-based filtering has two distinct advantages compared to filtering on absolute edge weights:
- It doesn't require any parameters, the only domain knowledge required is when choosing which property to use as edge weight for computing salience,
- On a natural network as discussed here, taking the top few percent of the edges with highest salience leads to a nearly fully connected network where most routes between any two nodes are unique.