Network smoothing and locational measures
Spatial interactions such as people, commodities, information, diseases and money move between locations and naturally form location-to-location networks. Extracting and understanding flow structures and locational characteristics is crucial for a wide range of study areas such as migration, transportation, disease spread, and information diffusion. A location-to-location network is a type of geo-social network in which a node represents a location or an area, and a link represents an interaction (flow) between two locations. Below is a flow map illustrating migration flows between regions of the U.S. |
Locational measures (network/graph measures) such as net-flow, centrality, and entropy, are often derived to understand the structural characteristics and the roles of locations in spatial interaction networks. However, due to the small-area problem and the dramatic difference in location sizes (such as population), derived locational measures often exhibit spurious variations, which may conceal the underlying spatial and network structures. |
Source: Guo, 2009 |
To demonstrate the problem and the network smoothing approach, let's focus on the county-to-county migration data between 1995 and 2000 in US and derive net migration rate for each county using the flow data. Net migration rate is the difference between in-migration (inflow) and out-migration (out-flow) of an area in a period of time, divided by the population of the area. |
The figure on the left illustrates the original net migration rate for 25-29 age group. It is difficult to distinguish regions of attraction and depletion because of unstable values caused by the dramatic population differences among counties and the small-area problem in the data. |
Several studies have applied existing spatial kernel smoothing methods to remove spurious data variations (Porta et al., 2009; Sohn & Kim, 2010), which treat a locational measure (e.g., centrality) as a regular attribute and apply a traditional spatial kernel smoothing method to directly smooth the derived measure values. However, directly smoothing the measure values may generate unreliable or even misleading results for two main reasons. First, the original measure values may be unstable due to varying unit sizes and small flows between units. Second, traditional smoothing methods do not differentiate flows within and beyond a neighborhood and it is inappropriate to directly smooth original locational measures. For example, the net flow ratio (i.e., net flow/total flow) for a neighborhood (i.e., a group of contiguous spatial units) cannot be calculated as the average of unit-level net flow ratios within the neighborhood. |
Methodology
We introduce a new approach to smoothing locational measures in spatially embedded networks. The new smoothing approach consists of four steps. |
1 For a location (node) s in a spatial interaction network, identify its spatial neighborhood Ns based on a geographic distance threshold (fixed-bandwidth) or a size threshold such as a minimum population (adaptive-bandwidth). The figure illustrates the bandwidth selection process. The neighborhood Ns of a location s is the smallest set of nearest neighbors that has a total population P(Ns) greater than a given population threshold p, which is 100 in this example. The map shows the neighborhoods of three locations r, s, and t. |
2 Temporarily remove the flows within the neighborhood, i.e., those with both origin and destination in the same neighborhood. Note that these flows are excluded only for this specific neighborhood and are still eligible for consideration for other neighborhoods. Then we weigh flows from/to the nodes (including s) in the neighborhood based on their distances to location s. The result is a smoothed sub-graph, in which flows to/from location s are modified considering flows to/from its neighbors. The figure below illustrates the smoothed sub-graph of a location s where flows within Ns are removed and flows to/from Ns (shown by dashed lines) are weighted and partially considered as flows to/from location s.
3 Calculate the needed network measure for location s with the smoothed sub-graph. In other words, the weighted flows to/ from the neighborhood are used in calculating the network measure for the location.
4 Fourth, repeat the above three-step process for each location (node). After the measure is obtained for a location, the smoothed flows are discarded and their original flows are restored. In other words, the smoothing (Step 2) is only temporary for each neighborhood. |
Results
Net Migration RateBelow figure illustrates the two results (conventional approach vs. our approach) for smoothed net migration rates for all population. In order to allow comparison, both methods use the same bandwidth (i.e., one million) and the same spatial kernel function (i.e., Gaussian). The overall patterns are similar in both maps. However, for the conventional approach the effect of small base populations can still be observed in many places such as the surrounding counties of Salt Lake City, UT, Albuquerque, NM and Houston, TX (Left), where smoothed rates are affected by the original unstable rates and the flows within the neighborhood. Our approach eliminates the effect of small base populations by treating the neighborhood as whole, removing internal flows, and calculating the measure based on smoothed network. Conventional smoothing Network smoothing Inflow entropyThe effect of small base populations is more dramatic for the entropy measure, causing small areas to have very small entropy values due to the sparse flows to/from those areas. This can easily be seen in the original measure result as well as the smoothing result of the conventional approach (Left), which produces large clusters of low entropy values which are highly correlated with the presence of small counties and their unstable rates. Our approach, on the other hand, first smoothes the network related to a neighborhood and then calculates its entropy measure. As such, our approach reduces the effect of the small-area problem and reveals spatial clusters of low inflow entropy values, indicating places that draw focused in-migration flows (Right), which is dramatically different from the result of the conventional approach. Conventional smoothing Network smoothing
Network Measure Mapper BetaNetwork Measure Mapper introduces an online generic framework to calculate and smooth network measures using a novel approach. A beta version of this online application will soon be released to calculate and smooth network (locational) measures using county-to-county migration datasets in the US. Network Measure Mapper helps discover true patterns by removing the influence of size in spatially embedded networks and can answer research questions such as "What places are the most attractive for young migrants?". In the beta version, you will be able to calculate and smooth (given a set of parameters) net, in and out migration rates for county-to-county migration datasets in U.S. from 1995-2000 and 2005-2010. Later on, the full version of this tool will be publicly available to allow users upload datasets and calculate and smooth a number of locational measures (net flow ratio, entropy, and centrality measures) using their own datasets. Below is the current prototype of Network Measure Mapper. |