The way in which a network is connected plays a large part in how we analyze and interpret it. When analyzing connectedness and clustering, we are asking how integrated or fractured the overall network system is, how these different major sub-systems are distributed out and their local characteristics. A graph can be said to be connected if for any node in the graph there is a path to any other node. When the graph is not connected then there will be a number of what we call components to it. A component is a subset of nodes and edges within a graph that are fully connected.1 Thus, for a node to be part of a component it must be connected to all the other nodes in that component.
A cluster is simply a subset of the nodes and edges in a graph that possess certain common characteristics or relate to each other in a particular way, forming some domain-specific structure.2 Whereas a component is simply referring to whether a given set of nodes are all connected or not, a cluster is referring to how they are connected and how much they are connected, that is the frequency of links between a given subset of nodes. In order to model the degree of clustering of a subset of nodes, we simply take a node and look at how connected a node it links to is to other nodes that it is also connected to. For example, if this was a social network of friends, we would be asking how many of your friends know your other friends. The more your friends are interconnected the more clustered the subset is said to be. This clustering within social networks is also called a clique. A clique is a group of people who interact with each other more regularly and intensely than others in the same setting.
Within this social context, clustering can be correlated to homophily, where homophily describes the phenomenon where people tend to form connections with those similar to themselves, as captured in the famous saying “birds of a feather flock together.” We might think of clustering coming from the fact that the interaction between nodes with similar attributes will often require fewer resources than interactions between nodes with different attributes.3 For example, between two cultures there may be a language barrier, or between different devices on a network that might have different protocols. Equally, clustering may be due to physical constraints of the resource expenditure required to maintain them over a greater distance, thus resulting in a clustering around a geographic neighborhood.
Understanding the different local conditions that have created clustering within a network are important for understanding why the network is distributed out into the topology that it has, how you can work to integrate it or disintegrate it, and how something will propagate across the network, as each one of these clusters will have its own unique set of properties within the whole, making it particularly receptive or resistant to a given phenomenon. For example, we might be analyzing a political network, with each cluster in this network representing a different set of ideologies, social values, and policy agendas that are receptive to different messages.
Or as another example, by understanding that different clustering groups on a computer network may represent different operating systems, we will be able to better understand why a virus has rapidly spread in one part of the network but not in another. And also by understanding this local clustering condition, we will be able to better approach integrating them into the broader network. The clustering coefficient of a node is then a method for measuring the degree of a local cluster. There are a number of such methods for measuring this but they are essentially trying to capture the ratio of existing links connecting a node’s neighbors to each other relative to the maximum possible number of such links that could exist between them. A high clustering coefficient for a network is an indication of the small-world phenomenon.4