# The Affiliation Graph Model | Stanford University

our general plan of attack today will be to think about generative models for networks so our goal will be the following actually first I will start TED talk telling you about how do we generate networks right so if I have some model how can i generate a network a graph from this model right where you know nodes nodes represent some entities or people and edges represent let’s say friendships and so on and the our goal in some sense will be that we will want to define a model that is able to generate realistic looking graphs and then in the second step what we will do is to say actually given a real graph imagine given like a facebook graph what we will want to do is we will want to go and fit or find a generative model that has generated our graph and by doing so we will go and detect communities so this is the overall idea for today our next thing now is to actually say what is a good generative models model for networks so let me tell you how to think about this right our goal is to say we want to define a generative model for networks and in some sense our model will have a set of parameters right that within kind of later want to estimate given given our real data and by estimating those parameters we will in some sense implicitly actually detect this social communities that I was talking about so the question is given a set of nodes how do communities generate the edges of the underlying social network so we need to kind of define the way the edges arise given a set of nodes the model we will be talking about is called the community affiliation graph model and the way this works is the following right so our goal is we want to define the model that will generate a network so the model is specified the following we have a set of nodes we these are the nodes of my underlying social network imagine here they are and then I have also another set of nodes that I will talk and I will call them communities I call this set C ok and then what I will do is the following I will say that every node can be a member of any of the communities so in some sense I will have these edges here that I that are basically community memberships right so what this will tell me is that the blue node is member of community a green nodes are members of community B and the four red nodes they belong both to community a and to community C so this is now my specification of my model now what I want to do is given this model I will want to generate the network so the affiliation graph model is a generic model for networks and the model can be specified as a with kind of four types of parameters I need to know the set of nodes in the network this is the set week I need to know the set of communities that is the set C I need to know the set of edges the membership edges between nodes and the corresponding communities and then what we will also have is every community has a single parameter associated with it and we call this parameter P sub a P sub B and in general there is a set of dis parameters that we will call P sub C okay so the model above is uniquely defined by this couple of four four different sets now kind of what we want to do next is to specify how does this model generate the network right we need to define what is the process from going from this affiliation network here on the left to actually the edges of the underlying social network for example one thing we already know is that that the set V the number of nodes at the bottom here is exactly the number of nodes of our social network what we need to do now is define how are these edges of the social network created or how do they arise out of our model so the way we do this is to think to think about the affiliation graph model in the following way we can think that every of the communities that a pair of nodes shares generates an edge with some probability so in some sense the idea is if we both belong to soccer community and we also both attend the same University then because we are playing soccer together that has some probability of us being friends and also because we had no attend the same University that has also some probability for us becoming friends so the idea is that each of such social communities in some sense creates an opportunity for a pair of nodes to meet and create a friend connection so the idea is that every community let’s call this community a has a has a single parameter P sub a and this parameter tells us with how likely with what probability does a pair of node nodes connect if they both belong to this community a okay so now we can ask okay given a pair of nodes and all the communities they belong to how likely is this pair of nodes to connect with each other and the way we will do this is we will say they connect if at least one of the communities they have in common creates an edge and the formula that describes this I have it down here all we are saying is the probability that node U and we are connected is 1 minus the product over all the communities they have in common 1 minus the probability of each given community creating a connection the way to think about this formula is the following write 1 minus P sub C basically tells us what is the probability that the pair of nodes does not connect because they have a community in common so the product over this probabilities tells us what is the probability that all all of the communities they have in common all of this said no I don’t want a connection so 1 minus that is the probability that at least one of the communities created a connection so one way to think about this expression here is that in some sense this is an or function a pair of nodes will connect if at least one of the communities they have in common actually creates creates an edge what is also interesting here is that we already see that this is an increasing function in the number of common communities why does it increase is because the more communities we have in common the more numbers are in this product each of the elements of this product is a number smaller than 1 so the more numbers that are smaller than 1 we multiply together the smaller the total product so 1 minus a small number is is a bigger number right so this means that the more community’s a pair of nodes has in common the more likely they will be to link with each other so now we specified basically how to create edges of our network right the way we create edges here is that for every pair of nodes we ask what are the communities you have in common and we apply this formula down here and say that this way we computed the total probability of a connection flip a coin and if the coin says yes we create a connection right the nodes that have multiple communities in common they will basically get multiple chances in some sense to create an edge so what is now what we have done so far is that we have our affiliation graph model of networks we specify the model so here is a different specification of the model where we have three communities each of these communities has different linking parameter and given such model we can now generate a network right so from the model we can go to the network here is a picture of the network that would arise from the model above what we nicely see here is that each of these communities we can think of it as a tile we see that for example the red nodes here in the overlap of all three communities they are very well connected we see that for example the pink nodes also connect heavily with each other and then we see that other parts of the communities were kind of there is no overlaps they are less well connected but the model that we specified here is able to generate us the network below so what is good about the affiliation graph model and the way I describe the model so far is that this model is able to express a variety of different structures of networks so for example if I want communities social communities or clusters that don’t overlap with each other this is how I could represent this using our model right so the idea is I have a community a I have a community B and imagine I have maybe a few edges across these two communities the way I would represent this in terms of our model the affiliation graph model is basically to say I have one community node a I have another community node B and then a set of nodes is member of a and the other set of nodes is a member of B of course given the way I define the formula the on the previous slide the idea is if a pair of nodes has no community in common then they link with a small probability epsilon so that we still get a few edges crossing the two communities so this is the idea for how to model non overlapping or this kind of partition based communities if you want we already know how to model overlapping communities the idea here is I have communities a and B they overlap in this yellow area so how does the model GMR look like here is the model basically the idea is that I have these two yellow notes that actually belong to both communities to community a and community B and what’s interesting we can even like think about nested or hierarchically nested communities where we could have you know the mud the mother community be the den has two smaller communities a and C embedded in it the way we would model this using our affiliation network model is to say okay we have three communities the nodes at the bottom every node is a member of the community B but then some subset of them is also a member of a and another subset is a member of C so this way we nicely model the hierarchical structure of the network so basically the bottom line is that this model for generating networks is very flexible and if somebody gives us the structure of this bipartite graph this affiliation network below we are able to generate the network of course what we will do next is actually turn the problem around we will say given a network we want to find the model so this will be the topic of our next lecture

## Be the first to comment on "The Affiliation Graph Model | Stanford University"