I have 43 networks where some of the actors overlap, and what I would like to do is to visualise this overlap.
The problem is that these networks combined will have more that 40,000 nodes, so it will be very messy to stick everything into one network graph.
So the next best thing is to count the number of overlapping actors and use those as the edges and edge weight.
In other words, if you have three networks (g1, g2,and g3), and if there are 5 actors that are part of both g1 and g2, you could represent it with two nodes (one for each graph) linked by an edge with weight 5 (the number of overlapping actors). Like so…
From the graph we can then tell that g1 and g3 have 1 overlapping actor, while g3 and g2 have the highest number of overlapping actors, 6.
This way, you are not worrying about the 1,000+ that are only active in one network, instead you are emphasising those who are moving cross boundaries.
The massive problem is that you will have to compare the list of actors in two networks and pick the ones that overlap – over and over again. So for three networks you will have to do three comparisons:
1) between g1 and g2,
2) between g1 and g3, and
3) between g2 and g3.
With 3 networks it should be fine, but as the number of networks increases, so do the number of comparisons.
For the 43 networks I want to compare, I will have to do 903 comparisons.
Not going to happen – not manually anyway.
My laziness, therefore, propelled me to write a Python script. I built the engine this morning, which calculates the different combinations (all 903 of them).
Those combinations are then fed through a function that places all the network actors in two sets, intersects them, then appends the results to an output file with the corresponding network names.
The output file for the above example would read
g1 g2 5
g1 g3 1
g2 g3 6
To get the script to work, you should already have the list of actors in a text file that ends with
_uList.txt (one file for each network).
For the above example, your data folder should contain:
The lists inside these files contains the name of the actor and the degree, separated by a coma. Like so:
When running the code, you have to
$cd to the directory containing the files first.
Here is Python script…
# # # # # # # # # # # # # # # # # # Jose Christian # # Batch comparison # # input: *_uList.txt # # output: _fullGStats.txt # # # # # # # # # # # # # # # # # # from re import sub from os import listdir def getIntersection(threadOne,threadTwo): # reads the uList.txt files rFileOne=open(threadOne+"_uList.txt","r") rFileTwo=open(threadTwo+"_uList.txt","r") wFileOut=open("_fullGStats.txt","a") # populates the first set dataFileOne=set() for line in rFileOne: line1=sub("\n","",line) lineF=sub(",.*$","",line1) dataFileOne.add(lineF) # populates the second set dataFileTwo=set() for line in rFileTwo: line1=sub("\n","",line) lineF=sub(",.*$","",line1) dataFileTwo.add(lineF) # intersects the two sets inter=dataFileOne.intersection(dataFileTwo) # finds the number (rather than names) of actors numb=len(inter) # formats output lines fullOutPut=threadOne+" "+threadTwo+" "+str(numb)+"\n" # appends line to output file wFileOut.write(fullOutPut) # just so you know what's going on print fullOutPut, # house-keeping wFileOut.close() rFileOne.close() rFileTwo.close() # creates a list of all your uList files uFileList= dirList=listdir(".") for eachFile in dirList: if eachFile.endswith("_uList.txt"): theID=sub("_uList.txt","",eachFile) uFileList.append(theID) # creates the combinations and feeds them one at a time to the function above listLength=len(uFileList) for i in range(1,listLength): for i2 in range(i,listLength): getIntersection(uFileList[i-1],uFileList[i2])