From nCol to Pajek 1/2 – Python

This process will take nCol data and create the *Vertices and *Edges Pajek sections.

Pulled all the nCol data from the database and saved as links.txt, which reads:

# links.txt
Jim Elle
Andy Shaniqua
Shaniqua Elle

I pulled all the user names from the database and saved in a temporary file, usrTemp.txt.

# usrTemp.txt
Andy
Jim
Andy
Elle
Shaniqua
Jim
Jim

Removed all duplicates with the terminal and saved the file as usrIDs.txt.

# This arranges all lines alphabetically
sort usrTemp.txt -o usrTemp.txt

# This removes duplicate lines
uniq usrTemp.txt usrIDs.txt

Then, using Vim, I added a number for each line

# To add a unique number to every user name.
%s/^/\=printf('%d ', line('.'))

So now, the usrIDs.txt file reads:

1 Andy
2 Elle
3 Jim
4 Shaniqua

This can be used as the *Vertices section.

Now, use Python to replace nCol user names with global IDs

#!/usr/bin/python

# To split by space
import shlex

# Open nCol file and...
rLinks=open("links.txt","r")
# global ids
rIDs=open("ids.txt","r")
# output
aOutput=open("output.txt","a")

# create dictionary
theKey={}

# split the usrIDs.txt file to populate dictionary
for line in rIDs:
	idEnt=shlex.split(line)
	theKey[idEnt[1]]=int(idEnt[0])

# match dictionary entry with nCol
# and save to output.txt
for line in rLinks:
	nodes=shlex.split(line)
	nFrom=nodes[0]
	nTo=nodes[1]
	idFrom=theKey[nFrom]
	idTo=theKey[nTo]
	edge=str(idFrom)+" "+str(idTo)+"\n"
	aOutput.write(edge)
	
rIDs.close()
rLinks.close()

The final output file now reads

# output.txt
3 2
1 4
4 2

and can be used as the *Edges section.

PROBLEM

The Python script can’t handle non-alphanumerical characters…so they have to be deleted from all files running this on vim

%s/[^a-zA-Z0-9_: ]//g

Share this:

Related

One reply on “From nCol to Pajek 1/2 – Python”

Leave a comment Cancel reply