This process will take nCol data and create the *Vertices
and *Edges
Pajek sections.
Pulled all the nCol data from the database and saved as links.txt
, which reads:
# links.txt Jim Elle Andy Shaniqua Shaniqua Elle
I pulled all the user names from the database and saved in a temporary file, usrTemp.txt
.
# usrTemp.txt Andy Jim Andy Elle Shaniqua Jim Jim
Removed all duplicates with the terminal and saved the file as usrIDs.txt
.
# This arranges all lines alphabetically sort usrTemp.txt -o usrTemp.txt # This removes duplicate lines uniq usrTemp.txt usrIDs.txt
Then, using Vim, I added a number for each line
# To add a unique number to every user name. %s/^/\=printf('%d ', line('.'))
So now, the usrIDs.txt
file reads:
1 Andy 2 Elle 3 Jim 4 Shaniqua
This can be used as the *Vertices
section.
Now, use Python to replace nCol user names with global IDs
#!/usr/bin/python # To split by space import shlex # Open nCol file and... rLinks=open("links.txt","r") # global ids rIDs=open("ids.txt","r") # output aOutput=open("output.txt","a") # create dictionary theKey={} # split the usrIDs.txt file to populate dictionary for line in rIDs: idEnt=shlex.split(line) theKey[idEnt[1]]=int(idEnt[0]) # match dictionary entry with nCol # and save to output.txt for line in rLinks: nodes=shlex.split(line) nFrom=nodes[0] nTo=nodes[1] idFrom=theKey[nFrom] idTo=theKey[nTo] edge=str(idFrom)+" "+str(idTo)+"\n" aOutput.write(edge) rIDs.close() rLinks.close()
The final output file now reads
# output.txt 3 2 1 4 4 2
and can be used as the *Edges
section.
PROBLEM
The Python script can’t handle non-alphanumerical characters…so they have to be deleted from all files running this on vim
%s/[^a-zA-Z0-9_: ]//g
One reply on “From nCol to Pajek 1/2 – Python”
[…] Continues from here. […]