## Get a Frequency Count

The user file looks like this

userID,user,posts 1,user1,581 2,user2,281 3,user3,196 ... 2002,usern-2,1 2003,usern-1,1 2004,usern,1

First thing is to read the file

> df<-read.csv('path/to/file.csv')

Then we get the frequency count

# first we figure out what the names # of the columns are > names(df) [1] "userID" "user" "posts" # we want to count the posts...so > postFreqCount<-data.frame(table(df['posts']))

The frequency count should now return something like this

> postFreqCount Var1 Freq 1 1 723 2 2 314 3 3 186 ... 84 196 1 85 281 1 86 851 1

## Building the Scatter Plot

We need to use `Freq`

as the *x* coordinates and `Var1`

as the *y* coordinates.

> x<-as.matrix(postFreqCount['Freq']) > y<-as.matrix(postFreqCount['Var1'])

Now a simple scatter plot can be made like so

> plot(x,y)

Which will look like this…

Which does not look that great, so we will have to apply the log scales.

A simple way of doing it is like this…

> plot(log(x),log(y))

Which will give you something like this…

Which does look a bit better, except that the scales on the axes are from 0 to 6 instead of the real values.

## Applying the log scales

To get the scales right we need to change the way we construct the `plot( )`

function.

First, we use the `xy.coords( )`

function to set the coordinates for the plot.

Then we add the scale range for each axis using `xlim`

and `ylim`

– both starting from 1 to their maximum value (starting from zero will give you an error).

Now we can apply the log for both axes using `log="xy"`

.

Finally, we can lable both axes with `xlab`

and `ylab`

.

The final function with all its parameters looks a bit like this…

plot( xy.coords(x,y), xlim=c(1,max(x)), ylim=c(1,max(y)), log="xy", xlab="Frequency", ylab="Posts" )

The graph now looks like this….

PERFECT!

## Function that saves it to pdf

logPlot<-function(fileDir,name){ df<-read.csv(fileDir) pfc<-data.frame(table(df['posts'])) x<-as.matrix(pfc['Freq']) y<-as.matrix(pfc['Var1']) pdfPath=paste('Desktop/',name,'.pdf') pdf(pdfPath) plot( xy.coords(x,y), xlim=c(1,max(x)), ylim=c(1,as.integer(max(y))), log="xy", xlab="Frequency", ylab="Posts", main=name ) dev.off() }