Categories
data science

Scatter Plot with Log Scale – R

Get a Frequency Count

The user file looks like this

userID,user,posts
1,user1,581
2,user2,281
3,user3,196
 ...
2002,usern-2,1
2003,usern-1,1
2004,usern,1

First thing is to read the file

> df<-read.csv('path/to/file.csv')

Then we get the frequency count

# first we figure out what the names
# of the columns are
> names(df)
[1] "userID" "user" "posts"

# we want to count the posts...so

> postFreqCount<-data.frame(table(df['posts']))

The frequency count should now return something like this

> postFreqCount
      Var1	    Freq
1	    1	    723
2	    2	    314
3	    3	    186
...
84	    196	    1
85	    281	    1
86	    851	    1

Building the Scatter Plot

We need to use Freq as the x coordinates and Var1 as the y coordinates.

> x<-as.matrix(postFreqCount['Freq'])
> y<-as.matrix(postFreqCount['Var1'])

Now a simple scatter plot can be made like so

> plot(x,y)

Which will look like this…

simple

Which does not look that great, so we will have to apply the log scales.

A simple way of doing it is like this…

> plot(log(x),log(y))

Which will give you something like this…

simple2

Which does look a bit better, except that the scales on the axes are from 0 to 6 instead of the real values.

Applying the log scales

To get the scales right we need to change the way we construct the plot( ) function.

First, we use the xy.coords( ) function to set the coordinates for the plot.

Then we add the scale range for each axis using xlim and ylim – both starting from 1 to their maximum value (starting from zero will give you an error).

Now we can apply the log for both axes using log="xy".

Finally, we can lable both axes with xlab and ylab.

The final function with all its parameters looks a bit like this…

plot(
	xy.coords(x,y),
	xlim=c(1,max(x)),
	ylim=c(1,max(y)),
	log="xy",
	xlab="Frequency",
	ylab="Posts"
	)

The graph now looks like this….

simple3

PERFECT!

Function that saves it to pdf

logPlot<-function(fileDir,name){
	df<-read.csv(fileDir)
	pfc<-data.frame(table(df['posts']))
	x<-as.matrix(pfc['Freq'])
	y<-as.matrix(pfc['Var1'])
	pdfPath=paste('Desktop/',name,'.pdf')
	pdf(pdfPath)
	plot(
		xy.coords(x,y),
		xlim=c(1,max(x)),
		ylim=c(1,as.integer(max(y))),
		log="xy",
		xlab="Frequency",
		ylab="Posts",
		main=name
	)
	dev.off()
}

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s