Categories

# Scatter Plot with Log Scale – R

## Get a Frequency Count

The user file looks like this

```userID,user,posts
1,user1,581
2,user2,281
3,user3,196
...
2002,usern-2,1
2003,usern-1,1
2004,usern,1
```

First thing is to read the file

```> df<-read.csv('path/to/file.csv')
```

Then we get the frequency count

```# first we figure out what the names
# of the columns are
> names(df)
[1] "userID" "user" "posts"

# we want to count the posts...so

> postFreqCount<-data.frame(table(df['posts']))

```

The frequency count should now return something like this

```> postFreqCount
Var1	    Freq
1	    1	    723
2	    2	    314
3	    3	    186
...
84	    196	    1
85	    281	    1
86	    851	    1
```

## Building the Scatter Plot

We need to use `Freq` as the x coordinates and `Var1` as the y coordinates.

```> x<-as.matrix(postFreqCount['Freq'])
> y<-as.matrix(postFreqCount['Var1'])
```

Now a simple scatter plot can be made like so

```> plot(x,y)
```

Which will look like this…

Which does not look that great, so we will have to apply the log scales.

A simple way of doing it is like this…

```> plot(log(x),log(y))
```

Which will give you something like this…

Which does look a bit better, except that the scales on the axes are from 0 to 6 instead of the real values.

## Applying the log scales

To get the scales right we need to change the way we construct the `plot( )` function.

First, we use the `xy.coords( )` function to set the coordinates for the plot.

Then we add the scale range for each axis using `xlim` and `ylim` – both starting from 1 to their maximum value (starting from zero will give you an error).

Now we can apply the log for both axes using `log="xy"`.

Finally, we can lable both axes with `xlab` and `ylab`.

The final function with all its parameters looks a bit like this…

```plot(
xy.coords(x,y),
xlim=c(1,max(x)),
ylim=c(1,max(y)),
log="xy",
xlab="Frequency",
ylab="Posts"
)
```

The graph now looks like this….

PERFECT!

## Function that saves it to pdf

```logPlot<-function(fileDir,name){
df<-read.csv(fileDir)
pfc<-data.frame(table(df['posts']))
x<-as.matrix(pfc['Freq'])
y<-as.matrix(pfc['Var1'])
pdfPath=paste('Desktop/',name,'.pdf')
pdf(pdfPath)
plot(
xy.coords(x,y),
xlim=c(1,max(x)),
ylim=c(1,as.integer(max(y))),
log="xy",
xlab="Frequency",
ylab="Posts",
main=name
)
dev.off()
}
```