Categories
data science

Inequality and Lorenz Curve – R

Inspired by the train-wreck that was yesterday’s post, I was able to find a better solution to calculating inequality and plotting Lorenz Curves using the ineq library in R.

EDIT: gini() in the reldist package also works.

Installing and loading the Library

Download and install

> install.packages('ineq')

Now load the library

> library('ineq')

The Data

I have a frequency list which looks a bit like this

# user_posts.txt
user            posts
jose             2342
BonQuisha        1564
Kisha            1198

...               ...

Takiera             2
Tramicia            1
Watermelondrea      1

so we load the file to a data frame.

> df <- read.csv('path/to_the_file/user_posts.txt',sep='\t')

The “posts” column contains the data that we want to analyse.

We want to do two things, first, calculate the Gini Index (or coefficient), and the second is to plot a Lorenz curve.

Gini Index

This is as simple as it gets

> ineq(df$posts,type='Gini')
# and that returns
[1] 0.8724686

AWESOME!

Lorenz Curve Plot

Again, this cannot be any simpler…

> plot(Lc(df$posts))

and that should give us something pretty basic like this

basicLorenz

Which is nice and all, but we can always make it better by changing the labels, title and lines.

> plot(Lc(df$subs),
        xlab="User Percentile",
        ylab="Post Percentage",
        main="Participation Inequality",
        col="blue",
        )

Now we get a much nicer graph

finalLorenz

DONE!

One reply on “Inequality and Lorenz Curve – R”

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s