data science

Inequality and Lorenz Curve – R

Inspired by the train-wreck that was yesterday’s post, I was able to find a better solution to calculating inequality and plotting Lorenz Curves using the ineq library in R.

EDIT: gini() in the reldist package also works.

Installing and loading the Library

Download and install

> install.packages('ineq')

Now load the library

> library('ineq')

The Data

I have a frequency list which looks a bit like this

# user_posts.txt
user            posts
jose             2342
BonQuisha        1564
Kisha            1198

...               ...

Takiera             2
Tramicia            1
Watermelondrea      1

so we load the file to a data frame.

> df <- read.csv('path/to_the_file/user_posts.txt',sep='\t')

The “posts” column contains the data that we want to analyse.

We want to do two things, first, calculate the Gini Index (or coefficient), and the second is to plot a Lorenz curve.

Gini Index

This is as simple as it gets

> ineq(df$posts,type='Gini')
# and that returns
[1] 0.8724686


Lorenz Curve Plot

Again, this cannot be any simpler…

> plot(Lc(df$posts))

and that should give us something pretty basic like this


Which is nice and all, but we can always make it better by changing the labels, title and lines.

> plot(Lc(df$subs),
        xlab="User Percentile",
        ylab="Post Percentage",
        main="Participation Inequality",

Now we get a much nicer graph



One reply on “Inequality and Lorenz Curve – R”

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s