The list and the problem…
I have a little list, and I would like to get a frequency count.
myList=['jose','jose','jose','shaniqua','shaniqua','lafawnduh']
so that i end up with something like this…
0 jose 3 1 shaniqua 2 2 lafawnduh 1
Constructing the dataframe
Step one is to put the list into a dataframe using the pandas
module.
from pandas import * theDF=DataFrame(myList)
The dataframe looks something like this…
0 0 jose 1 jose 2 jose 3 shaniqua 4 shaniqua 5 lafawnduh
Getting a frequency count
To get the frequency count we do this…
theSe=theDF[0].value_counts()
This, returns a series with the names as its index, which looks like this…
jose 3 shaniqua 2 lafawnduh 1
Which is okay, except we want a dataframe with a sequential index.
We can get this by creating a second dataframe with the data in the series, with one column for the names and the other for the frequencies…
theDF2=DataFrame({"Names":theSe.index,"Freq":theSe})
And that looks something like this…
Freq Names jose 3 jose shaniqua 2 shaniqua lafawnduh 1 lafawnduh
We can see that the two columns now have names, and that the index is still the names on the list.
The final step is therefore to change the index to sequential numbers starting from zero to what ever.
Reindexing the dataframe
First we create a list of sequential numbers using range
with the length of the dataframe as the limit…
newIndexList=range(0,len(theDF2))
Now we can use that list to reindex the dataframe by first putting the newIndexList
into a new column
theDF2['ni']=newIndexList
Finally, we turn that column into the index.
theDF2=theDF2.set_index('ni')
So the dataframe now looks like this…
Freq Names ni 0 3 jose 1 2 shaniqua 2 1 lafawnduh
DONE!