data science

Exploring jason files – Python

Working with json files can be freaking horrible, specially if you don’t know what data is in the file.

Let me give you and example of how unreadable it can be.

If you use Apple’s iTunes search API, and you search for user id 112018 you get this in return

 "results": [
{"wrapperType":"artist", "artistType":"Artist", "artistName":"Nirvana", "artistLinkUrl":"", "artistId":112018, "amgArtistId":5034, "primaryGenreName":"Rock", "primaryGenreId":21, "radioStationUrl":""}]

Which is crappy because you don’t know what data you have, specially if it’s a large file.

The best a quickest way of sorting that out is by using the .dumps() function in the json module, which works in a similar way as the .prettify() function in BeautifulSoup.

This is how you do it.

import urllib2 as ul2
import json

# first, the url for the query
str_url = ''

# the request
ul_req = ul2.Request(str_url, 

# open connection and read info
str_json = ul2.urlopen(ul_req).read()

# read the json data
js_d = json.loads(str_json)

# this bit will prettify it,
# it's the indent=True that does it
str_output = json.dumps(js_d ,indent=True)

# save the readable json to a file
with open('output.json','w') as fw:

And the output file looks a bit like this…nicer and readable.

 "resultCount": 1, 
 "results": [
   "artistType": "Artist", 
   "amgArtistId": 5034, 
   "wrapperType": "artist", 
   "artistId": 112018, 
   "artistLinkUrl": "", 
   "radioStationUrl": "", 
   "artistName": "Nirvana", 
   "primaryGenreId": 21, 
   "primaryGenreName": "Rock"

Now you know what data you have.

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s