I was data scraping the other day and saving the output to a JSON file, but the text in one of the entries was coming out wrong.
Instead of “Antal Dovcsák”, it was coming out as “Antal Dovcs\u00e1k” instead.
For context, my code looked something like this:
import requests
import json
import bs4
...
dicData = {'name': foundData.text, 'url': foundData['href']}
with open('output_file.json', 'w') as fw:
fw.write(json.dumps(dicData, indent=2))
And the output was this…
{
"name": "Antal Dovcs\u00e1k",
"url": "/wiki/Antal_Dovcs%C3%A1k"
}
After some googling I found this awesome entry in Stakoverflow.
My code now looks like this now:
import requests
import json
import bs4
...
dicData = {'name': foundData.text, 'url': foundData['href']}
jsonOutput = json.dumps(dicData, indent=2, ensure_ascii=False).encode('utf8')
with open('output_file.json', 'w') as fw:
fw.write(jsonOutput.decode())
This will now give me the following output.
{
"name": "Antal Dovcsák",
"url": "/wiki/Antal_Dovcs%C3%A1k"
}
Perfect!