Url addresses can’t have special characters, so some of these are encoded.
An example would be a url that contains spaces, such as
http://www.this is an address.com/
Which then gets encoded to
More on url encoding here:
What this means is that if we want to have a list of addresses which have url, you may end up with a few characters that you don’t want.
Using Python’s urllib, however, we can encode and decode.
# we import urllib from urllib import pathname2url # this is the page we want to get to str_page_address = 'this is the page' # we now encode the address str_page_address_encoded = pathname2url(str_page_address) # this is the full url with the uncoded page name str_full_url = 'http://www.mysite.com/%s' % str_page_address_encoded print str_full_url
This should then return
# we import urllib from urllib import url2pathname # we start with the path name str_full_url = 'http://www.mysite.com/this%20is%20the%20page' # we get rid of the encoding str_decode_url = url2pathname(str_full_url) print str_decode_url
Which should then return
'http://www.mysite.com/this is the page'