Categories
Uncategorized

Coding and Decoding urls – Python

Url addresses can’t have special characters, so some of these are encoded.

An example would be a url that contains spaces, such as

http://www.this is an address.com/

Which then gets encoded to

http://www.this%20is%20an%20address.com/

More on url encoding here:
http://www.w3schools.com/tags/ref_urlencode.asp

What this means is that if we want to have a list of addresses which have url, you may end up with a few characters that you don’t want.

Using Python’s urllib, however, we can encode and decode.

Encoding

# we import urllib
from urllib import pathname2url

# this is the page we want to get to
str_page_address = 'this is the page'

# we now encode the address
str_page_address_encoded = pathname2url(str_page_address)

# this is the full url with the uncoded page name
str_full_url = 'http://www.mysite.com/%s' % str_page_address_encoded

print str_full_url

This should then return

http://www.mysite.com/this%20is%20the%20page

Decoding

# we import urllib
from urllib import url2pathname

# we start with the path name
str_full_url = 'http://www.mysite.com/this%20is%20the%20page'

# we get rid of the encoding
str_decode_url = url2pathname(str_full_url)

print str_decode_url

Which should then return

'http://www.mysite.com/this is the page'

Done!!

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s