|
|
 |
|
Title: Calculating the distance between zip codes
Submitter: Kevin Ryan
(other recipes)Kevin Ryan
(other recipes)
Last Updated: 2006/04/25
Version no: 1.1
Category:
Web
|
|
4 vote(s)
|
|
|
|
Description:
I came across the mention of a formula labeled "The Great Circle Distance Formula" that purported to calculate the distance between any two points on the earth given their longitude and latitude points (the reference was in a Linux Magazine article). So, I looked up some information and cooked up a Python version of the calculation. There are references in the code where you can obtain approximate zip code data for free (e.g., if you wanted to enhance your website by adding a "Search within x mi's" feature), as well as references to the GCDF if you have further interest. Enjoy!
04/25/2006 update: I've decided to update this recipe with an object oriented bent where the information is cached once the object is instantiated. I've also added command line access to automatically download the zipcode file from the census website (just use 'python zips.py -d' and it will download a copy to your harddrive under 'zips.txt'). Lastly, I've added some unit testing so that if any future changes are made this will automatically run and tell me if anything pops out as wrong.
Source: Text Source
import math
class _Zip_Code_Record:
def __init__(self, zip, state, city, longitude, latitude, perform_transformation=True):
self.zip = zip
self.state = state
self.city = city
if perform_transformation:
self.longitude = math.radians(float(longitude))
self.latitude = math.radians(float(latitude))
else:
self.longitude = longitude
self.latitude = latitude
class Zip_Codes:
def __init__(self, zip_file="zips.txt"):
'''Load the zip dB into memory and initialize our object.'''
self.zips = {}
lines = [line.rstrip().replace('"', '') for line in open(zip_file)]
for line in lines:
zip, city, state, lon, lat = line.split(",")[1:-2]
self.zips[zip] = _Zip_Code_Record(zip, city, state, lon, lat)
def get_distance(self, zip1, zip2):
'''Returns the distance between two points on the earth.
Inputs used are:
Longitude (in radians) of the first location,
Latitude (in radians) of the first location,
Longitude (in radians) of the second location, and
Latitude (in radians) of the second location.
To convert to radians (from degrees), use pythons math.radian function (Note: already done
for you in the constructor above). Returns the distance in miles.'''
long_1 = self.zips[zip1].longitude
lat_1 = self.zips[zip1].latitude
long_2 = self.zips[zip2].longitude
lat_2 = self.zips[zip2].latitude
dlong = long_2 - long_1
dlat = lat_2 - lat_1
a = (math.sin(dlat / 2))**2 + math.cos(lat_1) * math.cos(lat_2) * (math.sin(dlong / 2))**2
c = 2 * math.asin(min(1, math.sqrt(a)))
dist = 3956 * c
return dist
def close_zips(self, starting_zip, radius):
'''For any given zip code (assuming it's valid), returns any zip codes within a 'radius' mile radius.'''
zip_long = self.zips[starting_zip].longitude
zip_lat = self.zips[starting_zip].latitude
close_zips = [self.zips[record].zip for record in self.zips if self.get_distance(self.zips[record].zip, starting_zip) < radius]
return close_zips
def __getitem__(self, zip):
'''allows convenient 'dictionary' access to the zips'''
return self.zips[str(zip)]
if __name__ == "__main__":
import sys
if len(sys.argv) > 1 and sys.argv[1] == "-d":
if len(sys.argv) == 2:
filename = 'zips.txt'
else:
filename = sys.argv[2]
import urllib2
zip_url = 'http://www.census.gov/tiger/tms/gazetteer/zips.txt'
print "Downloading %s from the internet to %s on your local machine." % (zip_url, filename)
f = open(filename, 'w')
f.write(urllib2.urlopen(zip_url).read())
f.close()
print "Successfully downloaded zip file. Use %s %s to run unittests." % (sys.argv[0], filename)
sys.exit(1)
if len(sys.argv) > 1:
zip_code_filename = sys.argv[1]
del sys.argv[1:]
else:
zip_code_filename = None
import unittest
class TestDistances(unittest.TestCase):
def setUp(self):
if len(sys.argv) > 1:
self._zc = Zip_Codes(sys.argv[1])
else:
self._zc = Zip_Codes()
def test_dist_measurement(self):
zips = [
('57350', '58368', 286.182054428),
('75217', '11366', 1376.21109349),
('67431', '64631', 227.366924671),
('57106', '27812', 1158.00278947),
('54460', '50518', 254.522422855),
('54451', '28756', 802.75909488),
('32615', '65624', 794.942194475),
('67054', '71827', 439.965551049),
('45686', '36879', 463.978723078),
('37845', '30506', 121.411082293),
('49440', '29847', 699.897793808),
('41331', '12046', 606.944002949),
('05083', '65244', 1093.97636115),
('84069', '70374', 1451.94104395),
('95821', '55760', 1520.98809123)
]
for zip in zips:
self.assertAlmostEqual(zip[2], self._zc.get_distance(zip[0], zip[1]), 7, "%s distance from %s failed" % (zip[0], zip[1]))
def test_getitem(self):
self.assertEqual('PILGRIM GARDENS', self._zc['19026'].city)
self.assertEqual('PILGRIM GARDENS', self._zc[19026].city)
def test_closezips(self):
close_zips = ['90210', '90212', '90035', '90211', '90067', '90069', '90046', '90048', '90077', '90024']
self.assertEqual(close_zips, self._zc.close_zips('90210', 3))
unittest.main()
Discussion:
Instead of using the close_zips function given, you could store your data in a database (such as SQLite or MySQL) and let them do the heavy lifting. But this approach seems to be reasonable if you're only doing a few lookups (i.e., not more than 5 or so). I'm sure you could substantially improve the performance by simply caching the zips (instead of openning the file every time), but I'll leave that up to you ;)
04/25/2006 Update: I no longer leave the caching up to you :)
|
|
Add comment
|
|
Number of comments: 14
German zip codes, Dirk Holtwick, 2005/03/29
Good zip data (aka Postleitzahlen) in various formats and free of payment for Germany, Austria and Switzerland can be found here: http://opengeodb.sourceforge.net/
Add comment
US Zip Code Site Not Apparently Valid, Eric Thompson, 2005/04/04
Unless there's a fee or subscription involved. Get a permission-denied error when I try it.
Add comment
zips.txt, Cody Pisto, 2005/04/04
Rather then posting a direct link,
append zips.txt to the url:
http://www.census.gov/tiger/tms/gazetteer/
and you'll get what you want ;)
Add comment
Kevin Ryan,Kevin Ryan, 2005/04/17
Thank You ... I also noted that they've updated their site. You can get more information from:
http://www.census.gov/geo/www/gazetteer/gazette.html
such as file layout, other data sets, etc.
Add comment
this code is WRONG, L T, 2005/07/29
This code doesn't work correctly. try getting the distance between New York City and San Francisco...
The number will be over 5000, and that's not right even in kilometers.
Correct code is available at from http://www.zachary.com/blog/2005/01/12/python_zipcode_geo-programming
Oh, and a great zip code DB is available at http://www.cfdynamics.com/cfdynamics/zipbase/index.cfm
/LT>
Add comment
Not from what I can see, Kevin Ryan,Kevin Ryan, 2005/10/20
I tried your mythical San Fran vs. NY problem ... in the dB I gave reference to, SF has Long / Lat coordinates (zip 94134) of -122.409577 / 37.718969, respectively. This translates into -2.13645 / 0.65832 (rounded, again respectively) in radians (which the recipe calls for). NY (zip 10011) is -73.99963 / 40.740225 => -1.291537 / 0.711051. Pumping the radian values into the formula I provided (dist) gives me a distance of ~ 2,564. Using the same coordinates in your formula produces 2,601 (diff ~ 37 mi's). Therefore, not sure why you would consider my code as "WRONG" unless your code is wrong as well. So until you can reproduce your error, please don't criticize my work and then try to promote your blog through such weak rhetoric.
Add comment
Peter Kleiweg, 2006/04/26
The code on www.zachary.com is bogus. Calculate the distance between -90/89 and 90/89, two locations near the north pole, and you get the same distance as if it were two locations on opposite sides on the equator.
I don't understand the formula Kevin Ryan is using. I wrote the code below, that gives 2565 miles for the distance between SF and NY, only one mile off from Kevin's result.
import math
class Position:
def __init__(self, longitude, latitude):
'init position with longitude/latitude coordinates'
llx = math.radians(longitude)
lly = math.radians(latitude)
self.x = math.sin(llx) * math.cos(lly)
self.y = math.cos(llx) * math.cos(lly)
self.z = math.sin(lly)
def __sub__(self, other):
'get distance in km between two positions'
d = self.x * other.x + self.y * other.y + self.z * other.z
if d < -1:
d = -1
if d > 1:
d = 1
km = math.acos(d) / math.pi * 20000.0
return km
if __name__ == '__main__':
d = Position(-122.409577, 37.718969) - Position(-73.99963, 40.740225)
print d, 'kilometers'
print d / 1.609, 'miles'
Output:
4127.21644987 kilometers
2565.08169662 miles
Add comment
Peter Kleiweg, 2006/04/26
Actually, the difference is only 0.4 miles.
Add comment
thanks :), Kevin Ryan,Kevin Ryan, 2006/04/27
Awesome, thanks!
Add comment
zachary.com code is not bogus, David Creemer, 2006/05/31
The code on zachary.com is not bogus. Be sure you call the calcDistance function with longitude and latitude passed correctly. If you do, the code on zachary.com gives an answer of about 217 nautical miles for your example, which is entirely reasonable.
Add comment
Bug in the code, John DeRosa, 2006/07/07
There's a bug in the code, but another bug in the code compensates for it.
Executable lines 4-5 of __init__ should be:
zip, state, city, lon, lat = line.split(",")[1:-2]
self.zips[zip] = _Zip_Code_Record(zip, state, city, lon, lat)
John
Add comment
use Google, simple..., Nikolay Bushkov, 2006/10/10
# geopy - http://exogen.case.edu/projects/geopy/
from geopy import geocoders
from geopy import distance
g = geocoders.Google(resource='maps', output_format='js')
#('57350', '58368', 286.182054428), # HURON, SD ... PLEASANT LAKE, ND
place, (lat, lng) = g.geocode("57350")
place1, (lat1, lng1) = g.geocode("58368")
print "%s: %.5f, %.5f" % (place, lat, lng)
print "%s: %.5f, %.5f" % (place1, lat1, lng1)
print "dist: ", distance.distance((lat, lng), (lat1, lng1)).miles
Output:
SD 57350: 44.35902, -98.21629
ND 58368: 48.31726, -99.99895
dist: 286.367102557
Add comment
Simple, but..., Jeremy Dunck, 2006/12/12
The Google EULA excludes many uses. So simple, but sometimes also illegal.
Add comment
Works for me, Marc Draco, 2007/11/22
The database mentioned is pants - many of the lat/longs are copies of nearby neighbours. :-(
Add comment
|
|
|
|
|
 |
|