Welcome, guest | Sign In | My Account | Store | Cart

I came across the mention of a formula labeled "The Great Circle Distance Formula" that purported to calculate the distance between any two points on the earth given their longitude and latitude points (the reference was in a Linux Magazine article). So, I looked up some information and cooked up a Python version of the calculation. There are references in the code where you can obtain approximate zip code data for free (e.g., if you wanted to enhance your website by adding a "Search within x mi's" feature), as well as references to the GCDF if you have further interest. Enjoy!

04/25/2006 update: I've decided to update this recipe with an object oriented bent where the information is cached once the object is instantiated. I've also added command line access to automatically download the zipcode file from the census website (just use 'python zips.py -d' and it will download a copy to your harddrive under 'zips.txt'). Lastly, I've added some unit testing so that if any future changes are made this will automatically run and tell me if anything pops out as wrong.

Python, 169 lines
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169``` ```#!/usr/bin/python # # Author: Kevin T. Ryan, kevryan0701@yahoo.com # Date: 03/29/2005 # # dist computes the distance between two points on the earth, as identified by their # longitude and latitude identifications in MILES. If you would like to change the # distance (to, perhaps, calculate the distance in km) you should change the last line # of the function. For example, for km, use 6399 instead of 3956 (the approximate # distance of the earth). # # Obtained distance formula from "http://www.census.gov/cgi-bin/geo/gisfaq?Q5.1" and # more information is available at "http://en.wikipedia.org/wiki/Great_circle_distance". # # Zip code data obtained from 'http://www.census.gov/tiger/tms/gazetteer/zips.txt' # # Changes: # 04/09/2006: Added a class (Zip_Codes) to cache the zip data in memory for fast lookups and # a convenience class _Zip_Code_Record that gives us named access to zipcode # properties (e.g., longitude). Can also instantiate a Zip_Codes object and then # have access to the zips like zipcode_obj['08083'].city (== SOMERDALE) which gives # fast access to attributes and distance calculations import math class _Zip_Code_Record: def __init__(self, zip, state, city, longitude, latitude, perform_transformation=True): self.zip = zip self.state = state self.city = city if perform_transformation: self.longitude = math.radians(float(longitude)) self.latitude = math.radians(float(latitude)) else: self.longitude = longitude self.latitude = latitude class Zip_Codes: def __init__(self, zip_file="zips.txt"): '''Load the zip dB into memory and initialize our object.''' self.zips = {} lines = [line.rstrip().replace('"', '') for line in open(zip_file)] # Do some cleanup on the lines before setting up our "db" for line in lines: # Default layout is: # The 1990 Zip Code by State file is comma separated values, ASCII text, one # record per line. The field/record layout is as follows: # Field 1 - State Fips Code # Field 2 - 5-digit Zipcode # Field 3 - State Abbreviation # Field 4 - Zipcode Name # Field 5 - Longitude in Decimal Degrees (West is assumed, no minus sign) # Field 6 - Latitude in Decimal Degrees (North is assumed, no plus sign) # Field 7 - 1990 Population (100%) # Field 8 - Allocation Factor (decimal portion of state within zipcode) # (see http://www.census.gov/tiger/tms/gazetteer/zip90r.txt) zip, city, state, lon, lat = line.split(",")[1:-2] self.zips[zip] = _Zip_Code_Record(zip, city, state, lon, lat) def get_distance(self, zip1, zip2): '''Returns the distance between two points on the earth. Inputs used are: Longitude (in radians) of the first location, Latitude (in radians) of the first location, Longitude (in radians) of the second location, and Latitude (in radians) of the second location. To convert to radians (from degrees), use pythons math.radian function (Note: already done for you in the constructor above). Returns the distance in miles.''' long_1 = self.zips[zip1].longitude lat_1 = self.zips[zip1].latitude long_2 = self.zips[zip2].longitude lat_2 = self.zips[zip2].latitude dlong = long_2 - long_1 dlat = lat_2 - lat_1 a = (math.sin(dlat / 2))**2 + math.cos(lat_1) * math.cos(lat_2) * (math.sin(dlong / 2))**2 c = 2 * math.asin(min(1, math.sqrt(a))) dist = 3956 * c return dist def close_zips(self, starting_zip, radius): '''For any given zip code (assuming it's valid), returns any zip codes within a 'radius' mile radius.''' zip_long = self.zips[starting_zip].longitude zip_lat = self.zips[starting_zip].latitude close_zips = [self.zips[record].zip for record in self.zips if self.get_distance(self.zips[record].zip, starting_zip) < radius] return close_zips def __getitem__(self, zip): '''allows convenient 'dictionary' access to the zips''' return self.zips[str(zip)] if __name__ == "__main__": # First, we check if the user wants to download the zip file import sys if len(sys.argv) > 1 and sys.argv[1] == "-d": if len(sys.argv) == 2: filename = 'zips.txt' else: filename = sys.argv[2] import urllib2 zip_url = 'http://www.census.gov/tiger/tms/gazetteer/zips.txt' print "Downloading %s from the internet to %s on your local machine." % (zip_url, filename) f = open(filename, 'w') f.write(urllib2.urlopen(zip_url).read()) f.close() print "Successfully downloaded zip file. Use %s %s to run unittests." % (sys.argv[0], filename) sys.exit(1) # Otherwise, we do some unit testing if len(sys.argv) > 1: zip_code_filename = sys.argv[1] # The following is a hack b/c unittest seems to use sys.argv to grab arguments - which we don't want :) del sys.argv[1:] else: zip_code_filename = None import unittest class TestDistances(unittest.TestCase): def setUp(self): if len(sys.argv) > 1: self._zc = Zip_Codes(sys.argv[1]) else: self._zc = Zip_Codes() def test_dist_measurement(self): zips = [ ('57350', '58368', 286.182054428), # HURON, SD ... PLEASANT LAKE, ND ('75217', '11366', 1376.21109349), # DALLAS, TX ... FRESH MEADOWS, NY ('67431', '64631', 227.366924671), # CHAPMAN, KS ... BUCKLIN, MO ('57106', '27812', 1158.00278947), # SIOUX FALLS, SD ... BETHEL, NC ('54460', '50518', 254.522422855), # OWEN, WI ... BARNUM, IA ('54451', '28756', 802.75909488), # MEDFORD, WI ... MILL SPRING, NC ('32615', '65624', 794.942194475), # SANTA FE, FL ... CAPE FAIR, MO ('67054', '71827', 439.965551049), # GREENSBURG, KS ... BUCKNER, AR ('45686', '36879', 463.978723078), # VINTON, OH ... WAVERLY, AL ('37845', '30506', 121.411082293), # PETROS, TN ... GAINESVILLE, GA ('49440', '29847', 699.897793808), # MUSKEGON, MI ... TRENTON, SC ('41331', '12046', 606.944002949), # HADDIX, KY ... COEYMANS HOLLOW, NY ('05083', '65244', 1093.97636115), # WEST FAIRLEE, VT ... CLIFTON HILL, MO ('84069', '70374', 1451.94104395), # RUSH VALLEY, UT ... LOCKPORT, LA ('95821', '55760', 1520.98809123) # SACRAMENTO, CA ... MC GREGOR, MN ] for zip in zips: self.assertAlmostEqual(zip[2], self._zc.get_distance(zip[0], zip[1]), 7, "%s distance from %s failed" % (zip[0], zip[1])) def test_getitem(self): self.assertEqual('PILGRIM GARDENS', self._zc['19026'].city) self.assertEqual('PILGRIM GARDENS', self._zc[19026].city) def test_closezips(self): close_zips = ['90210', '90212', '90035', '90211', '90067', '90069', '90046', '90048', '90077', '90024'] self.assertEqual(close_zips, self._zc.close_zips('90210', 3)) unittest.main() ```

Instead of using the close_zips function given, you could store your data in a database (such as SQLite or MySQL) and let them do the heavy lifting. But this approach seems to be reasonable if you're only doing a few lookups (i.e., not more than 5 or so). I'm sure you could substantially improve the performance by simply caching the zips (instead of openning the file every time), but I'll leave that up to you ;)

04/25/2006 Update: I no longer leave the caching up to you :)

Dirk Holtwick 16 years, 8 months ago

German zip codes. Good zip data (aka Postleitzahlen) in various formats and free of payment for Germany, Austria and Switzerland can be found here: http://opengeodb.sourceforge.net/

Eric Thompson 16 years, 8 months ago

US Zip Code Site Not Apparently Valid. Unless there's a fee or subscription involved. Get a permission-denied error when I try it.

Cody Pisto 16 years, 8 months ago

zips.txt. Rather then posting a direct link,

append zips.txt to the url:

http://www.census.gov/tiger/tms/gazetteer/

and you'll get what you want ;)

Kevin Ryan (author) 16 years, 7 months ago

Thank You ... I also noted that they've updated their site. You can get more information from: http://www.census.gov/geo/www/gazetteer/gazette.html such as file layout, other data sets, etc.

L T 16 years, 4 months ago

this code is WRONG. This code doesn't work correctly. try getting the distance between New York City and San Francisco... The number will be over 5000, and that's not right even in kilometers.

Correct code is available at from http://www.zachary.com/blog/2005/01/12/python_zipcode_geo-programming

Oh, and a great zip code DB is available at http://www.cfdynamics.com/cfdynamics/zipbase/index.cfm

/LT>

Kevin Ryan (author) 16 years, 1 month ago

Not from what I can see. I tried your mythical San Fran vs. NY problem ... in the dB I gave reference to, SF has Long / Lat coordinates (zip 94134) of -122.409577 / 37.718969, respectively. This translates into -2.13645 / 0.65832 (rounded, again respectively) in radians (which the recipe calls for). NY (zip 10011) is -73.99963 / 40.740225 => -1.291537 / 0.711051. Pumping the radian values into the formula I provided (dist) gives me a distance of ~ 2,564. Using the same coordinates in your formula produces 2,601 (diff ~ 37 mi's). Therefore, not sure why you would consider my code as "WRONG" unless your code is wrong as well. So until you can reproduce your error, please don't criticize my work and then try to promote your blog through such weak rhetoric.

Peter Kleiweg 15 years, 7 months ago

The code on www.zachary.com is bogus. Calculate the distance between -90/89 and 90/89, two locations near the north pole, and you get the same distance as if it were two locations on opposite sides on the equator.

I don't understand the formula Kevin Ryan is using. I wrote the code below, that gives 2565 miles for the distance between SF and NY, only one mile off from Kevin's result.

``````import math

class Position:

def __init__(self, longitude, latitude):
'init position with longitude/latitude coordinates'
self.x = math.sin(llx) * math.cos(lly)
self.y = math.cos(llx) * math.cos(lly)
self.z = math.sin(lly)

def __sub__(self, other):
'get distance in km between two positions'
d = self.x * other.x + self.y * other.y + self.z * other.z
if d &lt; -1:
d = -1
if d &gt; 1:
d = 1
km = math.acos(d) / math.pi * 20000.0
return km

if __name__ == '__main__':

d = Position(-122.409577, 37.718969) - Position(-73.99963, 40.740225)

print d, 'kilometers'
print d / 1.609, 'miles'
``````

Output:

``````4127.21644987 kilometers
2565.08169662 miles
``````
Peter Kleiweg 15 years, 7 months ago

Actually, the difference is only 0.4 miles.

Kevin Ryan (author) 15 years, 7 months ago

thanks :). Awesome, thanks!

David Creemer 15 years, 6 months ago

zachary.com code is not bogus. The code on zachary.com is not bogus. Be sure you call the

``````calcDistance
``````

function with longitude and latitude passed correctly. If you do, the code on zachary.com gives an answer of about 217 nautical miles for your example, which is entirely reasonable.

John DeRosa 15 years, 4 months ago

Bug in the code. There's a bug in the code, but another bug in the code compensates for it.

Executable lines 4-5 of __init__ should be:

``````zip, state, city, lon, lat = line.split(",")[1:-2]
self.zips[zip] = _Zip_Code_Record(zip, state, city, lon, lat)
``````

John

Nikolay Bushkov 15 years, 1 month ago

``````# geopy - http://exogen.case.edu/projects/geopy/
from geopy import geocoders
from geopy import distance

#('57350', '58368', 286.182054428), # HURON, SD ... PLEASANT LAKE, ND

place, (lat, lng) = g.geocode("57350")
place1, (lat1, lng1) = g.geocode("58368")

print "%s: %.5f, %.5f" % (place, lat, lng)
print "%s: %.5f, %.5f" % (place1, lat1, lng1)
print "dist: ", distance.distance((lat, lng), (lat1, lng1)).miles

Output:
SD 57350: 44.35902, -98.21629
ND 58368: 48.31726, -99.99895
dist:  286.367102557
``````
Jeremy Dunck 14 years, 11 months ago

Simple, but... The Google EULA excludes many uses. So simple, but sometimes also illegal.

Marc Draco 14 years ago

Works for me. The database mentioned is pants - many of the lat/longs are copies of nearby neighbours. :-(

 Created by Kevin Ryan on Tue, 29 Mar 2005 (PSF)