Welcome, guest | Sign In | My Account | Store | Cart

I came across the mention of a formula labeled "The Great Circle Distance Formula" that purported to calculate the distance between any two points on the earth given their longitude and latitude points (the reference was in a Linux Magazine article). So, I looked up some information and cooked up a Python version of the calculation. There are references in the code where you can obtain approximate zip code data for free (e.g., if you wanted to enhance your website by adding a "Search within x mi's" feature), as well as references to the GCDF if you have further interest. Enjoy!

04/25/2006 update: I've decided to update this recipe with an object oriented bent where the information is cached once the object is instantiated. I've also added command line access to automatically download the zipcode file from the census website (just use 'python zips.py -d' and it will download a copy to your harddrive under 'zips.txt'). Lastly, I've added some unit testing so that if any future changes are made this will automatically run and tell me if anything pops out as wrong.

Python, 169 lines
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
#!/usr/bin/python
#
# Author:    Kevin T. Ryan, kevryan0701@yahoo.com
# Date:      03/29/2005
#
# dist computes the distance between two points on the earth, as identified by their
# longitude and latitude identifications in MILES.  If you would like to change the 
# distance (to, perhaps, calculate the distance in km) you should change the last line
# of the function.  For example, for km, use 6399 instead of 3956 (the approximate
# distance of the earth).
#
# Obtained distance formula from "http://www.census.gov/cgi-bin/geo/gisfaq?Q5.1" and 
# more information is available at "http://en.wikipedia.org/wiki/Great_circle_distance".
#
# Zip code data obtained from 'http://www.census.gov/tiger/tms/gazetteer/zips.txt'
#
# Changes:
#   04/09/2006: Added a class (Zip_Codes) to cache the zip data in memory for fast lookups and
#               a convenience class _Zip_Code_Record that gives us named access to zipcode
#               properties (e.g., longitude).  Can also instantiate a Zip_Codes object and then
#               have access to the zips like zipcode_obj['08083'].city (== SOMERDALE) which gives
#               fast access to attributes and distance calculations

import math

class _Zip_Code_Record:

    def __init__(self, zip, state, city, longitude, latitude, perform_transformation=True):
        self.zip       = zip
        self.state     = state
        self.city      = city
        if perform_transformation:
            self.longitude = math.radians(float(longitude))
            self.latitude  = math.radians(float(latitude))
        else:
            self.longitude = longitude
            self.latitude  = latitude

class Zip_Codes:

    def __init__(self, zip_file="zips.txt"):
        '''Load the zip dB into memory and initialize our object.'''
        self.zips = {}

        lines = [line.rstrip().replace('"', '') for line in open(zip_file)] # Do some cleanup on the lines before setting up our "db"
        for line in lines:
            # Default layout is:
            # The 1990 Zip Code by State file is comma separated values, ASCII text, one 
            # record per line. The field/record layout is as follows: 

                # Field 1 - State Fips Code 
                # Field 2 - 5-digit Zipcode 
                # Field 3 - State Abbreviation 
                # Field 4 - Zipcode Name 
                # Field 5 - Longitude in Decimal Degrees (West is assumed, no minus sign) 
                # Field 6 - Latitude in Decimal Degrees (North is assumed, no plus sign) 
                # Field 7 - 1990 Population (100%) 
                # Field 8 - Allocation Factor (decimal portion of state within zipcode) 

            # (see http://www.census.gov/tiger/tms/gazetteer/zip90r.txt)

            zip, city, state, lon, lat = line.split(",")[1:-2]
            self.zips[zip] = _Zip_Code_Record(zip, city, state, lon, lat)

    def get_distance(self, zip1, zip2):
        '''Returns the distance between two points on the earth.
        
        Inputs used are:
            Longitude (in radians) of the first location,
            Latitude (in radians) of the first location,
            Longitude (in radians) of the second location, and
            Latitude (in radians) of the second location.
        To convert to radians (from degrees), use pythons math.radian function (Note: already done 
        for you in the constructor above).  Returns the distance in miles.'''

        long_1 = self.zips[zip1].longitude
        lat_1  = self.zips[zip1].latitude

        long_2 = self.zips[zip2].longitude
        lat_2  = self.zips[zip2].latitude

        dlong = long_2 - long_1
        dlat = lat_2 - lat_1
        a = (math.sin(dlat / 2))**2 + math.cos(lat_1) * math.cos(lat_2) * (math.sin(dlong / 2))**2
        c = 2 * math.asin(min(1, math.sqrt(a)))
        dist = 3956 * c
        return dist

    def close_zips(self, starting_zip, radius):
        '''For any given zip code (assuming it's valid), returns any zip codes within a 'radius' mile radius.'''

        zip_long = self.zips[starting_zip].longitude
        zip_lat  = self.zips[starting_zip].latitude

        close_zips = [self.zips[record].zip for record in self.zips if self.get_distance(self.zips[record].zip, starting_zip) < radius]
        return close_zips

    def __getitem__(self, zip):
        '''allows convenient 'dictionary' access to the zips'''
        return self.zips[str(zip)]

if __name__ == "__main__":
    # First, we check if the user wants to download the zip file
    import sys
    if len(sys.argv) > 1 and sys.argv[1] == "-d":
        if len(sys.argv) == 2:
            filename = 'zips.txt'
        else:
            filename = sys.argv[2]

        import urllib2
        zip_url = 'http://www.census.gov/tiger/tms/gazetteer/zips.txt'

        print "Downloading %s from the internet to %s on your local machine." % (zip_url, filename)
        f = open(filename, 'w')
        f.write(urllib2.urlopen(zip_url).read())
        f.close()
        print "Successfully downloaded zip file.  Use %s %s to run unittests." % (sys.argv[0], filename)

        sys.exit(1)

    # Otherwise, we do some unit testing
    if len(sys.argv) > 1:
        zip_code_filename = sys.argv[1]
        # The following is a hack b/c unittest seems to use sys.argv to grab arguments - which we don't want :)
        del sys.argv[1:]
    else:
        zip_code_filename = None

    import unittest

    class TestDistances(unittest.TestCase):
        def setUp(self):
            if len(sys.argv) > 1:
                self._zc = Zip_Codes(sys.argv[1])
            else:
                self._zc = Zip_Codes()

        def test_dist_measurement(self):
            zips = [
                    ('57350', '58368', 286.182054428), # HURON, SD ... PLEASANT LAKE, ND
                    ('75217', '11366', 1376.21109349), # DALLAS, TX ... FRESH MEADOWS, NY 
                    ('67431', '64631', 227.366924671), # CHAPMAN, KS ... BUCKLIN, MO 
                    ('57106', '27812', 1158.00278947), # SIOUX FALLS, SD ... BETHEL, NC 
                    ('54460', '50518', 254.522422855), # OWEN, WI ... BARNUM, IA 
                    ('54451', '28756', 802.75909488),  # MEDFORD, WI ... MILL SPRING, NC 
                    ('32615', '65624', 794.942194475), # SANTA FE, FL ... CAPE FAIR, MO 
                    ('67054', '71827', 439.965551049), # GREENSBURG, KS ... BUCKNER, AR 
                    ('45686', '36879', 463.978723078), # VINTON, OH ... WAVERLY, AL 
                    ('37845', '30506', 121.411082293), # PETROS, TN ... GAINESVILLE, GA 
                    ('49440', '29847', 699.897793808), # MUSKEGON, MI ... TRENTON, SC 
                    ('41331', '12046', 606.944002949), # HADDIX, KY ... COEYMANS HOLLOW, NY 
                    ('05083', '65244', 1093.97636115), # WEST FAIRLEE, VT ... CLIFTON HILL, MO 
                    ('84069', '70374', 1451.94104395), # RUSH VALLEY, UT ... LOCKPORT, LA 
                    ('95821', '55760', 1520.98809123)  # SACRAMENTO, CA ... MC GREGOR, MN 
                ]

            for zip in zips:
                self.assertAlmostEqual(zip[2], self._zc.get_distance(zip[0], zip[1]), 7, "%s distance from %s failed" % (zip[0], zip[1]))

        def test_getitem(self):
            self.assertEqual('PILGRIM GARDENS', self._zc['19026'].city)
            self.assertEqual('PILGRIM GARDENS', self._zc[19026].city)

        def test_closezips(self):
            close_zips = ['90210', '90212', '90035', '90211', '90067', '90069', '90046', '90048', '90077', '90024']
            self.assertEqual(close_zips, self._zc.close_zips('90210', 3))

    unittest.main()

Instead of using the close_zips function given, you could store your data in a database (such as SQLite or MySQL) and let them do the heavy lifting. But this approach seems to be reasonable if you're only doing a few lookups (i.e., not more than 5 or so). I'm sure you could substantially improve the performance by simply caching the zips (instead of openning the file every time), but I'll leave that up to you ;)

04/25/2006 Update: I no longer leave the caching up to you :)

14 comments

Dirk Holtwick 19 years, 1 month ago  # | flag

German zip codes. Good zip data (aka Postleitzahlen) in various formats and free of payment for Germany, Austria and Switzerland can be found here: http://opengeodb.sourceforge.net/

Eric Thompson 19 years, 1 month ago  # | flag

US Zip Code Site Not Apparently Valid. Unless there's a fee or subscription involved. Get a permission-denied error when I try it.

Cody Pisto 19 years, 1 month ago  # | flag

zips.txt. Rather then posting a direct link,

append zips.txt to the url:

http://www.census.gov/tiger/tms/gazetteer/

and you'll get what you want ;)

Kevin Ryan (author) 19 years ago  # | flag

Thank You ... I also noted that they've updated their site. You can get more information from: http://www.census.gov/geo/www/gazetteer/gazette.html such as file layout, other data sets, etc.

L T 18 years, 9 months ago  # | flag

this code is WRONG. This code doesn't work correctly. try getting the distance between New York City and San Francisco... The number will be over 5000, and that's not right even in kilometers.

Correct code is available at from http://www.zachary.com/blog/2005/01/12/python_zipcode_geo-programming

Oh, and a great zip code DB is available at http://www.cfdynamics.com/cfdynamics/zipbase/index.cfm

/LT>

Kevin Ryan (author) 18 years, 6 months ago  # | flag

Not from what I can see. I tried your mythical San Fran vs. NY problem ... in the dB I gave reference to, SF has Long / Lat coordinates (zip 94134) of -122.409577 / 37.718969, respectively. This translates into -2.13645 / 0.65832 (rounded, again respectively) in radians (which the recipe calls for). NY (zip 10011) is -73.99963 / 40.740225 => -1.291537 / 0.711051. Pumping the radian values into the formula I provided (dist) gives me a distance of ~ 2,564. Using the same coordinates in your formula produces 2,601 (diff ~ 37 mi's). Therefore, not sure why you would consider my code as "WRONG" unless your code is wrong as well. So until you can reproduce your error, please don't criticize my work and then try to promote your blog through such weak rhetoric.

Peter Kleiweg 18 years ago  # | flag

The code on www.zachary.com is bogus. Calculate the distance between -90/89 and 90/89, two locations near the north pole, and you get the same distance as if it were two locations on opposite sides on the equator.

I don't understand the formula Kevin Ryan is using. I wrote the code below, that gives 2565 miles for the distance between SF and NY, only one mile off from Kevin's result.

import math

class Position:

    def __init__(self, longitude, latitude):
        'init position with longitude/latitude coordinates'
        llx = math.radians(longitude)
        lly = math.radians(latitude)
        self.x = math.sin(llx) * math.cos(lly)
        self.y = math.cos(llx) * math.cos(lly)
        self.z = math.sin(lly)

    def __sub__(self, other):
        'get distance in km between two positions'
        d = self.x * other.x + self.y * other.y + self.z * other.z
        if d &lt; -1:
            d = -1
        if d &gt; 1:
            d = 1
        km = math.acos(d) / math.pi * 20000.0
        return km

if __name__ == '__main__':

    d = Position(-122.409577, 37.718969) - Position(-73.99963, 40.740225)

    print d, 'kilometers'
    print d / 1.609, 'miles'

Output:

4127.21644987 kilometers
2565.08169662 miles
Peter Kleiweg 18 years ago  # | flag

Actually, the difference is only 0.4 miles.

Kevin Ryan (author) 18 years ago  # | flag

thanks :). Awesome, thanks!

David Creemer 17 years, 11 months ago  # | flag

zachary.com code is not bogus. The code on zachary.com is not bogus. Be sure you call the

calcDistance

function with longitude and latitude passed correctly. If you do, the code on zachary.com gives an answer of about 217 nautical miles for your example, which is entirely reasonable.

John DeRosa 17 years, 10 months ago  # | flag

Bug in the code. There's a bug in the code, but another bug in the code compensates for it.

Executable lines 4-5 of __init__ should be:

zip, state, city, lon, lat = line.split(",")[1:-2]
self.zips[zip] = _Zip_Code_Record(zip, state, city, lon, lat)

John

Nikolay Bushkov 17 years, 6 months ago  # | flag

use Google, simple...

# geopy - http://exogen.case.edu/projects/geopy/
from geopy import geocoders
from geopy import distance

g = geocoders.Google(resource='maps', output_format='js')

#('57350', '58368', 286.182054428), # HURON, SD ... PLEASANT LAKE, ND

place, (lat, lng) = g.geocode("57350")
place1, (lat1, lng1) = g.geocode("58368")

print "%s: %.5f, %.5f" % (place, lat, lng)
print "%s: %.5f, %.5f" % (place1, lat1, lng1)
print "dist: ", distance.distance((lat, lng), (lat1, lng1)).miles

   Output:
           SD 57350: 44.35902, -98.21629
           ND 58368: 48.31726, -99.99895
           dist:  286.367102557
Jeremy Dunck 17 years, 4 months ago  # | flag

Simple, but... The Google EULA excludes many uses. So simple, but sometimes also illegal.

Marc Draco 16 years, 5 months ago  # | flag

Works for me. The database mentioned is pants - many of the lat/longs are copies of nearby neighbours. :-(