In the previous installment of this series you learned how to query the geocoder.us Web service using PHP and its native SOAP extension. This Web service converts mailing addresses into latitudinal and longitudinal coordinates; a task that can be very useful when integrating features such as Google’s mapping service into your applications. Offering both a free and commercial service, consider learning more about this excellent service should you require such information.
However what if you required your own internal data source for this information, for instance if you were interested in creating your own spatially-based product or service? Believe it or not, mailing addresses and their corresponding coordinates for the entire United States are freely available through the U.S. Census Bureau. As you might imagine, this is quite a bit of information. Therefore to facilitate retrieval there are numerous files, each of which is stored in text-based format. In order to use this information efficiently, it should be imported it into a database at which point it can be retrieved and even manipulated as necessary. This article guides you through this process, creating an accurate data source that can be used to create countless fascinating applications.
Prerequisites
Not surprisingly, some enterprising Perl programmers (namely Eric Schuyler and Jo Walsh) have already gone a very long way towards facilitating this process. The freely-available Geo::Coder::US module is available via CPAN. The process for installing it, its dependencies, and the Archive::Zip module which will all be used to import and use the data follows. Note you’ll need to have Perl installed in order to take advantage of this solution. Also, if you haven’t used CPAN before, you’ll need to run through some configuration instructions first.
%>perl -MCPAN -e shell cpan>install Geo::Coder::US
Perl will proceed with the installation. If you’re missing any dependencies, you’ll be prompted to confirm their download installation. Next you’ll need to download the Archive::Zip module:
cpan>install Archive::Zip
Finally, exit the CPAN shell:
cpan>quit
You should also take some time to learn a bit about the
Next, proceed to the Census Bureau’s website and download the data (known as TIGER/Line):
http://www2.census.gov/geo/tiger/tiger2004se/
Keep in mind this is quite a bit of data (4GB compressed), and unfortunately it’s divided into thousands of files. Alternatively you can purchase CD-ROM and DVD versions via the Web site. However being an impatient man, I wasn’t willing to manually download these files or wait for snail-mail. An automated solution is in order.
Automating the Download
Of course, who wants to write a script for such matters when a perfectly capable solution is already available? The Unix command wget easily accomplishes this task. Just execute the following commands:
%>mkdir tigerzips %>cd tigerzips %>wget -nd --no-parent -A "*.zip" -r -l2 >http://www2.census.gov/geo/tiger/tiger2004se/
Keep in mind it could take some time to download all of the zip files depending on your connection speed. Also, if you’re interested in solely one or several states then change the URL accordingly pointing it to the appropriate state directories. For instance, for the purposes of this tutorial I downloaded solely the Ohio-related files, meaning the URL I used was http://www2.census.gov/geo/tiger/tiger2004se/. Even so, at 158MB this is no insignificant download.
Should for some reason the download is disrupted, there is no reason to again download those files already available locally. If you need to restart the download, execute this command:
%>wget -nd -nc --no-parent -A "*.zip" -r -l2 >http://www2.census.gov/geo/tiger/tiger2004se/
While you’re on the site, you should also take some time to review the Tiger/Line documentation. You’re not constrained to, and it isn’t necessary for purposes of this discussion, however having a better understanding of its internals could be useful in the future. The home page is available here.
Importing the Files
Once all of the files have been downloaded, it’s time to import them into the database. GeoCoder::US uses the Berkeley DB database by default. Berkeley DB is a lightweight yet extremely fast and capable database that will serve nicely to handle even this large database. To begin the import process, execute the following command:
%>perl eg/import_tiger_zip.pl geocoder.db /path/to/zipfiles/*.zip
This presumes you’d like the database to reside in the same location as the present directory. If not, prepend geocoder.db with the appropriate path. Keep in mind that depending on the number of states you downloaded this could take several hours to complete! Using an otherwise idle 1Ghz server with just 384 MB RAM, the process of importing the Ohio files (89 in all) into the database required 22 minutes. Incidentally, the database size for only the Ohio files is 36.5 MB.
Using the Data
At this point, you’re ready to begin using the data! The following example uses the Geo::Coder::US module to retrieve the coordinates of The Ohio State University football stadium, which has also been used in the previous installments of this series.
#!/usr/bin/perl -w Use Geo::Coder::US; Geo::Coder::US->set_db("geocoder.db"); my ($stadium) = Geo::Coder::US->geocode ("411 Woody Hayes Dr, Columbus, OH"); print "The Ohio State University stadium coordinates: Latitude ($stadium->{lat}) Longitude($stadium->{long})";
Returning:
The Ohio State University stadium coordinates: Latitude (40.004761) Longitude(-83.019945)
Comparing these results to those that were passed to the Google Map API in the previous articles, you’ll see they match exactly!
Conclusion
In an age where so much information is readily available and even reusable, the opportunities to create new and truly exciting applications remain more prominent than ever. Just think; in this brief article you created a database allowing you to pinpoint the vast majority of mailing addresses throughout the United States, opening up the possibility to create any number of spatial applications.
In the next article I’ll show you how to use Perl to create a Web service that exposes this data, which will then be queried by a PHP Web service and used in conjunction with Google Maps API to create a spatially enabled Web site.
About the Author
W. Jason Gilmore (http://www.wjgilmore.com/) is the open source editor for Apress. He’s the author of the best-selling “Beginning PHP 5 and MySQL: Novice to Professional” (Apress, 2004. 758pp.). Along with Robert Treat, Jason is the co-author of the forthcoming “Beginning PHP 5 and PostgreSQL 8: From Novice to Professional”, due out at the conclusion of 2005. Jason loves receiving e-mail; so don’t hesitate to write him at wjATwjgilmore.com.