About a week ago, I made a post about the sad state of direction search on Duck Duck Go. In that post “mountain view, ca to berkeley, ca” gave me nothing useful, while that kinda search on google is extremely useful. After writing that post and attending the recent AWS summit in San Francisco, I started thinking about how hard it would actually be to implement a good response to a direction search query on DDG. That’s when I found out about DuckDuckHack:
Anyone in the world can help improve the search experience on DuckDuckGo. Whether you can code or not, there are many ways for you to contribute. You can get started by suggesting new instant answers to the community, recommending better data sources, or actually hacking away on your own instant answer.
Sounds supercool. I’m going to take a crack at this. I see bloom filters and highly optimized string libraries in my future. My first step is to get some data… namely names… of places.
Some Data Sources
- The United States Geological Survey’s United States Board on Geographic Names maintains downloadable lists of domestic names, foreign names, and Antarctic names, undersea names, etc.
- The U.S. Military National Geospatial-Intelligence Agency provides downloadable “files of geographic names information covering countries or geopolitical areas”, which includes a single 1.89 GB text file that contains the entire country dataset.
USGS USBoGN Place-iest Places
To take my loader for a quick test-drive, I extracted this list of the places that appear most frequently in the downloadable USGS USBoGN domestic data for all (60?) states.
In the following table I’ve linked the place name to the corresponding wikipedia article.
More Informative HTML Table View
|Count||Place ID||Place Name|
|14||205110||Appalachian National Scenic Trail|
|11||801970||Lewis and Clark National Historic Trail|
|8||165345||Tennessee Valley Divide|
|5||558730||Gulf of Mexico|