Wednesday, December 4, 2013

Watercooler Wednesday #14 - Retrieving the Boundary of a Place

Up until now we have been geocoding collections to a set of coordinates. If a particular collection covers three towns, we would put three coordinates on the map. While this works well for small towns and parishes, it breaks down when applied counties and countries. The United States maps to the middle of Kansas so only queries near that point will return results associated with the United States as a whole.

The solution is to map collections to a coverage area (shape) instead of a point. Then when we query for any point inside the United States, we will see that it's inside of the coverage area for collections that cover the entire United States.

There are several sources for place boundaries, the largest of them being OpenStreetMap (OSM). But there are several issues with most of these:
  • They were created to enable queries that map to a point or to enable queries that get a list of features (points of interest, like stores or restaurants) in a specific area. We have yet to find a service that allows for a string search to return a shape that represents the place's boundary.
  • They do not take into account the fact that places only exist for a particular length of time, and that the boundaries of these places change over time.
  • There is no reasonably hostable database that a developer can setup and use. OSM's data is available as a planet.osm file that is 31 GB compressed. For most developers that is untenable at best, unusable at worst.
  • There are no publicly available APIs that map a place string to a coverage shape.
What will we do if the data isn't good enough? We have two options. The first option is to use available services and jury rig an insane system. Starting with a place string, we would follow these steps:
  1. Query geonames/google place api/etc... for a lat/lon coordinate.
  2. Use the coordinate to query Nominatim to get an OSM place hierarchy.
  3. Use that place hierarchy to query Overpass for the relation, way(s) and nodes.
  4. Use this algorithm to convert the OSM data into a GeoJSON shape.
That's at least 4 services to map a place to its coverage shape, assuming nothing goes wrong. And that doesn't work for historical places, misspellings, etc.

The alternative is to create a place coverage database and associated API that will take in a place string and return a coverage shape for that place.

We are resigning ourselves to the fact that we need to create said database (though we are asking the experts at GIS.StackExchange to make sure our conclusion is correct). This will slow us down significantly but the resulting service will be valuable for the community.

5 comments:

  1. I've added some suggestions the GIS:SE site, but I'd just like to mention this overlaps with what I'm working on (a placenames and boundaries site).

    OSM is big (I've loaded it into MongoDB to process it) but at least the data is reusable and combinable.

    The real challenge is that almost all the nice data (historical boundaries) has been produced by academics who automatically slap a "no commercial use" restriction on it (the Irish data even has a No Derivatives clause, making it unusable for anybody, despite being CC!).

    So in many cases the data is going to have to be totally re-done from original sources, with an open license. There's scope here for a crowd sourced tracing project (maybe using OSM tools). But organising something like that (aligning the old maps, adjusting projections etc) is well beyond my GIS knowledge!

    ReplyDelete
  2. Hi - just a quick note to point you at Open Historical Map https://wiki.openstreetmap.org/wiki/Open_Historical_Map (and their mailing list at https://lists.openstreetmap.org/listinfo/historic) in case you hadn't come across it already.

    You might also be interested in on-going conversations about the need for open historical gazetteers and geodata among cultural heritage technologists on lists like MCG (e.g. https://www.jiscmail.ac.uk/cgi-bin/webadmin?A1=ind1312&L=mcg&X=3A011001587E223D91#27), Antiquist (e.g. https://groups.google.com/forum/#!searchin/antiquist/gazetteer/antiquist/0U4bLdO9ZbU/U2gWqGSlmVMJ) and at various events e.g. http://spacetimewg.pbworks.com

    Cheers, Mia

    ReplyDelete
    Replies
    1. We were aware of OHM but not their mailing list. That helps since the site has been down for a while. And thanks for the links to other groups.

      Delete
  3. Agreed, digital historical gazetteers an idea whose time has come -- a while back actually. Kudos for launching this! Much energy being put at the issue from many domains as Mia notes. Add to her list, Pelagios 3 (http://pelagios-project.blogspot.com/2013/09/pelagios-3-overview.html). Also this is a principal motivator for new GeoHumanities SIG (http://geohumanities.org).

    best
    Karl (@kgeographer)

    ReplyDelete
    Replies
    1. Thanks for the links. We'll be sure to reach out to these other groups and see how we can help each other.

      Delete