Wednesday, December 18, 2013

Watercooler Wednesday #16 - Focusing our Direction

Choosing a direction can be hard.

At the end of last week, we had a meeting where we identified several things that we needed to do next:
  1. Get country and state level historical boundaries for the United States and Great Britain from 1700s onward in the Open Place Database
  2. Put a place and shape editor in the Open Place Database
  3. Index Ancestry.com's catalog
  4. Flesh out and refine the experience for the Find-A-Record Chrome Plugin
  5. Polish the findarecord.com/app experience, paying particular attention to the text and format of the results to make them self explanatory and useful to our target audience.
We had too many things that need to be done now and not enough people to do them. So we had to choose and focus on the most important ones. Here is what we did.
  1. Took five minutes and reviewed our overall direction. We affirmed that it was to continue to refine the experience in Find-A-Record until we nailed it.
  2. Discussed and decided what our current shorter-term focus was. We decided that it was to properly index and display data that mapped to large areas (countries, states, and counties). You can read more about this in a previous blog post.
  3. Listed the requirements and their dependencies for accomplishing our short term focus:
    • Search on and display polygons in our results page at findarecord.com/app
    • Before we can search on polygons, we must have them indexed in Neo4j
    • Before we can index polygons in Neo4j, we must be able to geocode collections to shapes when indexing them
    • Before we can geocode to shapes, we must have them available via the Open Place Database API
    • Because the data currently available for the Open Place Database is either incomplete or poor quality, we need an editor for adding and modifying data
So we're creating an editor for the Open Place Database this week which will eventually allow us to geocode collections with shapes for an optimal experience at Find-A-Record.

Deciding on our short term focus took some time to figure out, but we were able to finally make a decision by reaffirming what our long term company focus and direction was.

Wednesday, December 11, 2013

Watercooler Wednesday #15 - Introducing the Open Place Database

Last week we explained how we're not able to geocode to shapes with any currently available services and then discussed a possible solution. After a few days of implementing the solution, we are announcing the Open Place Database.


The Open Place Database (OPD) will contain information about places and administrative boundaries for the entire world including how they changed over time. The data will be licensed with the Open Database License (ODbl) so that it can be used by anybody in any way that they please. There is an API for light programmatic usage with the option of installing OPD yourself for heavy usage.


We've only been working on this for three days, so there's much more to come:
  • Regularly updated data dumps
  • Visual editor
  • Much more data (at the time of writing we only have data for US counties)

Wednesday, December 4, 2013

Watercooler Wednesday #14 - Retrieving the Boundary of a Place

Up until now we have been geocoding collections to a set of coordinates. If a particular collection covers three towns, we would put three coordinates on the map. While this works well for small towns and parishes, it breaks down when applied counties and countries. The United States maps to the middle of Kansas so only queries near that point will return results associated with the United States as a whole.

The solution is to map collections to a coverage area (shape) instead of a point. Then when we query for any point inside the United States, we will see that it's inside of the coverage area for collections that cover the entire United States.

There are several sources for place boundaries, the largest of them being OpenStreetMap (OSM). But there are several issues with most of these:
  • They were created to enable queries that map to a point or to enable queries that get a list of features (points of interest, like stores or restaurants) in a specific area. We have yet to find a service that allows for a string search to return a shape that represents the place's boundary.
  • They do not take into account the fact that places only exist for a particular length of time, and that the boundaries of these places change over time.
  • There is no reasonably hostable database that a developer can setup and use. OSM's data is available as a planet.osm file that is 31 GB compressed. For most developers that is untenable at best, unusable at worst.
  • There are no publicly available APIs that map a place string to a coverage shape.
What will we do if the data isn't good enough? We have two options. The first option is to use available services and jury rig an insane system. Starting with a place string, we would follow these steps:
  1. Query geonames/google place api/etc... for a lat/lon coordinate.
  2. Use the coordinate to query Nominatim to get an OSM place hierarchy.
  3. Use that place hierarchy to query Overpass for the relation, way(s) and nodes.
  4. Use this algorithm to convert the OSM data into a GeoJSON shape.
That's at least 4 services to map a place to its coverage shape, assuming nothing goes wrong. And that doesn't work for historical places, misspellings, etc.

The alternative is to create a place coverage database and associated API that will take in a place string and return a coverage shape for that place.

We are resigning ourselves to the fact that we need to create said database (though we are asking the experts at GIS.StackExchange to make sure our conclusion is correct). This will slow us down significantly but the resulting service will be valuable for the community.