Wednesday, February 26, 2014

Watercooler Wednesday #26 - More data = new problems

While loading more data into Open Place Database and Find-A-Record, we have learned the following:

  • When saving really high resolution polygons (such as Alaska), we discovered that CloudFlare's max request entity size is 100 MB.
  • When creating a snapshot zip file for OPD, and later when process that data to load it into Find-A-Record, we found that node.js has a max buffer size of 1GB. We had to rewrite two scripts because of that.
  • Simplifying countries with a lot of islands and crazy coastlines often results in invalid shapes. We ended up using MapShaper for simplification and repairing. Sadly, it doesn't have a documented node API so we had to learn that too by reading source code.
  • We really are liking the nice weather we've been having. 60+ degrees equals running outside :)
  • Trying to import 1 million collections from the Family History Library Catalog takes a wee bit o' time.
  • Simplifying and saving hundreds of thousands of shapes takes a long time.
  • When making thousands of HTTP requests, chances are at least one of them is going to fail so you better at least have error handling if not also retries.
  • We found out that Justin's keyboard hates him and that he needs a new one.

Wednesday, February 19, 2014

Watercooler Wednesday #25 - Tracing Historical Maps

A lot of quality GIS data about historical boundaries exists. Here are two examples:
Sadly, both of those data sets have non-commercial clauses in their licenses, which makes their data difficult to use for anything but academic research.

A solution to this problem is tracing historical maps. NYPL has a nice project called Map Warper which allows you to georectify digital scans of old maps. However, that's just one step of the process. After properly aligning the map, we want to trace and extract the historical boundaries.

So we're going to create our own map warper. Though it will be heavily inspired by the NYPL Map Warper, we're going to start from scratch. It will be written in node.js and include the second step of tracing the map after it's been georectified.

Sadly, this requires a small refactor of OPDs infrastructure, but we've already finished the first step.

Wednesday, February 12, 2014

Watercooler Wednesday #24 - RootsTech Recap

We are thrilled Find-A-Record was a Developer Challenge Finalist. This gave us great exposure to the press, to bloggers, to potential customers, and to other genealogy companies. Though we didn't win first place, we did meet many people who are ecstatic about Find-A-Record. We were often told, "It solves my biggest problem!" Find-A-Record was validated: it provides a simple solution to a complex problem that many people have.

Here are our short-term plans:
  • Load as much data into Find-A-Record as we can. There are many people interested in using the API. The more data we have, the more useful our API is.
  • Harden our API and back end infrastructure. It needs to be reliable, fast, and bug free.
  • Find a way to simplify the search experience. We're concerned that the current experience with the large map is overwhelming and confusing. We will be experimenting with alternative designs.
We have a published list of repositories we plan to index collections from in the near future. Let us know if where else we can find valuable data. You'll notice we're mostly focused on the US and UK right now.

Do you have any ideas about how to improve the search experience? We would love to hear them.