Thursday, May 1, 2014

Asset-rack and Nodemon for Easy Caching in Node.js

Caching of static assets (JavaScript, CSS, images) is one way to increase the performance of your website. Caching decreases the amount of bandwidth used, and when done right, it also decreases the number of HTTP requests that the browser must make to load the page.

One way to cache assets is with ETags. When an asset is requested for the first time, the response includes the ETag header. On subsequent requests, the browser sends the ETag back to the server via the If-None-Match header. The server then compares the ETag it received from the browser to the current version of the ETag. If the ETag matches, it responds with a 304 Not Modified and an empty body which saves drastically on bandwidth.

The Cache-Control header allows the server to give more complex instructions to the browser about caching behavior. When max-age is specified, the server can tell the browser to keep that asset and not ask about it again for a specified number of seconds. For example, Cache-Control: max-age=3600 tells the browser that the response should be cached for 1 hour. That not only cuts down on bandwidth but also HTTP requests.

There is a danger is telling the browser to cache and not validate whether a resource has been updated recently. If you update the HTML structure of your page but your CSS is being cached for an hour, then your site could look broken until the browser checks to see whether the CSS has been updated.

Fingerprinting solves that problem. Instead of referring to mysite.css in the HTML, we calculate a fingerprint (md5 hash) of mysite.css and add it to the file name so that now we have mysite-29fe7a48516763c7c3033929e3d4c2a0.css. The fingerprint will change when the CSS changes which changes the filename. Now we know these files will never go stale and therefore can be cached forever. We can set max-age to one year which is the maximum value for max-age. Now when we update our HTML and CSS, the filename of the CSS changes so the browser asks for the new file.

You don't want to calculate these fingerprints on your own and update your HTML by hand to refer to the new versions. You want a framework or build process which does that automagically for you. We use node so we found asset-rack. In our HTML templates we just call assets.url('/mysite.css') and it handles all the magic of fingerprinting for us.

There was a little heartburn during development at first because we had to restart the app everytime our CSS changed so that asset-rack could update it's own cache (ironically). Nodemon solved that problem for us. We configure it to watch all CSS and JS files and automatically restart the app when anything changes: nodemon -e js,css app.js.

Now we don't have to think about caching.

Further reading:

Friday, April 18, 2014

Find-A-Record is Growing Up

Find-A-Record has grown to the point that it needs it's own blog so we set one up at blog.findarecord.com. This also means we will cease to regularly post on this blog. It's possible that we will continue to post updates about Open Place Database, or share development war stories that don't belong on the Find-A-Record blog, but they won't be regular.

We enjoyed blogging for the past 7 months, and appreciate those of you who followed us most of the way. A special thanks to the die-hard fans that watched the videos. We hope you will continue to follow the progress of Find-A-Record. We expect some great things to happen.

Wednesday, April 9, 2014

Watercooler Wednesday #32 - Free, Online, and Military Search Filters

We added the ability to filter Find-A-Record search results based on whether they are free, paid, online, or offline. The only free collections we have right now are from FamilySearch: the free and online collections are the digital collections at FamilySearch while the free and offline collections are from the FamilySearch Catalog. All other collections we currently have are paid and online.

You'll see that we also added the "Military" record type for searching. We have military records from Fold3 and FamilySearch. We will add more military records from Ancestry, the FamilySearch Catalog, and other places in the future.

Our Repositories page contains the full list of sources for our collections, as well as some stats about our coverage.

Wednesday, April 2, 2014

Watercooler Wednesday #31 - WorldVitalRecords

Today we added collections from WorldVitalRecords. We were able to import about 20% of the 22 thousand collections that WorldVitalRecords has. That percentage will go up overtime as we continue get better insight into our data and indexing process.

We also increased the number of Ancestry collections from 7% to 26%. You can see the details on our repositories page.

Stay tuned for many more updates on new collections; we expect to add many more in the next few weeks.

Wednesday, March 26, 2014

Watercooler Wednesday #30 - Collection Details View

Yesterday we released the new collection details view. Each collection now has its own page that allows you to take a peek into how we indexed the collection.


Above is a screen shot from the collection page for the United States Marriage, 1733-1990 index from FamilySearch. The orange area represents it's coverage between 1803 and 1812. If you search for marriages in Florida in 1805, this collection will not show up.

These new collection pages are beautiful, fun, and useful. We have already identified and fixed several bugs in our indexing process now that we can easily visualize our data. It's also useful for researchers to understand the coverage of particular collections. For example, when searching for records in Eastern New Mexico, it's easy to see why collections from Texas appear in the results.


These new collection detail pages are available from the search results. Just click the "Details" button to open that collection's detail page in a new tab.


Wednesday, March 19, 2014

Watercooler Wednesday #29 - FAR Chrome Extension and Data Updates

Two things of note for this week.

The first is that we have updated the Find-A-Record Chrome Extension to work with the new search page. We also setup a redirect on the old search page to send you to the new search page. Now that traffic is no longer hitting the old search page and stressing our database, we can continue importing more data and work on creating the new collection details view.

The second is that we figured out why some of the collections available from FamilySearch were being labeled with incorrect record types. First some background. FamilySearch tags their online collections with information about what record types they contain, which is great news for us. They label things as containing, birth, marriage, death, census, and several other tags. But they also label things as "vital" records. Based on what we were seeing initially, it seemed as though collections were being labeled as "vital" if they contained birth, marriage, and death records. Well, it turns out that's not the case. Take the collection Minnesota, Marriages, 1849-1950. This is labeled as vital and not marriage. Why? We have no clue. Thankfully these inconsistencies seems to be limited to the vital tag. To fix this, we ignore the vital label and generate the list of record types ourselves by parsing the title, as we do with all other data sources. That properly gives us a tag of marriage for Minnesota, Marriages, 1849-1950. Much better.

Wednesday, March 12, 2014

Watercooler Wednesday #28 - A Better Marketing Message

Along with our redesign, we have been working on our branding and messaging. Inspired by Simon Sinek, we are focusing on the "why" of what we do. Here are a few of the brainstorming ideas we came up with:
  • Genealogy should be easier
  • The industry lacks innovation
  • Genealogy shouldn't be an old mans sport
  • We want to show slow-moving incumbents what can be done
  • We believe that there is a solution to the industry's problems
  • (Many others that I won't list here)
So after coming up with that list, we started narrowing it down and refining our ideas. Here are a few of the realizations we came to:
  • People want to connect to other people. In genealogy, those people are our ancestors.
  • We believe that everyone should be able to connect to their ancestors.
  • We help people connect to their ancestors by making genealogy easier, more approachable, and doable.
  • We built Find-A-Record to help you know where to look for your ancestor's records.
  • The number of people interested in connecting with their ancestors (for many reasons) is much larger than those who enjoy the detective and puzzle-like challenge of genealogy research.
We iterated on this smaller list and settled on a simple 3-part message.
  1. We believe that you want to connect to your ancestors, and that connecting to them is important.
  2. To connect to your ancestors, you need to know who they are, what happened to them, the story of their lives, etc.
  3. Find-A-Record shows you where to look to find that information.
Finally we turned it into marketing content for our home page.





Thursday, March 6, 2014

Watercooler Wednesday #27 - Working on a New Design

We've decided to create a new search experience on Find-A-Record. Let's look at the current search page.


There are a few things about this that can be improved.
  1. The collection results are the most import piece of information, but they're relegated to a portion of the left side of the screen. The map, which is primarily used for searching, takes up most of the screen.
  2. The expanse of blue lines covering the US makes little sense to anybody but us. What it's showing is that the collection we hovered over was correctly geocoded to historical boundaries of the US (in this case 1767-1950).
  3. The search can be slow when searching on large complicated polygons like the US.
  4. The autocomplete for place search doesn't function very well.
Here's a peek at the redesign we're working on.


  1. The map will only be used for choosing and displaying the search area.
  2. We have more real estate for displaying information about collections, which is what really matters. We will tell you what record types are included in the collection and whether it's free or behind a paywall and whether it's online or offline.
  3. The "Search" button will take you to an external web page that allows you to search the collection or gives you instructions on how to access it.
  4. The "Details" button will bring up a modal dialog that has more details about the collection, such as a description. It will also allow you to explore all of the different boundaries which the collection was geocoded to. We are making the boundaries of a collection a little more difficult to access because most users don't care and because it significantly improves performance.
  5. We will no longer allow you to search on the borders of a country, state, or county. All searches will be radius searches. If you want to search an entire country you will have to expand the size of the circle to envelope it. This isn't perfect because it's impossible to cover all of Russia without getting China and other nearby countries, but Find-A-Record is most useful when optimized for local searches.
We're also in the early stages of developing a mobile app.



Wednesday, February 26, 2014

Watercooler Wednesday #26 - More data = new problems

While loading more data into Open Place Database and Find-A-Record, we have learned the following:

  • When saving really high resolution polygons (such as Alaska), we discovered that CloudFlare's max request entity size is 100 MB.
  • When creating a snapshot zip file for OPD, and later when process that data to load it into Find-A-Record, we found that node.js has a max buffer size of 1GB. We had to rewrite two scripts because of that.
  • Simplifying countries with a lot of islands and crazy coastlines often results in invalid shapes. We ended up using MapShaper for simplification and repairing. Sadly, it doesn't have a documented node API so we had to learn that too by reading source code.
  • We really are liking the nice weather we've been having. 60+ degrees equals running outside :)
  • Trying to import 1 million collections from the Family History Library Catalog takes a wee bit o' time.
  • Simplifying and saving hundreds of thousands of shapes takes a long time.
  • When making thousands of HTTP requests, chances are at least one of them is going to fail so you better at least have error handling if not also retries.
  • We found out that Justin's keyboard hates him and that he needs a new one.

Wednesday, February 19, 2014

Watercooler Wednesday #25 - Tracing Historical Maps

A lot of quality GIS data about historical boundaries exists. Here are two examples:
Sadly, both of those data sets have non-commercial clauses in their licenses, which makes their data difficult to use for anything but academic research.

A solution to this problem is tracing historical maps. NYPL has a nice project called Map Warper which allows you to georectify digital scans of old maps. However, that's just one step of the process. After properly aligning the map, we want to trace and extract the historical boundaries.

So we're going to create our own map warper. Though it will be heavily inspired by the NYPL Map Warper, we're going to start from scratch. It will be written in node.js and include the second step of tracing the map after it's been georectified.

Sadly, this requires a small refactor of OPDs infrastructure, but we've already finished the first step.

Wednesday, February 12, 2014

Watercooler Wednesday #24 - RootsTech Recap

We are thrilled Find-A-Record was a Developer Challenge Finalist. This gave us great exposure to the press, to bloggers, to potential customers, and to other genealogy companies. Though we didn't win first place, we did meet many people who are ecstatic about Find-A-Record. We were often told, "It solves my biggest problem!" Find-A-Record was validated: it provides a simple solution to a complex problem that many people have.

Here are our short-term plans:
  • Load as much data into Find-A-Record as we can. There are many people interested in using the API. The more data we have, the more useful our API is.
  • Harden our API and back end infrastructure. It needs to be reliable, fast, and bug free.
  • Find a way to simplify the search experience. We're concerned that the current experience with the large map is overwhelming and confusing. We will be experimenting with alternative designs.
We have a published list of repositories we plan to index collections from in the near future. Let us know if where else we can find valuable data. You'll notice we're mostly focused on the US and UK right now.

Do you have any ideas about how to improve the search experience? We would love to hear them.

Thursday, January 30, 2014

Watercooler Wednesday #22 - Boundary searching is live on Find-A-Record

Boundary searching is now live on Find-A-Record. As shown in the Friday video from last week, you can search for collections within an entire county, state, or country all at once. Don't forget to install the Chrome Extension for an enhanced search experience from within FamilySearch and Ancestry trees. Right now we have collections from Ancestry.com, digital collections from FamilySearch.org, and FamilySearch catalog entries for the UK.



Find-A-Record is still in beta. If you have any ideas for new features or find a bug, please let us know.

We are aware that a lot of collections in England are incorrectly mapped to the country level instead of the county level. We have plans to fix that tomorrow.

* This post is a day late because we were heads down preparing for RootsTech yesterday and completely forgot about it.

Wednesday, January 22, 2014

Watercooler Wednesday #21 - Matt, the water god

On November 11th of last year Matt, the local water god, walked through our door with a wondrous contraption. A tall black cylinder that, if supplied with water vessels and electricity, would dispense a never-ending shimmering stream of cool, refreshing H2O. The peasants rejoiced.

Early last week our mystical  developed some disturbing symptoms. Its usual gurgley response to filling our tankards was distinctly absent, and its flow was abnormally slow. Being as we are not practiced in the arts of dispensary divination, we cast the bones and summoned Matt. After a few minor incantations and an absolutely amazing decapitation and reattachment, the gurgles returned. All was well again in the kingdom.

But our serenity would not last long. On Monday morning, as we bowed before its shiny black exterior gathering the chilled drops of dew from its small spout, the gurgles ceased. No amount of pleading or cajoling would bring them forth again. Luckily, before complete panic set in, we remembered the appropriate combination and summoned forth a vision of Matt, the water god.

After we explained the dire predicament we found ourselves in, Matt assured us that we would not die of thirst. Mounting his white chariot, Matt raced through the local wilds of Springville to bring relief and salvation to his devout followers. He arrived, and quickly rushed to the stricken tower.

An on the spot resurrection was once again attempted, but alas, the patient was too far gone. No matter the incantation or ritual, the gurgles of life refused to return. Upon a plea to know what had befallen the filler-of-beakers-with-clarity-and-coolness, Matt replied that an autopsy would be required before he knew for sure.

We thought all was lost, and that the kingdom would come to an end, but Matt said no, all was not lost. A new non-carnivorous contraption would be dispatched forthwith, and we would once again experience the ecstasy of agua.

At 8 o'clock this morning Matt arrived, toting a massive parcel. Inside was a brand new cooler, ready to fill our cups with life-giving sparkles. Huzzah! Praise the lord and pass the pretzels! It now sits behind me, waiting patiently to respond to my request for a draft with bouncy gurgles of pleasure.

As I close this tale of danger and redemption, I am nursing a tall goblet filled with glorious H2O, cooled through the good graces of Matt, the local water god.

Matt, I raise my glass to you. May your magic always hold true.

Wednesday, January 15, 2014

Watercooler Wednesday #20 - Open Place Database SDK and pre-launch

We are planning on launching openplacedatabase.com this week. In preparation for that, we are working on a JavaScript SDK, documentation for the data API, data schemas, and instructions for installing OPD on your own server.

The JavaScript SDK and documentation is on github. We hope to have it in npm by the end of this week.

The data schema documentation can be found at github.com/openplacedatabase/utils.

The instructions for running a copy can be found at github.com/openplacedatabase/www.

By Friday we will have data available for download at www.openplacedatabase.org/download.

In addition to weekly snapshots, we have a new public changes API that will allow you to keep your own copy of the data up-to-date. Documentation will be available at github.com/openplacedatabase/www/blob/master/API.md once we get an hour or two to update everything.

This is a huge milestone for us because all of the data for geolocating collections in Find-A-Record is coming from the Open Place Database. We plan on adding a massive amount of place data to Open Place Database over the next few months, and all of it will be available under the Open Database License.

We don't know yet when the data will be open to editing by the public, but we'll be sure to let you know.

Wednesday, January 8, 2014

Watercooler Wednesday #19 - 6 Hour Heart Transplant - Neo4j to PostGIS

We learned this week that our core database, Neo4j, wouldn't be able to handle the amount of data we need. We scrambled to find a solution and settled on PostGIS. Amazingly, it only took John 6 hours to implement today -- complete with Couchbase XDCR. We're a little busy tieing up loose ends and racing towards the RootsTech Developer Challenge deadline on Friday. Wish us luck!

Wednesday, January 1, 2014

Watercooler Wednesday #18 - We're Going To RootsTech

We will be attending RootsTech on February 6-8 as well as the Innovator Summit on the 5th. Here's what we're looking forward to.

Developer Challenge

We will be entering both Find-A-Record and Open Place Database in the Developer Challenge. The cash prizes are underwhelming but the publicity is priceless. The developer challenge put last years winner, Treelines, on the map and they weren't promised half the attention that even 3rd place will get this year.

FamilySearch Records API

We're hoping for some big news from the session about the FamilySearch Historical Records API. A historical records API that grants access to a collection as vast as FamilySearch's will initiate an exciting new age in the development of genealogy applications.

FamilySearch Partner Services Session

FamilySearch will be hosting a session titled FamilySearch is Open for Business! – What is Partner Services? How Can it Help my Business? We're hoping to hear some announcements about an improved experience for FamilySearch partners, such as self-service developer keys.

Surprises

RootsTech is always full of surprise announcements from companies large and small. What will happen this time?

Meeting People

We're going to meet some friends from RootsDev for breakfast before the Innovator Summit. We look forward to meeting many others over the course of the four days.

Free Lunch

The Innovator Summit includes free* lunch. Who could pass up an opportunity for free lunch?

* Free after the entrance fee, of course.