Friday, September 27, 2013

Developer Day #4 - Neo4j

Today we explain how we're using Neo4j to perform our spatial queries. That's where the real magic of Find-A-Record happens. Here's the most complicated query we discussed in the video:

start node = node:geom('withinDistance:[{lat},{lon},{rad}]')
MATCH node-[r:COV]-(col)
WHERE all( rel in r 

    WHERE rel.from >= {from} 
    AND <= {to} 
    AND rel.tag = {tag}
return col.cbid

This week we got clocked on the head by the 80/20 rule of software development (read more about that rule here). This is why we look exhausted in our video this week. Some of the fun details:
  • In Ubuntu's Upstart conf files, environment variables are not set on an exec command unless you add -l to the su command (ie, exec su -l -c "<command>" user >> output.log). This is because su runs in a non-login shell by default (See this article here, specifically the last couple of paragraphs). And non-login shells don't run /etc/profile.d/ to set the environment variables!
  • Know your network topology BEFORE you start making assumptions! We ended up with two separate APIs, a private one for the website and User information, and a public one for the actual Data. Problem is, we had some code already written that assumed 1 API. Silly us.
  • If you have a terminal session up and you are SSH'd into a server running nodemon, a 1 second network hiccup will freeze the connection in such a way that the server will not kill your terminal session but you will be disconnected by putty. Fun fun. "ps ax" and "kill" to the rescue.

Wednesday, September 25, 2013

Water Cooler Wednesday #4 - Announcing Find-A-Record

We have two big announcements today.

First, we have finally picked a name for our project: Find-A-Record. It is located at

Second, we are planning a private beta. You can signup for the beta on Find-A-Record's home page. We expect to start letting people into the beta within a few weeks.

That's all for today. See you in the beta.

Friday, September 20, 2013

Developer Day #3 - A Beautiful Demo

This week we setup Neo4j, filled it with data, and put an API on top of it to perform our geospatial queries. We're confident that you'll love the result as much as we do. And we're just getting started.

Wednesday, September 18, 2013

Water Cooler Wednesday #3 - Screenshots

At last, we have screenshots.

We patterned the general layout after Google Maps with the search controls on the left. The search results on the right are calculated based on the search area designated by the circle on the map.

When a place is selected using the search box, the map automatically centers to that spot and zooms in. The search box even has autocomplete.

The shaded circle represents the search area. It can be dragged across the map.

You can also adjust the size of the search area by dragging the border of the circle in or out.

We have plans to add many more features, such as:
  • Hovering over the search results will highlight their location on the map
  • Adding filters for the year and record type in the upper left
  • Clicking on results will show where they are located within their repositories
  • Ability to view jurisdictional boundaries on the map, including historical boundaries

What do you think? What features would you like to see?

Friday, September 13, 2013

Developer Day #2 - Creating a Place API

Place Data Sets We Tried
Any other suggestions for free place data?

Wednesday, September 11, 2013

Water Cooler Wednesday #2 - Standardized Place Names

File:Union flag 1606 (Kings Colors).svgA contentious issue in genealogy right now is the standardization of place names. Genealogists are taught to record events with the proper name of the place for when the event took place. For example, if an event occurred in 1734 in Virginia, the place would be properly recorded as "Colony of Virginia, British Colonial America, United Kingdom". On the other hand, if the event occurred in 1792 then the place should be recorded as "Virginia, United States".

File:Flag of Virginia.svgSome modern genealogy programs try to force you to choose a standard name for place, so you either always use "Colony of Virginia, British Colonial America, United Kingdom" or you always use "Virginia, United States". This can make the source citations appear misleading and frustrate future research, especially if there were changes to the jurisdiction. If a city changed counties or state, you might end up searching for records in the wrong courthouse.

In our Genealogical Repository Index (GENRI) and Raven projects, we won't be using standardized place names. Instead, we will index both the modern and historical names, pointing them to the same underlying location. We will also allow the coverage and jurisdiction boundaries to change over time, removing the need for rigidly enforced standardization. Whether you search for "Virginia" or "Virginia Colony", you will end up at the right place.

For further reading, James Tanner has blogged about this issue manymany, many, many, many, many times.

Wednesday, September 4, 2013

Water Cooler Wednesday #1 - Origins

10 years ago I was doing genealogy research with my father on our Oxenbold line. We were tracing the family of Richard Oxenbold from Knighton-on-Teme, Worcestershire, England. According to his 1728 Christening record his parents were William and Elizabeth. There were other children later, but this was the earliest mention of children to that couple. Looking back over the preceding 5 years, we found 2 other Christenings between a William and Jane, and then the death of Jane and both of her children in 1725. We surmised that William had become a widower (upon the death of Jane) and had remarried a few years later to Elizabeth. To prove it we needed to find a marriage record between William and Elizabeth.

There was no marriage record for William and Elizabeth in Knighton-on-Teme. Since it was very common for marriages to take place in the wife's parish, we also looked in the nearby parish of Lindridge where many of the Oxenbolds had been married but had no luck there either. Next we tried manually compiling a list of nearby parishes, which was slow going until we found a little known program that would create a print out for us. That gave us about 35 parishes to work with. We were able to further shrink the list by determining which parish's had marriage records extent for the period of 1725-1728. After looking at maps, Little Hereford was the likeliest candidate due to it's location just up river. A search of the appropriate microfilm revealed what we were searching for: "William Oxenbold & Elizabeth Geers of Knighton upon Teme were married by License the thirteenth of April [1727]".

After going through that process I began wondering why there wasn't a service that would give you the list of microfilm to search. You would give it a date range, a geographic location, and the type of record you are looking for, and it would return a list of records to search for, including where the records are located. This would save an enormous amount of time by automatically generating a research plan.

10 years later I still haven't seen the service I envisioned, but I am finally in a position to create it. And while I'm creating it, I think I will make it available world-wide. And not just limited to microfilm.

And so it begins.