Wednesday, December 18, 2013

Watercooler Wednesday #16 - Focusing our Direction

Choosing a direction can be hard.

At the end of last week, we had a meeting where we identified several things that we needed to do next:
  1. Get country and state level historical boundaries for the United States and Great Britain from 1700s onward in the Open Place Database
  2. Put a place and shape editor in the Open Place Database
  3. Index Ancestry.com's catalog
  4. Flesh out and refine the experience for the Find-A-Record Chrome Plugin
  5. Polish the findarecord.com/app experience, paying particular attention to the text and format of the results to make them self explanatory and useful to our target audience.
We had too many things that need to be done now and not enough people to do them. So we had to choose and focus on the most important ones. Here is what we did.
  1. Took five minutes and reviewed our overall direction. We affirmed that it was to continue to refine the experience in Find-A-Record until we nailed it.
  2. Discussed and decided what our current shorter-term focus was. We decided that it was to properly index and display data that mapped to large areas (countries, states, and counties). You can read more about this in a previous blog post.
  3. Listed the requirements and their dependencies for accomplishing our short term focus:
    • Search on and display polygons in our results page at findarecord.com/app
    • Before we can search on polygons, we must have them indexed in Neo4j
    • Before we can index polygons in Neo4j, we must be able to geocode collections to shapes when indexing them
    • Before we can geocode to shapes, we must have them available via the Open Place Database API
    • Because the data currently available for the Open Place Database is either incomplete or poor quality, we need an editor for adding and modifying data
So we're creating an editor for the Open Place Database this week which will eventually allow us to geocode collections with shapes for an optimal experience at Find-A-Record.

Deciding on our short term focus took some time to figure out, but we were able to finally make a decision by reaffirming what our long term company focus and direction was.

Wednesday, December 11, 2013

Watercooler Wednesday #15 - Introducing the Open Place Database

Last week we explained how we're not able to geocode to shapes with any currently available services and then discussed a possible solution. After a few days of implementing the solution, we are announcing the Open Place Database.


The Open Place Database (OPD) will contain information about places and administrative boundaries for the entire world including how they changed over time. The data will be licensed with the Open Database License (ODbl) so that it can be used by anybody in any way that they please. There is an API for light programmatic usage with the option of installing OPD yourself for heavy usage.


We've only been working on this for three days, so there's much more to come:
  • Regularly updated data dumps
  • Visual editor
  • Much more data (at the time of writing we only have data for US counties)

Wednesday, December 4, 2013

Watercooler Wednesday #14 - Retrieving the Boundary of a Place

Up until now we have been geocoding collections to a set of coordinates. If a particular collection covers three towns, we would put three coordinates on the map. While this works well for small towns and parishes, it breaks down when applied counties and countries. The United States maps to the middle of Kansas so only queries near that point will return results associated with the United States as a whole.

The solution is to map collections to a coverage area (shape) instead of a point. Then when we query for any point inside the United States, we will see that it's inside of the coverage area for collections that cover the entire United States.

There are several sources for place boundaries, the largest of them being OpenStreetMap (OSM). But there are several issues with most of these:
  • They were created to enable queries that map to a point or to enable queries that get a list of features (points of interest, like stores or restaurants) in a specific area. We have yet to find a service that allows for a string search to return a shape that represents the place's boundary.
  • They do not take into account the fact that places only exist for a particular length of time, and that the boundaries of these places change over time.
  • There is no reasonably hostable database that a developer can setup and use. OSM's data is available as a planet.osm file that is 31 GB compressed. For most developers that is untenable at best, unusable at worst.
  • There are no publicly available APIs that map a place string to a coverage shape.
What will we do if the data isn't good enough? We have two options. The first option is to use available services and jury rig an insane system. Starting with a place string, we would follow these steps:
  1. Query geonames/google place api/etc... for a lat/lon coordinate.
  2. Use the coordinate to query Nominatim to get an OSM place hierarchy.
  3. Use that place hierarchy to query Overpass for the relation, way(s) and nodes.
  4. Use this algorithm to convert the OSM data into a GeoJSON shape.
That's at least 4 services to map a place to its coverage shape, assuming nothing goes wrong. And that doesn't work for historical places, misspellings, etc.

The alternative is to create a place coverage database and associated API that will take in a place string and return a coverage shape for that place.

We are resigning ourselves to the fact that we need to create said database (though we are asking the experts at GIS.StackExchange to make sure our conclusion is correct). This will slow us down significantly but the resulting service will be valuable for the community.

Wednesday, November 20, 2013

Watercooler Wednesday #12 - Whiteboards

The first thing we did when moving into our office was to install some whiteboards. Being a startup, we went with a slightly unorthodox solution. We used glass tabletops from ikea. Not only do they look as nice as a $250 whiteboard, they last a lot longer as well. If they get dirty just use some windex.
Wall o' Whiteboards
We do most of our large scale brainstorming, planning, and collaboration using these whiteboards. When we're done we take a picture of the whiteboard and erase it. Over the last 12 weeks we have collected quite a few images. Enjoy a few of the more interesting ones.

Some early infrastructure and cost planning
Planning the Map View
Working on our customized R-Tree for Neo4j
Debugging our R-Tree Indexing and Neo4j Graph Queries
Deciding which attributes to sort by in our Neo4j query
Having a nomenclature discussion

Hopefully you enjoyed the small glimpse of what our whiteboards go through on a daily basis. We'll see you in two weeks!

Ikea link: Buy just the tabletop from this set. Article number 601.546.45

Wednesday, November 13, 2013

Watercooler Wednesday #11 - Find-A-Record's Logo

We have chosen a logo for Find-A-Record after running a successful logo design contest at 99designs.


Of the almost 300 entries, this best reflects the values that we wanted in a logo. It has the correct balance of simplicity while still being able to speak to our purpose of finding records. It's not distracting but is unique enough that people will be able to identify us when they see it.

Wednesday, November 6, 2013

Watercooler Wednesday #10 - Simplifying the Experience

The map interface we've created for Find-A-Record is cool, but it's not simple enough for our target audience.

1. The collections being returned are not ordered. We should be able to tell the user which collections they should search first, with priority given to collections that are easy to access and most likely to contain the info the user wants.

2. When doing genealogy, our target audience thinks in terms of people, not places. They are thinking, "I want a record for my great-grandmother." But our map interface expects them to be thinking, "I want a record for my great-grandmother who was born in Adliswil, Switzerland between 1813 and 1816 and most likely attended a church within 5 kilometers of that village."

We can solve both of those problems.

When a person is browsing their family tree and thinks, "I want to find a record for my great-grandmother", they should be able to just click a button and get results from Find-A-Record. The profile for their great-grandmother contains place information and dates. We can process that information and generate the necessary query without having to ask the user anything other than "What ancestor do you want to find records for?"


Above is the profile of Charlies Davies in the FamilySearch Family Tree. This has all of the information we need to generate a query. On the right, in the "Research Help" box there is a "Find-A-Record" link. When the user clicks that link it takes them straight to Find-A-Record and initiates a search.


The results are an ordered list of collections. They represent the best method we could calculate for finding the records you want. Clicking on a collection reveals instructions about the different ways of accessing it.

The integration with the FamilySearch Family Tree is done with a Chrome Extension. We will eventually add integration for many more online family trees such as Ancestry.com, MyHeritage, WikiTree, and WeRelate. Integration with popular PC programs would be much more difficult and therefore isn't really on our radar yet.

We will keep the map interface as an advanced view. We can't throw it away; it's too fun to play with.

Wednesday, October 30, 2013

Watercooler Wednesday #9 - Private Beta

We are excited to announce that we will begin letting people into the private beta starting tomorrow. If you have already signed up for the beta then you will receive your email within the next week or so. If you haven't signed up, you can still do so at https://www.findarecord.com.

To use the app, first fill out the search fields in the upper left hand corner. The Place field is a search box with autocomplete. After selecting a place, set your search radius and optionally limit your results using the from and to fields. Now click the Search button.
Your results will show up as cards on the right side of the screen. If you hover over them, the locations they cover will appear on the map as red markers. You can click on a result card to see what information is included. If you click on the Details button you will see how to access the records in that collection.
Well that's it for today. See you in the Beta!

Wednesday, October 23, 2013

Water Cooler Wednesday #8 - Data and Licenses


Having access to data, especially as developers, is key. That being said, much of the data relating to genealogy is behind a paywall or otherwise restricted. While we believe in being able to make a profit, it should be from providing tools and services, and not for selling access to the raw data itself.

We believe that the data that we are collecting for Find-A-Record should be freely available to other developers. To that end, we will be licensing our database under a Creative Commons License (probably Attribution-ShareAlike). Regular dumps of our collection, repository, and place data will be available for download.

While we are making our raw data freely available, we will be charging for access to our API. Why? Because we are adding value to the data by providing multiple ways of querying it. It will be hosted on our infrastructure that costs money to maintain.

In other news, we are almost done preparing for the beta. We'll have more to announce very soon.

Wednesday, October 16, 2013

Water Cooler Wednesday #7 - Moving Fast - Part 3

This week we finish our three-part series about moving fast. Part 1 was about environment. Part 2 was about mentality.

Part 3 - Talent

Moving fast requires talent.

A small disclaimer. The goal of this post is not to praise ourselves, but to convey what we have learned while working in many different companies and environments.

Skill

If you don't have sufficient skill then nothing else really matters. Here are a few characteristics we see in truly skilled people:
  • Thirst for learning. If someone has a certain body of knowledge, but is unable or unwilling to learn new things, then that person quickly loses their value. True skill is being able to tap a vast body of knowledge while keeping that body updated in an ever changing world. When someone becomes unwilling or unable to learn, their skill quickly becomes obsolete.
  • Know what they don't know. No one can know everything, but only the best are consciously aware of that fact. The ability to say "I don't know", coupled with the knowledge that they can learn anything they need to, is a rare combination.
  • Are not afraid of criticism and are secure with who they are. They are sure of what they know, and are confident in their ability to produce. They don't pay attention to the naysayers, other than to collect information that may be of value.

Passion

Skill without passion is wasted. Here are some qualities we've observed in passionate people:
  • Looks forward to working. This doesn't mean that they get ecstatic about every little thing. It simply means that they enjoy and take pleasure in their work. For the most passionate ones it can get to the point where money becomes secondary.
  • Cares about and takes pride in their work. Passionate people make an emotional investment when they work, and find fulfillment in what they produce. This means that they actually care about the end product and making it work correctly.
  • Will stand up for what they think is right. Because they are emotionally invested, they will stand up for what they believe to be right. The best ones will do this in a tactful and sensitive manner, but all passionate people will try and do what they believe to be right, sometimes regardless of the consequences. The trick is to keep the lines of communication open and let them be heard.
To sum up, the right environment, mentality, and talent have enabled us to move quickly.

Wednesday, October 9, 2013

Water Cooler Wednesday #6 - Moving Fast - Part 2

Last week we begin a three-part series about how we move fast. Part 1 was about environment.

Part 2 - Mentality

Moving quickly demands the proper mentality to maintain focus, handle failures, and be decisive.

Focus

Without focus you have no product; with it you can conquer the world. Here are a few of the lessons we have learned about focusing:
  • It takes a certain amount of ramp up time to settle into programming. Turn off your phone, close your email, damn the torpedoes, and full steam ahead.
  • Never lose sight of where the product is going. Define a clear direction for the product and make sure all efforts move the product forward in that direction. You may occasionally change the product's direction (pivot), but if you do be sure to reassess everything you're doing to make sure it fits the new vision. This doesn't mean that everyone needs to know the full details of the plan, just that they know the final destination and what role they play in getting there.
  • Refuse to be distracted for more than a short amount of time. This is best illustrated by the character Doug in the movie UP. Occasionally he sees a squirrel and gets distracted for a moment, but he always comes back to what he was focusing on before. These squirrel moments can be one of your greatest sources of new ideas, but beware of losing focus on your endgame.

Persistence

Any product which has a small amount of complexity will not work 100% right from the start. It takes persistence to get it right.
  • Go home at the end of the day. It takes a lot of energy to come to work and produce crazy new stuff. You can only continue producing if you turn off production and rest. Working more than 40 hours a week should not be a common occurrence, and weekend emergencies should be even more rare (see post here).
  • Enjoy what you are doing. Working is much easier when you love what you do. If you don't, I recommend you take a careful look at where you are in life. It is important to find fulfillment somewhere in your life. If not at work, then at home or in a hobby. It almost doesn't matter where you find it, as long as you do.
  • Make it a habit to keep on trying. I'm getting all meta here, but be persistent about being persistent. It takes a lot of practice, and you will fail. A lot. Trust me. I've failed way more times than I've succeeded, but I've learned to not do some things. Don't take failure to hard, but if you must be angry, use that anger to stoke your determination to try again. And if it is impossible, just let it go.
As Dory said, just keep swimming. That simple fact is so powerful that it bears repeating. Just keep swimming.

Decisiveness

Many decisions require time for investigation so that the different options can be accurately weighed. It is easy to spend inordinate amounts of time researching because there's no easy way to know when you know enough. Here are some for spending the right amount of time making decisions:  
  • How important is this decision, really? Is it a chocolate vs vanilla ice cream decision (assuming no allergies involved), or is it a who to marry decision? Make sure you work on developing the skills to see the difference. We usually think a given decision is more important than it really is. Here is a small truth: most decisions don't really matter. Objects in the mirror are smaller than they appear. Hindsight will let you see things for what they really are if you are honest with yourself.
  • Learn when to cut bait. You can continue to sink time into so many things, and that time is the most precious of your resources. Learn when to let go.
  • Just make the decision. Take a few seconds, collect yourself, then make it. The decision may change, it may be preliminary, it may be wrong, but at least you made a decision and can now move forward. Most decisions can be changed if you learn later that it was actually wrong.

Wednesday, October 2, 2013

Water Cooler Wednesday #5 - Moving Fast - Part 1

A little background. Over the last couple of weeks we have been talking to many different people and companies. When we show them our alpha version, we frequently hear comments like "Wow, you started this when?", "You are launching this year?", and "How did you build it that fast?". Thus we begin a three-part series in an attempt to answer the question: "How do you move so fast"?

Part 1 - Environment

The right environment is critical to product development. You need the right tools, low administrative overhead, and good understanding of people.

Tools

It is critical that you use the right tool for the right job. Although you can hammer in a screw, it's not optimal. And I suppose you could try and hammer in a nail with a screw driver, but really, who does that? Here are the most important tools we use on a daily basis.
  • AWS - Because you never know what hardware you need until after you bought it.
  • Github - Because you should always use version control, and there is no better service.
  • Puppet - Spin up a new server, and it configures itself. No manual intervention required.
  • Trello - Centralized todo lists. Easy to use, keeps us on track.
A small aside. More of a pet peeve. SQL is not the answer for everything. Here is when we use (and don't use) SQL:
  • SQL - When you have highly structured relational data and the schema is fairly stable. 
  • Neo4j - When your data is actually a graph, use a graph database.
  • Couchbase - When your schema is loose and your data is not relational.
  • Memcache - When you are using key value pairs that aren't persistent (ie, session storage)
  • Elasticsearch/Lucene - When you need something more robust than %like%

Overhead

Nothing is more of an anchor that administrative overhead. Every report, every status update, and every request for permission slows things down. To move fast you MUST reduce this to the barest minimum. Here are some of the things we do to keep overhead down.
  • Max of 1 recurring meeting - We only have 1 scheduled meeting per week. It's on Friday, and its purpose is to review the week and schedule out the next week. Every other meeting is called as needed, and only if it is needed.
  • Good delegation - If you hire the right people and you delegate appropriately, asking for permission to do something or to get something done shouldn't happen very often. If everything is working correctly, the majority of communication along these lines is limited to broadcasting the decisions we made, rather than trying to get decisions made.
  • No decisions by committee - For every decision, there is always a person who has final call. This means that no decisions are made via committee or consensus. This doesn't mean that things aren't discussed. This means that after the discussion is over, there is a decision made by one and only one person. And the decision is respected, even if other people may not agree.
  • Let the administrative process build itself - Instead of creating process and structure up front, we take a "lets try this" approach. We take a minute and decide on the process we want to try and use, and then test to see if it works. After we've tried it, we ask ourselves a simple question: Do we use this process next time? If so, we use that process. If not, we try something else. It boils down to having a mindset of "lets figure out what works for us".

People

Every time you involve a person with something, that thing always takes an immediate speed hit. Imagine a referee saying "Add a person, take a penalty". Here are a few things we've realized so far.
  • 1 task, 1 person - For each task, only one person should be responsible for it. Having two people responsible for the same task is asking for trouble. Not only do they have to collaborate on the task itself, but they have to figure out who is ultimately responsible anyway. Just assign it to one person in the first place and give them the authority to get it done. Note that this doesn't mean not collaborating, it just means assigning responsibility to 1 and only 1 person for a task.
  • Outsource - Only hire people for what you can't outsource at a reasonable cost. We have outsourced everything we can. Payroll, accounting, taxes, building maintenance, etc. Just make sure to choose well, because one of the main reasons you outsource is to gain time. The time you are not spending managing is time you can spend creating.
  • Do without - Ask yourself if you can do without before you do. If you can do without something or someone, even for a little while, it reduces the amount of people and things you have to deal with. You can now move faster because you have more time to spend on other things.
In part 2 we discuss the importance of a proper mentality.

Friday, September 27, 2013

Developer Day #4 - Neo4j


Today we explain how we're using Neo4j to perform our spatial queries. That's where the real magic of Find-A-Record happens. Here's the most complicated query we discussed in the video:

start node = node:geom('withinDistance:[{lat},{lon},{rad}]')
MATCH node-[r:COV]-(col)
WHERE all( rel in r 

    WHERE rel.from >= {from} 
    AND rel.to <= {to} 
    AND rel.tag = {tag}
)
return col.cbid

This week we got clocked on the head by the 80/20 rule of software development (read more about that rule here). This is why we look exhausted in our video this week. Some of the fun details:
  • In Ubuntu's Upstart conf files, environment variables are not set on an exec command unless you add -l to the su command (ie, exec su -l -c "<command>" user >> output.log). This is because su runs in a non-login shell by default (See this article here, specifically the last couple of paragraphs). And non-login shells don't run /etc/profile.d/nodejs.sh to set the environment variables!
  • Know your network topology BEFORE you start making assumptions! We ended up with two separate APIs, a private one for the website and User information, and a public one for the actual Data. Problem is, we had some code already written that assumed 1 API. Silly us.
  • If you have a terminal session up and you are SSH'd into a server running nodemon, a 1 second network hiccup will freeze the connection in such a way that the server will not kill your terminal session but you will be disconnected by putty. Fun fun. "ps ax" and "kill" to the rescue.

Wednesday, September 25, 2013

Water Cooler Wednesday #4 - Announcing Find-A-Record

We have two big announcements today.

First, we have finally picked a name for our project: Find-A-Record. It is located at http://www.findarecord.com.

Second, we are planning a private beta. You can signup for the beta on Find-A-Record's home page. We expect to start letting people into the beta within a few weeks.

That's all for today. See you in the beta.

Friday, September 20, 2013

Developer Day #3 - A Beautiful Demo



This week we setup Neo4j, filled it with data, and put an API on top of it to perform our geospatial queries. We're confident that you'll love the result as much as we do. And we're just getting started.


Wednesday, September 18, 2013

Water Cooler Wednesday #3 - Screenshots

At last, we have screenshots.


We patterned the general layout after Google Maps with the search controls on the left. The search results on the right are calculated based on the search area designated by the circle on the map.


When a place is selected using the search box, the map automatically centers to that spot and zooms in. The search box even has autocomplete.


The shaded circle represents the search area. It can be dragged across the map.


You can also adjust the size of the search area by dragging the border of the circle in or out.

We have plans to add many more features, such as:
  • Hovering over the search results will highlight their location on the map
  • Adding filters for the year and record type in the upper left
  • Clicking on results will show where they are located within their repositories
  • Ability to view jurisdictional boundaries on the map, including historical boundaries

What do you think? What features would you like to see?

Friday, September 13, 2013

Developer Day #2 - Creating a Place API


Place Data Sets We Tried
Any other suggestions for free place data?

Wednesday, September 11, 2013

Water Cooler Wednesday #2 - Standardized Place Names

File:Union flag 1606 (Kings Colors).svgA contentious issue in genealogy right now is the standardization of place names. Genealogists are taught to record events with the proper name of the place for when the event took place. For example, if an event occurred in 1734 in Virginia, the place would be properly recorded as "Colony of Virginia, British Colonial America, United Kingdom". On the other hand, if the event occurred in 1792 then the place should be recorded as "Virginia, United States".

File:Flag of Virginia.svgSome modern genealogy programs try to force you to choose a standard name for place, so you either always use "Colony of Virginia, British Colonial America, United Kingdom" or you always use "Virginia, United States". This can make the source citations appear misleading and frustrate future research, especially if there were changes to the jurisdiction. If a city changed counties or state, you might end up searching for records in the wrong courthouse.

In our Genealogical Repository Index (GENRI) and Raven projects, we won't be using standardized place names. Instead, we will index both the modern and historical names, pointing them to the same underlying location. We will also allow the coverage and jurisdiction boundaries to change over time, removing the need for rigidly enforced standardization. Whether you search for "Virginia" or "Virginia Colony", you will end up at the right place.

For further reading, James Tanner has blogged about this issue manymany, many, many, many, many times.

Wednesday, September 4, 2013

Water Cooler Wednesday #1 - Origins

10 years ago I was doing genealogy research with my father on our Oxenbold line. We were tracing the family of Richard Oxenbold from Knighton-on-Teme, Worcestershire, England. According to his 1728 Christening record his parents were William and Elizabeth. There were other children later, but this was the earliest mention of children to that couple. Looking back over the preceding 5 years, we found 2 other Christenings between a William and Jane, and then the death of Jane and both of her children in 1725. We surmised that William had become a widower (upon the death of Jane) and had remarried a few years later to Elizabeth. To prove it we needed to find a marriage record between William and Elizabeth.

There was no marriage record for William and Elizabeth in Knighton-on-Teme. Since it was very common for marriages to take place in the wife's parish, we also looked in the nearby parish of Lindridge where many of the Oxenbolds had been married but had no luck there either. Next we tried manually compiling a list of nearby parishes, which was slow going until we found a little known program that would create a print out for us. That gave us about 35 parishes to work with. We were able to further shrink the list by determining which parish's had marriage records extent for the period of 1725-1728. After looking at maps, Little Hereford was the likeliest candidate due to it's location just up river. A search of the appropriate microfilm revealed what we were searching for: "William Oxenbold & Elizabeth Geers of Knighton upon Teme were married by License the thirteenth of April [1727]".


After going through that process I began wondering why there wasn't a service that would give you the list of microfilm to search. You would give it a date range, a geographic location, and the type of record you are looking for, and it would return a list of records to search for, including where the records are located. This would save an enormous amount of time by automatically generating a research plan.

10 years later I still haven't seen the service I envisioned, but I am finally in a position to create it. And while I'm creating it, I think I will make it available world-wide. And not just limited to microfilm.

And so it begins.