Seven Databases: MongoDB and Cities

A few weeks ago the “nerd club” reading group at we7 moved on to Seven Databases in Seven Weeks. It’s a fun book, and I’ve been enjoying working through the exercises. The chapter on MongoDB has an exercise to use the geospatial indexing feature to search for “cities” near London. After a bit of digging and some pretty pictures I discover that things are not quite right with the supplied data.

The download accompanying the book includes a file with longitude and latitude info for many cities, and describes how to get them imported into a MongoDB service. Here’s the imported entry for London:

mongos> db.cities.find({name: 'London', country: 'GB'}).forEach(printjson)
{
  "_id" : ObjectId("503386b6d866be4ff784c633"),
  "name" : "London",
  "country" : "GB",
  "timezone" : "Europe/London",
  "population" : 7556900,
  "location" : {
    "latitude" : 51.50853,
    "longitude" : -0.12574
  }
}

The text has us create a geospatial index on the location member, which is represented by an object with latitude and longitude attributes. In contrast, the documentation for geospatial indexes starts with a strong recommendation to prefer an array:

In order to use the index, you need to have a field in your object that an array where the first 2 elements are x,y coordinates (or y,x – just be consistent; it might be advisable to use order-preserving dictionaries/hashes in your client code, to ensure consistency).

In the documentation for the spherical model that we’ll use it also says:

[MongoDB] assumes that you are using decimal degrees in (longitude, latitude) order […] The names you assign to a location object (if using an object and not an array) are completely ignored, only the ordering is detected

I’m not sure whether the book deliberately left discovering this as an exercise for the reader (it does advise reading the doc) or whether they thought that the object version would work fine, but I want to try following the project’s documentation (for now). This needs an encoding of the location for each city as an array with [longitude, latitude]. Adding a new property to each city with that array is possible, but each collection is limited to a single geospatial index, and so it seems easier to move back and forth between representations if we duplicate the collection, replacing the location value with the one we need:

function new_collection_with_modified_location(src_collection, target_collection, modifier) {
  db[src_collection].find().snapshot().forEach(
    function(city) {
      var new_location = modifier(city.location);
      city.location = new_location;
      db[target_collection].save(city);
    }
  );
  db[target_collection].ensureIndex({location:'2d'});
}
 
function to_long_lat_array(location) {
  return [location.longitude, location.latitude];
}
 
mongos> new_collection_with_modified_location('cities', 'cities_loc_ary', to_long_lat_array)

Aside: I noticed here is that if we wanted to update the city documents in-place then an update wouldn’t work. The MongoDB update modifier syntax doesn’t offer a way to add/modify a document attribute with a value based on another attribute of that document. Instead, we’d have to retrieve each document, then modify it before re-saving it. Map-reduce doesn’t seem to be an option for this, as MongoDB apparently forbids updating the documents in a collection during a map-reduce run, and so iteration is the only option.

Anyway, now we can try the search:

// units conversion helper
Earth = function() {
  var earth_radius_in_miles = 3959;
 
  return {
    m2r: function(miles) {
      return miles/earth_radius_in_miles;
    },
    r2m: function(radians) {
      return radians*earth_radius_in_miles;
    }
  };
}();
 
centre = db.cities.findOne({name: 'London', country : 'GB'}).location;
r = db.runCommand({
  geoNear: 'cities',
  spherical: true,
  near: [centre.longitude, centre.latitude],
  maxDistance: Earth.m2r(50),
  num: db.cities.count()
})

This gives several hundred results, and poses a problem about how to check that they’re good. A quick scan of the results looks okay, and the most distant one is given as Buckingham, which I reckon is indeed about 50 miles from London. However, I recently I downloaded Tableau Desktop, and this looks like a good excuse to play with it. After few false starts I arrived at the following map, which shows the set of cities found by the above query (if all you see is a blank space below, try refreshing your browser – the embed doesn’t always work on first view for me):

Tableau is pretty good, though the mapping interface is a bit clunky once you’ve got used to Google Maps. You can pan and zoom the map, and hover over points to read the city name, and lat/long info. If you get lost, just set it to its home setting via the little home icon that appears in the top left when you hover over it.

It clearly shows something is wrong. We expect a circular shape around London, but, while the left half looks okay, it’s truncated on the right side. The truncation happens at around zero degrees longitude (i.e. Greenwich), so are there any points with a positive longitude? The top panel of the next map shows the positions of all the cities where country is ‘GB’.

Again, things are obviously not right. As suspected, there are no points near London with a positive longitude. We can now see that they’ve been dumped off the coast of Somalia, with some stragglers thrown into the Mediterranean. To highlight what’s going on I’ve created 3 regions, each shown with different colours:

West of Greenwich
Longitude < 0
Cyprus
Longitude >= 0 and < 40
East of Greewich
Longitude >= 40

You can turn each of these regions on or off with the checkbox controls on the right of the map, and if the map is in its home setting it will scale to give you the smallest view of the regions you have selected. You can see that the East of Greenwich region looks like the east side of the UK, but mirrored, rotated, and, if you look closely, stretched. This suggests that the longitude and latitude values have been transposed, but only for those cities whose longitude is greater that 0. As for the stragglers, web-searching for their names (which don’t sound very British) suggest they should be in Cyprus, even though they appear to be off the coast of Lebanon and Israel. I guess these three cities are just outliers that have been given the wrong country code, but the lat/long transposition seems to have hit them just the same.

Let’s take a look at some of the location values:

mongos> db.cities.find({country:'GB', 'location.longitude':{$gte:0}}, {location:1,_id:0}).limit(5)
{ "location" : { "longitude" : 35.0191, "latitude" : 33.74057 } }
{ "location" : { "longitude" : 35.04167, "latitude" : 33.70833 } }
{ "location" : { "longitude" : 34.67083, "latitude" : 32.92917 } }
{ "location" : { "longitude" : 51.18249, "latitude" : 0.93678 } }
{ "location" : { "longitude" : 51.73011, "latitude" : 0.43001 } }
mongos> db.cities.find({country:'GB', 'location.longitude':{$lt:0}}, {location:1,_id:0}).limit(5)
{ "location" : { "latitude" : 51.64028, "longitude" : -3.24333 } }
{ "location" : { "latitude" : 51.75, "longitude" : -3.78333 } }
{ "location" : { "latitude" : 53.96667, "longitude" : -1.08333 } }
{ "location" : { "latitude" : 50.95, "longitude" : -2.63333 } }
{ "location" : { "latitude" : 50.35, "longitude" : -3.98333 } }

Hmm… it seems that some of the labels have been switched: the value of the first field is always the correct latitude, and the second field is always the correct longitude, but the labels are reversed for those where the longitude is > 0. This doesn’t seem to be a result of the import process into MongoDB. The following snippet of the mongo_cities1000.json file from the code archive that accompanies the book demonstrates the same problem:

{name:"Wroughton", country:"GB", timezone:"Europe/London", population:7003, location: { latitude:51.51667, longitude:-1.78333} }
{name:"Writtle", country:"GB", timezone:"Europe/London", population:4749, location: { longitude:51.73011, latitude:0.43001} }

Let’s fix this. It’s simple to create a new collection of cities with the correct values of latitude and longitude, by doing a partial flip of the lat and long values:

function swap_lat_long(location) {
  var new_long = location.latitude;
  location.latitude = location.longitude;
  location.longitude = new_long;
}
 
function fix_longitude_oddness(location) {
  if (location.longitude > 0) {
    swap_lat_long(location);
  }
  return location;
}
 
mongos> new_collection_with_modified_location('cities', 'cities_fixed', fix_longitude_oddness)

The lower panel of the above visualisation displays the same points but using their locations from the cities_fixed collection. The island off Somalia has been properly mapped onto the east side of the UK, and the 3 Mediterranean stragglers now fall on land, in Cyprus. We can now map the cities with fixed location values, as before, onto a new set where the location is encoded as an array of [longitude, latitude]:

mongos> new_collection_with_modified_location('cities_fixed', 'cities_fixed_loc_ary', to_long_lat_array)

and query this collection to get the correct set of cities, which is shown here:

Finally, we have a circle of all “cities” surrounding London. Before I go and celebrate, it’s worth thinking about what happens if we ignore the advice of the MongoDB documentation and query against the index created in the book. The first thing to remember is that the values in the location object were all arranged latitude before longitude, whereas the spherical index search assumes they are the other way around. So, searching for London by [longitude, latitude] gives no results. But, if we instead supply [latitude, longitude] to the geoNear function we do get some results. By doing this we’ve effectively searched for transpose-cities near transpose-London i.e. we’re searching around off the Somali coast. Also, the coordinates we read out from the results are still subject to the partial flip, and so we have to correct for that when we display the results. The final thing to think about is that the distances between transpose-cities is different than between their right-way-round counterparts: there’s a factor of cosine(latitude) to be accounted for. Near the equator moving 1 degree north means travelling the same distance as going 1 degree east, but that’s not true at other latitudes. (Remember the stretching of the transpose East of Greenwich portion of the UK that we saw earlier?) In transpose-space we’ve just searched in a 50 mile circle around transpose-London, but the shape of transpose-UK is skewed. If we print the results back in normal-space, we straighten out the shape of the UK, but our search circle has become an ellipse, as shown below:

It’s pretty clear that there are many ways to get this wrong, and only following the advice in the MongoDB documentation will keep things straight (literally). Using any representation other than [longitude, latitude] is asking for trouble. Trouble like producing data files to go with your book where a large portion of the data are wrong. It also seems handy to have a way of drawing things on a map when working with this stuff, so I’m very grateful for Tableau.

2 thoughts on “Seven Databases: MongoDB and Cities”

  1. Thanks for letting me know. They should be fixed now – after moving this site to https I was still asking for unsecure scripts from tableau.

Leave a Reply

Your email address will not be published. Required fields are marked *