Toggle Dark/Light/Auto mode Toggle Dark/Light/Auto mode Toggle Dark/Light/Auto mode Back to homepage

Starting with spatial data

Rory Meyer Feb 8, 2021

So I should probably write something about getting interesting AIS data rather than just assuming everyone has a production ready DB crammed full of indexed AIS messages. I made the assumption everyone was working from home next to me.

So first off? What is AIS? Well, here’s a pretty good definition, in fact it’s the protocol definition. In short, it’s a self organising collision avoidance system. Ships listen for time gaps in the AIS frequency around them, and then broadcast their ID and position to anyone listening. The whole idea being that two ships, that are geographically close to each other, will know where the other ship is. This breaks down a little bit in very busy areas where there might be multiple ships broadcasting in the open gap because they’re unaware of each other. In general it works pretty well though.

A pretty map with some overlaid AIS data

Now what does the data actually look like after it’s taken from radio-waves and turned into some kind of computer readable thing? It looks like this:

Not pretty. NMEA encoding

If you’ve got a good coverage (world wide for example) then you’ll be getting hundreds of these a second. To make sense of the deluge you’ll have to be able to decode, filter (if desired), check for errors, combine with any meta-data and insert into the DB. All at hundreds of messages per second. That’s ignoring things like alerting or machine learning running on the stream.

Get the data

AIS data is available from a few different places:

Let’s download some historical data that’s already been decoded. What does this look like?

Not too bad. Lots of data here

There some timestamps, that’s good, and it looks like they’re combining voyage and position reports because there are things like “vessel type” and “width” as well as position vector data.

Build the data store

Now describing the best way to ingest and store data is like asking “what’s the best car?”. Sure , NoSQL is pretty great in some cases but try using a BMW to plough your potato fields. My weapon of choice is PostgreSQL+PostGIS all packaged into a docker container. This is very particular to my set of needs. For a combination production/R&D project with a handful of developers operating in a pretty relaxed environment I have the following requirements:

  • A set of software that operates almost identically whether it’s in the cloud, the dev server or the developer’s home machine.
  • An environment that is has its own set of libraries/dependencies and is seperate from neighbouring environments
  • Off the shelf tools that are easily updated to enable cutting edge features
  • Deployments that are automated and do not add much load to developers
  • Systems that do not require too much maintenance or are easily returned to a known good state

To meet all of these I use docker + docker-compose + swarm (with a little portainer thrown on top). It takes a little bit extra to get docker going but once you’re familiar with the terminology and mental space you’ll be hard pressed to go back.

So find a docker tutorial and pull any of a hundred postgis+postgres containers. This one is good, so is this one. You could stick these all into a docker-compose.yaml file in order to preserve the container configuration. Luckily there are plenty of examples of this:

version: '2.2'
    image: cheewai/postgis
      - PGDATA=/var/lib/postgresql/data
     - "5432:5432"
     # Mount your data directory so your database may be persisted
     - path/to/data/directory:/var/lib/postgresql/data
     #*** If you have no existing database, comment the following the first-run
     #*** Empty data directory triggers initdb to be run
     # Customize access control to overwrite the default
     #- path/to/pg_hba.conf:/var/lib/postgresql/data/pg_hba.conf:ro
     # Customize server tuning parameters to overwrite the default
     #- path/to/postgresql.conf:/var/lib/postgresql/data/postgresql.conf:ro
    restart: on-failure:5

Once you’ve got the database up and running it’s time to get the data stored in there. I’ll tackle that in the next one since setting up tables can vary between super simple to very complex; especially if this is going to be a production database with years worth of AIS data.