OpenAIS
Toggle Dark/Light/Auto mode Toggle Dark/Light/Auto mode Toggle Dark/Light/Auto mode Back to homepage

Data Architecture

What does AIS look like?

Raw AIS data is encoded and isn’t really human readable. Here is a snippet of three messages:

!AIVDM,1,1,,A,13aEOK?P00PD2wVMdLDRhgvL289?,0*26
!AIVDM,1,1,,B,16S`2cPP00a3UF6EKT@2:?vOr0S2,0*00
!AIVDM,2,1,9,B,53nFBv01SJ<thHp6220H4heHTf2222222222221?50:454o<`9QSlUDp,0*09
!AIVDM,2,2,9,B,888888888888880,2*2E

The last 2 lines form a multi-part message. I strongly recommend that you browse through this fantastic description of the AIS protocol. There are several decoders out there that can turn this into a JSON or dict object. Some things to take into account would be handling the multipart messages in a streaming environment, parsing the message timestamp (if the provider included any), and transforming the output keys into something standardised.

A daily file of AIS data from a coastal receiver network would be several gigabytes in size and have about a hundred messages per second of recording. Luckily raw AIS compresses very well.

But what does it represent?

AIS messages come in 27 different types, some are only used by Type A or B transceivers, some are only used by ground stations or search-and-rescue aircraft. In general there are two main types of messages that are of interest to vessel tracking users:

  • Position reports: Vessel Identifier and GPS location data (Latitude/Longitude, Speed, Course etc)
  • Voyage reports: Vessel Identifier, other vessel identifying information (Name, Callsign, class etc), and voyage information (destination, ETA, etc). Many of the other message types can be of use to users but we’ll leave that for now.

What other datasets are there available?

No dataset is an island! To make the best use of AIS data it would require additional datasets to provide context. One simple one would just be a geometry definition for oceans, seas, territorial waters, exclusive economic zones, marine protected areas, ports and rivers. Here are some great resources for these:

  • World Port Index: A collection of information on large ports, their location, and dozens of fields on their infrastructure
  • Marine Regions: Geometry definitions for hundreds of maritime regions. Kept up to date by VLIZ!
  • AIS Definitions: A collection of helper tables that tell you exactly what “Navigation Status = 3” means, which country is represented by MID code 203, etc

Any deeper question like, “Where are fishing vessels active?” would require a combination of Position and Voyage messages but you could also go further by defining new objects based off the two main types, combined with some of the additional datasets:

  • Trajectory: A collection of position reports, ordered in time, grouped by the Vessel Identifier
  • Vessel Details: A collection of voyage reports, aggregated over time, and grouped by Vessel Identifier.
  • Voyage: A trajectory, split on changes in values in Voyage Report messages
  • Event: A timestamp combined with a status change from regular rule checks. Some examples could be:
    • 2022-01-01 12:34:56: Vessel X entered territorial water of Nation Y.
    • 2022-01-02 23:45:07: Vessel A changed Nav Status to “Engaged in Fishing”.
    • 2022-02-03 03:50:03: Fishing vessel too close to acoustic sensor mooring site.
  • Aggregates: A combination of Point or Trajectory information, grouped by Voyage or Vessel Information, that returns a single value for a location per time window. There will be many different types of aggregates though… Some examples:
    • The amount of time that cargo vessels spent in port ABC, last month.
    • The fishing effort, as a measure of time, that has occurred in the EEZ
    • What is the likelihood that a vessel at a specific location, travelling NE, is a tanker vessel?

What is this Vessel Identifier that keeps popping up?

AIS messages are associated with the Maritime Mobile Service Identifier (MMSI) which is a 9 digit code that should be unique to each vessel/station/maritime object and is assigned by the vessel’s flag state. In practice there are often duplicates and malformed MMSI numbers but it is still useful in grouping points into higher level data objects. Some care must be taken not to group points from obviously incorrect MMSIs eg: if a vessel in the Mediterranean and Southern Africa shared a MMSI then the trajectory would indicate that the vessel was travelling the length of Africa every couple of minutes.