How AnotherDay and Theodo built a big data geospatial analytics platform
Benjamin Piggin3 min read
AnotherDay’s Cascade platform provides businesses with the ultimate risk and intelligence management platform. With risk assessment at the heart of so many international standards, many business suffer from competing standards for tracking and mitigating risk. Often, large spreadsheets, unwieldy slide decks and or folders with hundreds of sensitive intelligence files become impossible to manage with challenges such as version control and centralisation. Cascade provides both a single centralised platform to collate intelligence and assets and also a powerful analytics platform to make sense of this data and to extract effective risk mitigation strategies.
Today, an often used strategy for asset risk quantification is to run accumulations. This is where a radius is selected and for each of the businesses assets, the total value of all the other assets lying within that radius from the asset is calculated. A business can then look for where most value has accumulated, and therefore where they have incurred the most financial risk. This form of analysis is often used in areas such as terrorism modelling.
The Cascade platform needed the ability to run these sorts of accumulations across large datasets, in excess of 1 million assets. As these modelling runs were potentially long-running it was also desirable that these tasks could be run asynchronously from a queue.
AnotherDay worked with Theodo to build this functionality into the Cascade platform. As experts in React, Django and AWS the task was to make this end to end functionality production ready in 5 weeks.
The modelling flow begins by uploading the assets into the Cascade platform. The data was provided in CSV format which was uploaded to S3 via the frontend. This triggered a lambda function which performed validation and cleaning of the data before leveraging the aws_s3
extension for RDS PostgreSQL to transfer the data directly into RDS from S3.
Asset upload flow
To store the geospatial data in RDS, Theodo used the PostGIS extension for PostgreSQL. A second RDS instance was provisioned alongside the existing database for Cascade in order to protect the main database from the expensive modelling queries.
To trigger an accumulation modelling run, a user creates an accumulation entity in the frontend specifying parameters such as radius. Saving this entity triggers a JSON message sent to an SQS Queue with the model parameters and run details.
Celery (an asynchronous task runner) was provisioned in Fargate, providing a worker pool that subscribes to the SQS queue. The model task is picked up by a celery worker which then runs the model SQL against the PostGIS database. A naive implementation of the model scales as N2 with an analysis of 20,000 assets taking 15 minutes. Leveraging the r-Tree index provided by PostGIS provides blazing fast performance, lowering the model runtime to 15 seconds.
Running an accumulation model
The top 50 results (in terms of accumulated value) are saved to the main database for display in the frontend. Theodo used the Mapbox API to build an interactive results map, displaying the numbered results and the accumulation radius. For the rest of the results the aws_s3
extension is used again to output a CSV to S3 which allows the entire result set to be downloaded.
Focusing on system design tailored to the business domain, and leveraging AWS & open source tools, sets AnotherDay up to create a new standard for risk management. Theodo ensured best practices were followed to create a state-of-the-art platform with many exciting avenues to explore around integrating live intelligence data into the modelling process.