Realtime Data Ingest & Analysis

A client wanted to move beyond their batched data loading from the Customer Data Provider: they needed realtime statistics and events ingested in minutes rather than hours.

Challenges

The existing ingest pipeline was batched: data was available for analysis between 70 and 130 minutes after occurring

Data format was controlled by the CDP and upstream parties, increasing complexity to query

There was a desire to remove other ‘realtime’ integrations on the web-properties, due to cost & privacy concerns

Primary Dashboard tool was capable, but not suited to realtime data

Technologies & Techniques

Lambda for Compute, using Python: performing a number of upfront transformations to reduce query complexity

ECS Fargate, for management ingest and materialisation tasks

Kinesis Datastreams and Kinesis Firehose feeding into existing Redshift data warehouse

Kinesis Analytics for realtime aggregate calculations, feeding into Aurora Postgresql for aggregation storage

Grafana deployment to display realtime aggregates, protected by SSO integration

Outcome

Events were available in the warehouse around 5-15 minutes

Realtime property-usage metrics available to the organisation