A client wanted to move beyond their batched data loading from the Customer Data Provider: they needed realtime statistics and events ingested in minutes rather than hours.
The existing ingest pipeline was batched: data was available for analysis between 70 and 130 minutes after occurring
Data format was controlled by the CDP and upstream parties, increasing complexity to query
There was a desire to remove other ‘realtime’ integrations on the web-properties, due to cost & privacy concerns
Primary Dashboard tool was capable, but not suited to realtime data
Lambda for Compute, using Python: performing a number of upfront transformations to reduce query complexity
ECS Fargate, for management ingest and materialisation tasks
Kinesis Datastreams and Kinesis Firehose feeding into existing Redshift data warehouse
Kinesis Analytics for realtime aggregate calculations, feeding into Aurora Postgresql for aggregation storage
Grafana deployment to display realtime aggregates, protected by SSO integration
Events were available in the warehouse around 5-15 minutes
Realtime property-usage metrics available to the organisation