Live analytics for streamed data with Kafka & DataPA OpenAnalytics

Live analytics for streamed data with Kafka & DataPA OpenAnalytics

DataPipeApache Kafka™ is a massively scalable publish and subscribe messaging system. Originally developed by LinkedIn and open sourced in 2011, it is horizontally scalable, fault tolerant and extremely fast. In the last few years its popularity has grown rapidly, and it now provides critical data pipelines for a huge array of companies including internet giants such as LinkedIn, Twitter, Netflix, Spotify, Pinterest, Uber, PayPal and AirBnB.


So, when the innovative and award winning ISP Exa Networks approached us to help deliver a live analytics solution that would consume, analyse and visualise over 100 million messages a day (up to 6.5 thousand a second at peak times) from their Kafka™ feed, it was a challenge we couldn’t turn down.


The goal was to provide analytics for schools who used Exa’s content filtering system SurfProtect®. Information on every web request from every user in over 1200 schools would be sent via Kafka™ to the analytics layer. The resulting dashboards would need to provide each school with a clear overview of the activity on their connection, allowing them to monitor usage and identify users based on rejected searches or requests.


The first task was to devise a way of consuming such a large stream of data efficiently. We realised some time ago that our customers would increasingly want to consume data from novel architectures and an ever-increasing variety of formats and structures. So, we built the Open API query, which allows the rapid development and integration of bespoke data connectors. For Exa, we had a data connector built to efficiently consume the Kafka™ feed within a few days.


The rest of the implementation was straight forward. DataPA OpenAnalytics allows the refreshing and data preparation process for dashboards to be distributed across any network, reducing the load on the web server. In Exa’s case, a single web server, and a single processing server are sufficient to allow the dashboards to be constantly refreshed, so data is never more than a few minutes old. To help balance the process, the schools were distributed amongst 31 dashboards, and filtered to a single school as the user logs in.

The final solution gives each school a dashboard, with data never more than a few minutes old, showing figures that accumulate over the day. Each dashboard allows the school to monitor web traffic and any rejected requests or searches on their connection.

Kafka


We're really excited about what we delivered for Exa Networks, and think with the versatility and scalability the latest release of DataPA OpenAnalytics offers, we can achieve even more. If you have large amounts of data, whether from Kafka™ or any other data source, and would like to explore the possibility of adding live analytics, please get in touch, we'd love to show you what we can do.

Written by
Published in Blog Posts