Real-Time, Searchable 10 Billion Data Point Database | Imperial College

The Imperial College of London engaged with Vokke to develop a community data comparison and collection product, allowing users to contribute, compare and visualize time series data in real time.

The project was sponsored by the School of Mathematics in London and the University of Sydney, with Vokke developing a solution based on academic research published by the team in the Journal of the Royal Society Interface.

This was not a typical data platform - it needed to operate within a research environment, where accuracy, reproducibility and scale are all critical.

The Challenge

Time series data is inherently complex. Formats vary, structures differ, and meaning is often context-dependent.

The objective was to allow users to upload their own data and compare it against a continuously growing dataset, made up of millions of time series and billions of individual data points.

This introduced several constraints:

Data needed to be ingested in multiple formats without loss of integrity
Comparisons had to be computed in real time
Similarity could not rely on a single method, but on multiple metrics
Results needed to remain interpretable, not just computationally correct

At this scale, performance and accuracy are often in tension. The system needed to achieve both.

Rather than adapting the research to fit a standard product model, the platform was designed around the specific requirements of the problem. This meant working directly with research teams to ensure that data handling, comparison methods and outputs remained consistent with scientific expectations.

Early decisions focused on preserving integrity and interpretability, ensuring that performance improvements did not come at the expense of accuracy or usability.

To establish a meaningful dataset from the outset, Vokke migrated thousands of time series from existing sources, enabling the platform to deliver value immediately.

The platform was deployed on-premise within Imperial College infrastructure, running on a Docker-based environment hosted on bare metal servers - providing the control and performance required while aligning with institutional constraints.

The system was built in close alignment with the academic research it supports.

The Solution

Vokke developed a platform that allows users to upload, process and compare time series data in real time.

Data can be ingested in formats including CSV, Excel and MP3, then normalised and prepared for analysis. Once uploaded, each dataset is compared against a large, evolving database of existing time series.

Similarity is calculated using multiple metrics, executed through optimised C programs running in the cloud. Fast search capability is enabled through a modified locality-sensitive hashing algorithm, allowing relevant matches to be identified without requiring full dataset comparisons.

Results are presented through an interactive visual interface, enabling users to explore relationships between datasets in a way that supports both analysis and interpretation.

Researchers are able to work with their own datasets in the context of a much larger body of data, identifying patterns and relationships that would otherwise be difficult to detect.

Analysis becomes more accessible, with results presented in a way that supports interpretation rather than requiring specialist handling.

The platform supports both ongoing research and practical application, bridging the gap between theoretical models and usable systems.

10+ billion data points searchable in real time

Millions time series compared against uploads

3 formats CSV, Excel and MP3 ingested natively

On-premise Docker-based deployment on bare metal

Real Time, 10 Billion Data Point Database

The Challenge

The system was built in close alignment with the academic research it supports.

A system that enables real-time comparison across billions of data points, without compromising usability or interpretability.

Real Time, 10 Billion Data Point Database

The Challenge

The system was built in close alignment with the academic research it supports.

A system that enables real-time comparison across billions of data points, without compromising usability or interpretability.

Related Impact Studies

E-Sourcing Analysis and Optimisation

ERP and Digital Operations Platform