The Imperial College of London engaged with Vokke to develop a community data comparison and collection product, allowing users to contribute, compare and visualize time series data in real time.
The project was sponsored by the School of Mathematics in London and the University of Sydney, with Vokke developing a solution based on academic research published by the team in the Journal of the Royal Society Interface.
This was not a typical data platform - it needed to operate within a research environment, where accuracy, reproducibility and scale are all critical.
Time series data is inherently complex. Formats vary, structures differ, and meaning is often context-dependent.
The objective was to allow users to upload their own data and compare it against a continuously growing dataset, made up of millions of time series and billions of individual data points.
This introduced several constraints:
At this scale, performance and accuracy are often in tension. The system needed to achieve both.
Rather than adapting the research to fit a standard product model, the platform was designed around the specific requirements of the problem. This meant working directly with research teams to ensure that data handling, comparison methods and outputs remained consistent with scientific expectations.
Early decisions focused on preserving integrity and interpretability, ensuring that performance improvements did not come at the expense of accuracy or usability.
To establish a meaningful dataset from the outset, Vokke migrated thousands of time series from existing sources, enabling the platform to deliver value immediately.
The platform was deployed on-premise within Imperial College infrastructure, running on a Docker-based environment hosted on bare metal servers - providing the control and performance required while aligning with institutional constraints.
Vokke developed a platform that allows users to upload, process and compare time series data in real time.
Data can be ingested in formats including CSV, Excel and MP3, then normalised and prepared for analysis. Once uploaded, each dataset is compared against a large, evolving database of existing time series.
Similarity is calculated using multiple metrics, executed through optimised C programs running in the cloud. Fast search capability is enabled through a modified locality-sensitive hashing algorithm, allowing relevant matches to be identified without requiring full dataset comparisons.
Results are presented through an interactive visual interface, enabling users to explore relationships between datasets in a way that supports both analysis and interpretation.
Researchers are able to work with their own datasets in the context of a much larger body of data, identifying patterns and relationships that would otherwise be difficult to detect.
Analysis becomes more accessible, with results presented in a way that supports interpretation rather than requiring specialist handling.
The platform supports both ongoing research and practical application, bridging the gap between theoretical models and usable systems.