Real time, searchable, 10 billion data point database
The Imperial College of London engaged with Vokke to develop a community data comparison and collection product, allowing users to contribute, compare and visualize time series data in real time. The project was sponsored by the School of Mathematics in London and the University of Sydney and Vokke developed a solution based on academic research that was published by the team in the Journal of the Royal Society Interface.
Users of the solution are able to upload time series data in various formats (MP3, CSV, Excel) and compare them in real time to millions of other time series, covering billions of individual data points. Similarity is based on multiple different metrics, which are computed by C programs that run in the cloud, and the fast searching was implemented by Vokke based on a modified locality sensitive hashing algorithm. The resulting comparison is then plotted on an interactive graph for users to explore.
The project involves significant volumes of scientific data and implements algorithms that are quite specific to the problem space. To fill the database, Vokke migrated thousands of time series into the project from various existing sources. The product was deployed on-premise onto Imperial College infrastructure which Vokke managed. The infrastructure is hosted within a Docker network on a bare metal server.