The Commonwealth Scientific and Industrial Research Organisation is an Australian Government agency responsible for scientific research, founded in 1916. The project consisted in developing a search engine that can operate 24/7 with massive compute power at low cost for genomic research.
The CSIRO developed a search engine that "lets researchers drill into complex diseases to look at how the genome might influence them". The platform needed to be accessible online anytime and provide results in seconds.
Finding the right string of genome to edit among 3 billion letters of the genome requires massive computational power. Auto-scaling groups were not quick enough to deliver results in seconds. Containers running 24/7 are much less cost-efficient than the serverless solution.
Serverless functions are used to parallelize the task of analyzing chunks of the genome at the same time: "we used SNS topic in order to send the payload of which region in the genome a specific Lambda function should analyze. And then from there the result of that Lambda function was then put into a DynamoDB database sort of in an asynchronous way of collecting all the information and after all of this was done the summary was sent back to the user." explains Denis Bauer, Head of Cloud Computing Bioinformatics
The platform Variant Spark is now available on AWS Marketplace. A side by side comparison between containers and serverless brings cost down from 3000 USD / month to just 15 USD