Serverless Data Processing

Serverless is a perfectly suited data processing architecture that requires multi-source data ingestion and real-time operation. More and more companies depend on multi-source data processing systems to enforce their data management capabilities. Serverless has enabled them to create robust data engines at a rapid pace and reduced costs.

Cloud-Based Data Management Using Serverless Architecture

Cloud offering encompasses a large range of services that enrich, transform, validate, recognize, process, generate, store, and clean data. Many Cloud Service Providers offer data-based services such as:

  • Data storage, data streaming and data collection,
  • Machine-learning,
  • IoT,
  • Image recognition,
  • Audio streaming,
  • Video transcoding.

Huge volumes of structured and unstructured data, including text, audio files, images, documents, and videos, require robust data handling mechanisms. Serverless architecture serves this purpose perfectly for the subsequent reasons:

  • The overall infrastructure management is handed over to the cloud provider, thus reducing the burden on the development team and letting them target on developing optimized data handling and analysis functions.
  • The infrastructure scales automatically and instantly with demand, even for big workloads like image processing, video transcoding, and more.,
  • It can leverage best-in-class components with proven performance records to cope with large amounts of data.
  • It can support both batch or real-time operations.
  • It is stateless, meaning that the code only runs when a request is served.
  • Putting the system on hold for deploying new features is not necessary. New features can be easily tested and deployed independently of the other components used in the architecture.

Serverless architectures are ideally fitted to work with all kinds of data stream ingestions (for validation, cleansing, enrichment, transformation), including:

  • Business data streams (from external systems),
  • IoT sensor data,
  • Log data,
  • Social media data,
  • Financial data.

An Illustration of Serverless Real-Time Data Processing

Source: AWS

An example: how Netflix uses AWS serverless services

Netflix is one of the most popular online entertainment services available today. Founded as a streaming service back in 1997, they have transformed themselves into an industry giant now offering more than 10 billion hours of videos to 125 million customers every quarter. That is a humongous amount of data delivered every single day, and Netflix needs very powerful servers to make it possible. And right now, Netflix relies on an utterly serverless implementation provided by the AWS cloud infrastructure.

As Netflix continued to grow, they identified several pain points of self-managing massive data centers. For example, thousands of files in the range of petabytes get written into and modified on Netflix daily. Managing all aspects, such as tedious data management, was a difficult task that puts a lot of strain on the Netflix network. They found it limiting their abilities to innovate and bring in new features. So they decided to move to cloud-native implementation and adapt the serverless architecture using the AWS Lambda functions. It took them seven years to achieve full serverless implementation, which has proven beneficial in more than one way.

This involved:
·      Breaking down of each service into its own component,
·      Removing duplication,
·      And taking a "build first, optimize later" approach.
They made use of the AWS serverless services like Lambda to create the new BBC.
This approachhas proven to bring better performance optimization, scalable app development,and cost reduction on the whole.

More on: https://www.bbc.co.uk/blogs/internet/entries/8673fe2a-e876-45fc-9a5f-203c049c9f9c