Scaling on Demand: How Serverless Handles Traffic Spikes with Ease

Scaling on Demand: How Serverless Handles Traffic Spikes with Ease

·

7 min read

In the previous article, we discussed two process models for Serverless: run-to-completion and long-running processes. The key difference between these models lies in whether the function instance terminates immediately after execution. Additionally, we explored two scenarios: data orchestration and service orchestration.

You might wonder if these scenarios can be implemented using long-running processes. The answer is yes, but it’s important to note that run-to-completion is the purest form of Serverless. So, what’s the underlying logic?

To fully grasp this, we need to introduce a crucial concept in complex internet application architecture evolution: scaling, which is the focus of this article.

Imagine 200 users simultaneously accessing the index.html homepage of your locally developed web application. What happens to your local web server instance?

Let’s describe the state of your PC. Firstly, 200 TCP/IP connections are established between the clients and your PC, which it can barely handle. Then, all 200 clients simultaneously initiate HTTP “GET/” requests. Your web server’s main process creates “number of CPU cores -1” child processes concurrently to handle these requests. Note that we subtract one from the CPU core count to reserve one for the main process.

For instance, a 4-core CPU creates three child processes to handle three client requests concurrently, while the remaining requests are queued. The child processes start processing the “GET/” requests, match the routing rules, enter the corresponding control function, and return index.html to the clients. Once a child process sends the index.html file, the main process recycles it and creates a new one to handle the next request until all requests are processed.

Understanding this, the next question becomes simple. How do we improve the processing speed of our client queue?

Vertical Scaling vs. Horizontal Scaling

An obvious solution is to increase the number of CPU cores. We can achieve this by upgrading the configuration of a single machine, such as from 4 cores to 8 cores, resulting in 7 concurrent child processes.

Besides directly increasing CPU cores, we can add more machines (each with 4 cores). By distributing 500 clients to each of the two machines, we can also increase the number of concurrent child processes to 6.

Increasing or decreasing single machine performance is vertical scaling, which often comes with a steep cost curve as performance increases. Therefore, careful consideration is needed when adopting this approach. On the other hand, increasing or decreasing the number of machines is horizontal scaling, a more cost-effective approach and our default scaling method.

Now, let’s add some complexity. While index.html is a single file, what about data? Whether scaling vertically or horizontally, we need to restart machines. In our to-do list example, the data is stored in memory and resets upon each restart. So, how do we preserve our data during scaling?

Stateful vs. Stateless

Nodes in a network topology can be categorized as stateful or stateless based on whether they store state. Stateful nodes retain state, meaning they store data. Therefore, they require extra attention, demanding stability and resistance to frequent changes. For example, databases typically employ a master-slave structure, allowing immediate switching to a slave node if the master node encounters issues, ensuring continuous service availability.

Stateless nodes, on the other hand, do not store any state or only temporarily hold unreliable data. Due to their lack of state, stateless nodes can be scaled horizontally to handle high concurrency and scaled down to zero when there’s no traffic (sound familiar?). Stateful nodes, however, cannot do this. In scenarios with significant traffic fluctuations between peak and off-peak hours, we need to design stateful nodes to handle peak traffic while maintaining operational costs even during periods of low traffic.

A database is a typical stateful node as it persistently stores users’ to-do tasks. Similarly, a load balancer is also stateful, much like the main process maintaining the client queue in our thought experiment. It needs to store client connections to return the results processed by our web application back to the clients.

Returning to our process models, run-to-completion is inherently stateless as it terminates after execution, making it impossible to use it alone for persistent data storage. Long-running processes, however, are naturally stateful because their main process doesn’t exit, allowing it to store some values.

However, in Serverless, even if we store values in the main process of a long-running process, the cloud provider might still reclaim it. Even with reserved instances, the data in the memory of scaled-out nodes remains isolated.

Therefore, to make long-running processes stateless, we need to avoid storing values in the main process or only store temporary variables. Persistent data should be moved to dedicated stateful nodes like databases.

By separating data storage from the main process node and ensuring the main process doesn’t retain data, our application becomes stateless. We store the data in a separate, stateful database node. This example transforms into the long-running Serverless scenario discussed in the previous article, where we connect to the database during the main process startup and access data through child processes. However, this approach has a significant drawback: it directly increases cold start time. Is there a better solution?

Let’s consider an alternative approach to data persistence. Why do we have to connect to the database ourselves? Our CRUD (create, read, update, delete) operations on data essentially involve child processes reusing the TCP connection established by the main process, sending database statements, and retrieving data. Imagine if we could send instructions to the database using HTTP requests like POST, DELETE, PUT, and GET. Wouldn’t that enable us to leverage the data and service orchestration concepts from the previous lesson?

What is BaaS

Indeed, all this groundwork leads us to today’s protagonist: BaaSification. The data interface operations POST, DELETE, PUT, and GET correspond to the semantic HTTP methods of RESTful APIs. Taking MySQL as an example, POST maps to the CREATE command, DELETE to DELETE, PUT to UPDATE, and GET to SELECT. This semantic one-to-one correspondence allows us to naturally translate MySQL operations into RESTful API operations.

Traditional database approaches, due to TCP connection reuse and low communication overhead, are faster for the same operations compared to HTTP. While Serverless can directly connect to databases, connecting to traditional databases using IP addresses often proves challenging in cloud environments with VPC segmentation. Therefore, for Serverless database connections, we typically rely on BaaS services provided by cloud providers, although many BaaS services are not yet mature.

Taking it a step further, if Serverless isn’t suitable for stateful nodes, why not externalize all stateful operations as data interfaces? This allows our Serverless functions to utilize the data orchestration approach discussed in the previous lesson and achieve free scaling.

Summary

The reason why the run-to-completion model is considered purer than the long-running process model is because the latter can be misleading, tempting us to treat it like PaaS and use it as a stateful node for permanent data storage. However, in Serverless, even with long-running processes, cloud providers can still reclaim our function instances.

Just like in our example where storing data in memory resulted in resets upon each restart, by adopting a data orchestration mindset and transforming backend database operations into data interfaces, we can offload data storage in Serverless to backend applications and interact with them using data orchestration as explained in the previous lesson. However, we need to go beyond just creating data interfaces for backend applications. We need to embrace BaaSification, freeing backend engineers from server-side operational concerns during development.

For scaling, we can choose between vertical and horizontal scaling. Vertical scaling focuses on improving single machine performance but often comes with a steep cost increase, making it a cautious choice. Horizontal scaling involves increasing the number of machines, offering a smoother cost curve and serving as our default scaling method.

Stateful nodes store data, while stateless nodes process data without retaining it. Only stateless nodes can be scaled freely. Stateful nodes, responsible for storing critical data, demand careful handling. If we want our network topology nodes to scale freely, we need to externalize their data operations to dedicated stateful nodes.

When our Serverless functions access stateful nodes, it’s preferable to have these nodes provide data interfaces instead of relying solely on database commands, as database connections introduce additional overhead for Serverless functions. Furthermore, to simplify development for backend engineers, we should strive for BaaSification of stateful nodes. We will delve deeper into BaaSification in subsequent articles.

Originally published at Novita AI

Novita AI is the All-in-one cloud platform that empowers your AI ambitions. Integrated APIs, serverless, GPU Instance — the cost-effective tools you need. Eliminate infrastructure, start free, and make your AI vision a reality.