When evaluating Unleash for enterprise-wide adoption, your primary concerns likely revolve around scalability, performance, and availability. You need assurance that your chosen feature management system can handle potentially tens of millions of users and millions of flags across a number of regions, without compromising user experience or introducing system fragility.
This guide explains how Unleash is architected to meet these demands, focusing on:
We’ll explore the core architecture of Unleash and the mechanisms that enable effective scaling, ensuring high performance and resilience, even under significant load. We will cover deployment options, reference architectures, and best practices for enterprise environments.
Unleash is fundamentally designed for speed and resilience. Understanding the core principles of the architecture is key to grasping how it scales effectively.

SDKs embedded in your applications fetch feature flag configurations from the Unleash server or Unleash Edge.
Backend SDKs perform flag evaluations locally within the application’s memory. This eliminates network latency for each evaluation, resulting in extremely fast checks.
Frontend SDKs rely on Unleash Edge as a fast, lightweight caching and evaluation layer closer to the end-user.
SDKs enhance resilience by caching the last known valid configuration. For optimal performance, this is often kept in-memory, but SDKs can also be configured to use persistent backup storage (like a local file, Redis, or S3). If the Unleash API server or Unleash Edge becomes temporarily unavailable, the SDK continues operating using these cached flags along with predefined default values. This persistence ensures your application remains functional not only through network interruptions but also allows it to recover state even after restarting during an upstream outage.
For a deeper dive, explore the Unleash architecture overview.
How you set up Unleash to scale depends significantly on the hosting option you choose. Unleash offers flexible deployment models to match your operational preferences and requirements. Let’s recap the three main options.
Let’s look at the key considerations for scaling the Unleash API server, its database, and the Admin UI.
This component is a Node.js-based, stateless API. It can be scaled horizontally by running multiple instances behind a load balancer, ideally across different Availability Zones (Multi-AZ).
Utilize Auto Scaling Groups (ASGs) or similar mechanisms in cloud environments to automatically adjust capacity based on load metrics (such as CPU utilization, memory usage, request count).
A PostgreSQL database that stores all configurations, feature flags, strategies, context fields, metrics, and audit logs. It is the critical stateful component.
The database layer must be performant and highly available. Use managed database services (such as AWS RDS for PostgreSQL) configured for high availability and backups.
A bundled single-page application interface for interacting with the Unleash API server. Use a global content delivery network (CDN) to serve the UI and static assets for fast loading times worldwide.
Unleash achieves scalability and high availability by distributing API instances, ensuring database resilience, providing regional disaster recovery, utilizing a CDN for UI assets, and dynamically handling traffic bursts. Let’s look at the key mechanisms at play.

We run Unleash API instances across multiple AWS Availability Zones (Multi-AZ) in each region. Load balancers distribute incoming traffic, and our infrastructure automatically scales compute resources, allowing the number of Unleash instances to adjust based on observed load.
We use AWS RDS for PostgreSQL configured for high availability with Multi-AZ deployments, providing automatic failover within 2 minutes to a hot standby in case the primary database instance or its AZ fails.
Full backups are stored in a separate region. A standby Unleash cluster in the backup region can take over in the rare event of a complete primary region failure.
The Unleash Admin UI is served through a global CDN for fast loading worldwide.
Our managed environment is designed for elasticity. Auto-scaling adjusts resources automatically. We continuously monitor load and performance metrics, allowing the system to handle sudden traffic peaks, such as those from product launches or marketing campaigns, ensuring consistent performance.
When designing for high availability and resilience in a self-hosted environment, we recommend that you implement the following strategies:
We recommend running the Unleash API nodes in at least two different Availability Zones within a single region. Place these instances behind a load balancer that distributes traffic across the healthy instances in different AZs.
Key benefits:
For enterprise-scale deployments, a robust, managed service (such as AWS RDS for PostgreSQL) is recommended.
Configure the following:
Multi-AZ deployment for automatic failover within the primary region.
A cross-region read replica or continuous backups shipped to a secondary region for disaster recovery purposes.
Point-in-time recovery (PITR) to handle accidental data deletion or corruption scenarios.
SDK caching provides resilience against temporary database unavailability for flag evaluations, but we still recommend that you configure high-availability for the database to maintain key functionality like flag configuration updates and Admin UI access.
Configure automatic failover for your database (managed services typically handle this). Failover within a region should ideally complete in less than a few minutes. Ensure full database backups are regularly taken and stored securely, preferably in a separate region.
In the rare event of a complete primary region failure (disaster recovery scenario):
For organizations with globally distributed users and a high number of frontend flag evaluations (in web applications and mobile apps), relying solely on the Unleash API server can introduce latency for end-users and place significant read load on the API server and database.
Unleash Edge acts as a lightweight, globally distributed, read-only caching layer for your feature flag configurations. SDKs connect to the nearest Edge instance instead of directly to the Unleash API server.
Edge was specifically built to handle scenarios where tens of millions of end-users are spread across the globe, requiring localized evaluation of feature flags with minimal latency.
When you require low-latency flag evaluations for a globally distributed user base, we recommend that you use distributed Edge instances across multiple global regions.

With Unleash Enterprise Edge Cloud, this complexity is handled for you—you simply configure your SDKs with the provided Edge URL. We ensure that:
Even when global scaling and minimizing cross-region latency are not immediate priorities, implementing high availability is crucial for reliable production workloads using Unleash.
A standard approach involves addressing potential single points of failure within a region:
This configuration mitigates single points of failure for the Unleash API server and its database within a region.
Unleash Cloud provides built-in observability dashboards, usage metrics, and system status updates via the Admin UI and a public status page.
Underlying infrastructure monitoring, alerting, and scaling are handled by Unleash SREs.
Use the Connected Edges dashboard to get observability metrics for Unleash Edge instances, such as ID, region, resource usage, and latency.
Unleash is architected with scalability, performance, and resilience in mind, designed specifically for supporting demanding enterprise workloads. With fast, local feature flag evaluations, Unleash Edge for added scalability, and flexible deployment models, organizations can achieve reliable, high-performance feature management at a global scale, enabling faster, safer software delivery.
This FAQ section addresses common questions about scaling Unleash for enterprise use, focusing on performance, high availability, and managing large user and flag volumes.
Choosing a hosting option depends on several factors such as: your organization’s tolerance for operational overhead, requirements for data residency and infrastructure control, regulatory compliance needs (such as FedRAMP), available SRE/DevOps resources and expertise, and desired speed of implementation (time-to-value).
We generally recommend Unleash Cloud for most enterprises due to reduced operational complexity, built-in high-availability and disaster recovery, and managed scaling by Unleash experts. Choose self-hosted or hybrid if specific control, network isolation (air-gapped), or strict regulatory requirements mandate it.
Unleash achieves speed primarily through how feature flag evaluation works in the SDKs.
Unleash Cloud is architected on AWS for high availability and elasticity:
Unleash Edge is a lightweight, globally distributable, read-only cache for your feature flag configurations. Think of it as a specialized distribution layer for feature flags. You need it primarily for:
The main challenges include:
No, your application will not stop working. The Unleash SDKs are designed for resilience:
You can monitor the status via our public status page.
While ASGs scale the Unleash API server, Edge solves different problems:
Yes, self-hosted Unleash Edge can operate effectively in restricted or air-gapped environments. Once synced with an accessible Unleash API server (which could also be self-hosted within the restricted network), Edge can serve cached flags locally even if disconnected, synchronizing again when connectivity is restored.
Unleash Edge is the modern, high-performance, recommended successor to the legacy Unleash Proxy. Edge offers significant improvements in performance, scalability, real-time updates (via streaming in Enterprise Edge), and maintainability compared to the older Proxy component. New deployments should use Unleash Edge.
Unleash Cloud is designed for high availability:
Unleash is extremely scalable, designed to handle traffic volumes well beyond typical enterprise needs, as detailed in our recent performance test updates.
The exact capacity depends on the evaluation method: