The dual-cluster upgrade strategy is a Kong Gateway upgrade option used primarily for traditional mode deployments and for control planes in hybrid mode.
This guide refers to the old version as cluster X and the new version as cluster Y.
With a dual-cluster upgrade, you deploy a new cluster of version Y alongside the current version X, so that two clusters serve requests concurrently during the upgrade process. You will gradually adjust the traffic ratio between the two clusters to switch traffic over from the old cluster to the new one based on the business metrics.
flowchart TD DBX[(Current database)] DBY[(New database)] CPX(Current Kong Gateway X) Admin(No admin write operations) Admin2(No admin write operations) CPY(New Kong Gateway Y) LB(Load balancer) API(API requests) API --> LB & LB & LB & LB Admin2 -."X".- CPX LB -.90%.-> CPX LB --10%--> CPY Admin -."X".- CPY CPX -.-> DBX CPY --pg_restore--> DBY style DBX stroke-dasharray:3 !important style CPX stroke-dasharray:3 !important style Admin fill:none!important,stroke:none!important,color:#d44324 !important style Admin2 fill:none!important,stroke:none!important,color:#d44324 !important linkStyle 4,7 stroke:#d44324 !important,color:#d44324 !important linkStyle 3,6,9 stroke:#b6d7a8 !important
Figure 1: The diagram shows a Kong Gateway upgrade using the dual-cluster strategy. The new Kong Gateway cluster Y is deployed alongside the current Kong Gateway cluster X. A new database serves the new deployment. Traffic is gradually switched over to the new deployment, until all API traffic is migrated.
This upgrade strategy is the safest of all available strategies and ensures that there is no planned business downtime during the upgrade process.
This method has limitations on automatically generated runtime metrics that rely on the database. During the upgrade, some runtime metrics (for example, the number of requests) are sent to two databases separately. Since the metrics between the databases are not synced, metrics will not be accurate for the duration of the upgrade.
For example, if the Rate Limiting Advanced plugin is configured to store request counters in
the database, the counters between database X and database Y are not synchronized.
The impact scope depends on the window_size
parameter of the plugin and the duration of the upgrade process.
Similarly, the same limitation applies to Vitals if you have a large amount of buffered metrics in PostgreSQL or Cassandra.