Safe Blue Green Deployment with Durable Functions
My customer wanted to develop safe blue green deployment pipeline with Durable Functions. We develop the pipeline and implement Event Grid integration for this purpose.
Problem statement
As I share as the last post, Durable Function deployment needs to be careful. Durable Functions use Storage Table and Queue. It uses these for storing state and sending messages among the orchestrator and activities.
Breaking change
When we change the orchestrator code and activity interface and deploy it, it cause an error because of the layout change of the message (queue) and history table (storage table). If you have a running activity during the deployment, it will stop working. For development purpose it’s OK for delete the queue and storage account, however, if you have a long running activities, it could cause a problem. They want to do safe blue green deployment of Durable Function app. I’d like to share how we solve this.
In short, If you want to do the safe deployment, you need to make sure that there is no on-going activity on your Function App. Also you need to care not to override the current application with new version.
Make sure there is no running Activity
We can use Deployment Slot for deploying Azure Functions. Let’s imagine the deployment of the Durable Functions.
Imagine that you already have Ver.1 app on the Production slot. Now Staging slot is empty. In this case deploy Ver 2. is no problem. Even if you have a running task on Ver. 1, Ver 1. app is keep on working even after the swap happens. Also Ver. 2 app start working as a clean slate. Thus Ver 1. and Ver 2. both works. Then what happens if you deploy Ver.3. It could cause a problem. If the Ver. 1 app has running Activities, it will stop and cause an error. To avoid this problem, we need to make sure that there is no running activity on the staging slot app. Unfortunately, we have no way to know whether there is a running activity or not.
Durable Functions Lifecycle notification with Event Grid
We discuss with Chris who is the author of the Durable Functions, we successfully implement together the new feature of Lifecycle notification for Durable Functions. The feature allow you to notify Orchestrator lifecycle event to Event Grid. Now it is under the pull request however, it will soon integrated with Durable Functions. Please wait for a while this feature to be released.
For enabling this feature, you need to add host.json
some direction. Please add your Event Grid Topic endpoint like this.
host.json
{
"durableTask": {
"EventGridTopicEndpoint": "https://some-region.eventgrid.azure.net/api/events",
"EventGridKeySettingName": "EventGridKey"
}
}
Then add AppSettings to fit the “EventGridKeySettingName” and put your Event Grid Topic key on it.
"EventGridKey": "YOUR_EVENT_GRID_TOPIC_KEY_HERE"
Now your Event Grid Topic get notifications from Durable Functions. It includes these information. You can filter the event with subject
.
Now ready to implement Blue Green Deployment for Durable Functions.
Architecture
We decide the deployment architecture to deploy every time with new TaskHub name. For example, VSTS has environemnt variables for each Build / Release. like BUILD.BUILD_ID. You can use the ID for the TaskHub name. If you don’t know about the TaskHub, please refer this article. Using the new feature, now you can get events from Durable Functions.
State management
Also we need state management server. For knowing the completion of the Running activities of the staging slot, we need to know which version apps with TaskHub number and if the TaskHub has running process or not. We develop a simple state management server with Azure Functions.
State management backend API receives Notifications from Durable Functions. Save it to the Storage Table. If it is finished, you can mark it completed or just remove it. The backend API provide some APIs to keep the state of the Production / Staging TaskHub name and Orchestrator’s instance state.
Deployment Pipeline
The deployment pipeline needs to update the sate of the Backend API. For the safe deployment, we start with asking the backend API if there is running activities on the staging slot. I’ll show you the basic steps with VSTS.
- VSTS -> Get Staging Status() -> Backend API -> (Ready or Not Ready)
- VSTS -> Post Staging(taskhub name) -> Backend API: updates state -> (Ok, Error)
- VSTS deploy to the Staging Slot
- VSTS -> Post Swap() -> Backend API: swap the task hub name -> (Ok, Error)
- VSTS swap slot
- VSTS -> Cleanup() -> Backend API: remove old task hub data -> (Ok, Error)
At the Step 1. if you get “Not Ready” status, just fail the pipeline. If you got and error with Step 3. Just Re-Run the pipeline with research the root cause. If you fail with Step 5, You can swap manually, however, I’ve never experience the swap failure. :) Finally, I recommend to clean up the old data.
Durable Functions Safe Deployment extension and Backend API
For someone wants to use this strategy, we develop the VSTS extension. Also we are planning to make the backend API code written by Azure Functions public to share. It helps you to implement this strategy quite easily. We are working on this wait for a while. However, you can refer my GitHub. It is not implement everything, however, just add a little code, then you can finish it.
Another Strategy
My colleague, Brandon, the Durable Functions guy, suggest me an interesting idea. Just use API management to change the endpoint, when you deploy a new function, increase an instance. If you have enough Function Apps to finish the long running process, you don’t need to manage the state.
It also works! You can choose both, however, it is trade off. If you afford to use API management and several Function Apps and prefer a simple architecture, you can go second one. If you want to go cheaper and safer way, you can choose the first one. Also you need to consider how frequent your deploy is. The customer need to handle Long running activities with distributed transaction handling with Durable functions. Also they don’t have frequent deployment after the production. In this case, the first approach might fit. If you want to frequent and safe deployment, you can increase the number of the slots with the first strategy also, you can go the second strategy. It’s up to you.
Special Thanks
This achievement is not only me but a lot of cool developers include our customer, Sigma consulting! They are super technical guys. We have a hackfest and all of them are MVP who have very great idea and very deep knowledge of durable functions. I’ll write an case study post in a couple of weeks with other hack elements. :)
Thank you for the hackers, they are highly technical guys, I’m very excited to have a hack with you. I’m lucky to work with you guys! Let’s hack together again!
Sigma consulting developrs!
- Yuka Abuno (yu_ka1984)
- Hiroyuki Kinoshita (kingkino@マンダム)
- Kazunori Hamamoto (twitter: @airish9 )
Thank you for MSFT guys
- Chris Gillum
- Brandon Hurlburt (twitter: @BrandonH_MSFT)
- Kanio Dimitrov (twitter: @azurekanio )
- Ruka Sakurai