Solving Long, flaky testing with Azure DevOps

6 min readMar 3, 2019

I contribute an open source project. The project is awesome, however we have a problem of testing. We need a long time to execute the integration testing, also the tests has some flaky tests. People gradually stop executing the tests because of the long/flaky tests. I decide to solve this problem. However, the configuration is simpler than expected. I’d like to share what I learned.

Enabling Storage Emulator

We need to enable the Storage Emulator on the Hosted 2017 agent. Just simply create Command Line task then execute this code inline.

sqllocaldb create MSSQLLocalDB
sqllocaldb start MSSQLLocalDB
sqllocaldb info MSSQLLocalDB"C:\Program Files (x86)\Microsoft SDKs\Azure\Storage Emulator\AzureStorageEmulator.exe" start

Configure Environment Variables

Azure Pipeline can configure variables. However, it is NOT environment variables. Let’s turn these into environment variables. The variables available on the task. Create a Powershell task which execute this commands to expose the environment variables.

[Environment]::SetEnvironmentVariable("DurableTaskTestStorageConnectionString", "$(DurableTaskTestStorageConnectionString)")

Then set the Variables

CA0068 Error

We encounter this error. This happen by FxCop can’t find the pdb file.

##[error]CA0068 : Debug information could not be found for target assembly 'DurableTask.Core.dll'. For best analysis results, include the .pdb file with debug information for 'DurableTask.Core.dll' in the same directory as the target assembly.

When we search the Project, We found the settings on a props file. Currently just skip the FxCop code analysis by setting Configuration as debug.

<!-- Code Analysis Settings --><RunCodeAnalysis>True</RunCodeAnalysis><RunCodeAnalysis Condition=" '$(Configuration)' == 'Debug' ">False</RunCodeAnalysis>

FxCop code analysis and FxCop analyzers - Visual Studio

FxCop analyzers are based on the .NET Compiler Platform ("Roslyn"). You install them as a NuGet package that's…

docs.microsoft.com

You can stop this error by adding `/p:DebugType=pdbonly` on your Visual Studio Build Task, However, it eventually cause Sign an Assembly with Strong Name. It will take time, so this time I keep on using debug in this CI.

Common MSBuild Project Properties - Visual Studio

Specifies the path of the file that is used to generate external User Account Control (UAC) manifest information…

docs.microsoft.com

How to: Sign an Assembly with a Strong Name

There are a number of ways to sign an assembly with a strong name:

docs.microsoft.com

Update the Test Adapter

We encounter this error.

An exception occurred while invoking executor 'executor://mstestadapter/v2': Object '/0f369e97_078e_49a1_8dab_5e11d1ca83d6/fp+dq7nedbbafpjqlevc08sn_19.rem' has been disconnected or does not exist at the server.

According to the issue, We can solve this issue by upgrading the nuget packages MSTest.TestAdapter and MSTest.Framework to point at least 1.2.0+ on you Test project csproj file

<PackageReference Include="MSTest.TestAdapter" Version="1.4.0" />    <PackageReference Include="MSTest.TestFramework" Version="1.4.0" />

Run tests fail intermittently with a disconnected from server exception. · Issue #28 ·…

Description Running a large set of tests sometimes throws the following exception from the Test Platform: "Error: An…

github.com

The first try

Unfortunately, the task has been canceled. The reason is it took over 30 min. By default, Hosted agent will be canceled in 30 min. If you upgrade the plan, you can use it until 6 hours. I switch to Self-Hosted agent running on my machine to test the pipeline.

Hosted Agent cancel the task 30 min by default

Solving error on Self-Hosted agent

Once I change the agent, I’ve got this error. I haven’t seen the error on the Hosted agent.

Incorrect format for TestCaseFilter Missing Operator '|' or '&'. Specify the correct format and try again. Note that the incorrect format can lead to no test getting executed.

This error happens on Self-Hosted Agent. To prevent this, we can configure batch size. You need to put the number which can cover the number of the test cases.

But why? The reason is, private agent uses VsTest.Console.exe command. Command doesn’t parse the Test Filter correctly. If you enable the batch, Azure DevOps start to use API instead of the command. The API can parse the filter successfully. This is just an workaround.

Rerun of flaky tests failed · Issue #7037 · Microsoft/azure-pipelines-tasks

Environment Server - TFS 2018.2rc on-premises Agent - Private, 2.131.0 on Windows Server 2016 Issue Description Task …

github.com

Microsoft/vstest-docs

Documentation for the Visual Studio Test Platform. - Microsoft/vstest-docs

github.com

VSTest.Console.exe command-line options

VSTest.Console.exe is the command-line command that is used to run tests. You can specify several options in any order…

docs.microsoft.com

Microsoft/vstest

Visual Studio Test Platform is the runner and engine that powers test explorer and vstest.console. - Microsoft/vstest

github.com

Rerun option for flaky testing

Solving the flaky test is quite easy. Just configure rerun option on the Test task.

In our case, it is scenario testing for complex concurrency testing. It could fail by chance, however, if we retry several times, it will solve.

You can find flaky test from the test result. You can find the flaky test from the Passed on return checkbox.

Long running test issue

Now our pipeline works! However, it take too long. For only the testing, it took 22m+, Also, my PC was Surface Book 2. Very powerful dedicated machine. The Hosted agent is Standard_DS2_V2. How we can improve the execution time?

Microsoft-hosted agents for Azure Pipelines - Azure Pipelines

Learn about using the Microsoft-hosted agents provided in Azure Pipelines

docs.microsoft.com

Parallel testing with multiple agent

Azure DevOps has a feature of Parallel testing with multiple agent. Let’s configure that.

Yes! It works! You can see three agents works. This feature starts several agents at the same time, then execute at the same time. However, the tests are separated on three agents. We can use multi agents for public Azure DevOps repo up to 10. For the private, you can buy an agent.

Works! However, 22m (private pc) -> 15m (Hosted 3 Agents) is not big deal. Hmm. Let’s try 5 agents with adding more tests.

Hmm. Not big difference.

Change the algorithm

When I observe the execution, I notice one thing. The agents works properly. However, 4 of the agents already finished, only one agent keep on working. Every test execution is different! Deviation!

I notice that we can change the balancing algorithm. The default algorithm is simply split the test cases equally. Based on past running time of tests algorithm decide the test allocation based on the past execution time of the tests. How clever is it!

Try 5 agent with the algorithm.

6 minutes! Amazing!

Conclusion

Flaky testing and Long Running test is two big issues on CI. It is said that proper CI should be less than 10 minutes to get proper feedback. If it is longer than that, people give up to wait the execution of test.

If I create the pipeline by my self, It might take very long time to achieve this. I was really surprise that how easy it is. I hope this helps.

Resources

Test in parallel

You can find the concept. You can see the sample for .NET, JavaScript, Python.

Run any tests in parallel - Azure Pipelines

Continuous testing. Speed up testing by running tests in parallel using Visual Studio Test task.

docs.microsoft.com

For more detail of the VS Test (.NET)

Run any tests in parallel - Azure Pipelines

Continuous testing. Speed up testing by running tests in parallel using Visual Studio Test task.

docs.microsoft.com

Flaky test

Very good to read about the flaky testing. This is advanced case study by MSFT.

Eliminating Flaky Tests - Azure DevOps

Eliminating Flaky Tests The next part of ensuring 'Master is Shippable' was eliminating flaky tests. A test is flaky…

docs.microsoft.com

Solving Long, flaky testing with Azure DevOps

Enabling Storage Emulator

Configure Environment Variables

CA0068 Error

FxCop code analysis and FxCop analyzers - Visual Studio

FxCop analyzers are based on the .NET Compiler Platform ("Roslyn"). You install them as a NuGet package that's…

Common MSBuild Project Properties - Visual Studio

Specifies the path of the file that is used to generate external User Account Control (UAC) manifest information…

How to: Sign an Assembly with a Strong Name

There are a number of ways to sign an assembly with a strong name:

Update the Test Adapter

Run tests fail intermittently with a disconnected from server exception. · Issue #28 ·…

Description Running a large set of tests sometimes throws the following exception from the Test Platform: "Error: An…

The first try

Solving error on Self-Hosted agent

Rerun of flaky tests failed · Issue #7037 · Microsoft/azure-pipelines-tasks

Environment Server - TFS 2018.2rc on-premises Agent - Private, 2.131.0 on Windows Server 2016 Issue Description Task …

Microsoft/vstest-docs

Documentation for the Visual Studio Test Platform. - Microsoft/vstest-docs

VSTest.Console.exe command-line options

VSTest.Console.exe is the command-line command that is used to run tests. You can specify several options in any order…

Microsoft/vstest

Visual Studio Test Platform is the runner and engine that powers test explorer and vstest.console. - Microsoft/vstest

Rerun option for flaky testing

Long running test issue

Microsoft-hosted agents for Azure Pipelines - Azure Pipelines

Learn about using the Microsoft-hosted agents provided in Azure Pipelines

Parallel testing with multiple agent

Change the algorithm

Conclusion

Resources

Test in parallel

Run any tests in parallel - Azure Pipelines

Continuous testing. Speed up testing by running tests in parallel using Visual Studio Test task.

Run any tests in parallel - Azure Pipelines

Continuous testing. Speed up testing by running tests in parallel using Visual Studio Test task.

Flaky test

Eliminating Flaky Tests - Azure DevOps

Eliminating Flaky Tests The next part of ensuring 'Master is Shippable' was eliminating flaky tests. A test is flaky…

Written by Tsuyoshi Ushio

Responses (2)