Experienced Devs

3959 readers

9 users here now

A community for discussion amongst professional software developers.

Posts should be relevant to those well into their careers.

For those looking to break into the industry, are hustling for their first job, or have just started their career and are looking for advice, check out:

Logo base by Delapouite under CC BY 3.0 with modifications to add a gradient

founded 1 year ago

MODERATORS

[email protected]

How much flakiness do you tolerate in end to end tests? (programming.dev)

submitted 1 year ago* (last edited 1 year ago) by [email protected] to c/[email protected]

22 comments fedilink hide all child comments

End to end and smoke tests give a really valuable angle on what the app is doing and can warn you about failures before they happen. However, because they're working with a live app and a live database over a live network, they can introduce a lot of flakiness. Beyond just changes to the app, different data in the environment or other issues can cause a smoke test failure.

How do you handle the inherent flakiness of testing against a live app?

When do you run smokes? On every phoenix branch? Pre-prod? Prod only?

Who fixes the issues that the smokes find?

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 2 points 1 year ago (3 children)

My experience with E2E testing is that the tools and methods necessary to test a complex app are flaky. Waits, checks for text or selectors and custom form field navigation all need careful balancing to make the test effective. On top of this, there is frequently a sequentiality to E2E tests that causes these failures to multiply in frequency, as you're at the mercy of not just the worst test, but the product of every test in sequence.

I agree that the tests cause less flakiness in the app itself, but I have found smokes inherently flaky in a way that unit and integration tests are not.

[–] [email protected] 2 points 1 year ago

Okay I must admit that I do not have much experience with smoke and integration tests. We run end to end tests only and skip running the other two types entirely. They would be covered by the end to end tests anyways.

Perhaps I am lucky in that our software doesn't require us to use many waits at all. Most things are synchronous and those that are not mostly have API endpoints where the status of the process an be safely queried, i.e. a wait(1000) and hope for the best is not necessary, but rather do wait(1000) until isFinished().

And yes, for us it is also a mess of errors popping up when one step in a pipeline fails, where many tests rely on this single step. I don't know whether there is a way to approach this issue neatly. This is surely a chance in the market to be taken.

[–] [email protected] 1 points 1 year ago (2 children)

I'm a fan of randomizing the test order. That helps catch ordering issues early.

Also, it's usually valuable to have E2E tests all be as completely independent as possible so it's impossible for one to affect another. Have each one spin up the whole system, even though it takes longer. Use more parallelism, use dozens of VMs each running a fraction of the tests rather than trying to get the sequential time down.

[–] [email protected] 1 points 1 year ago

Wherever possible, this is a good idea. The campsite rule - tests don't touch data they didn't bring with them - helps as well.

However, many end to end tests exist as a pipeline, especially for entities that are core to the business function of the app. Cramming all sequentiality into single tests will give you all the problems described, but in a giant single test that you need to fish out the result for.

[–] [email protected] 1 points 1 year ago* (last edited 1 year ago)

The problem with randomising the test order is that it compromises the reproducibility of results. If there are ordering issues, then your tests will sometimes fail and sometimes pass, but will developers look at that and think "ah there must be an ordering issue" or will they think "damn these flaky tests, guess I'd better rerun the pipeline"?

[–] [email protected] 1 points 1 year ago* (last edited 1 year ago) (1 children)

End-to-end tests are basically non-deterministic state machines. Flakiness can come from any point in the test: bad tests, bad state management, conflicting tests, network hiccups, etc.

Your goal is to reduce every single point of that flakiness. Just make sure you keep track of it. Sometimes flakiness in tests is really pointing at flakiness in the product itself.

Some things that can help reduce that flakiness:

Dedicated network
No external dependencies
Polling instead of static waits/sleeps

[–] [email protected] 1 points 1 year ago

Polling is certainly useful, but at some point introducing reliability degrades effectiveness. I certainly want to know if the app is unreachable over the open internet, and I absolutely need to know if a partner's API is down.