How bad is e2e test performance really? (rtl vs cypress vs playwright vs testcafe)

Andreas Opferkuch
5 min readOct 29, 2021

--

Update 2021–11–02
Big thanks to Pavel Feldman (see comments) for pointing out that if the tests within a playwright suite don’t rely on shared state, one can use describe.parallel to improve the performance!

Before sharing my results using that, I need to point out that my benchmark results with playwright varied. The results in the original article, great as they are, are actually worst case. I of course rebooted multiple times and reinstalled dependencies before finishing the article. But the numbers stayed “high”, so I thought the lower ones I initially got were a fluke.

But they’re back, despite not having reinstalled dependencies. Right now, for the 100x10 short tests, I get 129s instead of 172.

Using describe.parallel, this decreases further: 109s

A few weeks ago, a lead frontend engineer at a company told me that he has started questioning the viability of react-testing-library because js-dom is slow.

Aside from checking grafana CI and not seeing the slowness (6248 tests done in 74s), it reminded me of the age-old “e2e tests are slow”. And I asked myself whether that could also be a perception that’s either wrong or at least outdated. While I stumbled across this benchmark, I really wanted to compare to react-testing-library to find out whether the traditional testing pyramid still makes sense from the point of view of performance. (Also — I was curious about testcafe. There’s a section about why below.)

It depends

These are the results of my experiments with verdaccio, run on a i7–10875H machine with 64 GB RAM. (You can check out my branch here: https://github.com/s-h-a-d-o-w/verdaccio/tree/e2e-experiments)

I run the same simple test many times with cypress, playwright, testcafe and react-testing-library. It tries to be close to reality by mocking a large data set (which admittedly doesn’t really impact rendering, since verdaccio uses react-virtualized) as well as opening and closing a dialog that is animated.

Test execution time (raw data)

I’ll leave it up to you to judge the results but one thing needs to be addressed…

What’s going on with cypress?

Strictly speaking, cypress is actually not particularly slow. It just doesn’t support running tests in parallel on the same machine. So as soon as there is more than one test file, in a benchmark that uses a single machine like I did, it’s bound to be slower.

Which in an age of machines with at least 8 cores and 32 GB RAM (and even on CI, having multiple cores and a decent amount of RAM is common these days) may seem like a weird architectural decision. But this limitation is one of the major ways to get people to use their cloud service. (Although there is an alternative: https://github.com/sorry-cypress/sorry-cypress )

While playwright doesn’t do this, it provides little interactivity. Meaning — no watching of test code out of the box (although there is a 3rd party tool called playwright-watch that mostly just wraps with chokidar to simplify the workflow) and its trace viewer doesn’t refresh automatically on repeated test failures. Which I imagine makes creating complex e2e tests somewhat inconvenient.

Personally, I probably wouldn’t run all e2e tests locally regularly anyway. While it is nice to know that if I want to, I can (without needing to go for lunch), it is probably also nice to be able to run individual tests easily through a UI. On the other hand, there is a bit of config overhead with cypress. I think it’s nice that with playwright, you can simply use e.g. a 16 core CI node and call it a day.

Hey, this article is supposed to be about performance, not the overall DX of these tools. But I feel a responsibility to prevent possible kneejerk reactions to the results I show here. 😅

I hope I’ve provided enough context, let’s move on…

A more complex scenario

By “more complex”, I mean that more operations happen within a single test, as tends to be the case in e2e tests. I did this by simply running the operations and assertions from the test in the repo 20 times in a loop.

As one might expect, the difference between js-dom versus browser is a lot smaller here.

Why cypress, playwright and… testcafe?

Cypress and playwright are probably obvious.

Testcafe is apparently not popular among the JS community (but note that it has the highest number of write-ins among “other tools” at the bottom of the State of JS 2020 survey), yet it appears to be the only reasonable (based on my research; opinions may differ) solution for running tests on device farms. I don’t know what happened because a few years ago, I would hear and read how important it is to run tests on real devices over and over again. Not in recent years though. But having experienced issues on a number of web apps on my mobile devices — simply as a user — it doesn’t seem to me like it’s suddenly not important any more.

Closing words

If e2e tests truly were slow in the past, looks to me like that’s not really a thing any more. At least if you set them up well. Because as these benchmarks show, it’s key to not have too many (or few) operations within a single test suite and to make sure to parallelize heavily. Whether that’s using appropriate threading arguments on a single machine or in the case of cypress — using their slightly more involved multi-node process.

While 3–4x lower performance is significant, it’s probably negligible in the grand scheme of things, since even if you now were to rely more heavily on e2e tests, you’ll probably still have a lot less compared to unit tests.

Especially because contrary to unit tests, one can’t run only those e2e tests that relate to code changes in a pre-commit hook. At least if you’re like me, you like to fail early and have plenty of tests that make this possible. That said, if a team doesn’t like git hooks and wants to rely exclusively on CI, I now see even less reason not to be enthusiastic about broadening the testing pyramid at the top.

--

--

Responses (3)