In this series, we’re discussing basic to advanced techniques for writing fast, efficient, and focused unit tests in the Django framework/content management system (CMS). Let’s start with some background.
One of the most important aspects of working on a software team is software testing agile code. During a big sprint towards a production release on a recent project, my web development team decided to get really serious about testing. After all, we had code in production now. We had to make absolutely sure that future releases fixed bugs instead of introducing them.
We immediately began to focus on beefing up our existing test coverage with thorough tests. But as our coverage increased, something we hadn’t expected started to happen: Our test suite was becoming slow. This was a minor annoyance at first, but then (and even more unexpectedly) it started to become a problem.
Why should I be concerned about unit test speed?
If you’re serious about agile development and agile software testing, then you’re serious about at least two things:
-
- Having lots of tests for ultimate coverage
- Running those tests frequently
Even on a modestly-sized project, a thorough test suite considered “slow” might take an hour or more to run. Unfortunately, this encourages Django developers to avoid running frequent tests, since it can occupy so much time. And can you blame them? Who wants to spend an hour running tests for a tiny five-minute-change that probably didn’t affect the rest of the application?
Slow tests slow down integration. There are few things worse than running a monstrous and slow set of tests on your shiny new code only to find out you made a simple mistake… and you need to run those same tests again after you fix it. If you end up having to run a test for the third time, you’ll probably start freshening up your resume while you wait.
Slow tests also slow down deployment. If your practice involves running tests before deploying code (which it should) you’ll be at the mercy of your slow test suite. This can be particularly problematic in audience-facing and/or production hotfix situations.
What is a unit test?
One of the first considerations in speeding up your tests is identifying what a unit test actually is. Looking around an established project or the Django test suite, you’ll probably encounter two types of tests: unit tests and integration tests. Unit tests are small and nimble – they should test a very small unit of code, such as a single method. Integration tests, on the other hand, are large. They tend to test an application from invocation to response, and lots of methods in between.
A good unit test might look like:
1
2
3
4
5
6
|
class TestCase(SimpleTestCase): def test_foo_is_true( self ): from myapp import foo # Assert true for bar self .assertTrue(foo(‘bar’)) self .assertFalse(foo(‘something_else’)) |
Where foo’s functionality is limited to a minimal amount of other calls:
1
2
3
4
|
def foo(s): if s = = ‘bar’: return True return False |
An integration test typically tests lots of methods (and probably needs lots of setup data or fixtures too). A dead giveaway for an integration test in Django is use of the Django test client:
1
2
3
4
5
6
|
from django.test.client import Client class TestCase(SimpleTestCase) def test_myview( self ): c = Client() response = c.get(reverse(‘my_view’)) #... |
Lots of things may happen in this view, so you may actually end up testing a large amount of code as opposed to a small unit. Further, since the test client is being used, you’re testing the mechanics of the Django framework. URL routing, request middleware, the view itself, response middleware, etc.
It’s important to make this distinction because, even though we are discussing unit tests, one of the first problems developers run into with a slow test suite is simply this: they haven’t written very many unit tests to begin with. Most of the test suite is composed of bulky integration tests that cover the same functionality over and over.
I’m convinced; where do we start?
Establish a good ratio of unit tests to integration tests
Integration tests are slow by nature, but that doesn’t make them inherently bad. Quite the contrary, integration tests are a necessary part of your test suite. Unit tests test small bits of functional code (which by themselves probably aren’t very useful outside of the big picture) while integration tests test the contracts between these smaller units of code – that these units work together as expected and prove you have a working application at the end of the day. That said, integration tests should still only comprise a small minority of cases in your test suite.
There’s no rule for the explicit ratio of unit tests to integration tests, but one idea may be that for every class and standalone method of your application, there should exist a unit test, and for every view or page in your application, there should exist an integration test.
Going further, you could isolate those integration tests so that they could be run separately, such that developers could run only unit tests before integration and integration tests only have to be run before a build. It all depends on how you can best apply this to your own workflow.
Avoid the database
Avoid database transactions as much as you can. You’ll notice a bias going forward of relying on unittest.TestCase instead of django.test.TestCase. The pluses of this are that a lot of database machinery isn’t run on test instantiation. The minuses are that a lot of database machinery isn’t run on test instantiation. Django has many nicely documented features for using test fixtures as well as ensuring the database is cleaned out after every test run. When you eschew this in favor of unittest.TestCase you have to cleanup your own mess and ensure that your database structure was properly torn down.
We think ends justify the means, though, because using fixtures is a fast route to slow tests. This is because database writes are naturally expensive, and the fixture mechanism in Django will reload fixtures for every test in the test case. If you’re testing against a complex data set, and if you have many tests, that’s a bunch of slow database writes.
Instead of fixtures, try creating non-persistent data in your tests when possible. For example, if you’re testing a method that performs an operation on a Model instance:
1
2
3
4
5
6
7
8
9
10
11
|
import unittest class SlowerTestCase(unittest.TestCase): def test_model_foo( self ): instance = MyModel.objects.create(name = "Test Instance" ) self .assertTrue(model_foo(instance)) class FasterTestCase(unittest.TestCase): def test_model_foo( self ): instance = MyModel(name = "Test Instance" ) self .assertTrue(model_foo(instance)) |
The difference is subtle, but by avoiding the inherent database save in Manager.create() your test performs much faster. Of course, this effect is much more dramatic when your tests depend on many more model instances.
Lastly, while this will not directly optimize the speed of your tests, a Django developer can save time by testing in an in-memory database like SQLite. You can do this easily by adding the following to your agile software testing environment’s settings file:
1
2
|
if 'test' in sys.argv: DATABASES[ 'default' ] = { 'ENGINE' : 'django.db.backends.sqlite3' } |
Now, if you have an explicit need to run tests against a specific database backend, this may not be an option. But for most projects that don’t stray far from regular ORM operations, this can give a huge performance increase to an already database-intensive test suite.
Use setUp/tearDown methods efficiently
So you get it, fixtures can be bad and are described by many as a testing anti-pattern. They don’t evolve with your database schema. On top of that, there’s not an easy way to use them on a per-test basis, so you end up loading the same fixture data over and over again even though you only need it once. But even so, you still need to simulate lots of relational data to test your project effectively. What now?
The first thing most developers do after realizing this is to begin leveraging the setUp and tearDown methods of TestCases. These well-documented and oft-used methods are run before and after every test in your TestCase, and are great to have at your disposal. But they house lots of operations that are repeated throughout many tests. When using setUp always be cautious about expensive operations. Take the following example:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
|
import unittest def expensive_data_generator(): #... takes 2-3 seconds # return list(some_data) class LeveragingSetUp(unittest.TestCase): def setUp( self ): self .complex_data = expensive_data_generator() def test_a( self ): self .assertTrue( 'foo' in self .complex_data) def test_b( self ): self .assertTrue( 'bar' in self .complex_data) def test_c( self ): a = b = c = 1 self .assertEqual(a, b) def test_d( self ): a = b = c = 1 self .assertEqual(a, c) |
It takes somewhere between 8-12 seconds to run, neglecting the time it takes to create the test database. Whereas in this example:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
|
import unittest def expensive_data_generator(): #... takes 2-3 seconds # return list(some_data) class LeveragingSetUp(unittest.TestCase): def setUp( self ): pass def test_a( self ): complex_data = expensive_data_generator() self .assertTrue( 'foo' in complex_data) def test_b( self ): complex_data = expensive_data_generator() self .assertTrue( 'bar' in complex_data) def test_c( self ): a = b = c = 1 self .assertEqual(a, b) def test_d( self ): a = b = c = 1 self .assertEqual(a, c) |
It takes between 4-6 seconds to run. That’s because in the first example expensive_data_generator is run for every test in the TestCase, despite some of the tests not using the data at all. You’re actually better off going against the developer’s Don’t-Repeat-Yourself mantra in this situation to have a test case that is doubly efficient. But, we can do better.
In the above example, we’re working with read-only data. That is, our tests aren’t manipulating or transforming the data in any way, so in theory we should only have to generate it once. There are actually some lesser known methods for just that: we can set up the data we need in the setUpClass/tearDownClass methods documented in the python unittest documents.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
|
import unittest def expensive_data_generator(): #... takes 2-3 seconds # return list(some_data) class LeveragingSetUpClass(unittest.TestCase): @classmethod def setUpClass( cls ): cls .complex_data = expensive_data_generator() def test_a( self ): self .assertTrue( 'foo' in self .complex_data) def test_b( self ): self .assertTrue( 'bar' in self .complex_data) def test_c( self ): a = b = c = 1 self .assertEqual(a, b) def test_d( self ): a = b = c = 1 self .assertEqual(a, c) |
2-3 seconds – much better! These methods are run once per TestCase and are ideal for read only data, since there isn’t a way to flush the data between each test run. It’s important not to manipulate this data in your individual tests, because the changes will persist to the next one, and depending on your testrunner, you may not know what test is run next.
Mock everything
Following these tips, there will still be situations where you’ll need to write data in unit tests. Some methods might require two related objects, the mechanics of which may not function without invoking database machinery (like a ManyToMany relationship with a through table with data populated on save). The Python testing library Mock becomes vastly useful here. It’s so useful, in fact, that’ll we’ll dedicate our next post in the series entirely to it. We’ll dig deep into Mock and how it can be used to focus your tests to the point where you can eliminate nearly all testing overlap and database usage, even in complex testing situations.
That’s it for part one! Make sure you’re writing unit tests when appropriate instead of integration tests, avoid the database if possible, and be mindful of how you leverage test case setUp methods. Then you can check out part two.