Test automation is based on a simple economic proposition:
· If a manual test costs $X to run the first time, it will cost just about $X to run each time thereafter, whereas:
· If an automated test costs $Y to create, it will cost almost nothing to run from then on.
$Y is bigger than $X. I've heard estimates ranging from 3 to 30 times as big, with the most commonly cited number seeming to be 10. Suppose 10 is correct for your application and your automation tools. Then you should automate any test that will be run more than 10 times.
A classic mistake is to ignore these economics, attempting to automate all tests, even those that won't be run often enough to justify it. What tests clearly justify automation?
· Stress or load tests may be impossible to implement manually. Would you have a tester execute and check a function 1000 times? Are you going to sit 100 people down at 100 terminals?
· Nightly builds are becoming increasingly common. (See [McConnell96] or [Cusumano95] for descriptions of the procedure.) If you build the product nightly, you must have an automated "smoke test suite". Smoke tests are those that are run after every build to check for grievous errors.
· 夜间构建变得越来越普遍了。(参见[McConnell96]或[Cusumano95]可以了解这个过程的描述)。如果在夜间构建产品，就必须有一个自动化的“冒烟测试套件”。 冒烟测试指的是那些在每次构建之后都去检查严重错误的测试。
· Configuration tests may be run on dozens of configurations.
The other kinds of tests are less clear-cut. Think hard about whether you'd rather have automated tests that are run often or ten times as many manual tests, each run once. Beware of irrational, emotional reasons for automating, such as testers who find programming automated tests more fun, a perception that automated tests will lead to higher status (everything else is "monkey testing"), or a fear of not rerunning a test that would have found a bug (thus leading you to automate it, leaving you without enough time to write a test that would have found a different bug).
其他种类的测试不是这个明显。仔细想一下，对于那些多次运行或者运行次数是手工运行次数10倍的自动化测试，你是否只运行一次。要当心实现自动化的不理性的、感情的原因，例如测试员发现程序自动化测试更有趣，认为自动化测试将带来更高的地位(其他测试都是“猴子测试”)，或者是害怕不能重新运行一个会发现 bug 的测试(这导致你将它自动化，使你没有足够的时间编写一个会发现其他 bug 的测试)。
You will likely end up in a compromise position, where you have:
1. a set of automated tests that are run often.
2. a well-documented set of manual tests. Subsets of these can be rerun as necessary. For example, when a critical area of the system has been extensively changed, you might rerun its manual tests. You might run different samples of this suite after each major build.
3. a set of undocumented tests that were run once (including exploratory "bug bash" tests).
Beware of expecting to rerun all manual tests. You will become bogged down rerunning tests with low bug-finding value, leaving yourself no time to create new tests. You will waste time documenting tests that don't need to be documented.
注意不要期望重新运行所有的手工测试。重新运行这些很少能找到 bug 的测试会使你停滞不前，使你自己没有时间创建新的测试。你会把时间浪费在为不需要文档的测试编写文档上。
You could automate more tests if you could lower the cost of creating them. That's the promise of using GUI capture/replay tools to reduce test creation cost. The notion is that you simply execute a manual test, and the tool records what you do. When you manually check the correctness of a value, the tool remembers that correct value. You can then later play back the recording, and the tool will check whether all checked values are the same as the remembered values.
There are two variants of such tools. What I call the first generation tools capture raw mouse movements or keystrokes and take snapshots of the pixels on the screen. The second generation tools (often called "object oriented") reach into the program and manipulate underlying data structures (widgets or controls).
First generation tools produce unmaintainable tests. Whenever the screen layout changes in the slightest way, the tests break. Mouse clicks are delivered to the wrong place, and snapshots fail in irrelevant ways that nevertheless have to be checked. Because screen layout changes are common, the constant manual updating of tests becomes insupportable.
Second generation tools are applicable only to tests where the underlying data structures are useful. For example, they rarely apply to a photograph editing tool, where you need to look at an actual image - at the actual bitmap. They also tend not to work with custom controls. Heavy users of capture/replay tools seem to spend an inordinate amount of time trying to get the tool to deal with the special features of their program - which raises the cost of test automation.
Second generation tools do not guarantee maintainability either. Suppose a radio button is changed to a pulldown list. All of the tests that use the old controls will now be broken.
GUI interface changes are of course common, especially between releases. Consider carefully whether an automated test that must be recaptured after GUI changes is worth having. Keep in mind that it can be hard to figure out what a captured test is attempting to aclearcase/" target="_blank" >ccomplish unless it is separately documented.
As a rule of thumb, it's dangerous to assume that an automated test will pay for itself this release, so your test must be able to survive a reasonable level of GUI change. I believe that capture/replay tests, of either generation, are rarely robust enough.
An alternative approach to capture/replay is scripting tests. (Most GUI capture/replay tools also allow scripting.) Some member of the testing team writes a "test API" (application programmer interface) that lets other members of the team express their tests in less GUI-dependent terms. Whereas a captured test might look like this:
· text $main.accountField "12"
文本 $main.accountField "12"
a script might look like this:
· select-account 12
The script commands are subroutines that perform the appropriate mouse clicks and key presses. If the API is well-designed, most GUI changes will require changes only to the implementation of functions like withdraw, not to all the tests that use them. Please note that well-designed test APIs are as hard to write as any other good API. That is, they're hard, and you shouldn't expect to get it right the first time.
In a variant of this approach, the tests are data-driven. The tester provides a table describing key values. Some tool reads the table and converts it to the appropriate mouse clicks. The table is even less vulnerable to GUI changes because the sequence of operations has been abstracted away. It's also likely to be more understandable, especially to domain experts who are not programmers. See [Pettichord96] for an example of data-driven automated testing.
Note that these more abstract tests (whether scripted or data-driven) do not necessarily test the user interface thoroughly. If the Withdraw dialog can be reached via several routes (toolbar, menu item, hotkey), you don't know whether each route has been tried. You need a separate (most likely manual) effort to ensure that all the GUI components are connected correctly.
Whatever approach you take, don't fall into the trap of expecting regression tests to find a high proportion of new bugs. Regression tests discover that new or changed code breaks what used to work. While that happens more often than any of us would like, most bugs are in the product's new or intentionally changed behavior. Those bugs have to be caught by new tests.
不论你采用的是什么方法，不要陷入期望回归测试发现高比例的新 bug 的陷阱。回归测试是发现以前起作用、但新代码或更改后的代码不起作用的现象。虽然它比我们希望的发生的次数更多，但许多 bug 是产品的新的或故意更改的行为。那些 bug 必须通过新测试来捕捉。
GUI capture/replay testing is appealing because it's a quick fix for a difficult problem. Another class of tool has the same kind of attraction.
The difficult problem is that it's so hard to know if you're doing a good job testing. You only really find out once the product has shipped. Understandably, this makes managers uncomfortable. Sometimes you find them embracing code coverage with the devotion that only simple numbers can inspire. Testers sometimes also become enamored of coverage, though their romance tends to be less fervent and ends sooner.
What is code coverage? It is any of a number of measures of how thoroughly code is exercised. One common measure counts how many statements have been executed by any test. The appeal of such coverage is twofold:
1. If you've never exercised a line of code, you surely can't have found any of its bugs. So you should design tests to exercise every line of code.
如果你从未执行过某一行代码，你当然不能找出它的任何 bug 。所以应当设计一个可以执行每一行代码的测试。
2. Test suites are often too big, so you should throw out any test that doesn't add value. A test that adds no new coverage adds no value.
Only the first sentences in (1) and (2) are true. I'll illustrate with this picture, where the irregular splotches indicate bugs:
句子(1)和(2)中，只有第一句是正确的。我将用下面的图说明，其中的不规则黑点指示的是 bug ：
If you write only the tests needed to satisfy coverage, you'll find bugs. You're guaranteed to find the code that always fails, no matter how it's executed. But most bugs depend on how a line of code is executed. For example, code with an off-by-one error fails only when you exercise a boundary. Code with a divide-by-zero error fails only if you divide by zero. Coverage-adequate tests will find some of these bugs, by sheer dumb luck, but not enough of them. To find enough bugs, you have to write additional tests that "redundantly" execute the code.
如果你仅编写需要满足覆盖率的测试，你会发现 bug 。那些总是失败的代码不论怎样执行，你都肯定能发现它们。但是大多数的 bug 取决于如何执行某一行代码。例如，对于“大小差一”(off-by-one)错误的代码，只有当你执行边界测试时才会失败。只有在被零除的时候，代码才会发生被零除的错误。覆盖率足够的测试会发现这些 bug 中的一部分，全靠运气，但发现得还不够多。要发现足够多的 bug ，你必须编写其他的测试“冗余地”执行代码。
For the same reason, removing tests from a regression test suite just because they don't add coverage is dangerous. The point is not to cover the code; it's to have tests that can discover enough of the bugs that are likely to be caused when the code is changed. Unless the tests are ineptly designed, removing tests will just remove power. If they are ineptly designed, using coverage converts a big and lousy test suite to a small and lousy test suite. That's progress, I suppose, but it's addressing the wrong problem.
同样的原因，因为有些测试不能增加覆盖率而将它们从回归测试套件中去掉也是危险的。关键不是覆盖代码;而是测试那些当代码更改时容易被发现的 bug 。除非测试用例是不熟练的设计，否则去掉测试用例就是去除作用力。如果它们是不熟练的设计，可以使用覆盖率将一个大而粗劣测试用例套件转化成一个小而粗劣的测试用例套件。我想这是进步，但是与这个问题无关。
A grave danger of code coverage is that it is concrete, objective, and easy to measure. Many managers today are using coverage as a performance goal for testers. Unfortunately, a cardinal rule of management applies here: "Tell me how a person is evaluated, and I'll tell you how he behaves." If a person is evaluated by how much coverage is achieved in a given time (or in how little time it takes to reach a particular coverage goal), that person will tend to write tests to achieve high coverage in the fastest way possible. Unfortunately, that means shortchanging careful test design that targets bugs, and it certainly means avoiding in-depth, repetitive testing of "already covered" code.
代码覆盖率的一个重大危险是它是具体、主观而易于衡量的。今天的许多经理都使用覆盖率作为测试员的绩效目标。不幸的是，一个重要的管理规则适用于这里：“告诉我如何评价一个人，然后我才能告诉你他的表现。”如果一个人是通过在给定的时间内覆盖了多少代码(或者是在多么少的时间内达到了特定覆盖目标)来评估的，那么那个人将倾向于以尽可能快的方式达到高覆盖率的测试。不幸的是，这将意味对以发现 bug 为目的的仔细测试设计的偷工减料，这当然也意味着避开了深层次、重复地测试“已经覆盖”的代码。
Using coverage as a test design technique works only when the testers are both designing poor tests and testing redundantly. They'd be better off at least targeting their poor tests at new areas of code. In more normal situations, coverage as a guide to design only decreases the value of the tests or puts testers under unproductive pressure to meet unhelpful goals.
Coverage does play a role in testing, not as a guide to test design, but as a rough evaluation of it. After you've run your tests, ask what their coverage is. If certain areas of the code have no or low coverage, you're sure to have tested them shallowly. If that wasn't intentional, you should improve the tests by rethinking their design. Coverage has told you where your tests are weak, but it's up to you to understand how.
You might not entirely ignore coverage. You might glance at the uncovered lines of code (possibly assisted by the programmer) to discover the kinds of tests you omitted. For example, you might scan the code to determine that you undertested a dialog box's error handling. Having done that, you step back and think of all the user errors the dialog box should handle, not how to provoke the error checks on line 343, 354, and 399. By rethinking design, you'll not only execute those lines, you might also discover that several other error checks are entirely missing. (Coverage can't tell you how well you would have exercised needed code that was left out of the program.)
There are types of coverage that point more directly to design mistakes than statement coverage does (branch coverage, for example). However, none - and not all of them put together - are so accurate that they can be used as test design techniques.
One final note: Romances with coverage don't seem to end with the former devotee wanting to be "just good friends". When, at the end of a year's use of coverage, it has not solved the testing problem, I find testing groups abandoning coverage entirely. That's a shame. When I test, I spend somewhat less than 5% of my time looking at coverage results, rethinking my test design, and writing some new tests to correct my mistakes. It's time well spent.