Magento Automated Testing Standard

There is a variety of ways to implement automated tests. When several developers start collaborating and implement tests for their code, they discover that without generally accepted conventions for tests the result of work would be the same mess as code without coding standard. The automated testing standard goal is to unify approaches for testing and ultimately to determine a high level of quality for Magento products.

This guide is a standard defining quality criteria for Magento automated tests. The standard is adopted by Magento developers and recommended for 3rd-party Magento developers.

Using this Standard

Definitions in this standard are grounded on real examples from practice of Magento team and are not contradictory to the generally-known best practices. It means that changing or adding requirements/recommendations is result of practically experienced issues and conclusions, made by Magento developers. These definitions would not encourage abusing or misusing concepts, determined by authors of the wide-adopted testing frameworks, such as PHPUnit, JMeter, Selenium...

The intended audience of these requirements are developers and QA engineers. The intended way to use is to follow requirements or recommendations, or act at own discretion if it is not covered ("what is not forbidden, is allowed").

Classification of Automated Tests

The variety of automated tests is broken down into types, which represent distinctive goals and technical aspects of implementing tests. The types in such a way determine what to expect from the tests which belong to it.

Functional tests verify the product features in the way as if user would interact with the system. Essentially it is an automation of manual regression tests, normally performed by QA engineers. The goal is to test product use case scenarios, involving multiple pages from "outside" of the system.
Unit tests verify certain portions of algorithm by executing its code with different combinations of arguments and asserting return values. The goal is to maintain high quality of algorithms being implemented and maintain low code coupling. Unit test should be isolated from environment (global variables, other classes, external resources such as database and file system, config settings, environment variables). A test which is not isolated can't be taken as Unit test. Mocking of all related objects is typical for unit tests.
Static Code Analysis tests scan the code base and assert versus certain coding standards and conventions, and verify whether the code corresponds to the framework requirements. A distinctive characteristic of static code analysis is that they don't execute code of the product being tested. The goal is to maintain a code base complying with the adopted coding standards and practices and to avoid broken references within the code.
Integration tests execute programming code on a higher level than unit tests and deployed on close to real environment. Integration tests verify interaction of components between each other and with system environment (database, file system). The goals are to tackle cross-dependencies between components/modules and environment-related issues. If to compare with unit tests, the integration tests assert algorithms "outside" on the code level, while unit tests from "inside". Integration test can also be considered as a "small" functional test, so its goal to preserve functionality (in addition to listed above). Integration test can involve many classes and can mock environment only partially, or even don't mock it at all. Typical Integration test will use real database.
Performance tests execute certain scenarios multiple times with simulating various environment conditions, typically number of concurrent users. The performance is determined by measuring samplers, such as execution time, response time, memory footprint, number of served pages, and applying different metrics to them: mean value, average etc. The goals are to monitor performance along with product development, determine bottlenecks, determine practical capacity and measure scalability.

There can be another classification of tests distinguished, parallel to the described above -- that's family of integrity tests:

Integrity tests analyze code whether it corresponds to specific framework requirements/conventions or just doesn't contain accidental mistakes. The goal is to maintain code integrity in places where it is prone to human error. For example, if a file was placed at the wrong location or deleted by accident, the integrity test would anticipate a certain file in that location and detect non-existing file. Or a test can detect that certain resource in configuration is overridden by accident by conflicting components.
Legacy tests are integrity tests, which's goal is more specific: to assert against code, which became obsolete in result of backwards-incompatible changes.
Ideally the legacy test can serve as a self-documentation to all backwards-incompatible changes in the system and automatic enforcement of not allowing old incompatible code. For example, a class was deleted. The legacy test would assert that this class is not used in the source code anymore.
An entire test suite is relevant between certain major versions of the product only, for example "between 1.x and 2.x" or "2.x and 3.x". After release of a new major version, the current legacy test suite is to be frozen and new one to be created for new major product version.

The parallelism of integrity/legacy tests is that both classifications can be applied at the same time. For example, a test can belong to "static" and "legacy" or to "static" and "integrity", or to "integration" and "integrity" test types. Each particular test can be classified like this by developer's discretion, depending on specifics of that particular test.

Summary

Let's summarize differences in types of tests in categories "what is the goal", subject of inspection, isolation:

Type of the test	The Goal	Subject of Inspection	Interaction with Environment
Functional	Prevent regressions	Use case scenario	Real environment
Unit	Help develop code with better quality. Test methods interface in all combinations	Single method	Completely isolated from environment
Static Code Analysis	Maintain consistent code base	Code in files of entire code base	OS file system
Integration	Prevent bugs in interactions between components, layers and environment.	Components	Close to real environment
Performance	Track and maintain certain performance level.	Use case scenario	Real environment
Integrity, Legacy	Maintan code integrity. In case of legacy -- in particular, not allow obsolete code.	Code base or application framework	Depends on particular test

Requirements of Covering Code With Tests

1. Functional Tests

Code to cover: must cover any code that stands for application features, which are involved in user interface or represent business value.

Expected coverage: coverage must match either to test cases in test plan or use cases in functional specification.

2. Unit Tests

Code to cover:

new code must be covered
legacy code modifications must be covered, but if impossible -- replace with "surrogate" integration tests

Expected coverage:

must cover all methods (100% lines of code) in a class
should cover 100% of lines and argument combinations

3. Integration Tests

Code to cover:

Must cover code that interacts with operating system environment, database or any other 3rd-party system directly
Must cover code in "Model" layer which interacts with database indirectly
Must be used as alternative of unit tests in legacy code in the following cases:
If it is impossible to cover due to high code coupling
If code had only minor modification and in order to cover it with unit test it would require refactoring of code, not related to the original modification

Expected code coverage:

New code -- 100% of lines must be covered
Legacy code modifications -- the affected algorithm must be covered

5. Integrity Tests

Code to cover: must cover any code which introduces convention in scope of a particular implementation, violation of which would lead to runtime error.

Expected code coverage: must cover all files applicable to this convention.

For example:

Scan for all XML-files of certain type and validate them using appropriate XML-schema
Scan for declarations of templates and invoke "fallback" mechanism to ensure they resolve

5.1. Static Code Analysis

Code to cover: must cover all new code files (or whatever qualifies as "new").

Expected code coverage: must cover all applicable files in entire code base.

5.2. Legacy Tests

Code to cover: must cover any formal backwards-incompatible changes on code level.

A failure in legacy test must provide comprehensive explanation of an alternative, if there is any.

Expected code coverage:

must cover majority of occurrences of the backwards-incompatible change
should cover 100% of occurrences

Not all changes can be covered. For example, it is possible to scan a file for literals, but it is unfeasible to analyze string concatenation or any other dynamic way of building variable.

6. Performance Tests

Code to cover: must cover resource-intensive areas of the code.

Expected coverage: must match to test plan scenarios.

Glossary

To help understand requirements better, the glossary below determines criteria for code and tests.

Code Criteria

Business Value

How important, in subjective opinion of a product owner, a feature is. There is always certain code behind a feature.

Grades: critical (must), important (should), less important (could), not important (won't). See also: "MoSCoW Method".

A hypothetical example for an eCommerce application:

Critical -- checkout, orders
Important -- catalog
Less important -- inline content editing
Not important -- catalog tags

Software Life Cycle

Current position of certain code in life cycle of the software product:

New -- code that is being developed was never released yet
Current -- code that is currently released and maintained (supported) according to current standards
Legacy -- the same as current, but sub-standard (e.g. not covered with tests or doesn't meet other quality criteria)
Deprecated -- code that is declared as not maintained anymore and will be discontinued soon
Obsolete -- code that is unusable anymore due to backwards-incompatible changes in the code it is trying to use.

Backwards-incompatible changes on code level -- when code is modified in such a way, that other code, that was potentially using it, will not work with it anymore. For example: modification of method interface (signature, return method), deleted method, deleted class, logically-incompatible change of behavior.

Legacy code which has undergone full or major refactoring, qualifies as new code.

Performance

Whether a particular feature implements an algorithm that performs bulk operations, depends on concurrent threads, depends on environment with delays in connection, performs computations.

Grades: resource-intensive, lightweight.

Examples:

Resource-intensive: writing to database in a transaction, executing an API-call through network, scanning file system.
Lightweight: performing calculations with data which is already loaded into memory

Application Layer

Which application layer the code belongs to: library, framework, "Model", "Service Layer", "Domain Layer", "View", "Controller", etc.

Test Criteria

Quality of a test can be determined by numerous criteria, which a developer can observe in a code review or using specialized tools (for example, code sniffer, Atlassian Clover, etc).

Code Coverage

Code coverage percent gives an idea of whether the code is covered by a test and helps to figure out untested scenarios. Depending on type of tests, code coverage is measured in percent of:

Lines of code
Methods in class, including combinations of possible arguments
Test cases of test plan or use cases of a functional specification

Accuracy

Ability of test to fail adequately when an error is introduced to the code it covers. Inaccurate test will work when code fails. Accurate test fails if and only if the code fails.

For example, a method paints a button in red color. A test calls the method, then asserts that color of the button is red. But mistake (inaccuracy) is not asserting that the button was not red in the first place.

A unit test with code coverage of 100% lines and methods is considered accurate (however unit test will never be as accurate as a functional test).

Fragility

Change-proofness of a test to code changes. A fragile test will fail even with minor but correct changes in the code (and will have to be fixed), while a robust test will fail mostly because of major errors in code. Some test are made intentionally fragile in order to protect some areas of code from even smallest changes.

Example: a fixture may scan code base for all available modules according to certain rule, therefore will return a set of modules regardless of a specific set of installed modules in the system. But if to make the list hard-coded, the fixture will be targeted for a specific edition of modules (AKA product edition).

Cost

Time and efforts to implement and maintain the test. Test can be easy to implement, but it can be fragile and require fixing over and over again -- that would be an expensive test. An ideally low cost for a test is when you implement it once and forget about its presence (however it is risking to be inaccurate).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly