-
Notifications
You must be signed in to change notification settings - Fork 0
Magento Automated Testing Standard
There is a variety of ways to implement automated tests. When several developers start collaborating and implement tests for their code, they discover that without generally accepted conventions for tests the result of work would be the same mess as code without coding standard. The automated testing standard goal is to unify approaches for testing and ultimately to determine a high level of quality for Magento products.
This guide is a standard defining quality criteria for Magento automated tests. The standard is adopted by Magento developers and recommended for 3rd-party Magento developers.
Definitions in this standard are grounded on real examples from practice of Magento team and are not contradictory to the generally-known best practices. It means that changing or adding requirements/recommendations is result of practically experienced issues and conclusions, made by Magento developers. These definitions would not encourage abusing or misusing concepts, determined by authors of the wide-adopted testing frameworks, such as PHPUnit, JMeter, Selenium...
The intended audience of these requirements are developers and QA engineers. The intended way to use is to follow requirements or recommendations, or act at own discretion if it is not covered ("what is not forbidden, is allowed").
The variety of automated tests is broken down into types, which represent distinctive goals and technical aspects of implementing tests. The types in such a way determine what to expect from the tests which belong to it.
- Functional tests verify the product features in the way as if user would interact with the system. Essentially it is an automation of manual regression tests, normally performed by QA engineers. The goal is to test product use case scenarios, involving multiple pages from "outside" of the system.
- Unit tests verify certain portions of algorithm by executing its code with different combinations of arguments and asserting return values. The goal is to maintain high quality of algorithms being implemented and maintain low code coupling. Unit test should be isolated from environment (global variables, other classes, external resources such as database and file system, config settings, environment variables). A test which is not isolated can't be taken as Unit test. Mocking of all related objects is typical for unit tests.
- Static Code Analysis tests scan the code base and assert versus certain coding standards and conventions, and verify whether the code corresponds to the framework requirements. A distinctive characteristic of static code analysis is that they don't execute code of the product being tested. The goal is to maintain a code base complying with the adopted coding standards and practices and to avoid broken references within the code.
- Integration tests execute programming code on a higher level than unit tests and deployed on close to real environment. Integration tests verify interaction of components between each other and with system environment (database, file system). The goals are to tackle cross-dependencies between components/modules and environment-related issues. If to compare with unit tests, the integration tests assert algorithms "outside" on the code level, while unit tests from "inside". Integration test can also be considered as a "small" functional test, so its goal to preserve functionality (in addition to listed above). Integration test can involve many classes and can mock environment only partially, or even don't mock it at all. Typical Integration test will use real database.
- Performance tests execute certain scenarios multiple times with simulating various environment conditions, typically number of concurrent users. The performance is determined by measuring samplers, such as execution time, response time, memory footprint, number of served pages, and applying different metrics to them: mean value, average etc. The goals are to monitor performance along with product development, determine bottlenecks, determine practical capacity and measure scalability.
There can be another classification of tests distinguished, parallel to the described above -- that's family of integrity tests:
- Integrity tests analyze code whether it corresponds to specific framework requirements/conventions or just doesn't contain accidental mistakes. The goal is to maintain code integrity in places where it is prone to human error. For example, if a file was placed at the wrong location or deleted by accident, the integrity test would anticipate a certain file in that location and detect non-existing file. Or a test can detect that certain resource in configuration is overridden by accident by conflicting components.
- Legacy tests are integrity tests, which's goal is more specific: to assert against code, which became obsolete in result of backwards-incompatible changes.
- Ideally the legacy test can serve as a self-documentation to all backwards-incompatible changes in the system and automatic enforcement of not allowing old incompatible code. For example, a class was deleted. The legacy test would assert that this class is not used in the source code anymore.
- An entire test suite is relevant between certain major versions of the product only, for example "between 1.x and 2.x" or "2.x and 3.x". After release of a new major version, the current legacy test suite is to be frozen and new one to be created for new major product version.
The parallelism of integrity/legacy tests is that both classifications can be applied at the same time. For example, a test can belong to "static" and "legacy" or to "static" and "integrity", or to "integration" and "integrity" test types. Each particular test can be classified like this by developer's discretion, depending on specifics of that particular test.
Let's summarize differences in types of tests in categories "what is the goal", subject of inspection, isolation:
Type of the test | The Goal | Subject of Inspection | Interaction with Environment |
---|---|---|---|
Functional | Prevent regressions | Use case scenario | Real environment |
Unit | Help develop code with better quality. Test methods interface in all combinations | Single method | Completely isolated from environment |
Static Code Analysis | Maintain consistent code base | Code in files of entire code base | OS file system |
Integration | Prevent bugs in interactions between components, layers and environment. | Components | Close to real environment |
Performance | Track and maintain certain performance level. | Use case scenario | Real environment |
Integrity, Legacy | Maintan code integrity. In case of legacy -- in particular, not allow obsolete code. | Code base or application framework | Depends on particular test |
Code to cover: must cover any code that stands for application features, which are involved in user interface or represent business value.
Expected coverage: coverage must match either to test cases in test plan or use cases in functional specification.
Code to cover:
- new code must be covered
- legacy code modifications must be covered, but if impossible -- replace with "surrogate" integration tests
Expected coverage:
- must cover all methods (100% lines of code) in a class
- should cover 100% of lines and argument combinations
Code to cover:
- Must cover code that interacts with operating system environment, database or any other 3rd-party system directly
- Must cover code in "Model" layer which interacts with database indirectly
- Must be used as alternative of unit tests in legacy code in the following cases:
- If it is impossible to cover due to high code coupling
- If code had only minor modification and in order to cover it with unit test it would require refactoring of code, not related to the original modification
Expected code coverage:
- New code -- 100% of lines must be covered
- Legacy code modifications -- the affected algorithm must be covered
Code to cover: must cover any code which introduces convention in scope of a particular implementation, violation of which would lead to runtime error.
Expected code coverage: must cover all files applicable to this convention.
For example:
- Scan for all XML-files of certain type and validate them using appropriate XML-schema
- Scan for declarations of templates and invoke "fallback" mechanism to ensure they resolve
Code to cover: must cover all new code files (or whatever qualifies as "new").
Expected code coverage: must cover all applicable files in entire code base.
Code to cover: must cover any formal backwards-incompatible changes on code level.
A failure in legacy test must provide comprehensive explanation of an alternative, if there is any.
Expected code coverage:
- must cover majority of occurrences of the backwards-incompatible change
- should cover 100% of occurrences
Not all changes can be covered. For example, it is possible to scan a file for literals, but it is unfeasible to analyze string concatenation or any other dynamic way of building variable.
Code to cover: must cover resource-intensive areas of the code.
Expected coverage: must match to test plan scenarios.
To help understand requirements better, the glossary below determines criteria for code and tests.
How important, in subjective opinion of a product owner, a feature is. There is always certain code behind a feature.
Grades: critical (must), important (should), less important (could), not important (won't). See also: "MoSCoW Method".
A hypothetical example for an eCommerce application:
- Critical -- checkout, orders
- Important -- catalog
- Less important -- inline content editing
- Not important -- catalog tags
Current position of certain code in life cycle of the software product:
- New -- code that is being developed was never released yet
- Current -- code that is currently released and maintained (supported) according to current standards
- Legacy -- the same as current, but sub-standard (e.g. not covered with tests or doesn't meet other quality criteria)
- Deprecated -- code that is declared as not maintained anymore and will be discontinued soon
- Obsolete -- code that is unusable anymore due to backwards-incompatible changes in the code it is trying to use.
Backwards-incompatible changes on code level -- when code is modified in such a way, that other code, that was potentially using it, will not work with it anymore. For example: modification of method interface (signature, return method), deleted method, deleted class, logically-incompatible change of behavior.
Legacy code which has undergone full or major refactoring, qualifies as new code.
Whether a particular feature implements an algorithm that performs bulk operations, depends on concurrent threads, depends on environment with delays in connection, performs computations.
Grades: resource-intensive, lightweight.
Examples:
- Resource-intensive: writing to database in a transaction, executing an API-call through network, scanning file system.
- Lightweight: performing calculations with data which is already loaded into memory
Which application layer the code belongs to: library, framework, "Model", "Service Layer", "Domain Layer", "View", "Controller", etc.
Quality of a test can be determined by numerous criteria, which a developer can observe in a code review or using specialized tools (for example, code sniffer, Atlassian Clover, etc).
Code coverage percent gives an idea of whether the code is covered by a test and helps to figure out untested scenarios. Depending on type of tests, code coverage is measured in percent of:
- Lines of code
- Methods in class, including combinations of possible arguments
- Test cases of test plan or use cases of a functional specification
Ability of test to fail adequately when an error is introduced to the code it covers. Inaccurate test will work when code fails. Accurate test fails if and only if the code fails.
For example, a method paints a button in red color. A test calls the method, then asserts that color of the button is red. But mistake (inaccuracy) is not asserting that the button was not red in the first place.
A unit test with code coverage of 100% lines and methods is considered accurate (however unit test will never be as accurate as a functional test).
Change-proofness of a test to code changes. A fragile test will fail even with minor but correct changes in the code (and will have to be fixed), while a robust test will fail mostly because of major errors in code. Some test are made intentionally fragile in order to protect some areas of code from even smallest changes.
Example: a fixture may scan code base for all available modules according to certain rule, therefore will return a set of modules regardless of a specific set of installed modules in the system. But if to make the list hard-coded, the fixture will be targeted for a specific edition of modules (AKA product edition).
Time and efforts to implement and maintain the test. Test can be easy to implement, but it can be fragile and require fixing over and over again -- that would be an expensive test. An ideally low cost for a test is when you implement it once and forget about its presence (however it is risking to be inaccurate).