Skip to content

Test Data Mining and Analytics

Freddy Vega edited this page Dec 4, 2023 · 2 revisions

Failure Processor Subsystem

(FailureProcessor.pm, failure_processor.pl)

Failure Processor's responsibility is to mine test run results store and add rows to and/or increment test_failure_tbl count with/by each error encountered. The automation_db.test_failure_tbl can be mined to get meaningful and interesting information about the executed test runs; what that information looks like can be determined by the table schema of the test_failure_tbl as well as the query used to get the information. At a minimum, the information should include timestamp, step #, reason code, action to take and count.

How a 'failure hit' is determined

The list below outlines when as well as the different ways a ‘failure hit’ can be determined by the Failure Processor system.

  • after finalize_run method is called from within test script
  • by FAIL result (hit) from post_processor.pl
  • by FAIL result (hit) from failure_processor.pl

It is important that the finalize_run method is called with appropriate parameters before the analysis from the current test run can take place. The finalize_run method is responsible for kicking off the post run processing of all test artifacts that have been collected during the test run and update the test_run_tbl with the PASS or FAIL status this sub-system requires.

To Determine a ‘failure hit’

The following is the process implemented in the framework to determine when we have a ‘failure hit’. All of the test artifacts that are listed here are produced by the framework and are normally housed in the automation database (automation_db) as well as in the directory structure of the host PC (the pc running the tests) in the directory C:\Automation\Results by default.

  1. First query the automation_db.test_run_tbl looking for unprocessed runs (failure_processed = false AND status = FAIL)

  2. Second query the automation_db.test_results_tbl looking for steps (test_id) where test_result is equal to 'not ok'; the result set from the query should include test_timestamp, test_id, test_case_name.

  3. Then search C:\Automation\Results\Test_Logs\ directory for test logs that contain test_timestamp in the filename. These will include ERROR and TEST OUTPUT files. You can decide what types of files you want to search for:

    1 = All

    2 = TEST ERROR

    3 = TEST OUTPUT

  4. The above should return an array of references (@filepaths) where each reference is a pointer to a file path.

  5. Try to determine failure reason by using reason code to action item mapping table.

Processing the Failures

How the failures are processed depends on whether the framework is aware of the current error or not. If it is aware of the error it attempts to mitigate. If it is not aware (i.e. this is the first time this error has been exposed in the framework) then it will just log the raw error.

A more detailed explanation (sequence of steps) for both known and unknown failures / errors is below:

FOR KNOWN FAILURES / ERRORS

  1. Save the current failure count

  2. Increase the test_failure_reasons_tbl.count field by 1 for the current error

  3. Get the list of applicable action items from the test_failure_reason_code_to_action_item_mapping_tbl

FOR UNKNOWN FAILURES / ERRORS

  1. Open each file in \@filepaths

  2. Load its content to a variable

  3. Store it into the automation_db.unknown_test_failure_tbl

For both types of errors, execute each of the mitigation “action items” listed for the error. The above only includes processing of the failures. The actual analysis of the data is covered next and is part of the Failure Analysis subsystem that is incorporated into the overall post processing of executed test runs.

NOTE: Initially all unknown errors are added to the unknown_test_failure_tbl however, once a failure is analyzed then it gets added to the test_failures_tbl for future reference.

Failure Analysis Subsystem

(FailureAnalysis.pm, failure_analysis.pl)

Failure Analysis responsibility is to analyze all artifacts produced by a test run that have been flagged as ERROR HITS by the Failure Processor system.

Analyzing the Failures

The process implemented in the framework to facilitate analysis of the data in the automation database includes the following:

  1. Parse all test artifacts with a specific timestamp and look for errors (e.g. HTTP, Selenium, Environment, Not Ok, etc.).

  2. Uses the test_failure_reasons_tbl to, if possible, determine errors that are known (the results are used in number 4 below).

  3. Identifies and flags 'New' or 'Unknown' failures / errors ('Unknown' failures are added to unknown_failure_reasons_tbl).

  4. Updates test_failure_tbl with required action_item if a match is made.

There is a mapping of test failure reason codes to action items. Reason codes are the what of the failure (e.g. element not found, 404) while action items refer to what needs to be done to potentially mitigate the failure. The potential action items are defined in the test_failure_action_items_tbl. The reason codes reside in the test_failure_reasons_tbl. The mapping between both is defined in the reason_code_action_item_mapping_tbl.

Test Failure Reasons

The reasons that are known to the framework of why a test script has failed as well as its type and how many times it has occurred are tracked in this table. This information, while manually collected initially, will be automatically updated with every known error’s occurrence count. This gives us the capability of analyzing and then separating failures caused by the framework vs application under test errors.

Test Failure Action Items

Each type of error / failure that is found (and therefore now known) should have, shortly after it is discovered, some sort of action plan in place to mitigate for it. Some of these will be human initiated while others will be executed by a computer. The idea behind the action items table is to list possible solutions to the error in question that will be executed one by one.

Test Failure to Reason Code Mapping

This is where the known reasons a test failure can occur are mapped to the known action items either a human or a computer can take in order to mitigate the failure and attempt to re-run the script again. As you can probably see, there is an n:n mapping between Reason Code to Action Item. What this means is that a Reason Code can have more than one Action Item associated with it. As well, an Action Item can be associated with multiple Reason Codes.

Unknown Failure Reasons

When the framework experiences an error that it is not aware of (e.g. its not listed in the test reasons table) it just absorbs the raw error and dumps it into the unknown failure reasons table. This table has to be manually monitored and entries upgraded to reason codes with action items once / when / if a solution to the error is needed and found.