Skip to content

Creating Custom Diffing Strategies

Saleh Yusefnejad edited this page Jul 19, 2021 · 10 revisions

The built-in options/strategies cover a lot of ground and will be enough for many scenarios. However, the diffing engine is pluggable with new strategies to enable custom scenarios. This page shows you have to create your own.

To learn when and how the different types of strategies are used, see the Difference Engine Internals page.

Contents:

Understanding the strategy pipeline

The library comes with a single implementation of the IDiffingStrategy interface: DiffingStrategyPipeline.

The strategy pipeline type allows us to register filters, matchers, and comparers that it will call, one at the time, when its Filter, Match, and Compare methods are called by the diffing engine.

When you register strategies, you have to decide what its category is, i.e. if it is a general or specialized strategy. This affects the order in which the pipeline invokes the strategies in. The invocation rules are as follows:

  1. Strategies within the same category are invoked in the order they are added to the pipeline.
  2. Specialized filters always execute after any generalized filters in the pipeline. That enables them to correct for the generic filters decision.
  3. Specialized matchers always execute before any generalized matchers in the pipeline. This enables the special matchers to handle special matching cases before the more simple generalized matchers process the rest.
  4. Specialized comparers always execute after any generalized comparers in the pipeline. That enables them to correct for the generic comparers decision.

Creating filters strategies

A filter strategy is used during the filtering phase where nodes and attributes are either included or excluded from the comparison.

A filter has to have the signature that is compatible with the FilterStrategy delegate shown below:

delegate FilterDecision FilterStrategy<TSource>(in TSource source, FilterDecision currentDecision);

The strategy you can create must return a FilterDecision, which can be either Keep or Exclude. The recommended structure of a filter looks like this (this is the IgnoreCommentFilter):

public static FilterDecision Filter(in ComparisonSource source, FilterDecision currentDecision)
{
    // Step 1: Check if the current decision already has the outcome the filter would set
    //         if its test passes.
    if (currentDecision.IsExclude()) return currentDecision;

    // Step 2: Inspect the source and decide if the current decision should be changed.
    //         If it should be changed, just return the desired decision.
    if (source.Node.NodeType == NodeType.Comment) return FilterDecision.Exclude;

    // Step 3: If the inspection did not change the current decision, just return it.
    return currentDecision;
}

This is an example of the filter that filters out "diff:"-attributes, i.e. attributes used in control markup to change the behavior of the diffing engine:

private const string DiffAttributePrefix = "diff:";

public static FilterDecision Filter(in AttributeComparisonSource source, FilterDecision currentDecision)
{
    // Step 1: Check if the current decision already has the outcome the filter would set
    //         if its test passes.
    if (currentDecision.IsExclude()) return currentDecision;

    // Step 2: Inspect source and decide if the current decision should be changed.
    //         If it should be changed, just return the desired decision.
    if (source.Attribute.Name.StartsWith(DiffAttributePrefix, StringComparison.OrdinalIgnoreCase))
        return FilterDecision.Exclude;

    // Step 3: If the inspection did not change the current decision, just return it.
    return currentDecision;
}

For a more complex example, see the TextNodeFilter. It removes empty text nodes depending on a custom diff:whitespace attribute and a general option set inside it during construction.

Creating matching strategies

A matching strategy is used to match control and test nodes/attributes with each other for comparison. If a control/test attribute or node is not matched during the matching phase, it is marked as either missing or unexpected, depending on whether it is a control or test attribute/node.

Matchers are a little more involved to create than filters and comparers. They are generally implemented as generators using the yield return keyword to return matches, and must match the following delegates:

// For node matching
delegate IEnumerable<Comparison> MatchStrategy(IDiffContext context, SourceCollection controlSources, SourceCollection testSources)

// For attribute matching
delegate IEnumerable<AttributeComparison> MatchStrategy(IDiffContext context, SourceMap controlSources, SourceMap testSources)

The general behavior for generalized matchers is to attempt to match all unmatched control nodes/attributes with a test node/attribute, but not match it again if it has already been matched before. The simplest example of this is the built-in AttributeNameMatcher, that, as the name suggests, matches attributes of two previously matched elements, by their name:

public static IEnumerable<AttributeComparison> Match(IDiffContext context, SourceMap controlSources, SourceMap testSources)
{
    if (controlSources is null) throw new ArgumentNullException(nameof(controlSources));
    if (testSources is null) throw new ArgumentNullException(nameof(testSources));

    // Note the call to "GetUnmatched()". It returns any source that has not been previously matched.
    foreach (AttributeComparisonSource control in controlSources.GetUnmatched())
    {
        // In this step the testSources map is query for the existence of an attribute
        // with the same name that is NOT previously matched.
        if (testSources.Contains(control.Attribute.Name) && testSources.IsUnmatched(control.Attribute.Name))
            yield return new AttributeComparison(control, testSources[control.Attribute.Name]);
    }

    yield break;
}

The forward searching node matcher is a lot more complex. It only matches nodes that are of the same type, by trying to match nodes at the same index. If a match is found it is returned, otherwise, it tries the node at index + 1, until it reaches the end of the node list and gives up.

public static IEnumerable<Comparison> Match(IDiffContext context,
                                        SourceCollection controlSources,
                                        SourceCollection testSources)
{
    if (controlSources is null) throw new ArgumentNullException(nameof(controlSources));
    if (testSources is null) throw new ArgumentNullException(nameof(testSources));

    // NOTE again the use of GetUnmatched. This makes sure that previously matched nodes are skipped.
    var lastMatchedTestNodeIndex = -1;
    foreach (ComparisonSource control in controlSources.GetUnmatched())
    {
        // The helper method listed below searched the testSources from the last matched index
        // to see if a match can be made.
        var comparison = TryFindMatchingNodes(control, testSources, lastMatchedTestNodeIndex + 1);
        if (comparison.HasValue)
        {
            // If a match is made, it is returned and the last matched index is updated.
            yield return comparison.Value;
            lastMatchedTestNodeIndex = comparison.Value.Test.Index;
        }
    }

    yield break;
}

private static Comparison? TryFindMatchingNodes(in ComparisonSource control, SourceCollection testSources, int startIndex)
{
    // The GetUnmatched methods takes an optional parameter, startIndex,
    // which skips over the unmatched nodes with an index less than start index.
    foreach (ComparisonSource test in testSources.GetUnmatched(startIndex))
    {
        // The check uses the IsSameTypeAs extension method on
        // the INode type to compare the control node with the test node.
        if (control.Node.IsSameTypeAs(test.Node))
        {
            return new Comparison(control, test);
        }
    }
    return null;
}

A specialized matcher, like the CSS selector element matcher, will run before the generalized matchers, and check if for elements with the diff:match attribute. If one is found, it will use the value in the attribute to query the test sources DOM tree for a matching element and return that match.

Creating comparer strategies

A compare strategy is used to evaluate a matched set of nodes or attributes and decide if they are semantically the same or different.

Their result can be either:

  • Same: Use when the two compared nodes or attributes are the same.
  • Different: Use when the two compared nodes or attributes are different.
  • Skip: Use when the comparison should be skipped and any child nodes or attributes skipped as well.

The delegate they have to match is the following, where TComparison is either a Comparison or AttributeComparison.

delegate CompareResult CompareStrategy<TComparison>(in TComparison comparison, CompareResult currentDecision)

Generalized comparers should check the current decision, and not change it if it is either Same or Skip. If it is, a previous comparer has changed it to this. A simple example of this is the built-in element comparer:

public static CompareResult Compare(in Comparison comparison, CompareResult currentDecision)
{
    // First we use the IsSameOrSkip to ensure we do not change a decision made by a more
    // specialized comparer.
    if (currentDecision.IsSameOrSkip()) return currentDecision;

    // Otherwise we perform our verification and return our result.
    return comparison.Control.Node.NodeType == NodeType.Element && comparison.AreNodeTypesEqual()
        ? CompareResult.Same
        : CompareResult.Different;
}

An example of a specialized comparer is the ignore element comparer, which will check if elements have the "diff:ignore" attribute, and set the result to Skip if it does:

private const string DIFF_IGNORE_ATTRIBUTE = "diff:ignore";

public static CompareResult Compare(in Comparison comparison, CompareResult currentDecision)
{
    // Check if the decision is already skipped. If it is, no need to do more work.
    if (currentDecision == CompareResult.Skip) return currentDecision;

    // Use the helper method to check for the "diff:ignore" attribute
    return ControlHasTruthyIgnoreAttribute(comparison)
        ? CompareResult.Skip
        : currentDecision;
}

private static bool ControlHasTruthyIgnoreAttribute(in Comparison comparison)
{
    return comparison.Control.Node is IElement element &&
            element.TryGetAttrValue(DIFF_IGNORE_ATTRIBUTE, out bool shouldIgnore) &&
            shouldIgnore;
}

A simply specialized comparer is the class attribute comparer, which compares the content of the class attribute such that the order of classes defined in its value does not matter, as long as they are all there.

private const string CLASS_ATTRIBUTE_NAME = "class";

public static CompareResult Compare(in AttributeComparison comparison, CompareResult currentDecision)
{
    // Check if current decision is already Same or Skip
    if (currentDecision.IsSameOrSkip()) return currentDecision;
    // Verify that both attributes are the class attribute
    if (!IsClassAttributes(comparison)) return currentDecision;

    // If it is, we get the ClassList from each element and compare them.
    var (ctrlElm, testElm) = comparison.GetAttributeElements();
    if (ctrlElm.ClassList.Length != testElm.ClassList.Length) return CompareResult.Different;

    return ctrlElm.ClassList.All(x => testElm.ClassList.Contains(x))
        ? CompareResult.Same
        : CompareResult.Different;
}

private static bool IsClassAttributes(in AttributeComparison comparison)
{
    return comparison.Control.Attribute.Name.Equals(CLASS_ATTRIBUTE_NAME, StringComparison.OrdinalIgnoreCase) &&
        comparison.Test.Attribute.Name.Equals(CLASS_ATTRIBUTE_NAME, StringComparison.OrdinalIgnoreCase);
}

A more advanced comparer you might look at for inspiration is the built-in TextNodeComparer. It handles all the special cases around whitespace, RegEx, and case insensitive comparison.

Creating strategy libraries/collections

If you are creating a set of strategies and want to make it easy for users to use them during a diffing operation, e.g. via the DiffBuilder type, it is recommended to include an extension method with your library, that handles the registration correctly (e.g. is it a specialized or general strategy).

For example, this extension method adds the standard attribute comparison logic and ensures that all the needed strategies are added in the correct order with the right category:

public static class DiffingStrategyCollectionExtensions
{
    public static IDiffingStrategyCollection AddAttributeComparer(this IDiffingStrategyCollection builder)
    {
        builder.AddMatcher(PostfixedAttributeMatcher.Match, StrategyType.Specialized);
        builder.AddComparer(AttributeComparer.Compare, StrategyType.Generalized);
        builder.AddComparer(IgnoreAttributeComparer.Compare, StrategyType.Specialized);
        return builder;
    }
}