-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Harvester regression testing #91
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great
I would like to see some documentation of these tests when you come to productionise
}; | ||
} | ||
|
||
function compareValues(val1, val2, ignoredKeys, path) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great.
I would get frustrated having to deal with JS type, null and undefined comparisons 😄
|
||
if (results.includes('improvement')) { | ||
return 'improvement'; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like the way you've thought about this.
Having these specific categories really helps the person who sees the results
}, | ||
}); | ||
|
||
async function getData(context, days, limitPerType) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably very unlikely it would happen.
If the Mongo DB schema was to change, I assume this test would fail and then whoever looked it would have to update.
But I doubt it will change this far into the project
@@ -14,19 +14,16 @@ describe('Validation definitions between dev and prod', function () { | |||
//Rest a bit to avoid overloading the servers | |||
afterEach(() => new Promise(resolve => setTimeout(resolve, definition.timeout / 2))) | |||
|
|||
describe('Validation between dev and prod', function () { | |||
before(() => { | |||
loadFixtures().forEach(([url, definition]) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please elaborate the reason why loadFixture is removed? Some definitions are different between Dev and Prod for the following reasons:
- Fixes implemented in Dev, but component is not reharvested on Prod. Eventually when the component is eventually correct on Prod, the fixture can be removed.
- Feature implemented in Dev but not yet deployed to Prod, e.g. LicenseRef.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed the fixtures because their purpose was not clear to me. It's not obvious to me what kind of assurance we expect to get from a test that compares the real (dev) definition to a modified (mocked prod) definition.
If we expect the definitions to be different, due to a bug fix or a new feature, let's assert that in a test, for each definition separately. Mocking and expecting equality, as is done now, seems to work in the opposite direction, glossing over the expected difference.
I admit I might be misunderstanding the purpose of this whole test, so maybe I should leave it as is and create a new one that will do the comparison using the new comparison function? However my initial approach was to modify this test, and I've removed the mocks to make sure I compare the real data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goal of integration tests is to detect any breaking changes. For example, after the recent ScanCode upgrade was completed, the integration tests showed differences: some were improvements, and some were regressions (See clearlydefined/service#1056 (comment)). For the improvements that are not in the Production deployment, vetted improved definition files can be put into fixtures, allowing for a successful run before the Production deployment is updated. Ideally, after the production deployment is updated, the components whose definitions can be fixed by the new deployment can be re-harvested and the fixture can be removed, as the definitions in dev and prod should be in sync again. @elrayle, Feel free to add if anything is missed here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the information @qtomlinson. I've decided to keep this test unchanged and add my structured diff comparison as a separate step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds like this is where you already landed. I'll add my thoughts for clarity. I like having both, fixture based tests and dynamic tests. Fixed catches regressions where we expect the same results every time. Dynamic provides a broader sweep with a goal of increasing confidence or identify systemic problems with a proposed release.
return JSON.parse(data); | ||
} catch (err) { | ||
// If the file doesn't exist, fetch the data and save it to disk | ||
const response = await fetch('https://cosmos-query-function-app.azurewebsites.net/api/getrecentdefinitions?days=1&limit=1'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another source of recently harvested coordinates exists at the status endpoint: https://dev-api.clearlydefined.io/status/recentlycrawled. The response format is:
[
{
"coordinates": "go/golang/github.com%2Fazure/azure-sdk-for-go/v43.3.0+incompatible",
"timestamp": "2024-09-25T22:36:22.017Z"
},
{
"coordinates": "go/golang/github.com%2Fazure%2Fgo-autorest/autorest/v0.11.24",
"timestamp": "2024-09-25T22:10:52.752Z"
}
]
The internal logic is at service/statusService. It utilize application insight, so might be cheaper than cosmo query?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the info, I wasn't aware of this endpoint. However it's not giving us enough data as you pointed out, so we'll have to rely on some other mechanism.
I think querying the CosmosDB should not increase the cost much if at all, because it's only done rarely and touches a limited amount of data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The approach looks good. Another idea just occurred to me: the search query in the getRecentDefinition is based on _meta.updated, which is also the change publication based off. The changed coordinates published hourly can potentially be used to provide the recent coordinates by day and by type (through sorting). Just thought to mention it as an idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I also thought of using the changes notifications mechanism for getting the recent definitions. For the relatively simple use case, as presented here, the data present in the changes notifications would be sufficient.
However I have plans to add some more data for these tests going forward. For example, one of the things we'd be interested in is to see whether we're getting rid of OTHER
and NOASSERTION
license entries when migrating to the ScanCode's LicenseRef
s. For that we'd have to make a more elaborate query, and data from changes notifications mechanism will not be sufficient anymore. We'd have to query the database, using the same approach as presented in this PR (an Azure HTTP Function with CosmosDB access).
Do you have some concerns about the proposed database querying mechanism, @qtomlinson?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes DB query is more extensible. Thanks for the explanation and clarification!
@RomanIakovlev The general approach looks good! It's great to see improvements and enhancements to the definition comparison!
|
d3b5416
to
4361fd6
Compare
@qtomlinson Thanks for your feedback Qing. Regarding the point of including scores into the structured diff, I'd prefer to keep the ignored keys as is for now, mainly for the sake of readability of the output. I agree looking into the scores might be necessary sometimes, but I don't think we need that all the time, only in those special cases when there are other, more significant changes (e.g. copyright detection difference). For those times, we can manually run the diff with another set of keys. It might be worth having the list of ignored keys taken as a workflow parameter for those times, but I'd prefer to add this as a separate change in the future. |
4361fd6
to
601b3a8
Compare
601b3a8
to
f93ec6a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the substantial contribution to testing and raising confidence in release candidates. It appears to me that the one question that required resolution has been met.
@@ -14,19 +14,16 @@ describe('Validation definitions between dev and prod', function () { | |||
//Rest a bit to avoid overloading the servers | |||
afterEach(() => new Promise(resolve => setTimeout(resolve, definition.timeout / 2))) | |||
|
|||
describe('Validation between dev and prod', function () { | |||
before(() => { | |||
loadFixtures().forEach(([url, definition]) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds like this is where you already landed. I'll add my thoughts for clarity. I like having both, fixture based tests and dynamic tests. Fixed catches regressions where we expect the same results every time. Dynamic provides a broader sweep with a goal of increasing confidence or identify systemic problems with a proposed release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work to add more integration tests. Minor edits can be put as a separate PR.
const status = await harvestTillCompletion(components) | ||
const recentDefinitions = await getComponents() | ||
console.info(`Recent definitions: ${recentDefinitions}`) | ||
const status = await harvestTillCompletion(recentDefinitions) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: naming? recentDefinitions, these can be static component coordinates.
} | ||
} | ||
|
||
function compareValues(val1, val2, ignoredKeys, path) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, naming: val2 seems to be the expected/baseline value, based on which regression or improvement is classified.
} | ||
|
||
function isEmpty(value) { | ||
if (value === null || value === undefined) return true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
empty string?
return false | ||
} | ||
} | ||
return true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: similar to return isSuperset(setB, setA)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you suggesting merging the two methods since they are basically the same? I would definitely see that as a follow-on PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely can be a separate PR. The logic seems to be similar to
function isSubset(setA, setB) {
return isSuperset(setB, setA)
}
return handleLargeArrays(val1, val2, path, 'inconclusive') | ||
} | ||
|
||
function handleLargeArrays(val1, val2, path, result) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: naming? other types in addition to arrays are handled here
if (Array.isArray(val1) && Array.isArray(val2)) { | ||
const set1 = new Set(val1.map(item => (typeof item === 'string' ? item.toLowerCase() : JSON.stringify(item)))) | ||
const set2 = new Set(val2.map(item => (typeof item === 'string' ? item.toLowerCase() : JSON.stringify(item)))) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: toLowerCase after stringify?
@@ -0,0 +1,224 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: is this file for documentation purpose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some suggested changes for most of @qtomlinson questions. @RomanIakovlev what do you think about the suggested changes?
} | ||
} | ||
|
||
function compareValues(val1, val2, ignoredKeys, path) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps...
function compareValues(val1, val2, ignoredKeys, path) { | |
function compareValues(devActual, expected, ignoredKeys, path) { |
} | ||
|
||
function isEmpty(value) { | ||
if (value === null || value === undefined) return true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (value === null || value === undefined) return true | |
if (value === null || value === undefined || value === '') return true |
if (Array.isArray(val1) && Array.isArray(val2)) { | ||
const set1 = new Set(val1.map(item => (typeof item === 'string' ? item.toLowerCase() : JSON.stringify(item)))) | ||
const set2 = new Set(val2.map(item => (typeof item === 'string' ? item.toLowerCase() : JSON.stringify(item)))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (Array.isArray(val1) && Array.isArray(val2)) { | |
const set1 = new Set(val1.map(item => (typeof item === 'string' ? item.toLowerCase() : JSON.stringify(item)))) | |
const set2 = new Set(val2.map(item => (typeof item === 'string' ? item.toLowerCase() : JSON.stringify(item)))) | |
if (Array.isArray(val1) && Array.isArray(val2)) { | |
const set1 = new Set(val1.map(item => JSON.stringify(item).toLowerCase()) | |
const set2 = new Set(val2.map(item => JSON.stringify(item).toLowerCase()) |
const recentDefinitions = await getComponents() | ||
console.info(`Recent definitions: ${recentDefinitions}`) | ||
const status = await harvestTillCompletion(recentDefinitions) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const recentDefinitions = await getComponents() | |
console.info(`Recent definitions: ${recentDefinitions}`) | |
const status = await harvestTillCompletion(recentDefinitions) | |
const targetDefinitions = await getComponents() | |
console.info(`Recent definitions: ${targetDefinitions}`) | |
const status = await harvestTillCompletion(targetDefinitions) |
Didn't check to see if the constant is used beyond these lines.
const set1 = new Set(val1.map(item => (typeof item === 'string' ? item.toLowerCase() : JSON.stringify(item)))) | ||
const set2 = new Set(val2.map(item => (typeof item === 'string' ? item.toLowerCase() : JSON.stringify(item)))) | ||
|
||
if (isSuperset(set1, set2)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When comparing arrays in certain cases, superset might not be improvement. For example, the file section in the definitions, the expected outcome is that the exact file paths listed in the production are detected in dev deployment.
Thanks @elrayle and @qtomlinson for the reviews and change suggestions. As those required changes are not blocking, I'd rather merge this PR as it is now and address the changes in a separate one. |
This PR is a work in progress. It aims to fix the #90 by adding a way to automatically harvest and compare definitions between prod and dev environments, summarizing and highlighting inconsistencies for a reviewer.
The overall approach I'm taking is to add more integration tests and modify the existing ones in
tools/integration
. I aim to achieve two improvements over the existing tests:You find more details about my implementation plan below
Making list of tested coordinates dynamic
To add variety to the test data, I plan to add support for querying recently modified definitions from production. Since there's currently no way to query the recently modified data from the ClearlyDefined API, I've added an Azure Function (see
tools/harvester-forwarding/src/functions/getRecentDefinitions.js
in this PR) to query CosmosDB and get a number of definition for each type (npm
,maven
, etc) that were created or updated recently. The function is called over HTTP and takes 2 parameters,days
for how many days back to look, andlimit
for how many records of each type to collect.I plan to use the output of that function in addition to the fixed coordinates list. I think we should have both fixed and dynamic list of coordinates to test, hence I plan to parameterize the existing tests to be able to pick either static or dynamic list in a given test run, and then add support for running existing tests in a matrix fashion for running the same tests on different coordinates lists.
Helping reviewers with definitions' comparison
There are existing tests that compare definitions between prod and dev environments. When results are not identical, the existing tests produce a standard line-by-line diff. It is hard to pinpoint the nature of the change by looking at that diff (whether it is an improvement or regression), and if we want to do this for multiple definitions, we need a better way.
For that I've implemented a comparison logic that classifies the differences between definitions into 3 categories,
regression
,improvement
andinconclusive
. It then groups all the differences into these categories and presents the output in the following way:See details
If you open the details, you may see there are 2 categories of differences:
inconclusive
for pathsdescribed.tools
,licensed.facets.core.discovered.unknown
andlicensed.facets.core.discovered.expressions
improvement
for pathlicensed.facets.core.attribution.parties
, since new attributions were found in dev compared to production.The paths "_meta", "licensed.score", "licensed.toolScore", "described.score", "described.toolScore" were ignored, since changes there are to be expected.
I believe this output format is easier to read and understand. I plan to add a new test into the existing
tools/integration/test/integration/e2e-test-service/definitionTest.js
suite to produce such output for the definitions under test.Please let me know what you think about these changes. If I get a green light I'll proceed with converting this PR into its final version.
CC @qtomlinson @elrayle