Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some issues with time based parallel for AQA test in current adoptium jenkins environments #5368

Closed
sophia-guo opened this issue May 31, 2024 · 0 comments

Comments

@sophia-guo
Copy link
Contributor

sophia-guo commented May 31, 2024

Related with adoptium/ci-jenkins-pipelines#1032

  • All aqa-test jobs( with all categories , all levels) will generate a **_testList_0 and run the job **_testList_0. Even if the test job can be finished in 120mins the **_testList_0 job will be generated and run. So test jobs number will be at lease doubled. Also all tests related archived files will also be at lease doubled. For example the win32 sanity job according to the output only one list is good, but instead of running the test directly it still generate and trigger a child job https://ci.adoptium.net/job/Test_openjdk17_hs_sanity.openjdk_x86-32_windows_testList_0/1/
02:08:18  Total number of tests searched: 36
02:08:18  Number of test durations found: 36
02:08:18  Top slowest tests: 
02:08:18  	14m30s jdk_util_2
02:08:18  	08m25s jdk_security2_2
02:08:18  	07m51s jdk_lang_2
02:08:18  ====================================================================================
02:08:18  
02:08:18  Test target is split into 1 lists.
02:08:18  Reducing estimated test running time from 51m18s to 51m18s.
02:08:18  
02:08:18  -------------------------------------testList_0-------------------------------------
02:08:18  Number of tests: 36
02:08:18  Estimated running time: 51m18s

So job number is doubled and test results will be archived twice.
Which also makes job the win32 sanity job is more like a parents job to setup or generate testList, trigger child and archive child jobs test results than a test job itself.

  • Due to the machines limitation, this might make the tests running longer. Also take the win32 sanity job for example. Running directly without this time based changes job 409 took 1 hr 33 min. With the change job 410 right after job 409 took 5 hr 32 min due to testList_0 in queue for 4h 14min. This is a job I random picked, it might be a common issue especially during the release all jdk versions with all levels and all categories test jobs are triggered.

  • How can the test data be more accurate? Take this job for example all three parallel jobs takes around 30 mins ( including setup and waiting). So without parallel the job shouldn't take more 120mins, which means no parallel is needed, but the data says it will take longer. The former build in jenkins says a successful build takes 1hr27mins https://ci.adoptium.net/view/Test_openjdk/job/Test_openjdk21_hs_sanity.openjdk_x86-64_linux/206/

  • How the maximum parallel number is set? Seems maximum is the number of available agent, which means one test may take all available agents?

Screenshot 2024-05-31 at 5 44 05 PM
  • Would it be possible to also consider the build time cost ( build target in build.xml)? Checked some build and seems normally build time is not a issue compared with queue time.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Status: Done
Development

No branches or pull requests

1 participant