Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract some product tests into a separate suites #14818

Merged
merged 7 commits into from
Nov 16, 2022

Conversation

nineinchnick
Copy link
Member

@nineinchnick nineinchnick commented Oct 28, 2022

Description

Suite 6 and 7 total run time is nearly 1 hour so split them up. Check the running times at the end of the product tests step at https://github.com/trinodb/trino/actions/runs/3343271418/jobs/5537463265

Non-technical explanation

n/a

Release notes

(x) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text:

@cla-bot cla-bot bot added the cla-signed label Oct 28, 2022
@nineinchnick nineinchnick changed the title Extract Kafka product tests into a separate suite Extract some product tests into a separate suites Oct 28, 2022
@nineinchnick nineinchnick force-pushed the extract-kafka-pt-suite branch 3 times, most recently from 6666351 to 85fe883 Compare November 2, 2022 15:48
@nineinchnick nineinchnick force-pushed the extract-kafka-pt-suite branch from 85fe883 to c622af0 Compare November 3, 2022 08:31
@nineinchnick nineinchnick requested a review from hashhar November 4, 2022 08:08
@nineinchnick
Copy link
Member Author

nineinchnick commented Nov 7, 2022

This branch:
image

master:
image

We can see that suite 6 and 7 improved over 20 minutes, getting them down to about 30 minutes.

Suite 1 and 2 got improved only slightly, by ~10 minutes, getting them just below an hour. If this gets approved, we can continue splitting more tests out of these suites, but I don't want to add more commits in this PR.

@MiguelWeezardo
Copy link
Member

This branch: image

master: image

We can see that suite 6 and 7 improved over 20 minutes, getting them down to about 30 minutes.

Suite 1 and 2 got improved only slightly, by ~10 minutes, getting them just below an hour. If this gets approved, we can continue splitting more tests out of these suites, but I don't want to add more commits in this PR.

I wonder why both branches have 24 PT job count. Shouldn't there be more jobs executed on this branch now that new suites have been created?

@nineinchnick
Copy link
Member Author

I wonder why both branches have 24 PT job count. Shouldn't there be more jobs executed on this branch now that new suites have been created?

This is a coincidence, master runs additional jobs with secrets, like Azure and GCP tests.

@MiguelWeezardo MiguelWeezardo self-requested a review November 7, 2022 10:51
@nineinchnick
Copy link
Member Author

@hashhar PTAL

@nineinchnick nineinchnick force-pushed the extract-kafka-pt-suite branch from c622af0 to 8168180 Compare November 10, 2022 09:22
@hashhar
Copy link
Member

hashhar commented Nov 10, 2022

What is the affect on wall-time?

cc: @findepi regarding the direction (not actual changes) since I know you have opinions on this.

@hashhar hashhar requested review from findepi and removed request for hashhar November 10, 2022 10:28
@nineinchnick
Copy link
Member Author

What is the affect on wall-time?

The overhead of a single PT job is about 50s, if you sum all steps except Product Tests:
image

This PR adds 9 new jobs. So I guess this adds ~10 minutes. But I don't have any exact statistics what's the total run time for all PT jobs and which ones we run most often since we don't run all of them in every PR.

Maybe this will make it easier to find failures in PTs if the test suites are smaller.

@nineinchnick
Copy link
Member Author

BTW this change was requested by @electrum

@MiguelWeezardo
Copy link
Member

This might also let us avoid running suite6 and suite7 for PRs with only those plugin changes.

@nineinchnick
Copy link
Member Author

@findepi PTAL

{
return ImmutableList.of(
testOnEnvironment(EnvMultinode.class)
.withGroups("configured_features", "tpch")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when do i write "configured_features" ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you want to ensure that the test environment (EnvMultinode.class) properly defines all configured features and doesn't have any additional ones (like extra catalogs). If that would happen, if there are tests that rely on those features, they could be skipped in some PRs.

Since we reuse the same environments in multiple suites if some don't run tests from the configured_features group, nothing terrible would happen, but it's good to keep it in every suite for consistency. I hope that's why you noticed it's missing.

@findepi
Copy link
Member

findepi commented Nov 14, 2022

cc: @findepi regarding the direction (not actual changes) since I know you have opinions on this.

I like the direction

@nineinchnick nineinchnick force-pushed the extract-kafka-pt-suite branch from 8168180 to 0facaf3 Compare November 14, 2022 13:18
Copy link
Member

@hashhar hashhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM % can we verify we run same number of tests?

In past I've noticed such refactor end up running same test group as part of multiple suites (which is ok but reverse is also possible - some group not being run at all).

We probably write surefire reports for PTs which would include test counts (with fully qualified names as well) so we can disable impact analysis, do a run before this change, another with this and see what we see?

@nineinchnick
Copy link
Member Author

We don't have functions to parse XML in Trino, but I used regexes to extract test class names from test reports stored as GitHub run artifacts:

I compared these runs:

with classes as (
  select run_id, array_agg(distinct test_class order by test_class) as classes
  from artifacts
  cross join unnest(regexp_extract_all(from_utf8(contents), 'class name="(.*)"', 1)) as c(test_class)
  where run_id in (3461805232, 3457214593) and name like 'test report pt %'
  group by run_id
)
select array_join(array_except(
  (select classes from classes where run_id = 3461805232),
  (select classes from classes where run_id = 3457214593)), U&'\000A') as missing;

which gives:

                        missing                        
-------------------------------------------------------
 io.trino.tempto.array_functions                       
 io.trino.tempto.binary_functions                      
 io.trino.tempto.hive_tpch                             
 io.trino.tempto.horology_functions                    
 io.trino.tempto.json_functions                        
 io.trino.tempto.map_functions                         
 io.trino.tempto.math_functions                        
 io.trino.tempto.regex_functions                       
 io.trino.tempto.string_functions                      
 io.trino.tempto.url_functions                         
 io.trino.tests.product.TestFunctions                  
 io.trino.tests.product.TestImpersonation              
 io.trino.tests.product.teradata.TestTeradataFunctions 
(1 row)

Most of these look like they should be in suite-functions, I'm looking into it.

@nineinchnick
Copy link
Member Author

Whoops, I reversed the ids in array_except(). So it looks like the tests in my previous comment are the ones we're not running on master right now.

The tests I'm missing in this branch are:

                          missing                          
-----------------------------------------------------------
 io.trino.tests.product.deltalake.TestDeltaLakeGcs         
 io.trino.tests.product.hive.TestAbfsSyncPartitionMetadata 
(1 row)

@nineinchnick
Copy link
Member Author

And the ones above require secrets, so it makes sense I'm not running them in my fork.

@nineinchnick
Copy link
Member Author

False alarm, I had a bug in the connector, it was skipping zip file entries when the size was unknown. I made sure I'm getting all artifacts and reports and tests missing here are:

                                         missing                                          
------------------------------------------------------------------------------------------
 io.trino.tests.product.deltalake.TestDatabricksWithGlueMetastoreCleanUp                  
 io.trino.tests.product.deltalake.TestDeltaLakeDatabricksCheckpointsCompatibility         
 io.trino.tests.product.deltalake.TestDeltaLakeDatabricksCompatibilityCleanUp             
 io.trino.tests.product.deltalake.TestDeltaLakeDatabricksCreateTableAsSelectCompatibility 
 io.trino.tests.product.deltalake.TestDeltaLakeDatabricksCreateTableCompatibility         
 io.trino.tests.product.deltalake.TestDeltaLakeDatabricksPartitioningCompatibility        
 io.trino.tests.product.deltalake.TestDeltaLakeGcs                                        
 io.trino.tests.product.deltalake.TestDeltaLakeWriteDatabricksCompatibility               
 io.trino.tests.product.hive.TestAbfsSyncPartitionMetadata                                
 io.trino.tests.product.iceberg.TestIcebergOptimize                                       
 io.trino.tests.product.iceberg.TestIcebergPartitionEvolution                             
 io.trino.tests.product.iceberg.TestIcebergProcedureCalls                                 
 io.trino.tests.product.iceberg.TestIcebergSparkCompatibility                             
 io.trino.tests.product.iceberg.TestIcebergSparkDropTableCompatibility                    
(1 row)

Suite 6 total run time is near 1 hour so split it up
Suite 6 total run time is near 1 hour so split it up
Suite 6 total run time is near 1 hour so split it up
Suite 7 total run time is near 1 hour so split it up
Suite 7 total run time is near 1 hour so split it up
Suite 1 total run time is over 1 hour so split it up
Suite 2 total run time is over 1 hour so split it up
@nineinchnick nineinchnick force-pushed the extract-kafka-pt-suite branch from 0facaf3 to ed69894 Compare November 15, 2022 10:57
@nineinchnick
Copy link
Member Author

Iceberg tests are missing because the job failed. I think I saw it green before, so I rebased and I'm running the CI again.

@nineinchnick
Copy link
Member Author

I compared it again with https://github.com/trinodb/trino/actions/runs/3470415629 and there are no extra tests and we're missing only ones that require secrets:

                                         missing                                          
------------------------------------------------------------------------------------------
 io.trino.tests.product.deltalake.TestDatabricksWithGlueMetastoreCleanUp                  
 io.trino.tests.product.deltalake.TestDeltaLakeDatabricksCheckpointsCompatibility         
 io.trino.tests.product.deltalake.TestDeltaLakeDatabricksCompatibilityCleanUp             
 io.trino.tests.product.deltalake.TestDeltaLakeDatabricksCreateTableAsSelectCompatibility 
 io.trino.tests.product.deltalake.TestDeltaLakeDatabricksCreateTableCompatibility         
 io.trino.tests.product.deltalake.TestDeltaLakeDatabricksPartitioningCompatibility        
 io.trino.tests.product.deltalake.TestDeltaLakeGcs                                        
 io.trino.tests.product.deltalake.TestDeltaLakeWriteDatabricksCompatibility               
 io.trino.tests.product.hive.TestAbfsSyncPartitionMetadata                                
(1 row)

@hashhar hashhar merged commit cc764ed into trinodb:master Nov 16, 2022
@github-actions github-actions bot added this to the 404 milestone Nov 16, 2022
@nineinchnick nineinchnick deleted the extract-kafka-pt-suite branch November 17, 2022 08:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

4 participants