Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update_db_packages - Installation failure should be raised as a critical event (which will stop the test) #7160

Closed
avikivity opened this issue Jan 27, 2024 · 3 comments
Assignees

Comments

@avikivity
Copy link
Member

https://jenkins.scylladb.com/view/staging/job/scylla-staging/job/avi/job/avi-regression-latency-650gb-grow-shrink/8/console

22:37:24  < t:2024-01-27 20:37:18,252 f:cluster.py      l:2118 c:sdcm.cluster_aws     p:WARNING > Node perf-latency-grow-shrink-ubuntu-db-node-42855fcd-1 [54.247.210.237 | 10.4.3.94] (seed: True): All attempts to get ScyllaDB version failed. Looks like there is no ScyllaDB installed.
22:39:21  < t:2024-01-27 20:39:07,508 f:cluster.py      l:2118 c:sdcm.cluster_aws     p:WARNING > Node perf-latency-grow-shrink-ubuntu-db-node-42855fcd-1 [54.247.210.237 | 10.4.3.94] (seed: True): All attempts to get ScyllaDB version failed. Looks like there is no ScyllaDB installed.
22:39:21  < t:2024-01-27 20:39:19,800 f:cluster.py      l:2118 c:sdcm.cluster_aws     p:WARNING > Node perf-latency-grow-shrink-ubuntu-db-node-42855fcd-1 [54.247.210.237 | 10.4.3.94] (seed: True): All attempts to get ScyllaDB version failed. Looks like there is no ScyllaDB installed.

but before it says

22:33:11  < t:2024-01-27 20:33:06,591 f:es.py           l:35   c:sdcm.es              p:INFO  > BODY: {'test_details': {'sct_git_commit': '698fbf2ed1e49c0ec18871461fa2cb95d5b209d1', 'job_name': 'scylla-staging/avi/avi-regression-latency-650gb-grow-shrink', 'job_url': 'https://jenkins.scylladb.com/job/scylla-staging/job/avi/job/avi-regression-latency-650gb-grow-shrink/8/', 'start_host': 'ip-34-245-175-36', 'test_duration': 3000, 'start_time': 1706387586, 'grafana_snapshots': [], 'grafana_screenshots': [], 'grafana_annotations': [], 'prometheus_data': '', 'test_id': '42855fcd-c9b1-47d0-b971-67032cb7f7b6', 'log_files': {}, 'test_name': 'performance_regression_test.PerformanceRegressionTest.test_latency_mixed_with_nemesis'}, 'setup_details': {'db_type': 'scylla', 'test_duration': 3000, 'ip_ssh_connections': 'private', 'scylla_repo': '', 'manager_version': '3.2', 'manager_scylla_backend_version': '2022', 'scylla_repo_loader': 'https://s3.amazonaws.com/downloads.scylladb.com/rpm/centos/scylla-4.6.repo', 'round_robin': True, 'append_scylla_args': '--blocked-reactor-notify-ms 5 --abort-on-lsa-bad-alloc 1 --abort-on-seastar-bad-alloc --abort-on-internal-error 1 --abort-on-ebadf 1', 'append_scylla_setup_args': '', 'db_nodes_shards_selection': 'default', 'scylla_linux_distro': 'ubuntu-focal', 'scylla_linux_distro_loader': 'centos', 'ssh_transport': 'libssh2', 'system_auth_rf': 3, 'monitor_branch': 'branch-4.2', 'space_node_threshold': 644245094, 'cluster_health_check': True, 'add_node_cnt': 1, 'nemesis_class_name': 'GrowShrinkClusterNemesis', 'nemesis_during_prepare': True, 'nemesis_interval': 30, 'nemesis_add_node_cnt': 3, 'nemesis_filter_seeds': True, 'seeds_selector': 'first', 'seeds_num': 1, 'instance_provision': 'spot', 'execute_post_behavior': False, 'post_behavior_db_nodes': 'destroy', 'post_behavior_loader_nodes': 'destroy', 'post_behavior_monitor_nodes': 'keep-on-failure', 'post_behavior_k8s_cluster': 'keep-on-failure', 'cloud_credentials_path': '', 'use_cloud_manager': False, 'cloud_prom_bearer_token': '', 'backtrace_decoding': False, 'print_kernel_callstack': True, 'update_db_packages': '/tmp/download_from_cloud/17b66becb94d667a7e801b0a42223530/', 'logs_transport': 'syslog-ng', 'rsyslog_imjournal_rate_limit_interval': 60, 'rsyslog_imjournal_rate_limit_burst': 50000, 'store_perf_results': True, 'email_subject_postfix': 'tablets-demo-1', 'collect_logs': False, 'hinted_handoff': 'enabled', 'server_encrypt': False, 'client_encrypt': False, 'scylla_encryption_options': '', 'loader_swap_size': 10240, 'monitor_swap_size': 8192, 'compaction_strategy': 'SizeTieredCompactionStrategy', 'use_preinstalled_scylla': True, 'alternator_enforce_authorization': False, 'alternator_access_key_id': '', 'alternator_secret_access_key': '', 'dynamodb_primarykey_type': 'HASH', 'store_cdclog_reader_stats_in_es': False, 'region_aware_loader': False, 'stop_test_on_stress_failure': True, 'stress_cdc_log_reader_batching_enable': True, 'use_legacy_cluster_init': False, 'internode_encryption': 'all', 'use_mgmt': True, 'manager_prometheus_port': 5090, 'scylla_mgmt_pkg': '', 'skip_download': False, 'authenticator_user': '', 'authenticator_password': '', 'n_test_oracle_db_nodes': 1, 'gemini_seed': 0, 'oracle_scylla_version': '4.6.9', 'append_scylla_args_oracle': '--enable-cache false', 'stress_multiplier': 1, 'stress_multiplier_w': 1, 'stress_multiplier_r': 1, 'stress_multiplier_m': 1, 'keyspace_num': 1, 'cs_user_profiles': [], 'cs_duration': '50m', 'batch_size': 1, 'pre_create_schema': False, 'user_profile_table_count': 1, 'cassandra_stress_population_size': 1000000, 'cassandra_stress_threads': 1000, 'new_scylla_repo': '', 'new_version': '', 'upgrade_node_packages': '', 'test_sst3': False, 'new_introduced_pkgs': '', 'recover_system_tables': False, 'scylla_version': '', 'test_upgrade_from_installed_3_1_0': False, 'target_upgrade_version': '', 'stress_cdclog_reader_cmd': 'cdc-stressor -stream-query-round-duration 30s', 'jepsen_scylla_repo': 'https://github.com/jepsen-io/scylla.git', 'jepsen_test_cmd': ['test-all -w cas-register --concurrency 10n', 'test-all -w counter --concurrency 10n', 'test-all -w cmap --concurrency 10n', 'test-all -w cset --concurrency 10n', 'test-all -w write-isolation --concurrency 10n', 'test-all -w list-append --concurrency 10n', 'test-all -w wr-register --concurrency 10n'], 'jepsen_test_count': 1, 'jepsen_test_run_policy': 'all', 'max_events_severities': '', 'mgmt_docker_image': '', 'k8s_log_api_calls': False, 'k8s_deploy_monitoring': False, 'k8s_minio_storage_size': '10Gi', 'k8s_n_scylla_pods_per_cluster': 3, 'k8s_loader_run_type': 'dynamic', 'k8s_tenants_num': 1, 'k8s_scylla_utils_docker_image': '', 'scylla_rsyslog_setup': False, 'backup_bucket_region': 'us-east-1', 'events_limit_in_email': 10, 'data_volume_disk_num': 0, 'data_volume_disk_type': 'gp2', 'data_volume_disk_size': 500, 'data_volume_disk_iops': 10000, 'gce_pd_standard_disk_size_db': 0, 'install_mode': 'repo', 'run_db_node_benchmarks': False, 'nemesis_multiply_factor': 6, 'scylla_apt_keys': ['17723034C56D4B19', '5E08FBD8B5D6EC9C', 'D0A112E067426AB2'], 'raid_level': 0, 'bare_loaders': False, 'nemesis_exclude_disabled': True, 'aws_fallback_to_next_availability_zone': False, 'enable_argus': True, 'stress_image': {'ndbench': 'scylladb/hydra-loaders:ndbench-jdk8-20210720', 'ycsb': 'scylladb/hydra-loaders:ycsb-jdk8-20220918', 'nosqlbench': 'scylladb/hydra-loaders:nosqlbench-4.15.49', 'cassandra-stress': '', 'scylla-bench': 'scylladb/hydra-loaders:scylla-bench-v0.1.15', 'gemini': 'scylladb/hydra-loaders:gemini-1.7.8', 'alternator-dns': 'scylladb/hydra-loaders:alternator-dns-0.1', 'cdc-stresser': 'scylladb/hydra-loaders:cdc-stresser-20210630', 'kcl': 'scylladb/hydra-loaders:kcl-jdk8-20210526-ShardSyncStrategyType-PERIODIC', 'harry': 'scylladb/hydra-loaders:cassandra-harry-jdk11-20220816'}, 'service_level_shares': [1000], 'use_hdr_cs_histogram': True, 'stop_on_hw_perf_failure': False, 'custom_es_index': '', 'run_fullscan': [], 'stress_step_duration': '15m', 'use_placement_group': False, 'spot_max_price': 0.6, 'instance_provision_fallback_on_demand': False, 'region_name': ['eu-west-1'], 'user_credentials_path': '~/.ssh/scylla-qa-ec2', 'instance_type_loader': 'c5.2xlarge', 'instance_type_monitor': 't3.large', 'ami_id_loader': 'ami-042cf1bc21e30ce60', 'ami_id_monitor': 'ami-026e72e4e468afa7b', 'availability_zone': 'a', 'root_disk_size_monitor': 50, 'root_disk_size_db': 30, 'root_disk_size_loader': 30, 'ami_db_scylla_user': 'scyllaadm', 'ami_loader_user': 'centos', 'ami_monitor_user': 'ubuntu', 'aws_instance_profile_name_db': 'qa-scylla-manager-backup-instance-profile', 'ami_id_db_scylla': 'ami-06ed37258d7c8c408', 'ami_id_db_oracle': '', 'backup_bucket_backend': 's3', 'backup_bucket_location': 'manager-backup-tests-us-east-1', 'prepare_write_cmd': ["cassandra-stress write no-warmup cl=ALL n=162500000 -schema 'replication(strategy=NetworkTopologyStrategy,replication_factor=3)' -mode cql3 native -rate threads=200 -col 'size=FIXED(128) n=FIXED(8)' -pop seq=1..162500000", "cassandra-stress write no-warmup cl=ALL n=162500000 -schema 'replication(strategy=NetworkTopologyStrategy,replication_factor=3)' -mode cql3 native -rate threads=200 -col 'size=FIXED(128) n=FIXED(8)' -pop seq=162500000..325000000", "cassandra-stress write no-warmup cl=ALL n=162500000 -schema 'replication(strategy=NetworkTopologyStrategy,replication_factor=3)' -mode cql3 native -rate threads=200 -col 'size=FIXED(128) n=FIXED(8)' -pop seq=325000000..487500000", "cassandra-stress write no-warmup cl=ALL n=162500000 -schema 'replication(strategy=NetworkTopologyStrategy,replication_factor=3)' -mode cql3 native -rate threads=200 -col 'size=FIXED(128) n=FIXED(8)' -pop seq=487500000..650000000"], 'n_db_nodes': 3, 'n_loaders': 4, 'n_monitor_nodes': 1, 'instance_type_db': 'i3en.2xlarge', 'nemesis_sequence_sleep_between_ops': 10, 'user_prefix': 'perf-latency-grow-shrink-ubuntu', 'ami_id_db_scylla_desc': '', 'use_prepared_loaders': True, 'append_scylla_yaml': 'tablets_initial_scale_factor: 100\nexperimental_features:\n  - tablets\n  - consistent-topology-changes\n', 'config_files': ['test-cases/performance/perf-regression-latency-650gb-grow-shrink.yaml', 'configurations/tablets.yaml'], 'cluster_backend': 'aws', 'test_id': '42855fcd-c9b1-47d0-b971-67032cb7f7b6', 'gce_datacenter': ['us-east1'], 'k8s_scylla_operator_helm_repo': 'https://storage.googleapis.com/scylla-operator-charts/latest', 'k8s_scylla_operator_chart_version': 'latest', 'user_data_format_version': '3', 'ami_tags_db_scylla': [{'branch': 'branch-2024.1', 'build_id': '9', 'scylla_build_sha_id': '05fd1aae7596802712754674280a0e6456758c11', 'scylla-build-sha-id': '0ba0261c79efc8ac49037e881e88b721af703a29', 'scylla_machine_image_version': '2024.1.0~rc4-20240118.e2fe9c1-1', 'Name': 'scylla-enterprise-2024.1.0-rc4-x86_64-2024-01-18T04-22-17', 'build_mode': 'release', 'scylla_jmx_version': '2024.1.0~rc4-0.20240117.0ba0261c79ef-1', 'build-tag': 'jenkins-enterprise-2024.1-promote-release-10', 'scylla_python3_version': '2024.1.0~rc4-0.20240117.0ba0261c79ef-1', 'scylla_tools_version': '2024.1.0~rc4-0.20240117.0ba0261c79ef-1', 'user_data_format_version': '3', 'operating_system': 'ubuntu22.04', 'scylla_version': '2024.1.0~rc4-0.20240117.0ba0261c79ef-1', 'arch': 'x86_64', 'build-id': '10', 'kernel-version': '5.15.0-1051-aws', 'operating-system': 'Ubuntu:22.04', 'scylla-git-commit': '0ba0261c79efc8ac49037e881e88b721af703a29', 'build_tag': 'jenkins-enterprise-2024.1-ami-9', 'owner_id': '158855661827'}], 'packages_updated': True, 'cpu_platform': 'UNKNOWN', 'db_cluster_node_details': {}, 'sysctl_output': []}, 'versions': {'scylla-jmx': {'version': '2024.1.0~rc4', 'date': '20240117', 'commit_id': '0ba0261c79ef', 'build_id': ''}, 'scylla-server': {'version': '2024.2.0~dev', 'date': '20240127', 'commit_id': '5f89014c0f87', 'build_id': '', 'build_mode': None}, 'scylla-tools': {'version': '2024.1.0~rc4', 'date': '20240117', 'commit_id': '0ba0261c79ef', 'build_id': ''}}, 'results': {'throughput': {}, 'latency_read_99': {}, 'latency_write_99': {}}, 'nemesis': {}, 'errors': [], 'coredumps': {}}

so obviously it knows something about the version

@avikivity
Copy link
Member Author

I found something - the installation failed (but there was no report in the logs)

@avikivity avikivity removed their assignment Jan 28, 2024
@avikivity
Copy link
Member Author

The installation failed since I forgot to include scylla-tools-java etc packages.

The error messages from the installation should be shown, and the run should terminate immediately after the failed installation.

@roydahan roydahan changed the title can't find installed scylla even when it is installed update_db_packages - Installation failure should be raised as a critical event (which will stop the test) Mar 4, 2024
@fruch fruch assigned dimakr and unassigned soyacz May 6, 2024
@fruch
Copy link
Contributor

fruch commented May 6, 2024

We should raise an event in case of failure in this step

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants