Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve ability to monitor running/canceled transactions #210

Closed
miabbott opened this issue Feb 1, 2016 · 6 comments
Closed

Improve ability to monitor running/canceled transactions #210

miabbott opened this issue Feb 1, 2016 · 6 comments

Comments

@miabbott
Copy link
Member

miabbott commented Feb 1, 2016

With the introduction of daemon mode, it is possible to start a transaction like rpm-ostree deploy and then cancel out of the running transaction with the likes of ctrl-c. This can leave the system in a state where future transactions run into the following error:

error: System transaction in progress

Unfortunately, there is no way to observe the state of the transaction in progress and the user is forced to repeatedly try the future transaction until the daemon is ready again.

@mbarnes
Copy link
Contributor

mbarnes commented Feb 1, 2016

For starters, a different error message for cancellation cleanup would be helpful.

error: Cleaning up a cancelled system transaction, please try again

If we can make whatever the daemon is getting stuck on more responsive to cancellation then maybe we can sidestep the lack of status information in this state.

@miabbott
Copy link
Member Author

miabbott commented Feb 4, 2016

Debugging a hang with rpm-ostree deploy with @mbarnes, I tried the following:

  • used subman to register to Red Hat production (remote points to cdn.redhat.com)
  • env OSTREE_DEBUG_HTTP=1 /usr/libexec/rpm-ostreed
  • rpm-ostree deploy 7.2.1
  • fire up gdb from rhel-tools container

Stack trace after attaching to pidof rpm-ostreed

(gdb) thread apply all backtrace

Thread 6 (Thread 0x7f95cb38d700 (LWP 2166)):
#0  0x00007f95d2609bcd in poll () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007f95d2d5aa74 in g_main_context_poll (priority=2147483647, n_fds=2, fds=0x7f95c40008e0, 
    timeout=-1, context=0x86a280) at gmain.c:4074
#2  g_main_context_iterate (context=context@entry=0x86a280, block=block@entry=1, 
    dispatch=dispatch@entry=1, self=<optimized out>) at gmain.c:3774
#3  0x00007f95d2d5ab9c in g_main_context_iteration (context=0x86a280, may_block=may_block@entry=1)
    at gmain.c:3840
#4  0x00007f95d2d5abd9 in glib_worker_main (data=<optimized out>) at gmain.c:5587
#5  0x00007f95d2d804e5 in g_thread_proxy (data=0x86b000) at gthread.c:764
#6  0x00007f95d28e6dc5 in start_thread (arg=0x7f95cb38d700) at pthread_create.c:308
#7  0x00007f95d261421d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Thread 5 (Thread 0x7f95ca38b700 (LWP 2168)):
#0  0x00007f95d2609bcd in poll () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007f95d2d5aa74 in g_main_context_poll (priority=2147483647, n_fds=3, fds=0x7f95c00010c0, 
    timeout=-1, context=0x7f95bc00e4a0) at gmain.c:4074
#2  g_main_context_iterate (context=0x7f95bc00e4a0, block=block@entry=1, dispatch=dispatch@entry=1, 
    self=<optimized out>) at gmain.c:3774
#3  0x00007f95d2d5adba in g_main_loop_run (loop=0x7f95bc00e430) at gmain.c:3973
#4  0x00007f95d43162f6 in gdbus_shared_thread_func (user_data=0x7f95bc00e470) at gdbusprivate.c:273
#5  0x00007f95d2d804e5 in g_thread_proxy (data=0x86b6d0) at gthread.c:764
#6  0x00007f95d28e6dc5 in start_thread (arg=0x7f95ca38b700) at pthread_create.c:308
#7  0x00007f95d261421d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Thread 4 (Thread 0x7f95c9255700 (LWP 2192)):
#0  0x00007f95d2609bcd in poll () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007f95d2d5aa74 in g_main_context_poll (priority=2147483647, n_fds=1, fds=0x7f95b4008600, 
    timeout=-1, context=0x7f95b4007d60) at gmain.c:4074
#2  g_main_context_iterate (context=context@entry=0x7f95b4007d60, block=block@entry=1, 
    dispatch=dispatch@entry=1, self=<optimized out>) at gmain.c:3774
#3  0x00007f95d2d5ab9c in g_main_context_iteration (context=context@entry=0x7f95b4007d60, 
    may_block=may_block@entry=1) at gmain.c:3840
#4  0x00007f95d480aaa5 in _ostree_fetcher_request_uri_to_membuf (fetcher=0x8a6f80, uri=0x896f80, 
    uri@entry=0x7f95c9254840, add_nul=add_nul@entry=1, allow_noent=allow_noent@entry=0, 
    out_contents=out_contents@entry=0x7f95c9254698, max_size=max_size@entry=10485760, 
    cancellable=0x89f850, cancellable@entry=0x0, error=error@entry=0x7f95c9254c90)
    at src/libostree/ostree-fetcher.c:1129
#5  0x00007f95d480bec4 in fetch_uri_contents_membuf_sync (pull_data=0x7f95c9254840, 
    pull_data=0x7f95c9254840, error=<optimized out>, cancellable=<optimized out>, 
    out_contents=0x7f95c9254698, allow_noent=0, add_nul=1, uri=0x7f95c9254840)
    at src/libostree/ostree-repo-pull.c:357
#6  fetch_uri_contents_utf8_sync (pull_data=pull_data@entry=0x7f95c9254840, uri=uri@entry=0x896f80, 
    out_contents=out_contents@entry=0x7f95c92547f0, cancellable=cancellable@entry=0x89f850, 
    error=error@entry=0x7f95c9254c90) at src/libostree/ostree-repo-pull.c:381
#7  0x00007f95d480e6e5 in load_remote_repo_config (error=0x7f95c9254c90, cancellable=0x89f850, 
    out_keyfile=<synthetic pointer>, pull_data=0x7f95c9254840) at src/libostree/ostree-repo-pull.c:1368
#8  ostree_repo_pull_with_options (self=self@entry=0x883190, 
    remote_name_or_baseurl=0x7f95b4005240 "rhel-atomic-host-ostree", options=<optimized out>, 
    progress=progress@entry=0x7f95b4004800, cancellable=cancellable@entry=0x89f850, 
    error=error@entry=0x7f95c9254c90) at src/libostree/ostree-repo-pull.c:1945
#9  0x000000000040b2a8 in rpmostreed_repo_pull_ancestry (repo=0x883190, 
    refspec=refspec@entry=0x7f95b4005890 "rhel-atomic-host-ostree:rhel-atomic-host/7/x86_64/standard", 
---Type <return> to continue, or q <return> to quit---
    visitor=visitor@entry=0x40a930 <version_visitor>, visitor_data=visitor_data@entry=0x7f95c9254ba0, 
    progress=progress@entry=0x7f95b4004800, cancellable=cancellable@entry=0x89f850, 
    error=error@entry=0x7f95c9254c90) at src/daemon/rpmostreed-utils.c:295
#10 0x000000000040b584 in rpmostreed_repo_lookup_version (repo=<optimized out>, 
    refspec=refspec@entry=0x7f95b4005890 "rhel-atomic-host-ostree:rhel-atomic-host/7/x86_64/standard", 
    version=0x7f95b4003b90 "7.2.1", progress=progress@entry=0x7f95b4004800, 
    cancellable=cancellable@entry=0x89f850, out_checksum=out_checksum@entry=0x7f95c9254c38, 
    error=error@entry=0x7f95c9254c90) at src/daemon/rpmostreed-utils.c:401
#11 0x000000000041e041 in deploy_transaction_execute (transaction=0x8781a0, cancellable=0x89f850, 
    error=0x7f95c9254c90) at src/daemon/rpmostreed-transaction-types.c:1057
#12 0x000000000040edb4 in transaction_execute_thread (task=0x873ac0, source_object=0x8781a0, 
    task_data=<optimized out>, cancellable=0x89f850) at src/daemon/rpmostreed-transaction.c:282
#13 0x00007f95d42bd325 in g_task_thread_pool_thread (thread_data=0x873ac0, pool_data=<optimized out>)
    at gtask.c:1215
#14 0x00007f95d2d80e6c in g_thread_pool_thread_proxy (data=<optimized out>) at gthreadpool.c:307
#15 0x00007f95d2d804e5 in g_thread_proxy (data=0x86b590) at gthread.c:764
#16 0x00007f95d28e6dc5 in start_thread (arg=0x7f95c9255700) at pthread_create.c:308
#17 0x00007f95d261421d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Thread 3 (Thread 0x7f95c8a54700 (LWP 2193)):
#0  0x00007f95d2609bcd in poll () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007f95d2d5aa74 in g_main_context_poll (priority=2147483647, n_fds=1, fds=0x7f95b8006d40, 
    timeout=-1, context=0x7f95b4005bd0) at gmain.c:4074
#2  g_main_context_iterate (context=0x7f95b4005bd0, block=block@entry=1, dispatch=dispatch@entry=1, 
    self=<optimized out>) at gmain.c:3774
#3  0x00007f95d2d5adba in g_main_loop_run (loop=0x7f95b4005740) at gmain.c:3973
#4  0x00007f95d4809d1e in ostree_fetcher_session_thread (data=0x890cf0)
    at src/libostree/ostree-fetcher.c:458
#5  0x00007f95d2d804e5 in g_thread_proxy (data=0x7f95b40048a0) at gthread.c:764
#6  0x00007f95d28e6dc5 in start_thread (arg=0x7f95c8a54700) at pthread_create.c:308
#7  0x00007f95d261421d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Thread 2 (Thread 0x7f95b2f30700 (LWP 2194)):
#0  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1  0x00007f95d2d9dc7c in g_cond_wait (cond=cond@entry=0x7f95ac00e788, mutex=mutex@entry=0x7f95ac00e740)
    at gthread-posix.c:1390
#2  0x00007f95d42c4d3b in invoke_closure_wait_and_free (closure=0x7f95ac00e740, error=0x883668)
    at gtlsinteraction.c:170
#3  0x00007f95c8046c55 in g_tls_connection_gnutls_request_certificate (self=self@entry=0x883790, 
    error=0x883668) at gtlsconnection-gnutls.c:1780
#4  0x00007f95c8042f8f in g_tls_client_connection_gnutls_retrieve_function (session=<optimized out>, 
    req_ca_rdn=<optimized out>, nreqs=<optimized out>, pk_algos=<optimized out>, 
    pk_algos_length=<optimized out>, st=0x7f95b2f2f640) at gtlsclientconnection-gnutls.c:265
#5  0x00007f95b3b3db96 in call_get_cert_callback (session=session@entry=0x7f95b82e8990, 
    issuers_dn=issuers_dn@entry=0x7f95ac004860, issuers_dn_length=issuers_dn_length@entry=3, 
    pk_algos=pk_algos@entry=0x7f95b2f2f740, pk_algos_length=pk_algos_length@entry=3) at cert.c:483
#6  0x00007f95b3b3e80a in select_client_cert (pk_algos_length=3, pk_algos=0x7f95b2f2f740, 
    _data_size=<optimized out>, _data=<optimized out>, session=0x7f95b82e8990) at cert.c:694
#7  _gnutls_proc_cert_cert_req (session=0x7f95b82e8990, data=<optimized out>, data_size=<optimized out>)
    at cert.c:1489
#8  0x00007f95b3abea22 in _gnutls_recv_server_crt_request (session=session@entry=0x7f95b82e8990)
    at gnutls_kx.c:487
#9  0x00007f95b3aba28f in handshake_client (session=0x7f95b82e8990) at gnutls_handshake.c:2757
#10 gnutls_handshake (session=0x7f95b82e8990) at gnutls_handshake.c:2549
---Type <return> to continue, or q <return> to quit---
#11 0x00007f95c804527b in handshake_thread (task=0x7f95b807add0, object=object@entry=0x883790, 
    task_data=<optimized out>, cancellable=<optimized out>) at gtlsconnection-gnutls.c:1226
#12 0x00007f95c80455b9 in async_handshake_thread (task=<optimized out>, object=0x883790, 
    task_data=<optimized out>, cancellable=<optimized out>) at gtlsconnection-gnutls.c:1414
#13 0x00007f95d42bd325 in g_task_thread_pool_thread (thread_data=0x7f95b807add0, 
    pool_data=<optimized out>) at gtask.c:1215
#14 0x00007f95d2d80e6c in g_thread_pool_thread_proxy (data=<optimized out>) at gthreadpool.c:307
#15 0x00007f95d2d804e5 in g_thread_proxy (data=0x7f95b4004f70) at gthread.c:764
#16 0x00007f95d28e6dc5 in start_thread (arg=0x7f95b2f30700) at pthread_create.c:308
#17 0x00007f95d261421d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Thread 1 (Thread 0x7f95d5275940 (LWP 2165)):
#0  0x00007f95d2609bcd in poll () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007f95d2d5aa74 in g_main_context_poll (priority=2147483647, n_fds=3, fds=0x871540, timeout=-1, 
    context=0x86a120) at gmain.c:4074
#2  g_main_context_iterate (context=0x86a120, block=block@entry=1, dispatch=dispatch@entry=1, 
    self=<optimized out>) at gmain.c:3774
#3  0x00007f95d2d5adba in g_main_loop_run (loop=0x86a050) at gmain.c:3973
#4  0x00000000004097f8 in main (argc=1, argv=0x7ffd6de8c698) at src/daemon/main.c:382

@mbarnes
Copy link
Contributor

mbarnes commented Feb 9, 2016

This particular hang was fixed by ostreedev/ostree#181, but I think it's still worth addressing Micah's more general point with canceled transactions.

@miabbott miabbott changed the title Improve ability to monitor running transactions Improve ability to monitor running/cancelled transactions Feb 9, 2016
@miabbott
Copy link
Member Author

miabbott commented Feb 9, 2016

I updated the title to be a bit more inclusive.

I think it would be great if the client could get a status of the transaction(s) on the system.

For example, user A starts rpm-ostree upgrade and user B could use something like rpm-ostree transaction status or rpm-ostree transaction list. The output to user B would look something like:

$ rpm-ostree transaction status
Transaction ID: b1012093
Operation:  rpm-ostree upgrade
Status:  Downloading packages 67%

Or in the case of a canceled transaction like rpm-ostree deploy 7.2

$ rpm-ostree transaction status
Transaction ID: 14afa3bf
Operation:  rpm-ostree deploy 7.2
Status:  Canceled; waiting for for cleanup

And the case of no transactions:

$ rpm-ostree transaction status
No running transactions

Not sure if any or all of this data is available or the feasibility of generating such output, but it would improve some of the porcelain.

@miabbott miabbott changed the title Improve ability to monitor running/cancelled transactions Improve ability to monitor running/canceled transactions Feb 9, 2016
@mbarnes
Copy link
Contributor

mbarnes commented Feb 9, 2016

I like that. The basic mechanics are already implemented: if you issue the exact same command from two different consoles (say, rpm-ostree upgrade), the daemon will transmit synchronized progress messages to both consoles.

Don't think it would be too hard to let any client listen in on the messages.

@jlebon
Copy link
Member

jlebon commented Nov 26, 2020

Fixed by #1019.

@jlebon jlebon closed this as completed Nov 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants