Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recover null onesignal ID crashes for Operations #2157

Merged
merged 3 commits into from
Aug 5, 2024

Conversation

nan-li
Copy link
Contributor

@nan-li nan-li commented Jul 27, 2024

Description

One Line Summary

Don't uncache invalid Operations that are missing onesignalId in order to prevent repetitive crashes when they go on to be processed.

Details

Motivation

Crash report showing high-volume repetitive crashes happening for a low volume of users.

  • Crash report shows onesignalId is null for an UpdateSubscriptionOperation
  • Recovers those users
  • Does not prevent the first crash. This is rare and only reported by one app, so we don't want to hide any issues if we start receiving increased reports of this crash. This fix will prevent the repetitive nature of the crash and allow recovery.
  • Note: this repetitive crash should not typically happen as this Operation should not be instantiated in the first place due to null checks, but this appears to have happened for the reporting app due to more aggressive minification that removes all Kotlin enforced null checks
Fatal Exception: java.lang.NullPointerException: Attempt to invoke virtual method 'boolean java.lang.String.startsWith(java.lang.String)' on a null object reference
       at kotlin.text.StringsKt__StringsJVMKt.startsWith(StringsJVM.kt:421)
       at kotlin.text.StringsKt__StringsJVMKt.startsWith$default(StringsJVM.kt:419)
       at com.onesignal.common.IDManager.isLocalId(IDManager.kt:28)
       at com.onesignal.user.internal.operations.UpdateSubscriptionOperation.getCanStartExecute(UpdateSubscriptionOperation.kt:87)
        ...

Investigation

  • Not able to reproduce a scenario where UpdateSubscriptionOperation would get created without a onesignalId
  • This could mean a subscription property was changed before initWithContext finishes setting the local onesignalId
  • This could also mean the identity model's onesignalId is set to from nonnull to null at the moment this operation is created
  • This could also be the identity model in the store is overwritten with an identity model that is missing onesignalId or by a blank Identity Model
  • I was not able to recreate any of the above scenarios naturally

Scope

Basically drop any Operation that is invalid and would not be sendable.

Testing

Unit testing

  • Add new unit test that uncaches valid and invalid Operations

Manual testing

Device: Emulator Pixel 3 API 33

  • Limited ability to test manually due to inability to reproduce
  • Attempted multiple paths of reproduction naturally
  • Force reproduction by instantiating blank UpdateSubscriptionOperation directly

Affected code checklist

  • Notifications
    • Display
    • Open
    • Push Processing
    • Confirm Deliveries
  • Outcomes
  • Sessions
  • In-App Messaging
  • REST API requests
  • Public API changes

Checklist

Overview

  • I have filled out all REQUIRED sections above
  • PR does one thing
  • Any Public API changes are explained in the PR details and conform to existing APIs

Testing

  • I have included test coverage for these changes, or explained why they are not needed
  • All automated tests pass, or I explained why that is not possible
  • I have personally tested this on my device, or explained why that is not possible

Final pass

  • Code is as readable as possible.
  • I have reviewed this PR myself, ensuring it meets each checklist item

This change is Reviewable

@nan-li nan-li changed the title [wip] Fix null onesignal ID crash in Update Subscription Operation [wip] Recover null onesignal ID crash in Update Subscription Operation Jul 27, 2024
@nan-li nan-li force-pushed the fix/null_onesignalid_crash branch from f18de03 to 8287a28 Compare July 31, 2024 02:51
* When uncaching an Operation that is not login-related, check for the existence of onesignalId.
* Note: The login-related Operations should still have onesignalId but they can be sent without it. It is also less likely for onesignalId to be null based on when these Operations are created.
* This is to remedy a rare case that an Operation could be missing the onesignalId, which would continuously cause crashes when the Operation is processed. The particular reported Operation is an UpdateSubscriptionOperation.
@nan-li nan-li force-pushed the fix/null_onesignalid_crash branch from 8287a28 to ba88ec2 Compare July 31, 2024 02:59
* Switch Operation type in an existing test now that non-login Operations are expected to have OSID
* The type of Operation used in the test is irrelevant
@nan-li nan-li force-pushed the fix/null_onesignalid_crash branch from 3fb7a38 to 9306f90 Compare July 31, 2024 04:46
@nan-li nan-li changed the title [wip] Recover null onesignal ID crash in Update Subscription Operation [wip] Recover null onesignal ID crashes for Operations Jul 31, 2024
* Add new test file `OperationModelStoreTests`
* Add test called "does not load invalid cached operations" that uncaches valid and invalid operations based on the new and existing checks in the OperationModelStore uncaching logic.
@nan-li nan-li force-pushed the fix/null_onesignalid_crash branch from f9e998b to f7b0e5f Compare July 31, 2024 20:46
@nan-li nan-li changed the title [wip] Recover null onesignal ID crashes for Operations Recover null onesignal ID crashes for Operations Jul 31, 2024
@nan-li
Copy link
Contributor Author

nan-li commented Jul 31, 2024

Thoughts on letting the first crash through when the Operation is first processed in the OpRepo?
Noting again that under normal circumstances, the Operation would not even be instantiated in the first place.

Copy link
Contributor

@jinliu9508 jinliu9508 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code and the logic looks good to me and nice job on the testing unit!

I just want to confirm that the 'shouldRePersist' logic under ModelStore.load() will not persist the invalid operation in cases where the operation model store contains only invalid operation(s). If the bad data get to persist and we let the first crash through, I think the bad data won't get drop out of the persistent store and will cause another crash in next launch.

@nan-li
Copy link
Contributor Author

nan-li commented Aug 2, 2024

Code and the logic looks good to me and nice job on the testing unit!

I just want to confirm that the 'shouldRePersist' logic under ModelStore.load() will not persist the invalid operation in cases where the operation model store contains only invalid operation(s). If the bad data get to persist and we let the first crash through, I think the bad data won't get drop out of the persistent store and will cause another crash in next launch.

The bad operation won't be loaded and created so when the operations are all persisted again, it wont be there.

@nan-li nan-li merged commit f7a7a5d into main Aug 5, 2024
2 checks passed
@nan-li nan-li deleted the fix/null_onesignalid_crash branch August 5, 2024 18:26
@jinliu9508 jinliu9508 mentioned this pull request Aug 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants