Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

An event was sent twice down /sync (and down application service txns?) consistently across clients; ghost notifications #14539

Open
MichaelSasser opened this issue Nov 23, 2022 · 10 comments
Labels
A-Sync defects related to /sync O-Occasional Affects or can be seen by some users regularly or most users rarely S-Minor Blocks non-critical functionality, workarounds exist. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues.

Comments

@MichaelSasser
Copy link
Contributor

MichaelSasser commented Nov 23, 2022

Description

Ghost notifications

Since synapse v1.70 or v1.71, I've noticed randomly occurring "ghost notification", happening on element-web, element-desktop and element-android, hydrogen. They usually come in small, fast bursts of two to four notifications (I hear an audible sound telling me about new events in a room). When I check for new events, there aren't any. Nothing has changed. I have the options Developer mode and Show hidden events in timeline enabled in element, so I would assume, I'd see any changes to a room.

Event duplication

Around the same Time, I've noticed some events, some from weeks or months ago, from being randomly duplicated and pushed as new ones. This happens independently of the "ghost notification issue". At first, I've only noticed this in one particular room, where the same messages were duplicated a few times over a couple of days (Example: see "Image 1" below), possibly leading to an unstable room (see "Possible implications). Today, the same thing happened in our mjolnir control room, where mjolnir (configured as a client and not as an appservice) executed the duplicated command (Example: see "Image 2" below). Actually, I'm actually not certain, if my homeserver is the one, the issues originate from, as the duplicated events are visible on other instances too, including matrix.org and maybe others as well.

Image 1: Duplicated messages in the python room

event_duplicated_python_room

Image 2: Mjolnir bot reacts to duplicated events

The "failed to leave" error from mjolnir is unrelated (and already fixed). The duplication errors happened before the room, it was trying to leave, had issues, too, and might have led to the issue, where the room was becoming unstable on my homeserver.
event_duplicated_mjolnir

Possible implications

After this happened a few times, in the Python room #python:matrix.org, the room became unstable on my end, and I was unable to send events to that room any more because of missing state events. I also wasn't able to remove the room with the admin API, so, I manually had to remove the room from the database to gain access again:

BEGIN; SET CONSTRAINTS ALL DEFERRED;

 DELETE FROM appservice_room_list WHERE room_id = '!roomid:matrix.org';
 DELETE FROM batch_events WHERE room_id = '!roomid:matrix.org';
 DELETE FROM current_state_events WHERE room_id = '!roomid:matrix.org';
 DELETE FROM blocked_rooms WHERE room_id = '!roomid:matrix.org';
 DELETE FROM current_state_delta_stream WHERE room_id = '!roomid:matrix.org';
 DELETE FROM destination_rooms WHERE room_id = '!roomid:matrix.org';
 DELETE FROM device_lists_changes_in_room WHERE room_id = '!roomid:matrix.org';
 DELETE FROM e2e_room_keys WHERE room_id = '!roomid:matrix.org';
 DELETE FROM event_auth_chain_to_calculate WHERE room_id = '!roomid:matrix.org';
 DELETE FROM event_auth WHERE room_id = '!roomid:matrix.org';
 DELETE FROM event_backward_extremities WHERE room_id = '!roomid:matrix.org';
 DELETE FROM event_failed_pull_attempts WHERE room_id = '!roomid:matrix.org';
 DELETE FROM event_json WHERE room_id = '!roomid:matrix.org';
 DELETE FROM event_forward_extremities WHERE room_id = '!roomid:matrix.org';
 DELETE FROM event_edges WHERE room_id = '!roomid:matrix.org';
 DELETE FROM event_labels WHERE room_id = '!roomid:matrix.org';
 DELETE FROM event_push_actions WHERE room_id = '!roomid:matrix.org';
 DELETE FROM event_push_summary WHERE room_id = '!roomid:matrix.org';
 DELETE FROM event_reports WHERE room_id = '!roomid:matrix.org';
 DELETE FROM event_search WHERE room_id = '!roomid:matrix.org';
 DELETE FROM event_txn_id WHERE room_id = '!roomid:matrix.org';
 DELETE FROM event_edges WHERE event_id IN (SELECT event_id FROM events WHERE room_id='!roomid:matrix.org');
 DELETE FROM event_auth_chains WHERE event_id IN (SELECT event_id FROM events WHERE room_id = '!roomid:matrix.org');
 DELETE FROM event_relations WHERE event_id IN (SELECT event_id FROM events WHERE room_id = '!roomid:matrix.org');
 DELETE FROM events WHERE room_id = '!roomid:matrix.org';
 DELETE FROM federation_inbound_events_staging WHERE room_id = '!roomid:matrix.org';
 DELETE FROM insertion_event_edges WHERE room_id = '!roomid:matrix.org';
 DELETE FROM insertion_event_extremities WHERE room_id = '!roomid:matrix.org';
 DELETE FROM insertion_events WHERE room_id = '!roomid:matrix.org';
 DELETE FROM local_current_membership WHERE room_id = '!roomid:matrix.org';
 DELETE FROM partial_state_events WHERE room_id = '!roomid:matrix.org';
 DELETE FROM partial_state_rooms_servers WHERE room_id = '!roomid:matrix.org';
 DELETE FROM pusher_throttle WHERE room_id = '!roomid:matrix.org';
 DELETE FROM receipts_linearized WHERE room_id = '!roomid:matrix.org';
 DELETE FROM room_account_data WHERE room_id = '!roomid:matrix.org';
 DELETE FROM room_aliases WHERE room_id = '!roomid:matrix.org';
 DELETE FROM room_depth WHERE room_id = '!roomid:matrix.org';
 DELETE FROM room_memberships WHERE room_id = '!roomid:matrix.org';
 DELETE FROM room_retention WHERE room_id = '!roomid:matrix.org';
 DELETE FROM room_stats_current WHERE room_id = '!roomid:matrix.org';
 DELETE FROM room_stats_earliest_token WHERE room_id = '!roomid:matrix.org';
 DELETE FROM room_stats_state WHERE room_id = '!roomid:matrix.org';
 DELETE FROM state_groups_state WHERE room_id = '!roomid:matrix.org';
 DELETE FROM rooms WHERE room_id = '!roomid:matrix.org';
 DELETE FROM user_directory WHERE room_id = '!roomid:matrix.org';
 DELETE FROM room_tags WHERE room_id = '!roomid:matrix.org';
 DELETE FROM room_tags_revisions WHERE room_id = '!roomid:matrix.org';
 DELETE FROM state_groups WHERE room_id = '!roomid:matrix.org';
 DELETE FROM state_events WHERE room_id = '!roomid:matrix.org';
 DELETE FROM users_in_public_rooms WHERE room_id = '!roomid:matrix.org';
 DELETE FROM users_who_share_private_rooms WHERE room_id = '!roomid:matrix.org';
 DELETE FROM stream_ordering_to_exterm WHERE room_id = '!roomid:matrix.org';
 DELETE FROM receipts_graph WHERE room_id = '!roomid:matrix.org';
 DELETE FROM threads WHERE room_id = '!roomid:matrix.org';
 DELETE FROM partial_state_rooms WHERE room_id = '!roomid:matrix.org';
COMMIT;

REINDEX (verbose) database synapse;
REINDEX (verbose) system synapse;
VACUUM FULL VERBOSE ANALYZE;

(Because I had to roll back a few times, while I was trying to figure out, how I can remove the room, I accidentally deleted the logs as well.)

Steps to reproduce

  • Do nothing
  • Duplications / ghost events occur

Homeserver

matrix.org; michaelsasser.org

Synapse Version

matrixdotorg/synapse; tags: v1.71.0, v1.72.0, possibly v1.70.0

Installation Method

Docker (matrixdotorg/synapse)

Database

postgres:15.0-alpine on same machine, It was recently restored to upgrade to v15, not sure, if it happeded before; recently restored while trying to removing room, issues happened before

Workers

Multiple workers

Platform

Environment: CX31 Hetzner Cloud
Using the Ansible playbook: spantaleev/matrix-docker-ansible-deploy
Debian: Linux matrix 5.10.0-19-amd64 element-hq/element-web#1 SMP Debian 5.10.149-2 (2022-10-21) x86_64 GNU/Linux
Docker: Docker version 20.10.21, build baeda1f
Containerd: containerd containerd.io 1.6.10 770bd0108c32f3fb5c73ae1264f7e503fe7b2661

Configuration

Playbook

Using the Ansible playbook: [spantaleev/matrix-docker-ansible-deploy], contains multiple files (result below):

Playbook/config
---
# hookshot
matrix_hookshot_enabled: true  # default
matrix_hookshot_gitlab_enabled: true
matrix_hookshot_permissions:
  # Allow all users to send commands to existing services
  - actor: "*"
    services:
      - service: gitlab
        level: commands
  # Allow users on this domain to log in to gitlab.
  - actor: "!foo:michaelsasser.org"
    services:
      - service: gitlab
        level: login
  # Allow this specific user to do any action
  - actor: "@foo:michaelsasser.org"
    services:
      - service: "*"
        level: admin
---
# Mjolnir
matrix_bot_mjolnir_enabled: true
matrix_bot_mjolnir_access_token: "foo"
matrix_bot_mjolnir_management_room: "!foo:michaelsasser.org"
matrix_bot_mjolnir_configuration_extension_yaml: |
  backgroundDelayMS: 50
  fasterMembershipChecks: true
  recordIgnoredInvites: true
  commands:
    additionalPrefixes:
        - mjolnir
        - bot
  automaticallyRedactForReasons:
    - spam
    - advertising
    - advertisement
    - ad
    - scam
  protections:
    wordlist:
      words:
        - foo
        - more foo
      minutesBeforeTrusting: 1234  # 1234 is not the actual time
---
# ssl

matrix_ssl_lets_encrypt_support_email: foo@MichaelSasser.org
---
# element

# set integration manager
matrix_client_element_integrations_ui_url: "https://scalar.vector.im/"
matrix_client_element_integrations_rest_url: "https://scalar.vector.im/api"
matrix_client_element_integrations_widgets_urls:
  ["https://scalar.vector.im/api"]

# set server room directory
matrix_client_element_roomdir_servers:
  - michaelsasser.org
  - matrix.org

# Extra themes
matrix_client_element_themes_enabled: false

# Default Template
matrix_client_element_default_theme: "dark"

matrix_client_element_configuration_extension_json: |
  {
     "sso_redirect_options": {
        "immediate": true
     }
  }

---
# hydrogen
#
matrix_client_hydrogen_enabled: true
---

matrix_grafana_enabled: true
matrix_grafana_anonymous_access: false
matrix_grafana_default_admin_user: foo # Do not change
matrix_grafana_default_admin_password: foo

matrix_grafana_container_extra_arguments:
  - -e GF_AUTH_GENERIC_OAUTH_NAME="Single Sign-On"
  - -e GF_AUTH_GENERIC_OAUTH_ENABLED="true"
  - -e GF_AUTH_GENERIC_OAUTH_CLIENT_ID="matrix-grafana"
  - -e GF_AUTH_GENERIC_OAUTH_CLIENT_SECRET="foo"
  - -e GF_AUTH_GENERIC_OAUTH_SCOPES="openid"
  - -e GF_AUTH_GENERIC_OAUTH_AUTH_URL="https://foo.michaelsasser.org/realms/MichaelSasser/protocol/openid-connect/auth"
  - -e GF_AUTH_GENERIC_OAUTH_TOKEN_URL="https://foo.michaelsasser.org/realms/MichaelSasser/protocol/openid-connect/token"
  - -e GF_AUTH_GENERIC_OAUTH_API_URL="https://foo.michaelsasser.org/realms/MichaelSasser/protocol/openid-connect/userinfo"
  - -e GF_AUTH_GENERIC_OAUTH_ALLOW_SIGN_UP="true"
  - -e GF_AUTH_GENERIC_OAUTH_ROLE_ATTRIBUTE_PATH="contains(roles[*], 'Admin') && 'Admin' || contains(roles[*], 'Editor') && 'Editor' || contains(roles[*], 'Viewer') && 'Viewer'|| ''"
  - -e GF_AUTH_GENERIC_OAUTH_ROLE_ATTRIBUTE_STRICT="true"
  # Disable Login Form
  - -e GF_AUTH_DISABLE_LOGIN_FORM="true"
  - -e GF_AUTH_OAUTH_AUTO_LOGIN="true"
  - -e GF_AUTH_BASIC_ENABLED="false"
---
# email

matrix_mailer_sender_address: "foo@MichaelSasser.org"
matrix_mailer_relay_use: true
matrix_mailer_relay_host_name: "foo.michaelsasser.org"
matrix_mailer_relay_host_port: 587
matrix_mailer_relay_auth: true
matrix_mailer_relay_auth_username: "foo@michaelsasser.org"
matrix_mailer_relay_auth_password: "foo"
---

matrix_synapse_rust_synapse_compress_state_find_rooms_command_wait_time: 1200
matrix_synapse_rust_synapse_compress_state_compress_room_time: 68400
matrix_synapse_rust_synapse_compress_state_psql_import_time: 68400

# Base Domain Serving
matrix_nginx_proxy_base_domain_serving_enabled: true

matrix_nginx_proxy_connect_timeout: 120
matrix_nginx_proxy_send_timeout: 120
matrix_nginx_proxy_read_timeout: 120
matrix_nginx_send_timeout: 120

# Fix issue where the admin api is not reacheable anymore
matrix_nginx_proxy_proxy_matrix_client_redirect_root_uri_to_domain: ""
matrix_nginx_proxy_proxy_matrix_client_api_forwarded_location_synapse_admin_api_enabled: true
---

matrix_postgres_connection_password: "foo"
matrix_postgres_container_postgres_bind_port: 5432

matrix_postgres_process_extra_arguments:
  [
    "-c max_connections=120",
    "-c shared_buffers=512MB",
    "-c effective_cache_size=1536MB",
    "-c maintenance_work_mem=128MB",
    "-c checkpoint_completion_target=0.9",
    "-c wal_buffers=16MB",
    "-c default_statistics_target=100",
    "-c random_page_cost=1.1",
    "-c effective_io_concurrency=200",
    "-c work_mem=2184kB",
    "-c min_wal_size=1GB",
    "-c max_wal_size=4GB",
    "-c max_worker_processes=2",
    "-c max_parallel_workers_per_gather=1",
    "-c max_parallel_workers=2",
    "-c max_parallel_maintenance_workers=1"
  ]

# Postgres Backups

matrix_postgres_backup_enabled: true
matrix_postgres_backup_keep_months: 12
---
# grafana prometheus

matrix_prometheus_enabled: true
matrix_prometheus_node_exporter_enabled: true
matrix_prometheus_postgres_exporter_enabled: false
matrix_prometheus_postgres_exporter_database_username: "matrix_prometheus_postgres_exporter"
matrix_prometheus_postgres_exporter_database_password: "foo"


matrix_prometheus_node_exporter_process_extra_arguments:
  - "--collector.disable-defaults"
  - "--collector.cpu"
  - "--collector.filesystem"
  - "--collector.os"
  - "--collector.stat"
  - "--collector.meminfo"
  - "--collector.loadavg"
  - "--collector.time"
  - "--collector.uname"  # Mandatory, otherwide the data vanishes from grafana
---
# synapse

# Federation
matrix_synapse_federation_enabled: true
matrix_synapse_allow_public_rooms_over_federation: true

# Controls whether accessing the server's public rooms directory can be done
# without authentication.
# For private servers, you most likely wish to require authentication,
# unless you know what list of rooms you're publishing to the world and
# explicitly want to do it.
matrix_synapse_allow_public_rooms_without_auth: true

# Matrix cache size (ram) default: 0.5
matrix_synapse_caches_global_factor: 1.5 # 0.5, is default

matrix_synapse_event_cache_size: "100K"

# OpenID (Keycloak) (rest see below in matrix_synapse_configuration_extension_yaml)
matrix_nginx_proxy_proxy_matrix_client_api_forwarded_location_synapse_oidc_api_enabled: true

# Extra Synapse homeserver.yaml config
matrix_synapse_configuration_extension_yaml: |
  suppress_key_server_warning: true  # defaults to matrix.org

  ####################
  # room complexity
  ####################
  limit_remote_rooms:
    enabled: true
    complexity: 0.5
    complexity_error: "This room is too complex."
    admins_can_join: true


  ####################
  # blacklist
  ####################
  ip_range_blacklist:
    - "123.123.123.123/32"   # foo.xyz
    - "123.123.123.123/32"  # matrix.foo.com
    - "123.123.123.123/32"    # matrix.foo.sh
    - "123.123.123.123/32"   # matrix.foo.net
    - "123.123.123.123/32"    # matrix.foo.biz (CNAME)

  ####################
  # retention
  ####################
  redaction_retention_period: 30d
  retention:
    enabled: true
    default_policy:
      min_lifetime: 4w
      max_lifetime: 8w
    allowed_lifetime_min: 1d
    allowed_lifetime_max: 24w
  purge_jobs:
   - longest_max_lifetime: 3d
     interval: 12h
   - shortest_max_lifetime: 3d
     interval: 1d

  ####################
  # server notice
  ####################
  server_notices:
    system_mxid_localpart: server
    system_mxid_display_name: "Server Notices"
    system_mxid_avatar_url: "mxc://{{ matrix_domain }}/foo"
    room_name: "Server Notices"

  ####################
  # OIDC
  ####################
  oidc_providers:
    - idp_id: keycloak
      idp_name: "Single Sign-On"
      issuer: "https://foo.michaelsasser.org/realms/MichaelSasser"
      client_id: "synapse"
      client_secret: "foo"
      scopes: ["openid", "profile"]
      allow_existing_users: true
      user_mapping_provider:
        config:
          localpart_template: "{% raw %}{{ user.preferred_username }}{% endraw %}"
          display_name_template: "{% raw %}{{ user.name }}{% endraw %}"

# Media retention
matrix_synapse_media_retention_local_media_lifetime: 8w
matrix_synapse_media_retention_remote_media_lifetime: 1w

# workers
matrix_synapse_workers_enabled: true

matrix_synapse_workers_preset: michaelsasser_org

# https://github.com/matrix-org/synapse/blob/develop/docs/workers.md
matrix_synapse_workers_presets:
  michaelsasser_org:
    generic_workers_count: 1
    pusher_workers_count: 1
    federation_sender_workers_count: 1
    media_repository_workers_count: 1
    appservice_workers_count: 1
    user_dir_workers_count: 1
    background_workers_count: 1
    stream_writer_events_stream_workers_count: 0
    stream_writer_typing_stream_workers_count: 0
    stream_writer_to_device_stream_workers_count: 0
    stream_writer_account_data_stream_workers_count: 0
    stream_writer_receipts_stream_workers_count: 0
    stream_writer_presence_stream_workers_count: 0

# A secret used to protect access keys issued by the server.
# You can put any string here, but generating a strong one is preferred
# (e.g. `pwgen -s 64 1`).
matrix_synapse_macaroon_secret_key: "foo"
matrix_homeserver_generic_secret_key: "{{ matrix_synapse_macaroon_secret_key }}"
matrix_synapse_max_upload_size_mb: 256

matrix_synapse_auto_join_rooms:
  - "#support:{{ matrix_domain }}"
# matrix_synapse_use_presence: true  # Disable on high load

# Shared Secret Provider
matrix_synapse_ext_password_provider_shared_secret_auth_enabled: false # disabled to make sure no m.password authentication is available
matrix_synapse_ext_password_provider_shared_secret_auth_m_login_password_support_enabled: false # disabled to make sure no m.password authentication is available
matrix_synapse_ext_password_provider_shared_secret_auth_shared_secret: "foo"
---
devture_timesync_ntpd_package: systemd-timesyncd
devture_timesync_ntpd_service: systemd-timesyncd
# A shared secret (between Coturn and Synapse) used for authentication.
# You can put any string here, but generating a strong one is preferred
# (e.g. `pwgen -s 64 1`).
matrix_coturn_turn_static_auth_secret: "foo"

---

# Enable location sharing
matrix_well_known_matrix_client_configuration_extension_json: |-
  {
    "org.matrix.msc3488.tile_server": {
      "map_style_url": "https://api.maptiler.com/maps/streets/style.json?key=foo"
    },
    "m.tile_server": {
      "map_style_url": "https://api.maptiler.com/maps/streets/style.json?key=foo"
    },
    "setting_defaults": {
      "MessageComposerInput.showStickersButton": false
    },
    "features": {
      "feature_latex_maths": true,
      "feature_pinning": true,
      "feature_thread": true,
      "feature_html_topic": true
    },
    "admins": [{
      "matrix_id": "@foo:michaelsasser.org",
      "email_address": "foo@michaelsasser.org",
      "role": "admin"
    },
    {
      "email_address": "foo@michaelsasser.org",
      "role": "security"
    }]
  }

Results in:

homeserver.yaml

account_threepid_delegates:
  msisdn: ''
alias_creation_rules:
- action: allow
  alias: '*'
  room_id: '*'
  user_id: '*'
allow_guest_access: false
allow_public_rooms_over_federation: true
allow_public_rooms_without_auth: true
allowed_avatar_mimetypes:
- image/png
- image/jpeg
- image/gif
app_service_config_files:
- /hookshot-registration.yml
auto_join_rooms:
- '#foo:michaelsasser.org'
autocreate_auto_join_rooms: true
background_updates: null
caches:
  global_factor: 1.5
  per_cache_factors: null
cas_config: null
daemonize: false
database:
  args:
    cp_max: 10
    cp_min: 5
    database: synapse
    host: matrix-postgres
    password: foo
    port: 5432
    user: synapse
  name: psycopg2
  txn_limit: 0
default_room_version: '9'
email:
  app_name: Matrix
  client_base_url: https://element.michaelsasser.org
  enable_notifs: true
  invite_client_location: https://app.element.io
  notif_for_new_users: true
  notif_from: Matrix <foo@MichaelSasser.org>
  require_transport_security: false
  smtp_host: matrix-mailer
  smtp_port: 8025
enable_media_repo: false
enable_metrics: true
enable_registration: false
enable_registration_captcha: false
enable_registration_without_verification: false
enable_room_list_search: true
encryption_enabled_by_default_for_room_type: 'off'
event_cache_size: 100K
federation_rr_transactions_per_room_per_second: 50
federation_sender_instances:
- matrix-synapse-worker-federation-sender-0
form_secret: foo
include_profile_data_on_invite: true
instance_map: {}
ip_range_blacklist:
- 123.123.123.123/32  # not actually 123...
- 123.123.123.123/32
- 123.123.123.123/32
- 123.123.123.123/32
- 123.123.123.123/32
limit_profile_requests_to_users_who_share_rooms: false
limit_remote_rooms:
  admins_can_join: true
  complexity: 0.5
  complexity_error: This room is too complex.
  enabled: true
listeners:
- bind_addresses:
  - 0.0.0.0
  port: 9100
  type: metrics
- bind_addresses:
  - '::'
  port: 8008
  resources:
  - compress: false
    names:
    - client
  tls: false
  type: http
  x_forwarded: true
- bind_addresses:
  - '::'
  port: 8048
  resources:
  - compress: false
    names:
    - federation
  tls: false
  type: http
  x_forwarded: true
- bind_addresses:
  - 0.0.0.0
  port: 9093
  resources:
  - names:
    - replication
  type: http
log_config: /data/matrix.michaelsasser.org.log.config
macaroon_secret_key: foo
manhole_settings: null
max_spider_size: 10M
max_upload_size: 256M
media_instance_running_background_jobs: matrix-synapse-worker-media-repository-0
media_retention:
  local_media_lifetime: 8w
  remote_media_lifetime: 1w
media_storage_providers: []
media_store_path: /matrix-media-store-parent/media-store
metrics_flags: null
modules: []
notify_appservices_from_worker: matrix-synapse-worker-appservice-0
oembed: null
oidc_providers:
- allow_existing_users: true
  client_id: synapse
  client_secret: foo
  idp_id: keycloak
  idp_name: Single Sign-On
  issuer: https://foo.michaelsasser.org/realms/MichaelSasser
  scopes:
  - openid
  - profile
  user_mapping_provider:
    config:
      display_name_template: '{{ user.name }}'
      localpart_template: '{{ user.preferred_username }}'
old_signing_keys: null
opentracing: null
password_config:
  localdb_enabled: false
  pepper: ''
  policy: null
pid_file: /homeserver.pid
presence:
  enabled: true
public_baseurl: https://matrix.michaelsasser.org/
purge_jobs:
- interval: 12h
  longest_max_lifetime: 3d
- interval: 1d
  shortest_max_lifetime: 3d
push:
  include_content: true
pusher_instances:
- matrix-synapse-worker-pusher-0
rc_admin_redaction:
  burst_count: 50
  per_second: 1
rc_federation:
  concurrent: 3
  reject_limit: 50
  sleep_delay: 500
  sleep_limit: 10
  window_size: 1000
rc_invites:
  per_issuer:
    burst_count: 10
    per_second: 0.3
  per_room:
    burst_count: 10
    per_second: 0.3
  per_user:
    burst_count: 5
    per_second: 0.003
rc_joins:
  local:
    burst_count: 10
    per_second: 0.1
  remote:
    burst_count: 10
    per_second: 0.01
rc_login:
  account:
    burst_count: 3
    per_second: 0.17
  address:
    burst_count: 3
    per_second: 0.17
  failed_attempts:
    burst_count: 3
    per_second: 0.17
rc_message:
  burst_count: 10
  per_second: 0.2
rc_registration:
  burst_count: 3
  per_second: 0.17
recaptcha_private_key: ''
recaptcha_public_key: ''
redaction_retention_period: 30d
redis:
  enabled: true
  host: matrix-redis
  password: null
  port: 6379
registration_requires_token: false
registration_shared_secret: foo
report_stats: false
require_auth_for_profile_requests: false
retention:
  allowed_lifetime_max: 24w
  allowed_lifetime_min: 1d
  default_policy:
    max_lifetime: 8w
    min_lifetime: 4w
  enabled: true
room_list_publication_rules:
- action: allow
  alias: '*'
  room_id: '*'
  user_id: '*'
room_prejoin_state: null
run_background_tasks_on: matrix-synapse-worker-background-0
saml2_config:
  sp_config: null
  user_mapping_provider:
    config: null
send_federation: false
server_name: michaelsasser.org
server_notices:
  room_name: Server Notices
  system_mxid_avatar_url: mxc://michaelsasser.org/foo
  system_mxid_display_name: Server Notices
  system_mxid_localpart: server
signing_key_path: /data/matrix.michaelsasser.org.signing.key
spam_checker: []
sso: null
start_pushers: false
stats: null
stream_writers: {}
suppress_key_server_warning: true
templates: null
tls_certificate_path: null
tls_private_key_path: null
trusted_key_servers:
- server_name: matrix.org
turn_allow_guests: false
turn_shared_secret: foo
turn_uris:
- turn:matrix.michaelsasser.org?transport=udp
- turn:matrix.michaelsasser.org?transport=tcp
ui_auth: null
update_user_directory_from_worker: matrix-synapse-worker-user-dir-0
url_preview_accept_language:
- en-US
- en
url_preview_enabled: true
url_preview_ip_range_blacklist:
- 127.0.0.0/8
- 10.0.0.0/8
- 172.16.0.0/12
- 192.168.0.0/16
- 100.64.0.0/10
- 192.0.0.0/24
- 169.254.0.0/16
- 192.88.99.0/24
- 198.18.0.0/15
- 192.0.2.0/24
- 198.51.100.0/24
- 203.0.113.0/24
- 224.0.0.0/4
- ::1/128
- fe80::/10
- fc00::/7
- 2001:db8::/32
- ff00::/8
- fec0::/10
user_directory: null
user_ips_max_age: 28d
worker_app: synapse.app.homeserver

Workers

# $cat worker.matrix-synapse-worker-*

################################################


worker_app: synapse.app.generic_worker
worker_name: matrix-synapse-worker-appservice-0

worker_daemonize: false
worker_log_config: /data/matrix.michaelsasser.org.log.config

worker_replication_host: matrix-synapse
worker_replication_http_port: 9093


################################################


worker_listeners:
  - type: metrics
    bind_addresses: ['0.0.0.0']
    port: 19300
worker_app: synapse.app.generic_worker
worker_name: matrix-synapse-worker-background-0

worker_daemonize: false
worker_log_config: /data/matrix.michaelsasser.org.log.config

worker_replication_host: matrix-synapse
worker_replication_http_port: 9093


################################################


worker_listeners:
  - type: metrics
    bind_addresses: ['0.0.0.0']
    port: 19700
worker_app: synapse.app.federation_sender
worker_name: matrix-synapse-worker-federation-sender-0

worker_daemonize: false
worker_log_config: /data/matrix.michaelsasser.org.log.config

worker_replication_host: matrix-synapse
worker_replication_http_port: 9093


################################################


worker_listeners:
  - type: metrics
    bind_addresses: ['0.0.0.0']
    port: 19400
worker_app: synapse.app.generic_worker
worker_name: matrix-synapse-worker-generic-0

worker_daemonize: false
worker_log_config: /data/matrix.michaelsasser.org.log.config

worker_replication_host: matrix-synapse
worker_replication_http_port: 9093

worker_main_http_uri: http://matrix-synapse:8008


################################################

worker_listeners:
  - type: http
    bind_addresses: ['::']
    x_forwarded: true
    port: 18111
    resources:
      - names: ["client", "federation"]
  - type: metrics
    bind_addresses: ['0.0.0.0']
    port: 19111
worker_app: synapse.app.media_repository
worker_name: matrix-synapse-worker-media-repository-0

worker_daemonize: false
worker_log_config: /data/matrix.michaelsasser.org.log.config

worker_replication_host: matrix-synapse
worker_replication_http_port: 9093


################################################


worker_listeners:
  - type: http
    bind_addresses: ['::']
    x_forwarded: true
    port: 18551
    resources:
      - names: ["media"]
  - type: metrics
    bind_addresses: ['0.0.0.0']
    port: 19551
worker_app: synapse.app.pusher
worker_name: matrix-synapse-worker-pusher-0

worker_daemonize: false
worker_log_config: /data/matrix.michaelsasser.org.log.config

worker_replication_host: matrix-synapse
worker_replication_http_port: 9093


################################################


worker_listeners:
  - type: metrics
    bind_addresses: ['0.0.0.0']
    port: 19200
worker_app: synapse.app.generic_worker
worker_name: matrix-synapse-worker-user-dir-0

worker_daemonize: false
worker_log_config: /data/matrix.michaelsasser.org.log.config

worker_replication_host: matrix-synapse
worker_replication_http_port: 9093


################################################


worker_listeners:
  - type: http
    bind_addresses: ['::']
    x_forwarded: true
    port: 18661
    resources:
      - names: ["client"]
  - type: metrics
    bind_addresses: ['0.0.0.0']
    port: 19661

Relevant log output

# Synapse + workers during the time the duplication, see "Image 2" happened:
-- Journal begins at Sat 2022-11-12 20:25:09 CET, ends at Wed 2022-11-23 18:42:30 CET. --
Nov 23 11:33:23 matrix matrix-synapse[1029358]: 2022-11-23 10:33:23,569 - synapse.handlers.device - 1132 - WARNING - _maybe_retry_device_resync-770 - Failed to handle device list update for @kosyne:kaosteam.net: Failed to send request: DNSLookupError: DNS lookup failed: no results for hostname lookup: matrix.kaosteam.net.
Nov 23 11:35:23 matrix matrix-synapse[1029358]: 2022-11-23 10:35:23,700 - synapse.handlers.device - 1132 - WARNING - _maybe_retry_device_resync-774 - Failed to handle device list update for @dries.staelens:rubicon.biz: Failed to send request: DNSLookupError: DNS lookup failed: no results for hostname lookup: rubicon.biz.
Nov 23 11:39:17 matrix matrix-synapse-worker-federation-sender-0[1030801]: 2022-11-23 10:39:17,029 - synapse.federation.sender.per_destination_queue - 368 - WARNING - federation_transaction_transmission_loop-21408 - TX [sameteem.com] Failed to send transaction: Failed to send request: TimeoutError: User timeout caused connection failure.
Nov 23 11:40:07 matrix matrix-synapse-worker-federation-sender-0[1030801]: 2022-11-23 10:40:07,211 - synapse.federation.sender.per_destination_queue - 368 - WARNING - federation_transaction_transmission_loop-21582 - TX [chat.jacobsen.app] Failed to send transaction: Failed to send request: TimeoutError: User timeout caused connection failure.
Nov 23 11:40:22 matrix matrix-synapse-worker-federation-sender-0[1030801]: 2022-11-23 10:40:22,671 - synapse.federation.sender.per_destination_queue - 368 - WARNING - federation_transaction_transmission_loop-21588 - TX [chat.perafree.ml] Failed to send transaction: Failed to send request: TimeoutError: User timeout caused connection failure.
Nov 23 11:45:53 matrix matrix-synapse[1029358]: 2022-11-23 10:45:53,782 - synapse.handlers.device - 1132 - WARNING - _maybe_retry_device_resync-795 - Failed to handle device list update for @dries.staelens:rubicon.biz: Failed to send request: DNSLookupError: DNS lookup failed: no results for hostname lookup: rubicon.biz.

Anything else that would be useful to know?

Full log during the time the duplication, see "Image 2" happened: message_duplication.log

All times & dates are UTC+1.

@DMRobertson
Copy link
Contributor

Thank you for your thorough report.

On ghost notifications: it's difficult to say what could be the cause without timestamps or logs to start digging into. I think if someone edits a message you are named or @named in then you will hear an audible notification but not receive an obvious prompt as to why. I'm not sure if that explains what you've seen. If it happens again, please submit client logs (e.g. /rageshake on Element Web) and mention this issue in the description.

On event duplication: it's not immediately obvious to me that something has gone wrong in image 1. What makes you say that the two messages from 10th October 2022 are duplicated, rather than appearing out of order? Are you seeing the same message/state event take place with two different event IDs?

It's possible that

  • bekon232 and Szwendacz are hosted on another homeserver (X) to your own (Y).
  • The two homeservers temporarily lost connection.
  • Connectivity was later restored.
  • Something triggered your homeserver Y to retrieve missed messages from the other homeserver X.
    If so, the UX hasn't done a good job of communicating this.

I also note that Element-Web and the mobile client (Element Android?) show a consistent timeline---which suggests that Synapse is presenting the same timeline via /sync to the two clients.

I don't fully follow what's happened in image 2 either. Are Michael and Administrator on the same homeserver as the client taking the screenshot?

After this happened a few times, in the Python room #python:matrix.org, the room became unstable on my end, and I was unable to send events to that room any more because of missing state events. I also wasn't able to remove the room with the admin API, so, I manually had to remove the room from the database to gain access again:

(Because I had to roll back a few times, while I was trying to figure out, how I can remove the room, I accidentally deleted the logs as well.)

That's very unfortunate. Please reach out to us if you end up in that situation; it's generally very unsafe to manually operate on Synapse's database (I hope you found all the tables that reference a room ID).

@DMRobertson DMRobertson added the X-Needs-Info This issue is blocked awaiting information from the reporter label Nov 24, 2022
@MichaelSasser
Copy link
Contributor Author

Before I get into detail, I want to thank you for your time and effort. I also wanted to mention that, I think, the issues "duplicates" and "ghost notifications" are somehow related and have the same origin. This is why I only created one issue.

I think if someone edits a message you are named or @nAmed in then you will hear an audible notification but not receive an obvious prompt as to why. I'm not sure if that explains what you've seen.

That was my first thought, too. But with the option to show hidden events in timeline and developer mode enabled, I should see a { "type": m.room.message } in the timeline, where I can see the event redacting the other, which wasn't the case.

If it happens again, please submit client logs (e.g. /rageshake on Element Web) and mention this issue in the description.

I should have done that, right. I'll remember it for the next time.

[I]t's not immediately obvious to me that something has gone wrong in image 1. What makes you say that the two messages from 10th October 2022 are duplicated, rather than appearing out of order.

"Duplicated" is probably the wrong word, as those events have the same event ids as those shown previously in the timeline. I used the term "duplicated" because they were shown multiple times at different locations in the timeline, at the same time.

bekon232 and Szwendacz are hosted on another homeserver (X) to your own (Y).

Yes, this is correct. bekon232 and Szwendacz are on the same homeserver, but not on mine. I don't think it should make a difference, but it is maybe worth mentioning, there are > 10k other users in the room from 524 different servers. The room is not bridged. As far as I know, these are the only messages bekon232 and Szwendacz created in any room, anyone of my users is in. "As far as I know" because I have retention enabled in the homeserver.yml, and I could only find their m.room.member state event and the m.room.message events, as seen in the screenshot.

The two homeservers temporarily lost connection.
Connectivity was later restored.

I haven't checked the logs specifically for their homeserver, but I do regular updates, and I have updated and restarted synapse during that process.

Something triggered your homeserver Y to retrieve missed messages from the other homeserver X. If so, the UX hasn't done a good job of communicating this.

I haven't looked into the details of federation on matrix yet, but I can say, the first occurrence of those messages in the timeline were fairly close to the origin_server_ts of the events. I also didn't see anything about missing events in the logs (at least not for the room in question) before I had issues with the room, which was weeks later.

I also note that Element-Web and the mobile client (Element Android?) show a consistent timeline---which suggests that Synapse is presenting the same timeline via /sync to the two clients.

Yes, the timeline seemed to be consistent between element-web, element-desktop and element-android (yes, the mobile client was element-android) and hydrogen. Even after clearing the cache on element-web and the mobile client, I could still see those messages in the timeline, at least in the case image 1 was taken. But I have also seen that those messages vanish after clearing the cache.

I don't fully follow what's happened in image 2 either. Are Michael and Administrator on the same homeserver as the client taking the screenshot?

I should have mentioned that. Michael (me) and Administrator (mjolnir, moderation bot) and the person who made the screenshot (also me, from the "Michael Account") are on the same homeserver. They are in another room with 6 more users, some from other homeservers.

Sunday, which was one or two days before the screenshot was made, Michael sent a message at 13:24 to the room: "mjolnir room remove [...]" and Administrator answered with the message: "⚠ | Failed to leave [...]". Almost fifty minutes later, I restarted the Administrator (mjolnir) and it sent four additional messages to the room at 14:09.
Around the time I took the screenshot, one or two days after, what I described and after a few more messages were sent to the room, the same events appeared again in the timeline. As before, they have the same event ids as the ones which are already in the room. Those "duplicates" in the timeline (same event ids) are in the rectangle with the red border marked as "1". For every message, a notification on element-android and on element-web (I haven't checked other clients) was created at 11:46 within a few milliseconds (and yes, notifications are set to "all messages" for that room). Thereafter, (also at 11:46) Administrator executed the "duplicated" command (the second rectangle with the red border, marked as "2").
Because the room is federated to/with matrix.org I was able to check the timeline there, which was consistent with what I saw on my server. This means, the original messages were there when they were sent on Sunday, and their "duplicates", with the same event ids, one or two days later. Even after clearing the cache on both ends.

That's very unfortunate. Please reach out to us if you end up in that situation; it's generally very unsafe to manually operate on Synapse's database (I hope you found all the tables that reference a room ID).

I think so. Everything is up and running again, and I could join the room again afterwards. Besides some missing prev_events everything looks alright. The room was stored in 55 tables, some tricky to find, as they don't have a room_id field, like event_relations, event_auth_chains and event_edges.
It actually would have been nice, though, if the Delete Room API was able to purge a room from the database, even though users couldn't leave.

@DMRobertson DMRobertson changed the title Random event duplication and ghost notifications An event was sent twice down /sync (and down application service txns?) consistently across clients; ghost notifications Nov 29, 2022
@DMRobertson
Copy link
Contributor

"As far as I know" because I have retention enabled in the homeserver.yml

That's an amber flag: retention is known to be buggy (e.g. element-hq/element-web#13476). Are you still seeing the duplicate event pattern today (even if sporadically?) If so I'd suggest disabling retention to see if that stops the problem.

Because the room is federated to/with matrix.org I was able to check the timeline there, which was consistent with what I saw on my server. This means, the original messages were there when they were sent on Sunday, and their "duplicates", with the same event ids, one or two days later.

That is extremely surprising (and somewhat alarming). Are you able to share the room ID so we can cross-reference with matrix.org's logs and database? I can be reached privately at @dmrobertson:matrix.org if you would prefer not to post it publicly.

It actually would have been nice, though, if the Delete Room API was able to purge a room from the database, even though users couldn't leave.

It has a force_purge option for this, IIRC.

@MichaelSasser
Copy link
Contributor Author

MichaelSasser commented Dec 1, 2022

That's an amber flag: retention is known to be buggy [...]

I've noticed minor issues pretty quickly after I turned it on a year ago or so.

If so I'd suggest disabling retention to see if that stops the problem.

I've thought about that, too. However, I first wanted to try out first whether the Purge History API could fix/mitigate the problem, like it usually does with retention related issues, which it actually did. Since I purged the room history down to the second last event, I haven't noticed any of the issues, I described.
Out of caution, I've disabled retention now.

Are you able to share the room ID so we can cross-reference with matrix.org's logs and database?

Sure. The room in image 1 is #python.matrix.org (!iuyQXswfjgxQMZGrfQ:matrix.org) (public, non-e2ee room) and the other room (image 2), is !ReeNuqUuyQhMKliZVe:michaelsasser.org (invite only, non-e2ee room). If you want to join the room, please don't hesitate to ask me @michael:michaelsasser.org, or use my matrix.org account @michaelsasser:matrix.org to /join yourself to the room. We have replaced it with a fresh room.

About the room in image 2: While going through the room history on matrix.org, I've noticed many out-of-order events, which were in order, when I verified the situation on before opening this issue. While, scrolling through the room history, one of the "duplicated" messages vanished after a few minutes. Now, I can only find the message in the original place (when the event was created), surrounded by out-of-order messages.

It has a force_purge option for this, IIRC.

Oh, right, it has... This is embarrassing now 😄. Either it was added after I've added the functionality to my CLI tool, or I've missed it twice in a row. Thanks for the hint!

@johndball
Copy link

johndball commented Dec 6, 2022

I've been following this thread since it was opened. Both of my servers are experiencing the same issue. I'm happy to share log data as needed.
In a nutshell, after all messages are read, the app (desktop, web, mobile) will still show unread notifications. Additionally, the notification count is way off. For example, one new message will show up as five unread messages and an active conversation with multiple messages will show in the thirties, forties, fifties, etc.

I do have retention enabled on both servers with server-specified policy and room policies.

retention:
  # The message retention policies feature is disabled by default. Uncomment the
  # following line to enable it.
  #
  enabled: true

Will continue to be a fly on the wall while y'all work through this thread.

@MadLittleMods
Copy link
Contributor

Related to #11447

@MichaelSasser
Copy link
Contributor Author

MichaelSasser commented Dec 27, 2022

I've handed in another /rageshake when I saw the following after opening element (web):

explorer_XLo46wFcav

After sending the /rageshake I've noticed that this probably has nothing to do with this issue.
The difference to this issue is:

  • I can't verify the same thing happened on matrix.org
  • I can't verify the same thing happened across different clients
  • the duplicated messages vanished after clearing the cache and reloading
  • I found a timeline pagination error in the console:
    Uncaught Error: Attemting to unpaginate 432 events, but only have 431 in the timeline
        at s.unpaginate (timeline-window.ts:339:19)
        at Object.onUnfillRequest (TimelinePanel.tsx:512:33)
        at ScrollPanel.tsx:467:17
    

I think the console logs are in the zip file. If not, I've saved them before reloading the client.


Room: #python.matrix.org (!iuyQXswfjgxQMZGrfQ:matrix.org; public, non-e2ee room)

@H-Shay H-Shay added A-Sync defects related to /sync T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. O-Occasional Affects or can be seen by some users regularly or most users rarely S-Minor Blocks non-critical functionality, workarounds exist. and removed X-Needs-Info This issue is blocked awaiting information from the reporter labels Feb 6, 2023
@ArrayBolt3
Copy link

ArrayBolt3 commented Feb 7, 2023

This problem seems to be getting worse as time goes by - I opened up a few rooms hosted on a servers affected by this issue (using the Element web app to open the rooms), and proceeded to get hit with approximately 30 notifications per chat message that I was receiving notifications for. And it was sending me notifications for quite a few chat messages even those these messages had been sent quite a bit ago, resulting in probably in excess of 200 notifications having to be closed. (And for some reason Chrome made me close every single notification manually or wait for them to disappear in small batches by themselves, but that's a Chrome problem or a problem with my setup.)

@MichaelSasser
Copy link
Contributor Author

Might be related to element-hq/element-web#897 and element-hq/element-meta#1491.

@colemickens
Copy link

This has gotten bad lately. I'm getting this very frequently when I'm mentioned (at least, in one room in particular, not sure if it's happening in all rooms yet)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Sync defects related to /sync O-Occasional Affects or can be seen by some users regularly or most users rarely S-Minor Blocks non-critical functionality, workarounds exist. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues.
Projects
None yet
Development

No branches or pull requests

7 participants