Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with metrics in bin/check-sqs-messages.rb #380

Open
yoliinyk opened this issue Apr 24, 2020 · 15 comments
Open

Issue with metrics in bin/check-sqs-messages.rb #380

yoliinyk opened this issue Apr 24, 2020 · 15 comments

Comments

@yoliinyk
Copy link

Hello.

I'm use bin/check-sqs-messages.rb and I would like monitoring my Q with metric name: ApproximateAgeOfOldestMessage.

I run check-sqs-messages.rb with next options:
/opt/sensu/embedded/bin/ruby check-sqs-messages.rb -r us-west-2 -q sqs_test_events_c4-dev_dlq -m ApproximateAgeOfOldestMessage -c 100
and I see output:
SQSMsgs OK: all queue(s): ["sqs_test_events_c4-dev_dlq"] are OK
, but really I have 350 000 messages in this Q metric ApproximateAgeOfOldestMessage

When I run without option -m ApproximateAgeOfOldestMessage - it is work correct:

/opt/sensu/embedded/bin/ruby check-sqs-messages.rb -r us-west-2 -q sqs_test_events_c4-dev_dlq -c 100 
SQSMsgs CRITICAL: 61223 message(s) in sqs_test_events_c4-dev_dlq

@majormoses can you please help investigate/fix this?
Thank you.

@yoliinyk
Copy link
Author

Any update @majormoses

@yoliinyk
Copy link
Author

yoliinyk commented May 6, 2020

Any update @majormoses ?

@yoliinyk
Copy link
Author

yoliinyk commented Jun 9, 2020

any update @majormoses ?

@majormoses
Copy link
Member

@yoliinyk hey I don't work for sensu, I maintain 500+ repositories across multiple orgs on my "free time" and as such I offer no response SLO/SLA.

To help me investigate I could use a bit more information:

  • what version do you see this with?
  • do you see this if you go back x versions? (regression?)
  • is this a first time setup or did it break upon upgrade?

@jnmullen
Copy link

I've just been trying to get the very same thing to work and finding it didn't.

It won't ever work as the SDK doesn't return an attribute called ApproximateAgeOfOldestMessage which is what the code will be looking for the in response.

The supported attribute names from the SDK are these :

attribute_names: ["All"], # accepts All, Policy, VisibilityTimeout, MaximumMessageSize, MessageRetentionPeriod, ApproximateNumberOfMessages, ApproximateNumberOfMessagesNotVisible, CreatedTimestamp, LastModifiedTimestamp, QueueArn, ApproximateNumberOfMessagesDelayed, DelaySeconds, ReceiveMessageWaitTimeSeconds, RedrivePolicy, FifoQueue, ContentBasedDeduplication, KmsMasterKeyId, KmsDataKeyReusePeriodSeconds

Which is taken from this page : https://docs.aws.amazon.com/sdk-for-ruby/v2/api/Aws/SQS/Client.html#get_queue_attributes-instance_method

@majormoses
Copy link
Member

majormoses commented Jun 17, 2020

I've just been trying to get the very same thing to work and finding it didn't.

It won't ever work as the SDK doesn't return an attribute called ApproximateAgeOfOldestMessage which is what the code will be looking for the in response.

The supported attribute names from the SDK are these :

attribute_names: ["All"], # accepts All, Policy, VisibilityTimeout, MaximumMessageSize, MessageRetentionPeriod, ApproximateNumberOfMessages, ApproximateNumberOfMessagesNotVisible, CreatedTimestamp, LastModifiedTimestamp, QueueArn, ApproximateNumberOfMessagesDelayed, DelaySeconds, ReceiveMessageWaitTimeSeconds, RedrivePolicy, FifoQueue, ContentBasedDeduplication, KmsMasterKeyId, KmsDataKeyReusePeriodSeconds

Which is taken from this page : https://docs.aws.amazon.com/sdk-for-ruby/v2/api/Aws/SQS/Client.html#get_queue_attributes-instance_method

Odd I know I wrote, tested, and consumed it; that was a number of years ago so I am trying to fill in the memory and/or knowledge gaps. I will also try to reach out to someone at my old employer and see if they have noticed the same (assuming they are still using this). To rule out other issues can you try going to 4.0.0 when the change was introduced and see the same result? https://github.com/sensu-plugins/sensu-plugins-aws/blob/master/CHANGELOG.md#400---2016-12-27

@majormoses
Copy link
Member

I am able to reproduce it, I am going to put in a hotfix that will surface false positives like this. I am still not sure why the api is not returning that value though.

@majormoses
Copy link
Member

I have opened #381 to solve the first problem (false positives and useful debug info) and will set us up for figuring out a proper fix for this. I suspect the reason is that the metric was exposed in cloudwatch and the console but not the particular call in question.

@majormoses
Copy link
Member

I am gonna see about getting CR on that and if needed I will self merge it.

A bit of an update: I have started some discussions around this inconsistent behavior I got an initial response back from the SDK team but I am pushing back on their assessment of it. Even if we don't get an sdk change I will still push for some documentation improvements as their explanation was inconsistent with some of the already exposed metrics.

@yoliinyk
Copy link
Author

Awesome, thank you.

@majormoses
Copy link
Member

Sorry this fell off my radar, I am gonna try some time next week and dig through my emails and see what/if I ever got a good response back from aws. Please ping me if I don't post something by mid next week.

@yoliinyk
Copy link
Author

yoliinyk commented Oct 8, 2020

@majormoses any update?

@yoliinyk
Copy link
Author

@majormoses any update?

@majormoses
Copy link
Member

Sorry I was heads down last week, I will reach out to our AWS rep and see if I can get an update on their end. I am not currently using SQS for anything at my org so the part 2 will likely need someone to jump in since there is no current clear path forward.

@james-mullen-itv
Copy link

So the change made in #381 breaks all my checks which are checking for number of messages in an SQS DLQ.

To me it looks like this bit of code is wrong :

if messages.attributes.key(config[:metric])

The key method expects the value to be passed in to return the key. I assume what you want to use in this situation is has_key? instead to check is the key within the attributes hash?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants