System completely unresponsive after role execution #145

alexlawrence · 2017-08-09T20:29:38Z

The role/playbook executes without any sign of error.

The system was even responsive for a few minutes after execution. Some minutes later the active SSH connection interrupted (timeout). After that the system seemed to be unable to process any incoming requests. SSH connection attempts, ping requests and also HTTP requests to Node.JS apps behind a nginx seem to load but run into timeouts.

When restarting the server it may be that the system is responsive again but this works only for some minutes. It then quickly becomes unresponsive again.

Any ideas what is happening in this case?
Do you need any additional information?

alexlawrence · 2017-08-10T06:48:14Z

Some additional information:

OS: Ubuntu 16 LTS (no GUI/desktop)
Custom packages: ZeroMQ, MongoDB, Node.js

speedmann · 2017-08-10T11:25:39Z

It looks like this role configures ufw which adds defaults to drop everything.
Check your firewall configuratin and allow needed ports

Edit: I now experience the same issue as you, and can say it is not the firewall which is causing this. I will have a closer look and let you know if I can find anything causing this

speedmann · 2017-08-10T12:39:52Z

Add: System stops responding after 10 minutes uptime and there are no log entries or anything generating heavy load
Ubuntu 16.04 with nothing else applied than this role and your ansible-ssh-hardening role

alexlawrence · 2017-08-10T12:53:17Z

I was about to add that while the ufw defaults were set I changed them later on and even verified that ufw itself was inactive.

boldandbusted · 2017-08-21T16:19:37Z

If this issue is confirmed, then this should be highlighted on the README.md... Anyone who wants to try this out (like me, in a new job), would be severely burned. :(

alexlawrence · 2017-08-21T19:34:36Z

Agree. It was a rather painful experience for me.

rndmh3ro · 2017-08-21T19:50:26Z

I'm still trying to reproduce this issue with vagrant and the default xenial box.
I think I somehow managed to do it with apt upgrade, but I cannot reproduce it anymore.

@alexlawrence, @speedmann do you see this with vagrant or a vm for that matter? If yes, can you provide your version of the box?

alexlawrence · 2017-08-21T21:43:24Z

@rndmh3ro This happened to me on a dedicated machine. I have re-installed the OS on the machine already. Therefore I cannot provide any further details.

speedmann · 2017-08-22T06:35:37Z

@rndmh3ro I had this issue on real hardware too. I'll try to reproduce it with a virtual system and will let you know if I can reproduce it there.

speedmann · 2017-08-22T08:17:55Z

Update: I executed this role on a virtual system and it did not break the system. It is still available and accessible.
To try to reproduce it with a hardware system, I first need to find a suitable hardware where I can get access to the console to see if there are errors.

rndmh3ro · 2017-08-22T08:58:25Z

Thanks to you both!

EDIT: Thanks @speedmann, I did not see your comment in time.

One more thing: It would be really cool if you could also test it with release 4.0.0? If that's not possible, it's totally fine, too!

alexlawrence · 2017-08-22T09:06:10Z

@rndmh3ro Sorry, I cannot do that currently. The only system available is used in production.

argami · 2017-11-04T20:37:02Z

Hi! first of all thank you for this awesome project.

I'm working with Debian Stretch. In my case i don't have that issue that the server freezes after a while is just when i reboot the server i can't login again, so im not sure if might be a different issue.

Anyhow version 4.0.0 works perfectly but 4.2.0 after i reboot the the server i can't log in again.

I will try the 4.1.0 and i will get back with more info.

argami · 2017-11-04T21:49:30Z

Ok i'm back and the version 4.1.0 also works well so i guess the bug was introduced after that. It will be nice to know if master also have the issue. I don'y have more time today to try this, but maybe after i finish the job im doing i will have some more time to look into this.

jokimaki · 2018-05-30T14:37:53Z

I'm considering using this role but this issue got me worried. Looking at https://github.com/dev-sec/ansible-os-hardening/compare/4.1.0...4.2.0?w=1 one significant change is that some filesystems are now disabled by default. This might explain the problem if your server is using one of them?

On UEFI-systems the boot-partition is FAT by default (see [here](https://wiki.archlinux.org/index.php/Unified_Extensible_Firmware_Interface/System_partition)). If we disable vfat, these systems become unbootable. This has already bitten some users using ansible-os-hardening (dev-sec/ansible-collection-hardening#162, dev-sec/ansible-collection-hardening#145). Therefore I propose we do not check for a disabled vfat filesystem as vfat is often used on newer systems.

On UEFI-systems the boot-partition is FAT by default (see [here](https://wiki.archlinux.org/index.php/Unified_Extensible_Firmware_Interface/System_partition)). If we disable vfat, these systems become unbootable. This has already bitten some users using ansible-os-hardening (dev-sec/ansible-collection-hardening#162, dev-sec/ansible-collection-hardening#145). Therefore I propose we do not check for a disabled vfat filesystem, if efi is used on these systems

rndmh3ro · 2018-09-02T17:28:01Z

Fixed by #190

Add support for Amazon Linux

rndmh3ro added the bug label Aug 21, 2017

rndmh3ro mentioned this issue Jul 1, 2018

Do not disable vfat by default dev-sec/linux-baseline#96

Merged

rndmh3ro closed this as completed Sep 2, 2018

rndmh3ro added a commit that referenced this issue Jul 24, 2020

Merge pull request #145 from woneill/amazon_linux

c1d5d16

Add support for Amazon Linux

divialth pushed a commit to divialth/ansible-collection-hardening that referenced this issue Aug 3, 2022

Merge pull request dev-sec#145 from woneill/amazon_linux

0d32eba

Add support for Amazon Linux

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

System completely unresponsive after role execution #145

System completely unresponsive after role execution #145

alexlawrence commented Aug 9, 2017

alexlawrence commented Aug 10, 2017

speedmann commented Aug 10, 2017 •

edited

Loading

speedmann commented Aug 10, 2017

alexlawrence commented Aug 10, 2017 •

edited

Loading

boldandbusted commented Aug 21, 2017

alexlawrence commented Aug 21, 2017

rndmh3ro commented Aug 21, 2017

alexlawrence commented Aug 21, 2017 •

edited

Loading

speedmann commented Aug 22, 2017

speedmann commented Aug 22, 2017

rndmh3ro commented Aug 22, 2017 •

edited

Loading

alexlawrence commented Aug 22, 2017

argami commented Nov 4, 2017

argami commented Nov 4, 2017

jokimaki commented May 30, 2018

rndmh3ro commented Sep 2, 2018

System completely unresponsive after role execution #145

System completely unresponsive after role execution #145

Comments

alexlawrence commented Aug 9, 2017

alexlawrence commented Aug 10, 2017

speedmann commented Aug 10, 2017 • edited Loading

speedmann commented Aug 10, 2017

alexlawrence commented Aug 10, 2017 • edited Loading

boldandbusted commented Aug 21, 2017

alexlawrence commented Aug 21, 2017

rndmh3ro commented Aug 21, 2017

alexlawrence commented Aug 21, 2017 • edited Loading

speedmann commented Aug 22, 2017

speedmann commented Aug 22, 2017

rndmh3ro commented Aug 22, 2017 • edited Loading

alexlawrence commented Aug 22, 2017

argami commented Nov 4, 2017

argami commented Nov 4, 2017

jokimaki commented May 30, 2018

rndmh3ro commented Sep 2, 2018

speedmann commented Aug 10, 2017 •

edited

Loading

alexlawrence commented Aug 10, 2017 •

edited

Loading

alexlawrence commented Aug 21, 2017 •

edited

Loading

rndmh3ro commented Aug 22, 2017 •

edited

Loading