Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System completely unresponsive after role execution #145

Closed
alexlawrence opened this issue Aug 9, 2017 · 16 comments
Closed

System completely unresponsive after role execution #145

alexlawrence opened this issue Aug 9, 2017 · 16 comments
Labels

Comments

@alexlawrence
Copy link

The role/playbook executes without any sign of error.

The system was even responsive for a few minutes after execution. Some minutes later the active SSH connection interrupted (timeout). After that the system seemed to be unable to process any incoming requests. SSH connection attempts, ping requests and also HTTP requests to Node.JS apps behind a nginx seem to load but run into timeouts.

When restarting the server it may be that the system is responsive again but this works only for some minutes. It then quickly becomes unresponsive again.

Any ideas what is happening in this case?
Do you need any additional information?

@alexlawrence
Copy link
Author

Some additional information:

  • OS: Ubuntu 16 LTS (no GUI/desktop)
  • Custom packages: ZeroMQ, MongoDB, Node.js

@speedmann
Copy link

speedmann commented Aug 10, 2017

It looks like this role configures ufw which adds defaults to drop everything.
Check your firewall configuratin and allow needed ports

Edit: I now experience the same issue as you, and can say it is not the firewall which is causing this. I will have a closer look and let you know if I can find anything causing this

@speedmann
Copy link

Add: System stops responding after 10 minutes uptime and there are no log entries or anything generating heavy load
Ubuntu 16.04 with nothing else applied than this role and your ansible-ssh-hardening role

@alexlawrence
Copy link
Author

alexlawrence commented Aug 10, 2017

I was about to add that while the ufw defaults were set I changed them later on and even verified that ufw itself was inactive.

@boldandbusted
Copy link

If this issue is confirmed, then this should be highlighted on the README.md... Anyone who wants to try this out (like me, in a new job), would be severely burned. :(

@rndmh3ro rndmh3ro added the bug label Aug 21, 2017
@alexlawrence
Copy link
Author

Agree. It was a rather painful experience for me.

@rndmh3ro
Copy link
Member

I'm still trying to reproduce this issue with vagrant and the default xenial box.
I think I somehow managed to do it with apt upgrade, but I cannot reproduce it anymore.

@alexlawrence, @speedmann do you see this with vagrant or a vm for that matter? If yes, can you provide your version of the box?

@alexlawrence
Copy link
Author

alexlawrence commented Aug 21, 2017

@rndmh3ro This happened to me on a dedicated machine. I have re-installed the OS on the machine already. Therefore I cannot provide any further details.

@speedmann
Copy link

@rndmh3ro I had this issue on real hardware too. I'll try to reproduce it with a virtual system and will let you know if I can reproduce it there.

@speedmann
Copy link

Update: I executed this role on a virtual system and it did not break the system. It is still available and accessible.
To try to reproduce it with a hardware system, I first need to find a suitable hardware where I can get access to the console to see if there are errors.

@rndmh3ro
Copy link
Member

rndmh3ro commented Aug 22, 2017

Thanks to you both!

EDIT: Thanks @speedmann, I did not see your comment in time.

One more thing: It would be really cool if you could also test it with release 4.0.0? If that's not possible, it's totally fine, too!

@alexlawrence
Copy link
Author

@rndmh3ro Sorry, I cannot do that currently. The only system available is used in production.

@argami
Copy link

argami commented Nov 4, 2017

Hi! first of all thank you for this awesome project.

I'm working with Debian Stretch. In my case i don't have that issue that the server freezes after a while is just when i reboot the server i can't login again, so im not sure if might be a different issue.

Anyhow version 4.0.0 works perfectly but 4.2.0 after i reboot the the server i can't log in again.

I will try the 4.1.0 and i will get back with more info.

@argami
Copy link

argami commented Nov 4, 2017

Ok i'm back and the version 4.1.0 also works well so i guess the bug was introduced after that. It will be nice to know if master also have the issue. I don'y have more time today to try this, but maybe after i finish the job im doing i will have some more time to look into this.

@jokimaki
Copy link

I'm considering using this role but this issue got me worried. Looking at https://github.com/dev-sec/ansible-os-hardening/compare/4.1.0...4.2.0?w=1 one significant change is that some filesystems are now disabled by default. This might explain the problem if your server is using one of them?

rndmh3ro added a commit to rndmh3ro/linux-baseline that referenced this issue Jul 1, 2018
On UEFI-systems the boot-partition is FAT by default (see [here](https://wiki.archlinux.org/index.php/Unified_Extensible_Firmware_Interface/System_partition)).

If we disable vfat, these systems become unbootable. This has already bitten some users using ansible-os-hardening (dev-sec/ansible-collection-hardening#162, dev-sec/ansible-collection-hardening#145).

Therefore I propose we do not check for a disabled vfat filesystem as vfat is often used on newer systems.
rndmh3ro added a commit to rndmh3ro/linux-baseline that referenced this issue Jul 10, 2018
On UEFI-systems the boot-partition is FAT by default (see [here](https://wiki.archlinux.org/index.php/Unified_Extensible_Firmware_Interface/System_partition)).

If we disable vfat, these systems become unbootable. This has already bitten some users using ansible-os-hardening (dev-sec/ansible-collection-hardening#162, dev-sec/ansible-collection-hardening#145).

Therefore I propose we do not check for a disabled vfat filesystem, if efi is used on these systems
@rndmh3ro
Copy link
Member

rndmh3ro commented Sep 2, 2018

Fixed by #190

@rndmh3ro rndmh3ro closed this as completed Sep 2, 2018
rndmh3ro added a commit that referenced this issue Jul 24, 2020
Add support for Amazon Linux
divialth pushed a commit to divialth/ansible-collection-hardening that referenced this issue Aug 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants