-
Notifications
You must be signed in to change notification settings - Fork 739
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
System completely unresponsive after role execution #145
Comments
Some additional information:
|
It looks like this role configures ufw which adds defaults to drop everything. Edit: I now experience the same issue as you, and can say it is not the firewall which is causing this. I will have a closer look and let you know if I can find anything causing this |
Add: System stops responding after 10 minutes uptime and there are no log entries or anything generating heavy load |
I was about to add that while the ufw defaults were set I changed them later on and even verified that ufw itself was inactive. |
If this issue is confirmed, then this should be highlighted on the README.md... Anyone who wants to try this out (like me, in a new job), would be severely burned. :( |
Agree. It was a rather painful experience for me. |
I'm still trying to reproduce this issue with vagrant and the default xenial box. @alexlawrence, @speedmann do you see this with vagrant or a vm for that matter? If yes, can you provide your version of the box? |
@rndmh3ro This happened to me on a dedicated machine. I have re-installed the OS on the machine already. Therefore I cannot provide any further details. |
@rndmh3ro I had this issue on real hardware too. I'll try to reproduce it with a virtual system and will let you know if I can reproduce it there. |
Update: I executed this role on a virtual system and it did not break the system. It is still available and accessible. |
Thanks to you both! EDIT: Thanks @speedmann, I did not see your comment in time. One more thing: It would be really cool if you could also test it with release 4.0.0? If that's not possible, it's totally fine, too! |
@rndmh3ro Sorry, I cannot do that currently. The only system available is used in production. |
Hi! first of all thank you for this awesome project. I'm working with Debian Stretch. In my case i don't have that issue that the server freezes after a while is just when i reboot the server i can't login again, so im not sure if might be a different issue. Anyhow version 4.0.0 works perfectly but 4.2.0 after i reboot the the server i can't log in again. I will try the 4.1.0 and i will get back with more info. |
Ok i'm back and the version 4.1.0 also works well so i guess the bug was introduced after that. It will be nice to know if master also have the issue. I don'y have more time today to try this, but maybe after i finish the job im doing i will have some more time to look into this. |
I'm considering using this role but this issue got me worried. Looking at https://github.com/dev-sec/ansible-os-hardening/compare/4.1.0...4.2.0?w=1 one significant change is that some filesystems are now disabled by default. This might explain the problem if your server is using one of them? |
On UEFI-systems the boot-partition is FAT by default (see [here](https://wiki.archlinux.org/index.php/Unified_Extensible_Firmware_Interface/System_partition)). If we disable vfat, these systems become unbootable. This has already bitten some users using ansible-os-hardening (dev-sec/ansible-collection-hardening#162, dev-sec/ansible-collection-hardening#145). Therefore I propose we do not check for a disabled vfat filesystem as vfat is often used on newer systems.
On UEFI-systems the boot-partition is FAT by default (see [here](https://wiki.archlinux.org/index.php/Unified_Extensible_Firmware_Interface/System_partition)). If we disable vfat, these systems become unbootable. This has already bitten some users using ansible-os-hardening (dev-sec/ansible-collection-hardening#162, dev-sec/ansible-collection-hardening#145). Therefore I propose we do not check for a disabled vfat filesystem, if efi is used on these systems
Fixed by #190 |
Add support for Amazon Linux
The role/playbook executes without any sign of error.
The system was even responsive for a few minutes after execution. Some minutes later the active SSH connection interrupted (timeout). After that the system seemed to be unable to process any incoming requests. SSH connection attempts, ping requests and also HTTP requests to Node.JS apps behind a nginx seem to load but run into timeouts.
When restarting the server it may be that the system is responsive again but this works only for some minutes. It then quickly becomes unresponsive again.
Any ideas what is happening in this case?
Do you need any additional information?
The text was updated successfully, but these errors were encountered: