Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autopause functionality #531

Merged
merged 30 commits into from
May 20, 2020
Merged
Show file tree
Hide file tree
Changes from 29 commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
2e2c883
set enviroment and packages
Oekn5w May 14, 2020
e77fae6
add dependant files in docker
Oekn5w May 14, 2020
51d73a5
add files
Oekn5w May 14, 2020
e52ec31
set up autopause functionality
Oekn5w May 14, 2020
6ce10c3
env variables refinement
Oekn5w May 14, 2020
1606223
define functions and states scaffolding
Oekn5w May 14, 2020
424651c
rename resume script
Oekn5w May 15, 2020
fb74035
compress pause script
Oekn5w May 15, 2020
6a4807a
build state machine
Oekn5w May 15, 2020
b7f1a25
update environment variables
Oekn5w May 15, 2020
dcc6a71
move functions to separate file
Oekn5w May 15, 2020
bafd8e3
fix function call and output states to the terminal
Oekn5w May 15, 2020
1a1eaa3
check for knockd failure
Oekn5w May 15, 2020
6b0e663
add static max-tick check
Oekn5w May 15, 2020
6a3cdfc
update knockd server port dynamically
Oekn5w May 15, 2020
3193ffa
fix returns
Oekn5w May 16, 2020
8830498
fix bash syntax
Oekn5w May 16, 2020
6652add
add max-tick-time check
Oekn5w May 16, 2020
5e8b5f1
rename period variable
Oekn5w May 16, 2020
092258c
add readme entry
Oekn5w May 16, 2020
356c103
fixes: whitespace and return values
Oekn5w May 16, 2020
a550427
exclude local connections as valid clients
Oekn5w May 16, 2020
65c5fb2
also listen for rcon queries to resume server
Oekn5w May 16, 2020
9bf3d42
save via rcon before suspending process
Oekn5w May 16, 2020
801b53d
clarify network_mode restriction
Oekn5w May 18, 2020
9ca44d0
wait for mc-monitor to return before pausing
Oekn5w May 18, 2020
df6ff76
add healthcheck wrapper script
Oekn5w May 18, 2020
79d5e5b
typos and formatting
Oekn5w May 19, 2020
31fc35e
adaption acc to review
Oekn5w May 19, 2020
053b0eb
add new env var for startup timeout
Oekn5w May 19, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 15 additions & 6 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -15,15 +15,17 @@ RUN apk add --no-cache -U \
mysql-client \
tzdata \
rsync \
nano

HEALTHCHECK --start-period=1m CMD mc-monitor status --host localhost --port $SERVER_PORT
nano \
sudo \
knock

RUN addgroup -g 1000 minecraft \
&& adduser -Ss /bin/false -u 1000 -G minecraft -h /home/minecraft minecraft \
&& mkdir -m 777 /data \
&& chown minecraft:minecraft /data /home/minecraft

COPY files/sudoers* /etc/sudoers.d

EXPOSE 25565 25575

# hook into docker BuildKit --platform support
Expand Down Expand Up @@ -63,14 +65,21 @@ COPY server.properties /tmp/server.properties
COPY log4j2.xml /tmp/log4j2.xml
WORKDIR /data

ENTRYPOINT [ "/start" ]

ENV UID=1000 GID=1000 \
JVM_XX_OPTS="-XX:+UseG1GC" MEMORY="1G" \
TYPE=VANILLA VERSION=LATEST FORGEVERSION=RECOMMENDED SPONGEBRANCH=STABLE SPONGEVERSION= FABRICVERSION=LATEST LEVEL=world \
PVP=true DIFFICULTY=easy ENABLE_RCON=true RCON_PORT=25575 RCON_PASSWORD=minecraft \
LEVEL_TYPE=DEFAULT SERVER_PORT=25565 ONLINE_MODE=TRUE SERVER_NAME="Dedicated Server" \
REPLACE_ENV_VARIABLES="FALSE" ENV_VARIABLE_PREFIX="CFG_"
REPLACE_ENV_VARIABLES="FALSE" ENV_VARIABLE_PREFIX="CFG_" \
ENABLE_AUTOPAUSE=false AUTOPAUSE_TIMEOUT_EST=3600 AUTOPAUSE_TIMEOUT_KN=120 AUTOPAUSE_PERIOD=10

COPY start* /
COPY health.sh /
ADD files/autopause /autopause

RUN dos2unix /start* && chmod +x /start*
RUN dos2unix /health.sh && chmod +x /health.sh
RUN dos2unix /autopause/* && chmod +x /autopause/*.sh

ENTRYPOINT [ "/start" ]
HEALTHCHECK --start-period=1m CMD /health.sh
29 changes: 29 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -179,6 +179,35 @@ You can also query the container's health in a script friendly way:
healthy
```

## Autopause

### Description

> There are various bug reports on [Mojang](https://bugs.mojang.com) about high CPU usage of servers with newer versions, even with few or no clients connected (e.g. [this one](https://bugs.mojang.com/browse/MC-149018), in fact the functionality is based on [this comment in the thread](https://bugs.mojang.com/browse/MC-149018?focusedCommentId=593606&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-593606)).

An autopause functionality has been added to this image to monitor whether clients are connected to the server. If for a specified time no client is connected, the Java process is stopped. When knocking on the server port (e.g. by the ingame Multiplayer server overview), the process is resumed. The experience for the client does not change.

Of course, even loaded chunks are not ticked when the process is stopped.

From the server's point of view, the pausing causes a single tick to take as long as the process is stopped, so the server watchdog might intervene after the process is continued, possibly forcing a container restart. To prevent this, ensure that the `max-tick-time` in the `server.properties` file is set correctly.

On startup the `server.properties` file is checked and, if applicable, a warning is printed to the terminal. When the server is created (no data available in the persistent directory), the properties file is created with the Watchdog disabled.

The autopause functionality is not compatible with docker's host network_mode, as the `knockd` utility cannot properly listen for connections in that mode.

### Enabling Autopause

Enable the Autopause functionality by setting:

```
-e ENABLE_AUTOPAUSE=TRUE
```

There are 3 more environment variables that define the behaviour:
* `AUTOPAUSE_TIMEOUT_EST`, default `3600` (seconds); describes the time between the last client disconnect and the pausing of the process (read as timeout established)
* `AUTOPAUSE_TIMEOUT_KN`, default `120` (seconds); describes the time knocking of the port (e.g. by the main menu ping) and the pausing of the process, when no client connects inbetween (read as timeout knocked)
* `AUTOPAUSE_PERIOD`, default `10` (seconds); describes period of the daemonized state machine, that handles the pausing of the process (resuming is done independently)

## Deployment Templates and Examples

### Helm Charts
Expand Down
90 changes: 90 additions & 0 deletions files/autopause/autopause-daemon.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
#!/bin/bash

exec 1>/tmp/terminal-mc
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious what this is doing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

redirects stdout to the terminal, that is logged by docker, as this script is called detached


. /autopause/autopause-fcns.sh

sudo /usr/sbin/knockd -c /autopause/knockd-config.cfg -d
if [ $? -ne 0 ] ; then
while :
do
if [[ -n $(ps -o comm | grep java) ]] ; then
break
fi
sleep 0.1
done
echo "[Autopause loop] Failed to start knockd daemon."
echo "[Autopause loop] Possible cause: docker's host network mode."
echo "[Autopause loop] Recreate without host mode or disable autopause functionality."
echo "[Autopause loop] Stopping server."
killall -SIGTERM java
exit 1
fi

STATE=K
TIME_THRESH=$(($(current_uptime)+$AUTOPAUSE_TIMEOUT_KN))

while :
do
case X$STATE in
XK)
# Knocked
if java_clients_connected ; then
echo "[Autopause loop] Client connected - waiting for disconnect"
STATE=E
else
if [[ $(current_uptime) -ge $TIME_THRESH ]] ; then
echo "[Autopause loop] No client connected since startup - stopping"
/autopause/pause.sh
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This scenario seemed a bit too "aggressive". I would start up a fresh server (which can take 60+ seconds) and almost immediately it would pause:

[01:25:14] [Server thread/INFO]: Time elapsed: 22734 ms
[01:25:14] [Server thread/INFO]: Done (65.333s)! For help, type "help"
[01:25:14] [Server thread/INFO]: Starting remote control listener
[01:25:14] [RCON Listener #1/INFO]: RCON running on 0.0.0.0:25575
[Autopause loop] No client connected since startup - stopping
[01:25:54] [RCON Listener #1/INFO]: Rcon connection from: /127.0.0.1
[01:25:55] [Server thread/INFO]: [Rcon: Saved the game]
[2020-05-19T01:25:55+0000] [Autopause] Pausing Java process

Would it make sense to have an extra allowance for this startup time threshold? Or would increasing the default for AUTOPAUSE_TIMEOUT_KN be sufficient?

Copy link

@CurlyFlow CurlyFlow May 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about check for a specific file before enabling auto pause, like level.dat or something, should only be there when map is finish generated?

or parse log for [Server thread/INFO]: Done

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, parsing the log in the data directory is possible, but we'd have to parse for something in every version. Level.dat always exists except for worldgen. The port is being listened to since the log entry, so probably after mod initialization, but definitely before world generation / loading. We could do something like 10min or 4*Knocked, whichever is higher, after port listening. I kind of want to avoid another environment variable and testing the versions. The knocked TO is mainly for the server being queried but not connected to.

Copy link

@CurlyFlow CurlyFlow May 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can only say for vanilla
[Server thread/INFO]: Done
and forge
[Server thread/INFO]: Done

I dont like fixed timeouts because for very slow or very fast hosts its not good... (not that i think that its that big of a problem anyway)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eihns I was thinking more in the Minecraft versions, e.g. Vanilla 1.3 only prints:

        at net.minecraft.server.i.run(SourceFile:539)
2020-05-19 12:24:13 [INFO] Done (3.315s)! For help, type "help" or "?"
2020-05-19 12:24:13 [INFO] Starting remote control listener
2020-05-19 12:24:13 [INFO] RCON running on 0.0.0.0:25575
[Autopause loop] No client connected since startup - stopping

which would fail your match, whereas my forge 1.7.10 prints (when searching for Done):

[12:36:14] [Server thread/INFO]: Done.
[12:36:24][FINE][noppes.npcs.controllers.LinkedNpcController:63] Done loading Linked Npcs
[12:36:30] [Server thread/INFO]: Done (6.053s)! For help, type "help" or "?"
[12:36:31][FINE][noppes.npcs.controllers.DialogController:34] Done loading Dialogs

Copy link

@CurlyFlow CurlyFlow May 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dont know what you mean, both print some sort of done when finish? Cant we just search for * done * ? - even if it starts on a done some seconds too early, better then hard coding, i guess?! Bc whatever happens the world will be generated at this point...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opted for the new environment variable, specifying TO after the server starts listening

STATE=S
fi
fi
;;
XE)
# Established
if ! java_clients_connected ; then
TIME_THRESH=$(($(current_uptime)+$AUTOPAUSE_TIMEOUT_EST))
echo "[Autopause loop] All clients disconnected - stopping in $AUTOPAUSE_TIMEOUT_EST seconds"
STATE=I
fi
;;
XI)
# Idle
if java_clients_connected ; then
echo "[Autopause loop] Client reconnected - waiting for disconnect"
STATE=E
else
if [[ $(current_uptime) -ge $TIME_THRESH ]] ; then
echo "[Autopause loop] No client reconnected - stopping"
/autopause/pause.sh
STATE=S
fi
fi
;;
XS)
# Stopped
if rcon_client_exists ; then
/autopause/resume.sh
fi
if java_running ; then
if java_clients_connected ; then
echo "[Autopause loop] Client connected - waiting for disconnect"
STATE=E
else
TIME_THRESH=$(($(current_uptime)+$AUTOPAUSE_TIMEOUT_KN))
echo "[Autopause loop] Server was knocked - waiting for clients or timeout"
STATE=K
fi
fi
;;
*)
echo "[Autopause loop] Error: invalid state: $STATE"
;;
esac
if [[ "$STATE" == "S" ]] ; then
# before rcon times out
sleep 2
else
sleep $AUTOPAUSE_PERIOD
fi
done
34 changes: 34 additions & 0 deletions files/autopause/autopause-fcns.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
#!/bin/bash

current_uptime() {
echo $(awk '{print $1}' /proc/uptime | cut -d . -f 1)
}

java_running() {
[[ $( ps -a -o stat,comm | grep 'java' | awk '{ print $1 }') =~ ^S.*$ ]]
}

rcon_client_exists() {
[[ -n "$(ps -a -o comm | grep 'rcon-cli')" ]]
}

java_clients_connected() {
local connections
connections=$(netstat -tn | grep ":$SERVER_PORT" | grep ESTABLISHED)
if [[ -z "$connections" ]] ; then
return 1
fi
IFS=$'\n'
connections=($connections)
unset IFS
# check that at least one external address is not localhost
# remember, that the host network mode does not work with autopause because of the knockd utility
for (( i=0; i<${#connections[@]}; i++ ))
do
if [[ ! $(echo "${connections[$i]}" | awk '{print $5}') =~ ^\s*127\.0\.0\.1:.*$ ]] ; then
# not localhost
return 0
fi
done
return 1
}
12 changes: 12 additions & 0 deletions files/autopause/knockd-config.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
[options]
logfile = /dev/null
[unpauseMCServer-server]
sequence = 25565
seq_timeout = 1
command = /sbin/su-exec minecraft:minecraft /autopause/resume.sh
tcpflags = syn
[unpauseMCServer-rcon]
sequence = 25575
seq_timeout = 1
command = /sbin/su-exec minecraft:minecraft /autopause/resume.sh
tcpflags = syn
19 changes: 19 additions & 0 deletions files/autopause/pause.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
#!/bin/bash

if [[ $( ps -a -o stat,comm | grep 'java' | awk '{ print $1 }') =~ ^S.*$ ]] ; then
# save world
rcon-cli save-all >/dev/null

# wait until mc-monitor is no longer connected to the server
while :
do
if [[ -z "$(netstat -nt | grep "127.0.0.1:$SERVER_PORT" | grep 'ESTABLISHED')" ]]; then
break
fi
sleep 0.1
done

# finally pause the process
echo "[$(date -Iseconds)] [Autopause] Pausing Java process" >/tmp/terminal-mc
killall -q -STOP java
fi
6 changes: 6 additions & 0 deletions files/autopause/resume.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
#!/bin/bash

if [[ $( ps -a -o stat,comm | grep 'java' | awk '{ print $1 }') =~ ^T.*$ ]] ; then
echo "[$(date -Iseconds)] [Autopause] Knocked, resuming Java process" >/tmp/terminal-mc
killall -q -CONT java
fi
2 changes: 2 additions & 0 deletions files/sudoers-mc
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
%minecraft ALL=(ALL) NOPASSWD:/usr/bin/killall
%minecraft ALL=(ALL) NOPASSWD:/usr/sbin/knockd
11 changes: 11 additions & 0 deletions health.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
#!/bin/bash

. /start-utils

if isTrue "${ENABLE_AUTOPAUSE}" && [[ "$( ps -a -o stat,comm | grep 'java' | awk '{ print $1 }')" =~ ^T.*$ ]]; then
echo "Java process suspended by Autopause function"
exit 0
else
mc-monitor status --host localhost --port $SERVER_PORT
exit $?
fi
8 changes: 8 additions & 0 deletions start
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,14 @@ if [ $(id -u) = 0 ]; then
chown -R ${runAsUser}:${runAsGroup} /data
fi

if [[ $(stat -c "%u" /autopause) != $UID ]]; then
log "Changing ownership of /autopause to $UID ..."
chown -R ${runAsUser}:${runAsGroup} /autopause
fi

ln -fs $(tty) /tmp/terminal-mc
chmod 777 /tmp/terminal-mc

if [[ ${SKIP_NSSWITCH_CONF^^} != TRUE ]]; then
echo 'hosts: files dns' > /etc/nsswitch.conf
fi
Expand Down
54 changes: 54 additions & 0 deletions start-configuration
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,60 @@ cd /data || exit 1

export ORIGINAL_TYPE=${TYPE^^}

if isTrue "${ENABLE_AUTOPAUSE}"; then
log "Autopause functionality enabled"

# update server port to listen to
regseq="^\s*sequence\s*=\s*$SERVER_PORT\s*$"
linenum=$(grep -nm1 sequence /autopause/knockd-config.cfg | cut -d : -f 1 | tail -n1)
if ! [[ $(awk "NR==$linenum" /autopause/knockd-config.cfg) =~ $regseq ]]; then
sed -i "${linenum}s/sequence.*/sequence = $SERVER_PORT/" /autopause/knockd-config.cfg
log "Updated server port in knockd config"
fi
# update rcon port to listen to
regseq="^\s*sequence\s*=\s*$RCON_PORT\s*$"
linenum=$(grep -nm2 sequence /autopause/knockd-config.cfg | cut -d : -f 1 | tail -n1)
if ! [[ $(awk "NR==$linenum" /autopause/knockd-config.cfg) =~ $regseq ]]; then
sed -i "${linenum}s/sequence.*/sequence = $RCON_PORT/" /autopause/knockd-config.cfg
log "Updated rcon port in knockd config"
fi

if ! [[ $AUTOPAUSE_PERIOD =~ ^[0-9]+$ ]]; then
AUTOPAUSE_PERIOD=10
export AUTOPAUSE_PERIOD
log "Warning: AUTOPAUSE_PERIOD is not numeric, set to 10 (seconds)"
fi
if [ "$AUTOPAUSE_PERIOD" -eq "0" ] ; then
AUTOPAUSE_PERIOD=10
export AUTOPAUSE_PERIOD
log "Warning: AUTOPAUSE_PERIOD must not be 0, set to 10 (seconds)"
fi
if ! [[ $AUTOPAUSE_TIMEOUT_KN =~ ^[0-9]+$ ]] ; then
AUTOPAUSE_TIMEOUT_KN=120
export AUTOPAUSE_TIMEOUT_KN
log "Warning: AUTOPAUSE_TIMEOUT_KN is not numeric, set to 120 (seconds)"
fi
if ! [[ $AUTOPAUSE_TIMEOUT_EST =~ ^[0-9]+$ ]] ; then
AUTOPAUSE_TIMEOUT_EST=3600
export AUTOPAUSE_TIMEOUT_EST
log "Warning: AUTOPAUSE_TIMEOUT_EST is not numeric, set to 3600 (seconds)"
fi

if [[ -n $MAX_TICK_TIME ]] ; then
log "Warning: MAX_TICK_TIME is non-default, for autopause to work properly, this check should be disabled (-1 for versions >= 1.8.1)"
else
if versionLessThan 1.8.1; then
# 10 years
MAX_TICK_TIME=315360000000
else
MAX_TICK_TIME=-1
fi
export MAX_TICK_TIME
fi

/autopause/autopause-daemon.sh &
fi

log "Resolving type given ${TYPE}"
case "${TYPE^^}" in
*BUKKIT|SPIGOT)
Expand Down
9 changes: 9 additions & 0 deletions start-finalSetup04ServerProperties
Original file line number Diff line number Diff line change
Expand Up @@ -180,4 +180,13 @@ else
log "server.properties already created, skipping"
fi

if isTrue "${ENABLE_AUTOPAUSE}"; then
current_max_tick=$( grep 'max-tick-time' "$SERVER_PROPERTIES" | sed -r 's/( )+//g' | awk -F= '{print $2}' )
if (( $current_max_tick > 0 && $current_max_tick < 86400000 )); then
log "Warning: The server.properties for the server doesn't have the Server Watchdog (effectively) disabled."
log "Warning (cont): Autopause functionality resuming the process might trigger the Watchdog and restart the server completely."
log "Warning (cont): Set the max-tick-time property to a high value (or disable the Watchdog with value -1 for versions 1.8.1+)."
fi
fi

exec /start-finalSetup05EnvVariables $@
18 changes: 14 additions & 4 deletions start-utils
Original file line number Diff line number Diff line change
Expand Up @@ -75,10 +75,20 @@ function versionLessThan {
return 1
fi

if (( activeParts[0] < givenParts[0] )) || \
(( activeParts[0] == givenParts[0] && activeParts[1] < givenParts[1] )); then
return 0
if (( ${#activeParts[@]} == 2 )); then
if (( activeParts[0] < givenParts[0] )) || \
(( activeParts[0] == givenParts[0] && activeParts[1] < givenParts[1] )); then
return 0
else
return 1
fi
else
return 1
if (( activeParts[0] < givenParts[0] )) || \
(( activeParts[0] == givenParts[0] && activeParts[1] < givenParts[1] )) || \
(( activeParts[0] == givenParts[0] && activeParts[1] == givenParts[1] && activeParts[2] < givenParts[2] )); then
return 0
else
return 1
fi
fi
}