-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/browser db #29
Conversation
…rd_monitor into feature/browser-db
…s Browser behavior in tests
I noticed a regression when re-opening a browser after closing the GTK window. For some reason, the monitor doesn't start a new browser process. I'm not sure why the tests didn't catch it, I'll have to investigate that. |
I have fixed the issue with restarting browsers. It turned out to be a missing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely amazing!
Very elegant solutions whereever I look. So much to learn here.
We should reuse some of the native/Docker recipes for ocrd_network deployers. Good idea to simply try the next free port in range.
Also fascinating: the @asynccontextmanager
and FastAPI lifespan
.
Unfortunately, I met some problems when testing that I cannot immediately explain:
When doing make test
, initially (in the first run), the native browser deployer had two failures:
FAILED tests/ocrdbrowser/test_browser_launch.py::test__launching_on_an_allocated_port__raises_unavailable_port_error[SubProcessOcrdBrowserFactory]
FAILED tests/ocrdbrowser/test_browser_launch.py::test__one_port_allocated__launches_on_next_available[SubProcessOcrdBrowserFactory]
When I reran there were no failures anymore.
Next, when running as part of ocrd_kitodo (with additional services ocrd-database
and mongo-express
, and new volume db-volume
, and additional key DB_ROOT_USER
in .env), spinning up a browser for a workspace led to the following failure:
INFO: 172.16.4.243:33918 - "GET /workspaces/open/testdata-presentation HTTP/1.1" 200 OK
INFO: 172.16.4.243:33918 - "GET /workspaces/browse/testdata-presentation HTTP/1.1" 200 OK
INFO: 172.16.4.243:33918 - "GET /workspaces/ping/testdata-presentation HTTP/1.1" 200 OK
INFO: 172.16.4.243:33918 - "GET /workspaces/view/testdata-presentation/ HTTP/1.1" 200 OK
INFO: 172.16.4.243:33918 - "GET /workspaces/view/testdata-presentation/broadway.js HTTP/1.1" 307 Temporary Redirect
INFO: 172.16.4.243:33918 - "GET /workspaces/view/testdata-presentation/broadway.js/ HTTP/1.1" 200 OK
INFO: ('172.16.4.243', 33932) - "WebSocket /workspaces/view/testdata-presentation/socket" [accepted]
INFO: connection open
WARNING:root:Could not find process with ID 23
INFO: connection closed
Then, clicking retry
I end up with the following infinite loop:
INFO: 172.16.4.243:45862 - "GET /workspaces/open/testdata-presentation HTTP/1.1" 200 OK
INFO: 172.16.4.243:45862 - "GET /workspaces/browse/testdata-presentation HTTP/1.1" 200 OK
ERROR:root:Tried to connect to http://localhost:9000
ERROR:root:Requested resource
INFO: 172.16.4.243:45862 - "GET /workspaces/ping/testdata-presentation HTTP/1.1" 502 Bad Gateway
ERROR:root:Tried to connect to http://localhost:9000
ERROR:root:Requested resource
INFO: 172.16.4.243:45862 - "GET /workspaces/ping/testdata-presentation HTTP/1.1" 502 Bad Gateway
ERROR:root:Tried to connect to http://localhost:9000
ERROR:root:Requested resource
At this point, in the DB there is a suitable object to reconnect to:
address: 'http://localhost:9000',
owner: 'edf609de-c148-4078-9be5-d8074138c014',
process_id: '55',
workspace: '/data/testdata-presentation'
But when I log in to the Monitor and see which ports are actually bound to, there's indeed no TCP 9000:
tcp 0 0 127.0.0.11:38267 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:5000 0.0.0.0:* LISTEN
It does work on other workspaces though.
Could it be that we should restrict the list of workspaces to those in the Manager's internal
directory ./ocr-d
? The testdata-presentation above was only present on the Kitodo outside (and had no images present locally, only ALTO).
FROM python:3.7 | ||
|
||
RUN apt-get update \ | ||
&& apt-get install -y --no-install-recommends libcairo2-dev libgtk-3-bin libgtk-3-dev libglib2.0-dev libgtksourceview-3.0-dev libgirepository1.0-dev gir1.2-webkit2-4.0 pkg-config cmake \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
&& apt-get install -y --no-install-recommends libcairo2-dev libgtk-3-bin libgtk-3-dev libglib2.0-dev libgtksourceview-3.0-dev libgirepository1.0-dev gir1.2-webkit2-4.0 pkg-config cmake \ | |
&& apt-get install -o Acquire::Retries=3 -y --no-install-recommends libcairo2-dev libgtk-3-bin libgtk-3-dev libglib2.0-dev libgtksourceview-3.0-dev libgirepository1.0-dev gir1.2-webkit2-4.0 pkg-config cmake \ |
ENV GDK_BACKEND broadway | ||
ENV BROADWAY_DISPLAY :5 | ||
|
||
EXPOSE 8085 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EXPOSE 8085 | |
EXPOSE 8085 | |
VOLUME /data |
#!/usr/bin/env bash | ||
|
||
set -x | ||
nohup broadwayd :5 & |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why nohup
? This will (attempt to) create a nohup.out (which at runtime we might not even have write access to). Also, why should broadwayd be kept alive if the shell running init.sh (and browse-ocrd) is gone? (Sounds like an invitation for zombies to me...)
Probably not relevant though, because when browse-ocrd and thus init.sh exits, the whole container should stop as well.
(In contrast, in the subprocess mode, we explicitly kill $!
the broadwayd after browse-ocrd exits.)
depends_on: | ||
- ocrd-database | ||
ports: | ||
- 8081:8081 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the external port be controlled by an environment variable (which can be set via .env, esp. in the superrepository)? Or is mongo-express
only for testing/debugging, i.e. will never get enabled by ocrd_kitodo/docker-compose.yml
anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- 8081:8081 | |
- ${MONITOR_PORT_DBE:-8081}:8081 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see f5db85d
build-browse-ocrd-docker: | ||
docker build -t ocrd-browser:latest -f docker-browse-ocrd/Dockerfile docker-browse-ocrd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should advertise this in make help
and readme?
Also, shouldn't we make this a dependency for make test
straight away? (If the image already exists and nothing in the build context changed, then the build cache will skip rebuild.)
EDIT: does not make sense – pytest under docker run precludes docker mode. So conversely, we need this as a dependency when running tests without make test
(i.e. via native nox or pytest directly)...
@@ -37,5 +38,29 @@ services: | |||
# DOZZLE_USERNAME= | |||
# DOZZLE_PASSWORD= | |||
|
|||
ocrd-database: | |||
image: "mongo:latest" | |||
container_name: ocrd-database |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the fixed container_name
here? IIUC it should also work having Docker daemon assign a default name (based on the Docker network, as all other services), but still referencing MongoDB by the service name (i.e. Docker should map service to host names).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me it does work without container_name
(keeping the MongoDB URL as is).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait. I do get an occasional failure to start mongo-express now (Could not connect to database using connectionString: mongodb://root:1234@ocrd-database:27017/"
) – despite the condition: service_started
...
@@ -15,6 +15,7 @@ services: | |||
environment: | |||
MONITOR_PORT_LOG: ${MONITOR_PORT_LOG} | |||
CONTROLLER: "${CONTROLLER_HOST}:${CONTROLLER_PORT_SSH}" | |||
DB_CONNECTION: "mongodb://${DB_ROOT_USER:-root}:${DB_ROOT_PASSWORD:-root_password}@ocrd-database:27017" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to be picky, but we already use DB_*
in top-level .env for the MySQL database in Kitodo, and it clashes on DB_ROOT_PASSWORD
(but not DB_ROOT_USER
). Do we simply tie these two databases to the same credentials here, or rather create a separate set of variables?
depends_on: | ||
- ocrd-database | ||
ports: | ||
- 8081:8081 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Native port 8081 is occupied on my server. Perhaps we should make this configurable as MONITOR_PORT_DBE
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see f5db85d
Could it be that the JS function setting the workspace interval … ocrd_monitor/ocrdmonitor/server/templates/workspace.html.j2 Lines 74 to 79 in b372d20
|
mongo-express: | ||
image: mongo-express:latest | ||
depends_on: | ||
- ocrd-database |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- ocrd-database | |
ocrd-database | |
condition: service_started |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see afdc359
Path(space).relative_to(browser_settings.workspace_dir) | ||
for space in workspace.list_all(browser_settings.workspace_dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does work on other workspaces though.
Could it be that we should restrict the list of workspaces to those in the Manager's internal directory
./ocr-d
? The testdata-presentation above was only present on the Kitodo outside (and had no images present locally, only ALTO).
Indeed, that was the problem!
But since we use OCRD_BROWSER__WORKSPACE_DIR=/data
(i.e. OcrdBrowserSettings.workspace_dir
) both as starting point for list_all
here and as reference point for relative paths, we should be safe simply switching to /data/ocr-d
(for both).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see cb5e065
@@ -20,6 +20,7 @@ fi | |||
export OCRD_BROWSER__MODE=native | |||
export OCRD_BROWSER__WORKSPACE_DIR=/data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does seem to work:
export OCRD_BROWSER__WORKSPACE_DIR=/data | |
export OCRD_BROWSER__WORKSPACE_DIR=/data/ocr-d |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see cb5e065
I still experience another problem: Browser connections apparently are not independent of each other, neither for different owners (i.e. browser clients) nor for different workspaces. Here's my db:
(wow, that worked via d&d out of the box!) So, whenever I switch to another tab/window, all existing tabs/windows say connection lost. On the logs, it always just says …
… each time. |
Wow, that's quite a serious bug, I'll come up with a test for that, so it doesn't happen again |
Just tried again (after a
They are all related to the subprocess mode. Those are the stack traces, respectively: test__launching_on_an_allocated_port__raises_unavailable_port_error[create_browser_factory0]
test__launching_on_an_allocated_port__raises_unavailable_port_error[SubProcessOcrdBrowserFactory]
test__one_port_allocated__launches_on_next_available[create_browser_factory0]
test__one_port_allocated__launches_on_next_available[SubProcessOcrdBrowserFactory]
warnings on log output
|
After pytest suite, I am seeing an orphan from the previous test run (which might explain all the failures above): a broadwayd on :9000 with the parent Now, when I manually kill that process and start test__factory__launches_new_browser_instance[create_browser_factory0]
Thus it seems the browser is not reachable – could that be a timing issue? Also, why is there no |
I noticed the same thing, but stopping the native browsers did not fix the issue.
Do you know what could cause that @bertsky ? |
This is from here, it just means that no configuration file could be found (not a problem, this is optional and can be used to set default fileGrps etc).
I guess this must be a problem with DBus communication. |
Could you please post or push these? |
I changed some stuff in the |
Unfortunately, I wasn't able to fix the issues with launching browsers so far. Since I'll be gone for the next 2+ weeks here is everything I know and have tried to fix the issues, in case anybody would like to try their hand. Initial workFirst off, since both docker and native browsers used a similar for loop to iterate over possible ports, I have introduced the class PortBindingError(RuntimeError):
pass
PortBindingResult = Union[T, PortBindingError]
PortBinding = Callable[[str, int], Awaitable[PortBindingResult[T]]]
class BoundPort(NamedTuple, Generic[T]):
bound_app: T
port: int
async def try_bind(
binding: PortBinding[T], host: str, ports: Iterable[int]
) -> BoundPort[T]:
for port in ports:
result = await binding(host, port)
if isinstance(result, PortBindingError):
logging.info(f"Port {port} already in use, continuing to next port")
continue
return BoundPort(result, port)
raise NoPortsAvailableError() Issues with the native browserThe broadway server will launch a process that runs continuously and will only print something to stderr if the port is already in use. process = await asyncio.create_subprocess_shell(
# broadway cmd here, shortened for readability,
"broadway ...",
stderr=asyncio.subprocess.PIPE,
)
try:
stderr = cast(asyncio.StreamReader, process.stderr)
err_output = await asyncio.wait_for(stderr.readline(), 5)
if b"Address already in use" in err_output:
return PortBindingError()
except asyncio.TimeoutError:
logging.info(
f"The process didn't exit within the given timeout. Assuming browser on port {port} launched successfully"
)
return process Note: I also tried to wait for the entire browser process to exit using: try:
await asyncio.wait_for(process.wait(), 5)
except asyncio.TimeoutError:
pass This unfortunately did not work at all and caused all launch tests to fail and resulted in browsers being launched on the same ports again. Duplicated windowsOn the left we have the browser on port 9000, the right one runs on 9001. The second window on the left appeared when the browser on the right was launched. I could not replicate that behavior when I tried to launch broadway from the command line by hand, so I'm even more confused about why that happens. Possible workaroundOne thing that just came to my mind is that we could try to bind the port in the import socket
# ...
async def try_bind(
binding: PortBinding[T], host: str, ports: Iterable[int]
) -> BoundPort[T]:
for port in ports:
try:
s = socket.socket()
s.bind((host, port))
s.close()
except OSError:
continue
result = await binding(host, port)
if isinstance(result, PortBindingError):
logging.info(f"Port {port} already in use, continuing to next port")
continue
return BoundPort(result, port)
raise NoPortsAvailableError() Failing tests in GitHub ActionsI am not sure why the tests are failing in GitHub Actions. They have been consistently working from inside my development Docker container. Maybe this is the failure @bertsky experienced that couldn't be reproduced? |
This PR introduces MongoDB to store information about running browser processes.
Adding the DB instead of keeping browsers in memory triggered a few design changes in order to keep the application testable:
Previous (simplified) Design:
New Design:
There are a few things that still need improvement:
try... except
that tries port numbers until we find one that is available