Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/browser db #29

Closed
wants to merge 33 commits into from
Closed

Conversation

SvenMarcus
Copy link
Collaborator

@SvenMarcus SvenMarcus commented Jun 7, 2023

This PR introduces MongoDB to store information about running browser processes.
Adding the DB instead of keeping browsers in memory triggered a few design changes in order to keep the application testable:

Previous (simplified) Design:

classDiagram
WorkspaceEndpoint "1" o-- "0..*" OcrdBrowser
OcrdBrowserFactory --> OcrdBrowser: creates
WorkspaceEndpoint --> OcrdBrowserFactory

class OcrdBrowser {
  start()
  stop()
}
Loading

New Design:

classDiagram
WorkspaceEndpoint "1" o-- "0..*" OcrdBrowser
WorkspaceEndpoint -- BrowserProcessRepository
OcrdBrowserFactory --> OcrdBrowser: creates and *starts*
BrowserProcessRepository --> RestoringOcrdBrowserFactory
RestoringOcrdBrowserFactory --> OcrdBrowser: restores
WorkspaceEndpoint --> OcrdBrowserFactory
 
class OcrdBrowser {
  void stop()
}

class BrowserProcessRepository {
<<protocol>>
  insert(OcrdBrowser)
  delete(OcrdBrowser)
  find(owner: str, workspace: str): list[OcrdBrowser]
}
Loading

There are a few things that still need improvement:

  • Replace the current solution for assigning ports with a simple try... except that tries port numbers until we find one that is available
  • Persisting browser processes in a DB means we need to verify that all stored browsers actually still available. This can either happen by running a clean up action when the monitor starts or by checking if a stored browser is still available when deciding whether to use an existing browser or starting a new
  • The additional design work that was necessary to keep things testable made the test setup more complicated. I think there is potential to find a simpler solution again.

@SvenMarcus
Copy link
Collaborator Author

SvenMarcus commented Jun 14, 2023

I noticed a regression when re-opening a browser after closing the GTK window. For some reason, the monitor doesn't start a new browser process. I'm not sure why the tests didn't catch it, I'll have to investigate that.

@SvenMarcus SvenMarcus marked this pull request as ready for review June 19, 2023 14:02
@SvenMarcus
Copy link
Collaborator Author

I have fixed the issue with restarting browsers. It turned out to be a missing try:...except: in the stop() method.
There is additional work that can be done to simplify the test setup, however, I can address that in a separate PR or just merge it myself later on.
In my opinion, this PR is ready to be reviewed and merged now @markusweigelt @bertsky

Copy link
Member

@bertsky bertsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely amazing!

Very elegant solutions whereever I look. So much to learn here.

We should reuse some of the native/Docker recipes for ocrd_network deployers. Good idea to simply try the next free port in range.

Also fascinating: the @asynccontextmanager and FastAPI lifespan.

Unfortunately, I met some problems when testing that I cannot immediately explain:

When doing make test, initially (in the first run), the native browser deployer had two failures:

FAILED tests/ocrdbrowser/test_browser_launch.py::test__launching_on_an_allocated_port__raises_unavailable_port_error[SubProcessOcrdBrowserFactory]
FAILED tests/ocrdbrowser/test_browser_launch.py::test__one_port_allocated__launches_on_next_available[SubProcessOcrdBrowserFactory]

When I reran there were no failures anymore.

Next, when running as part of ocrd_kitodo (with additional services ocrd-database and mongo-express, and new volume db-volume, and additional key DB_ROOT_USER in .env), spinning up a browser for a workspace led to the following failure:

INFO:     172.16.4.243:33918 - "GET /workspaces/open/testdata-presentation HTTP/1.1" 200 OK
INFO:     172.16.4.243:33918 - "GET /workspaces/browse/testdata-presentation HTTP/1.1" 200 OK
INFO:     172.16.4.243:33918 - "GET /workspaces/ping/testdata-presentation HTTP/1.1" 200 OK
INFO:     172.16.4.243:33918 - "GET /workspaces/view/testdata-presentation/ HTTP/1.1" 200 OK
INFO:     172.16.4.243:33918 - "GET /workspaces/view/testdata-presentation/broadway.js HTTP/1.1" 307 Temporary Redirect
INFO:     172.16.4.243:33918 - "GET /workspaces/view/testdata-presentation/broadway.js/ HTTP/1.1" 200 OK
INFO:     ('172.16.4.243', 33932) - "WebSocket /workspaces/view/testdata-presentation/socket" [accepted]
INFO:     connection open
WARNING:root:Could not find process with ID 23
INFO:     connection closed

Then, clicking retry I end up with the following infinite loop:

INFO:     172.16.4.243:45862 - "GET /workspaces/open/testdata-presentation HTTP/1.1" 200 OK
INFO:     172.16.4.243:45862 - "GET /workspaces/browse/testdata-presentation HTTP/1.1" 200 OK
ERROR:root:Tried to connect to http://localhost:9000
ERROR:root:Requested resource 
INFO:     172.16.4.243:45862 - "GET /workspaces/ping/testdata-presentation HTTP/1.1" 502 Bad Gateway
ERROR:root:Tried to connect to http://localhost:9000
ERROR:root:Requested resource 
INFO:     172.16.4.243:45862 - "GET /workspaces/ping/testdata-presentation HTTP/1.1" 502 Bad Gateway
ERROR:root:Tried to connect to http://localhost:9000
ERROR:root:Requested resource 

At this point, in the DB there is a suitable object to reconnect to:

    address: 'http://localhost:9000',
    owner: 'edf609de-c148-4078-9be5-d8074138c014',
    process_id: '55',
    workspace: '/data/testdata-presentation'

But when I log in to the Monitor and see which ports are actually bound to, there's indeed no TCP 9000:

tcp        0      0 127.0.0.11:38267        0.0.0.0:*               LISTEN     
tcp        0      0 0.0.0.0:5000            0.0.0.0:*               LISTEN 

It does work on other workspaces though.

Could it be that we should restrict the list of workspaces to those in the Manager's internal
directory ./ocr-d? The testdata-presentation above was only present on the Kitodo outside (and had no images present locally, only ALTO).

FROM python:3.7

RUN apt-get update \
&& apt-get install -y --no-install-recommends libcairo2-dev libgtk-3-bin libgtk-3-dev libglib2.0-dev libgtksourceview-3.0-dev libgirepository1.0-dev gir1.2-webkit2-4.0 pkg-config cmake \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
&& apt-get install -y --no-install-recommends libcairo2-dev libgtk-3-bin libgtk-3-dev libglib2.0-dev libgtksourceview-3.0-dev libgirepository1.0-dev gir1.2-webkit2-4.0 pkg-config cmake \
&& apt-get install -o Acquire::Retries=3 -y --no-install-recommends libcairo2-dev libgtk-3-bin libgtk-3-dev libglib2.0-dev libgtksourceview-3.0-dev libgirepository1.0-dev gir1.2-webkit2-4.0 pkg-config cmake \

ENV GDK_BACKEND broadway
ENV BROADWAY_DISPLAY :5

EXPOSE 8085
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
EXPOSE 8085
EXPOSE 8085
VOLUME /data

#!/usr/bin/env bash

set -x
nohup broadwayd :5 &
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why nohup? This will (attempt to) create a nohup.out (which at runtime we might not even have write access to). Also, why should broadwayd be kept alive if the shell running init.sh (and browse-ocrd) is gone? (Sounds like an invitation for zombies to me...)

Probably not relevant though, because when browse-ocrd and thus init.sh exits, the whole container should stop as well.

(In contrast, in the subprocess mode, we explicitly kill $! the broadwayd after browse-ocrd exits.)

docker-browse-ocrd/init.sh Show resolved Hide resolved
depends_on:
- ocrd-database
ports:
- 8081:8081
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the external port be controlled by an environment variable (which can be set via .env, esp. in the superrepository)? Or is mongo-express only for testing/debugging, i.e. will never get enabled by ocrd_kitodo/docker-compose.yml anyway?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- 8081:8081
- ${MONITOR_PORT_DBE:-8081}:8081

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see f5db85d

Comment on lines +14 to +15
build-browse-ocrd-docker:
docker build -t ocrd-browser:latest -f docker-browse-ocrd/Dockerfile docker-browse-ocrd
Copy link
Member

@bertsky bertsky Jun 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should advertise this in make help and readme?

Also, shouldn't we make this a dependency for make test straight away? (If the image already exists and nothing in the build context changed, then the build cache will skip rebuild.)

EDIT: does not make sense – pytest under docker run precludes docker mode. So conversely, we need this as a dependency when running tests without make test (i.e. via native nox or pytest directly)...

@@ -37,5 +38,29 @@ services:
# DOZZLE_USERNAME=
# DOZZLE_PASSWORD=

ocrd-database:
image: "mongo:latest"
container_name: ocrd-database
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the fixed container_name here? IIUC it should also work having Docker daemon assign a default name (based on the Docker network, as all other services), but still referencing MongoDB by the service name (i.e. Docker should map service to host names).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me it does work without container_name (keeping the MongoDB URL as is).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait. I do get an occasional failure to start mongo-express now (Could not connect to database using connectionString: mongodb://root:1234@ocrd-database:27017/") – despite the condition: service_started...

@@ -15,6 +15,7 @@ services:
environment:
MONITOR_PORT_LOG: ${MONITOR_PORT_LOG}
CONTROLLER: "${CONTROLLER_HOST}:${CONTROLLER_PORT_SSH}"
DB_CONNECTION: "mongodb://${DB_ROOT_USER:-root}:${DB_ROOT_PASSWORD:-root_password}@ocrd-database:27017"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to be picky, but we already use DB_* in top-level .env for the MySQL database in Kitodo, and it clashes on DB_ROOT_PASSWORD (but not DB_ROOT_USER). Do we simply tie these two databases to the same credentials here, or rather create a separate set of variables?

depends_on:
- ocrd-database
ports:
- 8081:8081
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Native port 8081 is occupied on my server. Perhaps we should make this configurable as MONITOR_PORT_DBE?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see f5db85d

@bertsky bertsky linked an issue Jun 20, 2023 that may be closed by this pull request
@bertsky
Copy link
Member

bertsky commented Jun 20, 2023

Then, clicking retry I end up with the following infinite loop:

Could it be that the JS function setting the workspace interval …

const interval = setInterval(async () => {
const result = await fetch(pingUrl);
if (!result.ok) return;
replaceProgressWithIframe();
clearInterval(interval);
}, 500);
… does not correctly react to 502 responses?

mongo-express:
image: mongo-express:latest
depends_on:
- ocrd-database
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- ocrd-database
ocrd-database
condition: service_started

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see afdc359

Comment on lines +16 to +17
Path(space).relative_to(browser_settings.workspace_dir)
for space in workspace.list_all(browser_settings.workspace_dir)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does work on other workspaces though.

Could it be that we should restrict the list of workspaces to those in the Manager's internal directory ./ocr-d? The testdata-presentation above was only present on the Kitodo outside (and had no images present locally, only ALTO).

Indeed, that was the problem!

But since we use OCRD_BROWSER__WORKSPACE_DIR=/data (i.e. OcrdBrowserSettings.workspace_dir) both as starting point for list_all here and as reference point for relative paths, we should be safe simply switching to /data/ocr-d (for both).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see cb5e065

@@ -20,6 +20,7 @@ fi
export OCRD_BROWSER__MODE=native
export OCRD_BROWSER__WORKSPACE_DIR=/data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does seem to work:

Suggested change
export OCRD_BROWSER__WORKSPACE_DIR=/data
export OCRD_BROWSER__WORKSPACE_DIR=/data/ocr-d

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see cb5e065

@bertsky
Copy link
Member

bertsky commented Jun 20, 2023

I still experience another problem: Browser connections apparently are not independent of each other, neither for different owners (i.e. browser clients) nor for different workspaces. Here's my db:

_id address owner process_id workspace
64918e50956bd16e73cbc834 http://localhost:9000 edf609de-c148-4078-9be5-d8074138c014 23 /data/ocr-d/3
64918e5f956bd16e73cbc835 http://localhost:9000 edf609de-c148-4078-9be5-d8074138c014 68 /data/ocr-d/data/3
64918e82956bd16e73cbc836 http://localhost:9000 e455a65e-d4f4-4086-8d3e-442c99198984 111 /data/ocr-d/data/3

(wow, that worked via d&d out of the box!)

So, whenever I switch to another tab/window, all existing tabs/windows say connection lost.

On the logs, it always just says …

INFO:     ('172.16.4.243', 46748) - "WebSocket /workspaces/view/3/socket" [accepted]
INFO:     connection open
INFO:     connection closed
ERROR:root:
                An exception occurred during communication with the browser <ocrdbrowser._subprocess.SubProcessOcrdBrowser object at 0x7f4100d3ffd0>. 
                The exception was WebSocketDisconnect(1001)

… each time.

@SvenMarcus
Copy link
Collaborator Author

Wow, that's quite a serious bug, I'll come up with a test for that, so it doesn't happen again

@bertsky
Copy link
Member

bertsky commented Jun 20, 2023

When doing make test, initially (in the first run), the native browser deployer had two failures:

FAILED tests/ocrdbrowser/test_browser_launch.py::test__launching_on_an_allocated_port__raises_unavailable_port_error[SubProcessOcrdBrowserFactory]
FAILED tests/ocrdbrowser/test_browser_launch.py::test__one_port_allocated__launches_on_next_available[SubProcessOcrdBrowserFactory]

Just tried again (after a docker compose build) as pytest -vv tests in a native Py3.11 venv, with both a fresh ocrd-browser:latest Docker image and native browse-ocrd in the active venv. I now see 4 failures:

FAILED tests/ocrdbrowser/test_browser_launch.py::test__launching_on_an_allocated_port__raises_unavailable_port_error[create_browser_factory0] - ocrdbrowser._port.NoPortsAvailableError
FAILED tests/ocrdbrowser/test_browser_launch.py::test__launching_on_an_allocated_port__raises_unavailable_port_error[SubProcessOcrdBrowserFactory] - Failed: DID NOT RAISE <class 'ocrdbrowser._port.NoPortsAvailableError'>
FAILED tests/ocrdbrowser/test_browser_launch.py::test__one_port_allocated__launches_on_next_available[create_browser_factory0] - ocrdbrowser._port.NoPortsAvailableError
FAILED tests/ocrdbrowser/test_browser_launch.py::test__one_port_allocated__launches_on_next_available[SubProcessOcrdBrowserFactory] - AssertionError: assert 'http://localhost:9000' == 'http://localhost:9001'

They are all related to the subprocess mode. Those are the stack traces, respectively:

test__launching_on_an_allocated_port__raises_unavailable_port_error[create_browser_factory0]

create_browser_factory = functools.partial(<class 'ocrdbrowser._docker.DockerOcrdBrowserFactory'>, 'http://localhost')

    @browser_factory_test
    async def test__launching_on_an_allocated_port__raises_unavailable_port_error(
        create_browser_factory: CreateBrowserFactory,
    ) -> None:
        _factory = create_browser_factory({9000})
>       await _factory("first-owner", "tests/workspaces/a_workspace")

tests/ocrdbrowser/test_browser_launch.py:92: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <ocrdbrowser._docker.DockerOcrdBrowserFactory object at 0x7f21bdbf53d0>, owner = 'first-owner'
workspace_path = 'tests/workspaces/a_workspace'

    async def __call__(self, owner: str, workspace_path: str) -> OcrdBrowser:
        abs_workspace = path.abspath(workspace_path)
    
        container: DockerOcrdBrowser | None = None
        for port in self._ports:
            cmd = await _run_command(
                _docker_run,
                _container_name(owner, abs_workspace),
                abs_workspace,
                port,
            )
    
            return_code = await self.wait_for(cmd)
            if return_code != 0:
                continue
    
            container = DockerOcrdBrowser(
                owner,
                abs_workspace,
                f"{self._host}:{port}",
                await self._read_container_id(cmd),
            )
            self._containers.append(container)
            return container
    
>       raise NoPortsAvailableError()
E       ocrdbrowser._port.NoPortsAvailableError

ocrdbrowser/_docker.py:102: NoPortsAvailableError

test__launching_on_an_allocated_port__raises_unavailable_port_error[SubProcessOcrdBrowserFactory]

create_browser_factory = <class 'ocrdbrowser._subprocess.SubProcessOcrdBrowserFactory'>

    @browser_factory_test
    async def test__launching_on_an_allocated_port__raises_unavailable_port_error(
        create_browser_factory: CreateBrowserFactory,
    ) -> None:
        _factory = create_browser_factory({9000})
        await _factory("first-owner", "tests/workspaces/a_workspace")
    
        sut = create_browser_factory({9000})
>       with pytest.raises(NoPortsAvailableError):
E       Failed: DID NOT RAISE <class 'ocrdbrowser._port.NoPortsAvailableError'>

tests/ocrdbrowser/test_browser_launch.py:95: Failed

test__one_port_allocated__launches_on_next_available[create_browser_factory0]

create_browser_factory = functools.partial(<class 'ocrdbrowser._docker.DockerOcrdBrowserFactory'>, 'http://localhost')

    @browser_factory_test
    async def test__one_port_allocated__launches_on_next_available(
        create_browser_factory: CreateBrowserFactory,
    ) -> None:
        _factory = create_browser_factory({9000})
>       await _factory("other-owner", "tests/workspaces/a_workspace")

tests/ocrdbrowser/test_browser_launch.py:104: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <ocrdbrowser._docker.DockerOcrdBrowserFactory object at 0x7f21bdc4fd50>, owner = 'other-owner'
workspace_path = 'tests/workspaces/a_workspace'

    async def __call__(self, owner: str, workspace_path: str) -> OcrdBrowser:
        abs_workspace = path.abspath(workspace_path)
    
        container: DockerOcrdBrowser | None = None
        for port in self._ports:
            cmd = await _run_command(
                _docker_run,
                _container_name(owner, abs_workspace),
                abs_workspace,
                port,
            )
    
            return_code = await self.wait_for(cmd)
            if return_code != 0:
                continue
    
            container = DockerOcrdBrowser(
                owner,
                abs_workspace,
                f"{self._host}:{port}",
                await self._read_container_id(cmd),
            )
            self._containers.append(container)
            return container
    
>       raise NoPortsAvailableError()
E       ocrdbrowser._port.NoPortsAvailableError

ocrdbrowser/_docker.py:102: NoPortsAvailableError

test__one_port_allocated__launches_on_next_available[SubProcessOcrdBrowserFactory]

create_browser_factory = <class 'ocrdbrowser._subprocess.SubProcessOcrdBrowserFactory'>

    @browser_factory_test
    async def test__one_port_allocated__launches_on_next_available(
        create_browser_factory: CreateBrowserFactory,
    ) -> None:
        _factory = create_browser_factory({9000})
        await _factory("other-owner", "tests/workspaces/a_workspace")
    
        sut = create_browser_factory({9000, 9001})
        browser = await sut("second-other-owner", "tests/workspaces/a_workspace")
    
>       assert browser.address() == "http://localhost:9001"
E       AssertionError: assert 'http://localhost:9000' == 'http://localhost:9001'
E         - http://localhost:9001
E         ?                     ^
E         + http://localhost:9000
E         ?                     ^

tests/ocrdbrowser/test_browser_launch.py:109: AssertionError

warnings on log output

WARNING  asyncio:unix_events.py:1427 Loop <_UnixSelectorEventLoop running=False closed=True debug=False> that handles pid 112317 is closed
tests/ocrdmonitor/test_sshremote.py::test_ps_over_ssh__returns_list_of_process_status
  /data/ocr-d/ocrd_kitodo/_modules/ocrd_monitor/env311/lib/python3.11/site-packages/_pytest/unraisableexception.py:78: PytestUnraisableExceptionWarning: Exception ignored in: <function BaseSubprocessTransport.__del__ at 0x7f21c0ba74c0>
  
  Traceback (most recent call last):
    File "/data/ocr-d/Python-3.11.4/Lib/asyncio/base_subprocess.py", line 126, in __del__
      self.close()
    File "/data/ocr-d/Python-3.11.4/Lib/asyncio/base_subprocess.py", line 104, in close
      proto.pipe.close()
    File "/data/ocr-d/Python-3.11.4/Lib/asyncio/unix_events.py", line 566, in close
      self._close(None)
    File "/data/ocr-d/Python-3.11.4/Lib/asyncio/unix_events.py", line 590, in _close
      self._loop.call_soon(self._call_connection_lost, exc)
    File "/data/ocr-d/Python-3.11.4/Lib/asyncio/base_events.py", line 761, in call_soon
      self._check_closed()
    File "/data/ocr-d/Python-3.11.4/Lib/asyncio/base_events.py", line 519, in _check_closed
      raise RuntimeError('Event loop is closed')
  RuntimeError: Event loop is closed
  
    warnings.warn(pytest.PytestUnraisableExceptionWarning(msg))

@bertsky
Copy link
Member

bertsky commented Jun 20, 2023

After pytest suite, I am seeing an orphan from the previous test run (which might explain all the failures above): a broadwayd on :9000 with the parent /bin/sh -c 'broadwayd :920 & /data/ocr-d/ocrd_kitodo/_modules/ocrd_monitor/env311/bin/browse-ocrd tests/workspaces/a_workspace/mets.xml ; kill $!' – I guess somewhere we forgot to close/kill the browser as part of the teardown.

Now, when I manually kill that process and start pytest tests again, I get the exact same failures, but this time also an extra failure: the very first test in the suite:

test__factory__launches_new_browser_instance[create_browser_factory0]

    @contextlib.contextmanager
    def map_httpcore_exceptions() -> typing.Iterator[None]:
        try:
>           yield

env311/lib/python3.11/site-packages/httpx/_transports/default.py:60: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <httpx.AsyncHTTPTransport object at 0x7f95c9bae1d0>, request = <Request('GET', 'http://localhost:9000/')>

    async def handle_async_request(
        self,
        request: Request,
    ) -> Response:
        assert isinstance(request.stream, AsyncByteStream)
    
        req = httpcore.Request(
            method=request.method,
            url=httpcore.URL(
                scheme=request.url.raw_scheme,
                host=request.url.raw_host,
                port=request.url.port,
                target=request.url.raw_path,
            ),
            headers=request.headers.raw,
            content=request.stream,
            extensions=request.extensions,
        )
        with map_httpcore_exceptions():
>           resp = await self._pool.handle_async_request(req)

...

The above exception was the direct cause of the following exception:

create_browser_factory = functools.partial(<class 'ocrdbrowser._docker.DockerOcrdBrowserFactory'>, 'http://localhost')

    @browser_factory_test
    async def test__factory__launches_new_browser_instance(
        create_browser_factory: CreateBrowserFactory,
    ) -> None:
        sut = create_browser_factory({9000})
        browser = await sut("the-owner", "tests/workspaces/a_workspace")
    
        client = browser.client()
>       response = await client.get("/")

tests/ocrdbrowser/test_browser_launch.py:83: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <ocrdbrowser._client.HttpBrowserClient object at 0x7f95ca420590>, resource = '/'

    async def get(self, resource: str) -> bytes:
        try:
            async with httpx.AsyncClient(base_url=self.address) as client:
                response = await client.get(resource)
                return response.content
        except Exception as ex:
            logging.error(f"Tried to connect to {self.address}")
            logging.error(f"Requested resource {resource}")
>           raise ConnectionError from ex
E           ConnectionError

ocrdbrowser/_client.py:78: ConnectionError
------------------------------------------------------------- Captured log call -------------------------------------------------------------
ERROR    root:_client.py:76 Tried to connect to http://localhost:9000
ERROR    root:_client.py:77 Requested resource /

Thus it seems the browser is not reachable – could that be a timing issue?

Also, why is there no stop_browsers for native mode?

@SvenMarcus
Copy link
Collaborator Author

SvenMarcus commented Jun 20, 2023

I noticed the same thing, but stopping the native browsers did not fix the issue.
I added some more logging and during the tests I'm getting this output from browse-ocrd

17:16:16.406 INFO ocrd_browser.util.config.SettingsFactory.build_from_files - Read 0 config file(s):
tried /etc/xdg/ocrd-browser.conf
/root/.config/ocrd-browser.conf
/workspaces/ocrd_monitor/ocrd-browser.conf
Bus error

Do you know what could cause that @bertsky ?

@bertsky
Copy link
Member

bertsky commented Jun 20, 2023

I'm getting this output from browse-ocrd

17:16:16.406 INFO ocrd_browser.util.config.SettingsFactory.build_from_files - Read 0 config file(s):
tried /etc/xdg/ocrd-browser.conf
/root/.config/ocrd-browser.conf
/workspaces/ocrd_monitor/ocrd-browser.conf

This is from here, it just means that no configuration file could be found (not a problem, this is optional and can be used to set default fileGrps etc).

Bus error

I guess this must be a problem with DBus communication.

@bertsky
Copy link
Member

bertsky commented Jun 20, 2023

I added some more logging and during the tests

Could you please post or push these?

@bertsky bertsky linked an issue Jun 20, 2023 that may be closed by this pull request
@SvenMarcus
Copy link
Collaborator Author

I changed some stuff in the SubProcessOcrdBrowserFactory and the launch tests for the native browsers are passing now. Sometimes I get a warning that the event loop is closed. I think it may be related to this: https://bugs.python.org/issue41320
However, the tests are passing despite that.

@SvenMarcus
Copy link
Collaborator Author

SvenMarcus commented Jun 23, 2023

Unfortunately, I wasn't able to fix the issues with launching browsers so far. Since I'll be gone for the next 2+ weeks here is everything I know and have tried to fix the issues, in case anybody would like to try their hand.

Initial work

First off, since both docker and native browsers used a similar for loop to iterate over possible ports, I have introduced the try_bind() function (in ocrdbrowser/_port.py) that takes care of the looping part. It accepts a function as a parameter that tries to launch some application on a given port. If this function returns the launched application we return, otherwise we move on to the next port.

class PortBindingError(RuntimeError):
    pass


PortBindingResult = Union[T, PortBindingError]
PortBinding = Callable[[str, int], Awaitable[PortBindingResult[T]]]


class BoundPort(NamedTuple, Generic[T]):
    bound_app: T
    port: int


async def try_bind(
    binding: PortBinding[T], host: str, ports: Iterable[int]
) -> BoundPort[T]:
    for port in ports:
        result = await binding(host, port)
        if isinstance(result, PortBindingError):
            logging.info(f"Port {port} already in use, continuing to next port")
            continue

        return BoundPort(result, port)

    raise NoPortsAvailableError()

Issues with the native browser

The broadway server will launch a process that runs continuously and will only print something to stderr if the port is already in use.
My approach is to read from stderr after launching the browser with a timeout of 5 seconds. If this timeout expires, I assume that broadway has been launched successfully, otherwise we return a PortBindingError. This does seem to work in detecting when broadway exits and correctly launches new browsers on new ports (so it fixes the problem @bertsky experienced), but it turns out that the newly launched window will also appear on all existing browsers as well.

process = await asyncio.create_subprocess_shell(
    # broadway cmd here, shortened for readability,
    "broadway ...",
    stderr=asyncio.subprocess.PIPE,
)

try:
    stderr = cast(asyncio.StreamReader, process.stderr)
    err_output = await asyncio.wait_for(stderr.readline(), 5)
    if b"Address already in use" in err_output:
        return PortBindingError()
except asyncio.TimeoutError:
    logging.info(
        f"The process didn't exit within the given timeout. Assuming browser on port {port} launched successfully"
    )


return process

Note: I also tried to wait for the entire browser process to exit using:

try:
   await asyncio.wait_for(process.wait(), 5)
except asyncio.TimeoutError:
    pass

This unfortunately did not work at all and caused all launch tests to fail and resulted in browsers being launched on the same ports again.

Duplicated windows

On the left we have the browser on port 9000, the right one runs on 9001. The second window on the left appeared when the browser on the right was launched. I could not replicate that behavior when I tried to launch broadway from the command line by hand, so I'm even more confused about why that happens.

Possible workaround

One thing that just came to my mind is that we could try to bind the port in the try_bind() function before trying to launch the actual application. This is not entirely safe, though, as it can lead to race conditions due to the time gap between the test binding and the actual app launch.

import socket

# ...

async def try_bind(
    binding: PortBinding[T], host: str, ports: Iterable[int]
) -> BoundPort[T]:
    for port in ports:
        try:
            s = socket.socket()
            s.bind((host, port))
            s.close()
        except OSError:
            continue

        result = await binding(host, port)
        if isinstance(result, PortBindingError):
            logging.info(f"Port {port} already in use, continuing to next port")
            continue

        return BoundPort(result, port)

    raise NoPortsAvailableError()

Failing tests in GitHub Actions

I am not sure why the tests are failing in GitHub Actions. They have been consistently working from inside my development Docker container. Maybe this is the failure @bertsky experienced that couldn't be reproduced?

@bertsky bertsky mentioned this pull request Jul 6, 2023
@SvenMarcus
Copy link
Collaborator Author

Since PR #30 is based on this one and has already many improvements on top of that, I'm closing this PR. We'll get everything in here when we merge #30

@SvenMarcus SvenMarcus closed this Jul 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pytest fails, cannot bind to address :7000 Adjust browser view width
2 participants