Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Queries are routed to unhealthy backends #199

Open
ahackett opened this issue May 4, 2023 · 9 comments
Open

Queries are routed to unhealthy backends #199

ahackett opened this issue May 4, 2023 · 9 comments

Comments

@ahackett
Copy link

ahackett commented May 4, 2023

Hi

Thank you for making the presto-gateway available to the community, it's a really useful piece of software.

I've recently been evaluating and testing the gateway, and have found that queries are being routed to backend clusters even when the coordinator is stopped, which of course results in query failures.

My config includes:

    modules:
      - com.lyft.data.gateway.ha.module.HaGatewayProviderModule
      - com.lyft.data.gateway.ha.module.ClusterStateListenerModule

    managedApps:
      - com.lyft.data.gateway.ha.GatewayManagedApp
      - com.lyft.data.gateway.ha.clustermonitor.ActiveClusterMonitor

I can see that #183 "filters out unhealthy clusters from queue based routing logic".

However, since our clusters have authentication enabled I get the following errors in the logs:

WARN  [2023-05-04 08:55:42,052] com.lyft.data.gateway.ha.clustermonitor.ActiveClusterMonitor: Received non 200 response, response code: 401
ERROR [2023-05-04 08:55:42,053] com.lyft.data.gateway.ha.clustermonitor.ActiveClusterMonitor: Received null/empty response for http://trino.example.net:8080/ui/api/stats

So the ActiveClusterMonitor cannnot fetch the metrics which I assume are needed for queue based routing to work.

I was hoping that someone might be able to confirm that the behaviour I'm seeing (queries routed to unhealthy backends) is definitely caused by the failure to fetch the metrics. If confirmed, I could look at adding support for password authentication with the /ui/api/stats endpoint.

Thanks

Austin

@connorlwilkes
Copy link

This seems like something that should be supported, would be good for someone to validate this behavior if possible

@willmostly
Copy link

I've added an option to my fork to query system.runtime.nodes and system.runtime.queries via JDBC with JWT authn instead of using the /ui/api/stats endpoint. In my case we require OAuth for the WebUI, so the gateway can't use any /ui/api endpoints.

I'm happy to contribute back if it is of general interest.

@Felicity-3786
Copy link

Trying to leverage ActiveClusterMonitor and met the same problem, I was thinking if only for health check purpose v1/status or v1/info/state should be enough, but if we want to get /ui/api/stats to work requires adding a certificate? Happy to discuss more.

@ahackett
Copy link
Author

Hi @willmostly

Thanks for the info, and for offering to make you contributions available. This would definitely be of interest to me, and I suspect others in the community are using JWT auth too...

Austin

@ccchenhe
Copy link

ccchenhe commented Aug 1, 2023

I think seems after 389 version, api "ui/api/stats" of health check became an internal api (meaning login is required).
so maybe necessary to modify code of http request in order to get login token.
here is just login demo:

public static String getTrinoClusterToken(String targetUrl, MonitorConfiguration monitorConfiguration){
        HttpURLConnection loginConnection = null;
        try{
            URL loginUrl = new URL(String.format("%s/ui/login", targetUrl));
            loginConnection = (HttpURLConnection) loginUrl.openConnection();
            loginConnection.setRequestMethod("POST");
            loginConnection.setDoOutput(true);
            loginConnection.setInstanceFollowRedirects(false);
            loginConnection.addRequestProperty("Content-Type", "application/x-www-form-urlencoded");
            loginConnection.setRequestProperty("charset", "utf-8");
            String password = "";
            if(!Strings.isNullOrEmpty(monitorConfiguration.getMonitorPassword())){
                password = monitorConfiguration.getMonitorPassword();
            }
            String loginData = String.format("username=%s&password=%s", monitorConfiguration.getMonitorUser(), password);
            byte[] postData = loginData.getBytes( StandardCharsets.UTF_8);
            loginConnection.connect();
            DataOutputStream out = new DataOutputStream(loginConnection.getOutputStream());
            out.write(postData);
            String token = loginConnection.getHeaderField("Set-Cookie");
            out.close();
            // data simple: Trino-UI-Token=xx;Version=1;Path=/ui;HttpOnly
            // so we need split by ;
            return token.split(";")[0];

        }catch (Exception e){
            log.error("Error login trino cluster from {},reason is:  {}", targetUrl, e);
        }finally {
            if (loginConnection != null) {
                loginConnection.disconnect();
            }
        }
        return "";
    }

after obtain login token, change healthy check http request code:

public static String requestTrinoClusterStatsWithToken(String targetUrl, String token){
        HttpURLConnection apiConnection = null;
        try{
            URL apiURL = new URL(String.format("%s/ui/api/stats", targetUrl));
            apiConnection = (HttpURLConnection) apiURL.openConnection();
            apiConnection.setRequestMethod("GET");
            apiConnection.setRequestProperty("Cookie", token);
            int responseCode = apiConnection.getResponseCode();
            if (responseCode == HttpStatus.SC_OK) {
                BufferedReader reader = new BufferedReader(new InputStreamReader((InputStream) apiConnection.getContent()));
                StringBuilder sb = new StringBuilder();
                String line;
                while ((line = reader.readLine()) != null) {
                    sb.append(line).append("\n");
                }
                return sb.toString();
            }
        }catch (Exception e){
            log.error("Error fetching cluster stats from [{}]", targetUrl, e);
        }finally {
            if (apiConnection != null){
                apiConnection.disconnect();
            }
        }
        return null;
    }

and healthy check logic works.

@ccchenhe
Copy link

ccchenhe commented Aug 1, 2023

but if you just want to confirm that cluster is healthy. maybe only change url from "/ui/api/stats" to "/v1/status"

@Chaho12
Copy link

Chaho12 commented Oct 20, 2023

Just a heads up @ccchenhe @ahackett, this issue has been fixed and uses /ui/api/stats for jdbc/http in trinodb/trino-gateway trinodb/trino-gateway@11b2dd2#diff-a3a976d789379cd160fbc2e9281073d96aa6e682e9d9e9ab67883992acaeeab2

For http and jdbc it uses username/pwd set in backendStateConfiguration

@Bigbarry7549
Copy link

uhuh

@Bigbarry7549
Copy link

boobs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants