Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose Tika http status code in errors returned by client methods #24

Closed
tomyl opened this issue Jul 10, 2020 · 3 comments
Closed

Expose Tika http status code in errors returned by client methods #24

tomyl opened this issue Jul 10, 2020 · 3 comments

Comments

@tomyl
Copy link
Contributor

tomyl commented Jul 10, 2020

For users of tika.Client it can be useful to be able to differentiate between intermittent errors (http status code 500) and content related errors (e.g. 415 and 422) however currently the client methods just return an opaque error string.

I'm experimenting in my fork https://github.com/tomyl/go-tika with exposing the http status code in the error. Basically:

diff --git a/tika/tika.go b/tika/tika.go
index a6ffdab..8a0cd39 100644
--- a/tika/tika.go
+++ b/tika/tika.go
@@ -29,6 +29,16 @@ import (
        "golang.org/x/net/context/ctxhttp"
 )
 
+// ClientError represents an error response from the Tika server.
+type ClientError struct {
+       // StatusCode is the http status code returned by the Tika server.
+       StatusCode int
+}
+
+func (e ClientError) Error() string {
+       return fmt.Sprintf("response code %d", e.StatusCode)
+}
+
 // Client represents a connection to a Tika Server.
 type Client struct {
        // url is the URL of the Tika Server, including the port (if necessary), but
@@ -107,7 +117,7 @@ func (c *Client) call(ctx context.Context, input io.Reader, method, path string,
        }
        defer resp.Body.Close()
        if resp.StatusCode != http.StatusOK {
-               return nil, fmt.Errorf("response code %v", resp.StatusCode)
+               return nil, ClientError{resp.StatusCode}
        }
        return ioutil.ReadAll(resp.Body)
 }

The calling code can do something like

func doStuff(input io.Reader, tikaURL string) error {
    client := tika.NewClient(nil, tikaURL)
    s, err := client.Parse(context.Background(), input)
    if isUnsupportedFileFormat(err) {
        return nil
    }
    if err != nil {
        return err
    }
   ...
}

func isUnsupportedFileFormat(err error) bool {
    var tikaErr tika.ClientError

    if errors.As(err, &tikaErr) {
        switch tikaErr.StatusCode {
        // Password protected documents yield StatusUnprocessableEntity
        case http.StatusUnsupportedMediaType, http.StatusUnprocessableEntity:
            return true
        default:
            return false
        }
    }

    return false
}

Thoughts? I'm happy to submit a PR if a change like this would be accepted.

@tbpg
Copy link
Member

tbpg commented Jul 16, 2020

This looks good to me! Thanks for filing an issue and offering to send a PR.

@tomyl
Copy link
Contributor Author

tomyl commented Jul 17, 2020

Cool, I submitted PR #25.

@tbpg
Copy link
Member

tbpg commented Aug 12, 2020

Closing this as the PR has been merged. Thanks!

@tbpg tbpg closed this as completed Aug 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants