Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose Tika http status code in errors returned by client methods #25

Merged
merged 6 commits into from
Aug 5, 2020
Merged

Expose Tika http status code in errors returned by client methods #25

merged 6 commits into from
Aug 5, 2020

Conversation

tomyl
Copy link
Contributor

@tomyl tomyl commented Jul 17, 2020

For users of tika.Client it can be useful to be able to differentiate between intermittent errors (http status code 500) and content related errors (e.g. 415 and 422) however currently the client methods just return an opaque error string.

This PR exposes the http status code in the errors returned by the client methods. Example usage:

func doStuff(input io.Reader, tikaURL string) error {
    client := tika.NewClient(nil, tikaURL)
    s, err := client.Parse(context.Background(), input)
    if isUnsupportedFileFormat(err) {
        return nil
    }
    if err != nil {
        return err
    }
   ...
}

func isUnsupportedFileFormat(err error) bool {
    var tikaErr tika.ClientError

    if errors.As(err, &tikaErr) {
        switch tikaErr.StatusCode {
        // Password protected documents yield StatusUnprocessableEntity
        case http.StatusUnsupportedMediaType, http.StatusUnprocessableEntity:
            return true
        default:
            return false
        }
    }

    return false
}

I considered going with type ClientError int but I thought that a struct is more future proof. Perhaps it could contain information about the exact Tika exception in the future.

tomyl added 4 commits July 9, 2020 16:53
Previously it was not possible for calling code to tell the difference
between e.g. errors 422 and 500 without parsing the opaque error string.
@googlebot
Copy link

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

@tomyl
Copy link
Contributor Author

tomyl commented Jul 17, 2020

@googlebot I signed it!

@googlebot
Copy link

CLAs look good, thanks!

ℹ️ Googlers: Go here for more info.

Copy link
Member

@tbpg tbpg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I left a couple minor comments. But, this looks good!

tika/tika.go Show resolved Hide resolved
tika/tika.go Outdated Show resolved Hide resolved
tomyl and others added 2 commits July 21, 2020 10:01
Co-authored-by: Tyler Bui-Palsulich <26876514+tbpg@users.noreply.github.com>
Copy link
Member

@tbpg tbpg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry this fell off my radar. Looks good! Thanks!

@tbpg tbpg merged commit 1e81b65 into google:master Aug 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants