Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Karate (Thymeleaf -> SAXParser) prints "Fatal Error" when encountering valid HTML5 #1684

Closed
bnutzer opened this issue Jul 16, 2021 · 8 comments
Assignees

Comments

@bnutzer
Copy link

bnutzer commented Jul 16, 2021

I am attempting to use Karate to authenticate with our service via an OAuth2 authorization code grant. During the process, my test needs to fetch (and parse, using JSoup in a Java interop) our service's scope approval web page. This page is valid HTML5 (hopefully), containing unclosed <link> and <meta> tags.

As far as I can tell, karate attempts to parse that page through thymeleaf, which, in turn seems to attempt to use a SAXParser to process. That results in an error message "Fatal Error" in the logs (although the test then continues and succeeds).

I am attaching a trivial feature file
html5-fetch.feature.txt
to fetch from a stub from a web site of mine.

Unfortunately, I seem to be unable to produce an English error message; the German equivalent is

[Fatal Error] :7:5: Elementtyp "link" muss mit dem entsprechenden Endtag "</link>" beendet werden.
13:24:03.947 [main] WARN  com.intuit.karate - auto-conversion of response failed: org.xml.sax.SAXParseException; lineNumber: 7; columnNumber: 5; Elementtyp "link" muss mit dem entsprechenden Endtag "</link>" beendet werden.

(seems to translate to org.xml.sax.SAXParseException: Element type "link" must be terminated by a matching end tag "</link>")

I would prefer not having "Fatal Errors" in my test logs, even when they succeed. It does not seem to be a good idea to attempt to parse responses as xml, unless the content type is xml.

I am fine if you regard this topic as a non-issue :)

@ptrthomas
Copy link
Member

ptrthomas commented Jul 16, 2021

yes a wontfix this is what I see on my console BTW, and I have no idea how it becomes Thymeleaf for you - but my guess is that the system XML parser has been "hijacked" by JSoup. here is a related issue for anyone who cares to investigate a fix: #1587

18:29:38.757 [main] DEBUG com.intuit.karate - response time in milliseconds: 547
1 < 200
1 < Date: Fri, 16 Jul 2021 12:59:38 GMT
1 < Server: Apache/2.4.43 (Linux/SUSE)
1 < Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
1 < Last-Modified: Fri, 16 Jul 2021 11:12:02 GMT
1 < ETag: "d2-5c73ba833de64"
1 < Accept-Ranges: bytes
1 < Content-Length: 210
1 < Keep-Alive: timeout=15, max=100
1 < Connection: Keep-Alive
1 < Content-Type: text/html
<!DOCTYPE html>
<html lang="de">
  <head>
    <meta charset="utf-8">
    <link rel="stylesheet" type="text/css" href="./format.css">
    <title>Stub</title>
  </head>
  <body>
    <p>Stub</p>
  </body>
</html>

[Fatal Error] :7:5: The element type "link" must be terminated by the matching end-tag "</link>".
18:29:38.770 [main] WARN  com.intuit.karate - auto-conversion of response failed: org.xml.sax.SAXParseException; lineNumber: 7; columnNumber: 5; The element type "link" must be terminated by the matching end-tag "</link>".

you are welcome to improve the error message and submit a PR !

BTW you can also add this line to your test:

* assert responseType == 'string'

and since you seem to trying to scrape things out of HTML see this also: https://stackoverflow.com/a/67331307/143475

@ptrthomas
Copy link
Member

actually on second thoughts can @edwardsph comment on this related to #1462

it does make sense that we only convert to XML based on the response content-type. I just worry about existing tests in the wild as always (especially a lot of SOAP users). and if there are more "exotic" content-types that happen to be just XML

@edwardsph
Copy link
Contributor

I think it should detect that the content-type is neither JSON, XML nor plain text and therefore not attempt to SAX parse it but I will have a closer look next week.

@ptrthomas ptrthomas self-assigned this Jul 16, 2021
@ptrthomas ptrthomas added this to the 1.1.0 milestone Jul 16, 2021
@ptrthomas
Copy link
Member

@edwardsph okay I took care of it and the diff is interesting. I think it will be fine, but anyway we will do RC5 to see what the feedback is

@bnutzer you convinced us :) reopening - and do let me know if the fix works !

@ptrthomas ptrthomas reopened this Jul 16, 2021
@bnutzer
Copy link
Author

bnutzer commented Jul 16, 2021

@ptrthomas Perfect! Works for me, fixes both the sample request as well as my production code.

Thanks for your hint for the karate.extract() function. Probably a valuable solution for lots of cases. However, in my case I need to collect all "name" attributes of all html entities with a certain "class" attribute. That's 3 lines of well readable code using Java, but would be hell of a regex using the "extract" function, I suppose.

The diff looks straight forward and really sensible to me. However, I do understand that this might be breaking for some folks.

Thanks again!

@bnutzer bnutzer closed this as completed Jul 16, 2021
@ptrthomas
Copy link
Member

@bnutzer I'll keep this open until the "final" version, that's what we usually do with especially (potentially) breaking issues like this one

P.S. thanks for starring the project - you made it into the screenshot: https://twitter.com/KarateDSL/status/1416013547067248640

@ptrthomas ptrthomas reopened this Jul 16, 2021
@ptrthomas
Copy link
Member

1.1.0 released

@ericdriggs
Copy link

ericdriggs commented Aug 5, 2021

Please re-open. See PR #1705

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants