Karate (Thymeleaf -> SAXParser) prints "Fatal Error" when encountering valid HTML5 #1684

bnutzer · 2021-07-16T11:42:25Z

I am attempting to use Karate to authenticate with our service via an OAuth2 authorization code grant. During the process, my test needs to fetch (and parse, using JSoup in a Java interop) our service's scope approval web page. This page is valid HTML5 (hopefully), containing unclosed <link> and <meta> tags.

As far as I can tell, karate attempts to parse that page through thymeleaf, which, in turn seems to attempt to use a SAXParser to process. That results in an error message "Fatal Error" in the logs (although the test then continues and succeeds).

I am attaching a trivial feature file
html5-fetch.feature.txt
to fetch from a stub from a web site of mine.

Unfortunately, I seem to be unable to produce an English error message; the German equivalent is

[Fatal Error] :7:5: Elementtyp "link" muss mit dem entsprechenden Endtag "</link>" beendet werden.
13:24:03.947 [main] WARN  com.intuit.karate - auto-conversion of response failed: org.xml.sax.SAXParseException; lineNumber: 7; columnNumber: 5; Elementtyp "link" muss mit dem entsprechenden Endtag "</link>" beendet werden.

(seems to translate to org.xml.sax.SAXParseException: Element type "link" must be terminated by a matching end tag "</link>")

I would prefer not having "Fatal Errors" in my test logs, even when they succeed. It does not seem to be a good idea to attempt to parse responses as xml, unless the content type is xml.

I am fine if you regard this topic as a non-issue :)

The text was updated successfully, but these errors were encountered:

ptrthomas · 2021-07-16T13:04:42Z

yes a wontfix this is what I see on my console BTW, and I have no idea how it becomes Thymeleaf for you - but my guess is that the system XML parser has been "hijacked" by JSoup. here is a related issue for anyone who cares to investigate a fix: #1587

18:29:38.757 [main] DEBUG com.intuit.karate - response time in milliseconds: 547
1 < 200
1 < Date: Fri, 16 Jul 2021 12:59:38 GMT
1 < Server: Apache/2.4.43 (Linux/SUSE)
1 < Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
1 < Last-Modified: Fri, 16 Jul 2021 11:12:02 GMT
1 < ETag: "d2-5c73ba833de64"
1 < Accept-Ranges: bytes
1 < Content-Length: 210
1 < Keep-Alive: timeout=15, max=100
1 < Connection: Keep-Alive
1 < Content-Type: text/html
<!DOCTYPE html>
<html lang="de">
  <head>
    <meta charset="utf-8">
    <link rel="stylesheet" type="text/css" href="./format.css">
    <title>Stub</title>
  </head>
  <body>
    <p>Stub</p>
  </body>
</html>

[Fatal Error] :7:5: The element type "link" must be terminated by the matching end-tag "</link>".
18:29:38.770 [main] WARN  com.intuit.karate - auto-conversion of response failed: org.xml.sax.SAXParseException; lineNumber: 7; columnNumber: 5; The element type "link" must be terminated by the matching end-tag "</link>".

you are welcome to improve the error message and submit a PR !

BTW you can also add this line to your test:

* assert responseType == 'string'

and since you seem to trying to scrape things out of HTML see this also: https://stackoverflow.com/a/67331307/143475

ptrthomas · 2021-07-16T13:40:15Z

actually on second thoughts can @edwardsph comment on this related to #1462

it does make sense that we only convert to XML based on the response content-type. I just worry about existing tests in the wild as always (especially a lot of SOAP users). and if there are more "exotic" content-types that happen to be just XML

edwardsph · 2021-07-16T14:13:16Z

I think it should detect that the content-type is neither JSON, XML nor plain text and therefore not attempt to SAX parse it but I will have a closer look next week.

…#1684

ptrthomas · 2021-07-16T15:52:57Z

@edwardsph okay I took care of it and the diff is interesting. I think it will be fine, but anyway we will do RC5 to see what the feedback is

@bnutzer you convinced us :) reopening - and do let me know if the fix works !

bnutzer · 2021-07-16T16:26:53Z

@ptrthomas Perfect! Works for me, fixes both the sample request as well as my production code.

Thanks for your hint for the karate.extract() function. Probably a valuable solution for lots of cases. However, in my case I need to collect all "name" attributes of all html entities with a certain "class" attribute. That's 3 lines of well readable code using Java, but would be hell of a regex using the "extract" function, I suppose.

The diff looks straight forward and really sensible to me. However, I do understand that this might be breaking for some folks.

Thanks again!

ptrthomas · 2021-07-16T16:50:27Z

@bnutzer I'll keep this open until the "final" version, that's what we usually do with especially (potentially) breaking issues like this one

P.S. thanks for starring the project - you made it into the screenshot: https://twitter.com/KarateDSL/status/1416013547067248640

ptrthomas · 2021-08-04T19:18:13Z

1.1.0 released

ericdriggs · 2021-08-05T20:57:50Z

Please re-open. See PR #1705

ptrthomas added the wontfix label Jul 16, 2021

ptrthomas closed this as completed Jul 16, 2021

ptrthomas self-assigned this Jul 16, 2021

ptrthomas added enhancement and removed wontfix labels Jul 16, 2021

ptrthomas added this to the 1.1.0 milestone Jul 16, 2021

ptrthomas added a commit that referenced this issue Jul 16, 2021

[breaking possibly] auto convert response to xml based on content-type …

88ac26a

…#1684

ptrthomas reopened this Jul 16, 2021

bnutzer closed this as completed Jul 16, 2021

ptrthomas reopened this Jul 16, 2021

ptrthomas added the fixed label Jul 16, 2021

ptrthomas closed this as completed Aug 4, 2021

ericdriggs mentioned this issue Aug 5, 2021

Support parsing valid HTML5 responses #1705

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Karate (Thymeleaf -> SAXParser) prints "Fatal Error" when encountering valid HTML5 #1684

Karate (Thymeleaf -> SAXParser) prints "Fatal Error" when encountering valid HTML5 #1684

bnutzer commented Jul 16, 2021

ptrthomas commented Jul 16, 2021 •

edited

Loading

ptrthomas commented Jul 16, 2021

edwardsph commented Jul 16, 2021

ptrthomas commented Jul 16, 2021

bnutzer commented Jul 16, 2021

ptrthomas commented Jul 16, 2021

ptrthomas commented Aug 4, 2021

ericdriggs commented Aug 5, 2021 •

edited

Loading

Karate (Thymeleaf -> SAXParser) prints "Fatal Error" when encountering valid HTML5 #1684

Karate (Thymeleaf -> SAXParser) prints "Fatal Error" when encountering valid HTML5 #1684

Comments

bnutzer commented Jul 16, 2021

ptrthomas commented Jul 16, 2021 • edited Loading

ptrthomas commented Jul 16, 2021

edwardsph commented Jul 16, 2021

ptrthomas commented Jul 16, 2021

bnutzer commented Jul 16, 2021

ptrthomas commented Jul 16, 2021

ptrthomas commented Aug 4, 2021

ericdriggs commented Aug 5, 2021 • edited Loading

ptrthomas commented Jul 16, 2021 •

edited

Loading

ericdriggs commented Aug 5, 2021 •

edited

Loading