Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Diagnostics lost during JSON serialization #170

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions clients/vscode-hlasmplugin/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
- Fix instruction formats (STNSM, STOSM, CUUTF, CU21, LLI[LH][LH])
- Evaluation of T'&VAR(num), where VAR is type C-type var symbol array
- &SYSMAC should contain only the macro name
- Diagnostics lost during JSON serialization

## [0.14.0](https://github.com/eclipse/che-che4z-lsp-for-hlasm/compare/0.13.0...0.14.0) (2021-08-18)

Expand Down
50 changes: 49 additions & 1 deletion parser_library/src/processing/statement_fields_parser.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,54 @@ const parsing::parser_holder& statement_fields_parser::prepare_parser(const std:
return *m_parser;
}

void append_sanitized(std::string& result, std::string_view str)
{
auto it = str.begin();
auto end = str.end();
while (true)
{
auto first_complex = std::find_if(it, end, [](unsigned char c) { return c >= 0x80; });
result.append(it, first_complex);
it = first_complex;
if (it == end)
break;

unsigned char c = *it;
auto cs = lexing::utf8_prefix_sizes[c];
if (cs.utf8 && (end - it) >= cs.utf8
&& std::all_of(it + 1, it + cs.utf8, [](unsigned char c) { return (c & 0xC0) == 0x80; }))
{
result.append(it, it + cs.utf8);
it += cs.utf8;
}
else
{
static const char hex_digits[] = "0123456789ABCDEF";
result.append(1, '<');
result.append(1, hex_digits[(c >> 4) & 0xf]);
result.append(1, hex_digits[(c >> 0) & 0xf]);
result.append(1, '>');

++it;
}
}
}

std::string decorate_message(const std::string& field, const std::string& message)
{
static const std::string_view prefix = "While evaluating the result of substitution '";
static const std::string_view arrow = "' => ";
std::string result;
result.reserve(prefix.size() + field.size() + arrow.size() + message.size());

result.append(prefix);
append_sanitized(result, field);
result.append(arrow);
result.append(message);

return result;
}

std::pair<semantics::operands_si, semantics::remarks_si> statement_fields_parser::parse_operand_field(std::string field,
bool after_substitution,
semantics::range_provider field_range,
Expand All @@ -64,7 +112,7 @@ std::pair<semantics::operands_si, semantics::remarks_si> statement_fields_parser

diagnostic_consumer_transform add_diag_subst([&field, &add_diag, after_substitution](diagnostic_op diag) {
if (after_substitution)
diag.message = "While evaluating the result of substitution '" + field + "' => " + std::move(diag.message);
diag.message = decorate_message(field, diag.message);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to run the sanitizer in the language_server to sanitize all diagnostics? I think there are some more diagnostics that show names of symbols... Although those are probably guaranteed to be 'nice' strings since they were lexed as ordinary symbols, it may be more futureproof solution, if we decide to add different diagnostics in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, there are more occasions where we copy the source code into json messages, see lsp_context::get_macro_documentation, and maybe some other places in lsp_context

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell, the only instance when we may end up including an invalid utf-8 sequences into diagnostics is when the code is generated (e.g. by substitutions).
In all other cases the text should be already sanitized or even parsed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you are right, I forgot that we sanitize the source prior to parsing it.

How about values of variables that we send via DAP?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look into that one... 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, so dap variables are affected as well. But we could address that together with other dap issues that need to be solved.

add_diag.add_diagnostic(std::move(diag));
});
const auto& h = prepare_parser(field, after_substitution, std::move(field_range), status, add_diag_subst);
Expand Down
29 changes: 29 additions & 0 deletions parser_library/test/parsing/parser_model_test.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -240,3 +240,32 @@ TEST(parser, parse_single_apostrophe_literal)
auto cc = concatenation_point::to_string(model->chain);
EXPECT_EQ(std::count(cc.begin(), cc.end(), '\''), 4);
}

TEST(parser, sanitize_message_content_replace)
{
diagnostic_op_consumer_container diag_container;

range r(position(0, 10), position(0, 15));
auto [op, rem] = parse_model("=C'\xC2'", r, true, &diag_container);

ASSERT_EQ(diag_container.diags.size(), 1);

const auto& msg = diag_container.diags[0].message;

EXPECT_TRUE(std::all_of(msg.begin(), msg.end(), [](unsigned char c) { return c < 0x80; }));
}

TEST(parser, sanitize_message_content_valid_multibyte)
{
diagnostic_op_consumer_container diag_container;

range r(position(0, 10), position(0, 14));
std::string line = "=C'\xC2\x80";
auto [op, rem] = parse_model(line, r, true, &diag_container);

ASSERT_EQ(diag_container.diags.size(), 1);

const auto& msg = diag_container.diags[0].message;

EXPECT_NE(msg.find(line), std::string::npos);
}