Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for grpc.tools #17995 & protobuf #7474 (handle UTF-8 paths in argumentfile) #10200

Merged
merged 2 commits into from
Jul 6, 2022

Conversation

tonydnewell
Copy link
Contributor

@tonydnewell tonydnewell commented Jul 1, 2022

Fix for #7474 (and grpc/grpc#17995)

When calling the plugins in subprocess.cc Subprocess::Start() fix to use CreateProcessW instead of CreateProcessA to handle non-ascii paths.

See grpc/grpc#17995 (comment) for a summary of what the actual problem is and why the fix looks like the way it does.



Use CreateProcessW so that non-ascii paths are handled correctly
@google-cla
Copy link

google-cla bot commented Jul 1, 2022

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@jtattermusch jtattermusch changed the title Fix for grpc.tools #17995 & protobuf #7474 Fix for grpc.tools #17995 & protobuf #7474 (handle UTF-8 paths in argumentfile) Jul 4, 2022
@jtattermusch
Copy link
Contributor

@acozzette can you please review? This is an important fix for gRPC's codegen support and you seems to be the author of the argumentFile parsing logic in protoc.

@jtattermusch
Copy link
Contributor

@tonydnewell just to clarify:

  • based on the analysis Grpc.Tools: Accented characters in nuget path breaks c# grpc tool compiler grpc/grpc#17995 (comment) and the changes made here, it seems that the only problematic part is if the protoc plugin's full path (grpc_csharp_plugin in our case) contains non-ASCII characters. IIUC if other arguments (e.g. the .proto file names) contain non-ascii characters in their path, it isn't currently a problem (did you verify that?), because the logic for handling UTF-8 file names for other params already exists. Only for the protoc plugin arg, things are different since the invocation of the protoc plugin happens through subprocess.cc
  • Based on your investigation, can you tell whether the problem is only specific to when using the argument file (protoc.exe @test.rsp) as originally reported or if there are other ways of reproducing it (e.g. invoking protoc.exe with args on command line normally)?

@jtattermusch
Copy link
Contributor

CC @jskeet


// Create the process.
PROCESS_INFORMATION process_info;

if (CreateProcessA((search_mode == SEARCH_PATH) ? nullptr : program.c_str(),
(search_mode == SEARCH_PATH) ? command_line : nullptr,
if (CreateProcessW((search_mode == SEARCH_PATH) ? NULL : wprogram.c_str(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: change NULL -> nullptr everywhere in the diff (as that seems to be the current codestyle requirement in protobuf codebase?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the code to use nullptr

@tonydnewell
Copy link
Contributor Author

@tonydnewell just to clarify:

  • based on the analysis Grpc.Tools: Accented characters in nuget path breaks c# grpc tool compiler grpc/grpc#17995 (comment) and the changes made here, it seems that the only problematic part is if the protoc plugin's full path (grpc_csharp_plugin in our case) contains non-ASCII characters. IIUC if other arguments (e.g. the .proto file names) contain non-ascii characters in their path, it isn't currently a problem (did you verify that?), because the logic for handling UTF-8 file names for other params already exists. Only for the protoc plugin arg, things are different since the invocation of the protoc plugin happens through subprocess.cc

It is just the plugin path that is an issue (these are the paths used in subprocess.cc). All other paths/names seem to work ok as they use the functions in io_win32.cc which do the utf-8 to wide string conversion correctly.

I've tested with non-ASCII proto_path and non-ASCII .proto file name. Here is one of my test input rsp files:

--plugin=protoc-gen-grpc=E:\work\grpc\protoctest\test-Dré\tools\grpc_csharp_plugin.exe
--grpc_out=E:\work\grpc\protoctest\test-Dré\grpc_out
--proto_path=E:\work\grpc\protoctest\test-Dré
hélloworld.proto

  • Based on your investigation, can you tell whether the problem is only specific to when using the argument file (protoc.exe @test.rsp) as originally reported or if there are other ways of reproducing it (e.g. invoking protoc.exe with args on command line normally)?

The reported problem was with using an utf-8 encoding in an argument file and this is what I've tested and is what Grpc.Tools uses.

In theory it should be the same with command line arguments as all that reading the argument file does (in ExpandArgumentFile) is push the lines read onto the same vector of command line arguments before the arguments are interpreted. However this does not seem to work - I think the arguments are getting changed before passing to main(int argc, char* argv[]). I've tried with cmd.exe (with chcp 65001), PowerShell and git bash - all fail to pass the argument correctly.

If we want to fix that too then may need to change to using wmain(int argc, wchar_t* argv[]) on Windows - but that has bigger impact as wchar_t will need to be passed everywhere. Maybe just document that if you want to use non-ASCII characters then pass the arguments in a file.

@tonydnewell
Copy link
Contributor Author

As I said above to handle non-ascii arguments on the command line (for Windows) then wmain needs to be used. The code would need to be something like below. I've not included that in the fix as it is probably outside the scope of the issue being fixed:

In main.cc:

#ifndef _WIN32
int main(int argc, char* argv[]) {
  return PROTOBUF_NAMESPACE_ID::compiler::ProtobufMain(argc, argv);
}
#else
int wmain( int argc, wchar_t *wargv[ ]) {

  // convert all arguments to utf-8
  char **argv = (char **)malloc(sizeof(char *) * argc);
  for (int i = 0; i < argc; ++i) {
    std::string utf8str;
    if (!google::protobuf::io::win32::strings::wcs_to_utf8(wargv[i], &utf8str)) {
      //GOOGLE_LOG(FATAL) << "wcs_to_utf8: " << Win32ErrorMessage(GetLastError());
    }
    argv[i] = _strdup(utf8str.c_str());
  }

  // call ProtobufMain with the utf-8 arguments
  int sts = PROTOBUF_NAMESPACE_ID::compiler::ProtobufMain(argc, argv);

  // free memory allocated for arguments
  for (int i = 0; i < argc; ++i)
    free(argv[i]);
  free(argv);
  return sts;
}
#endif

@jtattermusch
Copy link
Contributor

Thanks for the explanation, the fix makes sense to me. Looks like after this fix, all the command line arguments would work correctly with accents when passed via an argument file (e.g. @test.rsp) - most of them did already work correctly, but the plugin path didn't.

What's still a bit puzzling to me is why the command line argument don't seem to work with accents when regular command line arguments (not argument files) are used.
I found that that's something that should have been fixed long time ago by an old PR #4013 (which introduced most of the logic for converting args including as_winows_path).
As you said, fixing the argument processing for regular command line arguments isn't the goal here and should probably be fixed separately (if it indeed doesn't work), but it would still be good to understand what's wrong with it (it looks like it must have worked at some point, so perhaps there was a regression?).

Perhaps relevant:

// Using a malloc'ed string because CreateProcess() can mutate its second
// parameter.
char* command_line =
portable_strdup(("cmd.exe /c \"" + program + "\"").c_str());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is another call to portable_strdup in SubProccess::Start. Does that need to be updated as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. It is used in the non-Windows code and this is a Windows specific fix.

@jtattermusch
Copy link
Contributor

Looks like this change didn't make it to protobuf's 21.x release. I'd therefore expect it to become available as part of 22.x release (AFAIKT that release cycle hasn't started yet).

@fowles
Copy link
Contributor

fowles commented Sep 8, 2022

If you would like to include this in the next 21.x repease, please send a PR that cherry picks this back

tonydnewell pushed a commit to tonydnewell/protobuf that referenced this pull request Oct 5, 2022
…obuf-7474

Fix for grpc.tools protocolbuffers#17995 & protobuf protocolbuffers#7474 (handle UTF-8 paths in argumentfile)
@tonydnewell
Copy link
Contributor Author

@fowles

If you would like to include this in the next 21.x repease, please send a PR that cherry picks this back

PR #10721

deannagarcia added a commit that referenced this pull request Oct 6, 2022
Merge pull request #10200 from tonydnewell/bugfix/protobuf-7474
mkruskal-google added a commit that referenced this pull request Oct 19, 2022
* Force uninstall protobuf in python macos builds

We are seeing failures in brew uninstall protobuf due to no package. Change this to a force install to avoid the error.

* Fix spelling errors (#10717)

* Merge pull request #10200 from tonydnewell/bugfix/protobuf-7474

Fix for grpc.tools #17995 & protobuf #7474 (handle UTF-8 paths in argumentfile)

* Upgrade to kotlin 1.6

* 21.x No longer define no_threadlocal on OpenBSD

* Upgrade kokoro to Xcode 14 (#10732)

* Upgrade kokoro to Xcode 14

* Fix osx errors

* Merge pull request #10770 from protocolbuffers/googleberg-cl-480629524

Mark default instance as immutable first to avoid race during static initialization of default instances.

* Auto capitalize enums name in Ruby (#10454) (#10763)

This closes #1965.

* Edit toolchain to work with absl dep

* Bump upb to latest version after fixes applied (#10783)

* 21.x 202210180838 (#10785)

* Updating version.json and repo version numbers to: 21.8

* Update changelog

Co-authored-by: Protobuf Team Bot <protobuf-team-bot@google.com>

* Update generated protos

Co-authored-by: deannagarcia <69992229+deannagarcia@users.noreply.github.com>
Co-authored-by: Matt Fowles Kulukundis <matt.fowles@gmail.com>
Co-authored-by: Deanna Garcia <deannagarcia@google.com>
Co-authored-by: Brad Smith <brad@comstyle.com>
Co-authored-by: Jerry Berg <107155935+googleberg@users.noreply.github.com>
Co-authored-by: tison <wander4096@gmail.com>
Co-authored-by: Protobuf Team Bot <protobuf-team-bot@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants