Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential memory leak in dynamic_ot tracers #7647

Closed
lizan opened this issue Jul 19, 2019 · 25 comments
Closed

Potential memory leak in dynamic_ot tracers #7647

lizan opened this issue Jul 19, 2019 · 25 comments
Labels
bug stale stalebot believes this issue/PR has not been touched recently

Comments

@lizan
Copy link
Member

lizan commented Jul 19, 2019

Description:
Prior to #7555, we didn't statically link libstdc++ in tests, so the test didn't fail though it is now fails under Ubuntu 18.04 with gperftools HEAPCHECK. Since we do statically link libstdc++ for released envoy binary, I suspect this is the issue living there for a while.

LSAN also detects the leak if we link libstdc++ statically in ASAN build.

Some error logs caught by Bazel CI:
https://storage.googleapis.com/bazel-untrusted-buildkite-artifacts/69421b65-80ea-4551-bcc7-7094f44dbe62/test/extensions/tracers/dynamic_ot/config_test/attempt_1.log
https://storage.googleapis.com/bazel-untrusted-buildkite-artifacts/69421b65-80ea-4551-bcc7-7094f44dbe62/test/extensions/tracers/dynamic_ot/dynamic_opentracing_driver_impl_test/attempt_1.log

@lizan
Copy link
Member Author

lizan commented Jul 30, 2019

cc @rnburn for OpenTracing C++ visibility.

I guess this is a bug of either libstdc++ or opentracing-cpp, the FlushSpans failure is kind of critical as it produced incomplete JSON and likely broken too in production.

There is no perfect solution at this moment, some workaround:

  • use libc++ by using --config=libc++ to bazel, though this requires rebuilding dynamic_ot tracer with libc++ too, there is no ABI compatibility between libc++ and libstdc++.
  • linking libstdc++ dynamically by adding --linkopt=-lstdc++ to bazel, note //test/exe:static_static_test will fail due to that.

@htuch
Copy link
Member

htuch commented Aug 6, 2019

@rnburn friendly ping; this issue is affecting a fair few users (including at Google). From some preliminary poking around, this is not in the Envoy integration but either Dynamic OT or the mock tracer that gets loaded.

@rnburn
Copy link
Contributor

rnburn commented Aug 6, 2019

The tracer plugins follow the approach described here. They statically link the standard c++ library and use an export map so that the different libraries can coexist independently.

Statically linking libstdc++ in a binary and a DSO without an export map or linking the binary statically and a DSO dynamically isn't supported.

The mocktracer plugin, which was only built for testing, doesn't use an export map or static linking. If all of envoys tests now statically link the standard c++ library, maybe the best way to solve this is to add an additional mocktracer target that links to the existing target in opentracing-cpp but statically links libstdc++ (or libc++) and applies the export map.

@jmarantz
Copy link
Contributor

jmarantz commented Aug 6, 2019

Who owns this problem? I assume that's the person who integrated OpenTracing into Envoy?

@rnburn
Copy link
Contributor

rnburn commented Aug 6, 2019

I can put in a change for this.

But to be clear, the problem was caused when the tests were changed to statically link the standard c++ library. When I originally integrated, the tests were dynamically linking to the standard c++ library and the mocktracer plugin was dynamically linking to the standard c++ library which is an approach that works fine (and is a sensible default for the mocktracer plugin).

@lizan
Copy link
Member Author

lizan commented Aug 6, 2019

@rnburn

The mocktracer plugin, which was only built for testing, doesn't use an export map or static linking.

It does, the toolchain config does applied to all C++ target built within Envoy workspace, so the mock tracer plugin is static linking libstdc++ in that sense.

 ldd bazel-bin/external/io_opentracing_cpp/mocktracer/libmocktracer_plugin.so
        linux-vdso.so.1 (0x00007ffedc79c000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f2a9f7a5000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f2a9f5a1000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f2a9f1b0000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f2a9fb4c000)

But to be clear, the problem was caused when the tests were changed to statically link the standard c++ library. When I originally integrated, the tests were dynamically linking to the standard c++ library and the mocktracer plugin was dynamically linking to the standard c++ library which is an approach that works fine (and is a sensible default for the mocktracer plugin).

Again the toolchain does apply to both, so now both tests AND mocktracer plugin is statically linked to the standard C++ library. (confirm it using ldd). The problem is not test specific as now we link tests in exactly same way that we link envoy-static binary, so this issue might live in envoy-static binary (built on ubuntu 18.04) even longer than it is revealed in tests.

Though I think there are high chance this is due to a libstdc++ bug, because doing same with libc++ doesn't cause this.

@rnburn
Copy link
Contributor

rnburn commented Aug 6, 2019

Again the toolchain does apply to both, so now both tests AND mocktracer plugin is statically linked to the standard C++ library. (confirm it using ldd). The problem is not test specific as now we link tests in exactly same way that we link envoy-static binary, so this issue might live in envoy-static binary (built on ubuntu 18.04) even longer than it is revealed in tests.

Export maps (like what's done here https://github.com/lightstep/lightstep-tracer-cpp/blob/master/BUILD.bazel#L75 ) are a critical part of making this dynamic loading work.

So, if the linking was changed so that the test binary is linking to libstdc++ statically and the plugin DSO is linked statically without an export map, then, yes, you'll have issues.

Without the export map, you have two copies of libstdc++, but some of the symbols will get resolved to those in the test binaries and some will get resolved to those in the DSO.

The export map lets you link two versions of libstdc++ without any of the symbols mixing. See the notes on the SO post: https://stackoverflow.com/questions/47841812/how-to-support-dynamic-plugins-when-statically-linking-in-standard-libraries

@lizan
Copy link
Member Author

lizan commented Aug 7, 2019

@rnburn I just give the export map a quick try: https://github.com/envoyproxy/envoy/compare/master...lizan:dynamic_ot?expand=1

This does fix the FlushSpan test failure in the first comment, though heap profiler is still complains about the memory leak.

exec ${PAGER:-/usr/bin/less} "$0" || exit 1
Executing tests from //test/extensions/tracers/dynamic_ot:dynamic_opentracing_driver_impl_test
-----------------------------------------------------------------------------
WARNING: Perftools heap leak checker is active -- Performance may suffer
[==========] Running 3 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 3 tests from DynamicOpenTracingDriverTest
[ RUN      ] DynamicOpenTracingDriverTest.formatErrorMessage
[       OK ] DynamicOpenTracingDriverTest.formatErrorMessage (7 ms)
[ RUN      ] DynamicOpenTracingDriverTest.InitializeDriver
[       OK ] DynamicOpenTracingDriverTest.InitializeDriver (1 ms)
[ RUN      ] DynamicOpenTracingDriverTest.FlushSpans
[       OK ] DynamicOpenTracingDriverTest.FlushSpans (2 ms)
[----------] 3 tests from DynamicOpenTracingDriverTest (11 ms total)

[----------] Global test environment tear-down
[==========] 3 tests from 1 test suite ran. (11 ms total)
[  PASSED  ] 3 tests.
Have memory regions w/o callers: might report false leaks
Leak check _main_ detected leaks of 145408 bytes in 2 objects
The 1 largest leaks:
*** WARNING: Cannot convert addresses to symbols in output below.
*** Reason: Cannot find 'pprof' (is PPROF_PATH set correctly?)
*** If you cannot fix this, try running pprof directly.
Leak of 145408 bytes in 2 objects allocated from:
        @ 7f8e584eb096


If the preceding stack traces are not enough to find the leaks, try running THIS shell command:

pprof /home/lizan/.cache/bazel/_bazel_lizan/47a7888a6f37751b3be101c298a81af4/sandbox/linux-sandbox/1557/execroot/envoy/bazel-out/k8-fastbuild/bin/test/extensions/tracers/dynamic_ot/dynamic_opentracing_driver_impl_test.runfiles/envoy/test/extensions/tracers/dynamic_ot/dynamic_opentracing_driver_impl_test "/tmp/dynamic_opentracing_driver_impl_test.14._main_-end.heap" --inuse_objects --lines --heapcheck  --edgefraction=1e-10 --nodefraction=1e-10 --gv

If you are still puzzled about why the leaks a
Exiting with error code (instead of crashing) because of whole-program memory leaks

@rnburn
Copy link
Contributor

rnburn commented Aug 7, 2019

Does gperftools' heap checker work for this type of usage with dlopen?

Its docs have this section

In order to catch all heap leaks, tcmalloc must be linked last into
your executable. The heap checker may mischaracterize some memory
accesses in libraries listed after it on the link line. For instance,
it may report these libraries as leaking memory when they're not.
(See the source code for more details.)

Could we also see what the output says with this line commented out so that we can symbols in the stack trace?

@lizan
Copy link
Member Author

lizan commented Aug 7, 2019

LeakSanitizer (which doesn't link gperftools) fails too, if you comment out BAZEL_LINKOPTS/BAZEL_LINKLIBS from .bazelrc to statically link libstdc++.

Let me try to comment out that line.

@lizan
Copy link
Member Author

lizan commented Aug 9, 2019

Hmm, if I comment out dlclose or add RTLD_NODELETE to dlopen, the test passes (in both case).

@rnburn do you think this is a false positive of LSAN and gperftools profiler?

@rnburn
Copy link
Contributor

rnburn commented Aug 9, 2019

I think it's likely. I've found the sanitizer to have trouble working with dlopen: I ran into this other false positive with UBSAN google/sanitizers#911

lizan added a commit that referenced this issue Aug 28, 2019
Description:
Sanitizers doesn't support static link, reverts #7929 and link lib(std)c++ dynamically in sanitizer runs. Addresses test issue for #4251. Added workaround in ASAN for #7647.

Risk Level: Low (test only)
Testing: CI, local libc++ runs
Docs Changes: N/A
Release Notes: N/A
Fixes #7928
@stale
Copy link

stale bot commented Sep 8, 2019

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or other activity occurs. Thank you for your contributions.

@stale stale bot added the stale stalebot believes this issue/PR has not been touched recently label Sep 8, 2019
@lizan lizan added no stalebot Disables stalebot from closing an issue and removed stale stalebot believes this issue/PR has not been touched recently labels Sep 13, 2019
PiotrSikora pushed a commit to PiotrSikora/envoy that referenced this issue Oct 10, 2019
* ext_authz: add metadata_context to ext_authz filter (envoyproxy#7818)

This adds the ability to specify dynamic metadata (by namespace) to
send with the ext_authz check request. This allows one filter to
specify information that can be then used in evaluating an
authorization decision.

Risk Level: Medium. Optional feature/extension of existing filter
Testing: Unit testing
Docs Changes: Inline in attribute_context.proto and ext_authz.proto

Fixes envoyproxy#7699

Signed-off-by: Ben Plotnick <plotnick@yelp.com>

* fuzz: codec impl timeout fix + speed ups (envoyproxy#7963)

Some speed-ups and validations for codec impl fuzz test:

* validate actions aren't empty (another approach would be to scrub / clean these)
* limit actions to 1024
* require oneofs

Fixes OSS-Fuzz Issue:
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=16481
Testing: local asan/libfuzzer exec/sec go from 25 to 50

Signed-off-by: Asra Ali <asraa@google.com>

* docs: more detail about tracking down deprecated features (envoyproxy#7972)

Risk Level: n/a (docs only)
Testing: n/a
Docs Changes: yes
Release Notes: no
envoyproxy#7945

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

* Fix the alignement in optval of setsockopt when compiled with libc++. (envoyproxy#7958)

Description:
libc++ std::string may inline the data which results the memory is not
aligned to `void*`. Use vector instead to store the optval.

Detected by UBSAN with libc++ config. Preparation for envoyproxy#4251

Risk Level: Low
Testing: unittest locally
Docs Changes: N/A
Release Notes: N/A
Fixes envoyproxy#7968 

Signed-off-by: Lizan Zhou <lizan@tetrate.io>

* security: some intra-entity and 3rd party embargo clarifications. (envoyproxy#7977)

* security: some intra-entity and 3rd party embargo clarifications.

These came up in the last set of CVEs.

Signed-off-by: Harvey Tuch <htuch@google.com>

* protobuf: IWYU (envoyproxy#7989)

Include What You Use fix for source/common/protobuf/message_validator_impl.h.

Signed-off-by: Andres Guedez <aguedez@google.com>

* api: add name into filter chain (envoyproxy#7966)

Signed-off-by: Yuchen Dai <silentdai@gmail.com>

* rds: validate config in depth before update config dump (envoyproxy#7956)

Route config need deep validation for virtual host duplication check, regex check, per filter config validation etc, which PGV wasn't enough.

Risk Level: Low
Testing: regression test
Docs Changes: N/A
Release Notes: N/A

Fixes envoyproxy#7939

Signed-off-by: Lizan Zhou <lizan@tetrate.io>

* tls: maintain a free slot index set in TLS InstanceImpl to allocate in O(1… (envoyproxy#7979)

Signed-off-by: Xin Zhuang <stevenzzz@google.com>

* redis: handle invalid ip address from cluster slots and added tests (envoyproxy#7984)

Signed-off-by: Henry Yang <hyang@lyft.com>

* protobuf: report field numbers for unknown fields. (envoyproxy#7978)

Since binary proto won't have field names, report at least the field
numbers, as per
https://developers.google.com/protocol-buffers/docs/reference/cpp/google.protobuf.unknown_field_set#UnknownField.

Also fix minor typo encountered while doing this work.

Risk level: Low
Testing: Unit tests added/updated.

Fixes envoyproxy#7937

Signed-off-by: Harvey Tuch <htuch@google.com>

* Content in envoy docs does not cover whole page (envoyproxy#7993)

Signed-off-by: Manish Kumar <manishjpiet@gmail.com>

* stats: Add option to switch between fake and real symbol-tables on the command-line. (envoyproxy#7882)

* Add option to switch between fake and real symbol-tables on the command-line.

Signed-off-by: Joshua Marantz <jmarantz@google.com>

* api config: add build rules for go protos (envoyproxy#7987)

Some BUILD files are missing build rules to generate go protos. envoyproxy/go-control-plane depends on these protos, so they should be exposed publicly. Added build rules to generate *.pb.go files.

Risk Level: Low
Testing: These rules were copied to google3 and tested internally. Unfortunately, I am having a bit of trouble with bazel build directly on these targets ("Package is considered deleted due to --deleted_packages"). Please let me know if there is a better way to test this change.

Signed-off-by: Teju Nareddy <nareddyt@google.com>

* test: don't use <experimental/filesystem> on macOS. (envoyproxy#8000)

Xcode 11 requires at least macOS 10.15 (upcoming) in order to use
either <experimental/filesystem> or C++17 <filesystem>.

Signed-off-by: Piotr Sikora <piotrsikora@google.com>

*  event: adding the capability of creating an alarm with a given scope (envoyproxy#7920)

Precursor to envoyproxy#7782
Adding scope tracking functionality to the basic alarm functions.

Risk Level: Medium (should be a no-op but is a large enough refactor)
Testing: new unit tests
Docs Changes: n/a
Release Notes: n/a

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

* ext authz: add dns san support for ext authz service (envoyproxy#7948)

Adds support for DNS SAN in ext authz peer validation

Risk Level: Low
Testing: Added
Docs Changes: Added
Release Notes: N/A

Signed-off-by: Rama Chavali <rama.rao@salesforce.com>

* accesslog: don't open log file with read flag (envoyproxy#7998)

Description:
File access log shouldn't need read access for a file.

Risk Level: Low
Testing: local in mac, CI
Docs Changes:
Release Notes:
Fixes envoyproxy#7997

Signed-off-by: Lizan Zhou <lizan@tetrate.io>

* protobuf: towards unifying PGV, deprecated and unknown field validation. (envoyproxy#8002)

This is part of envoyproxy#7980; basically, we want to leverage the recursive pass
that already exists for the deprecated check. This PR does not implement
the recursive behavior yet for unknown fields though, because there is a
ton of churn, so this PR just has the mechanical bits. We switch
plumbing of validation visitor into places such as anyConvert() and
instead pass this to MessageUtil::validate.

There are a bunch of future followups planned in additional PRs:
* Combine the recursive pass for unknown/deprecated check in
  MessageUtil::validate().
* Add mitigation for envoyproxy#5965 by copying to a temporary before recursive
  expansion.
* [Future] consider moving deprecated reporting into a message
  validation visitor handler.

Risk level: Low
Testing: Some new //test/common/protobuf::utility_test unit test.

Signed-off-by: Harvey Tuch <htuch@google.com>

* http: forwarding x-forwarded-proto from trusted proxies (envoyproxy#7995)

Trusting the x-forwarded-proto header from trusted proxies.
If Envoy is operating as an edge proxy but has a trusted hop in front, the trusted proxy should be allowed to set x-forwarded-proto and its x-forwarded-proto should be preserved.
Guarded by envoy.reloadable_features.trusted_forwarded_proto, default on.

Risk Level: Medium (L7 header changes) but guarded
Testing: new unit tests
Docs Changes: n/a
Release Notes: inline
Fixes envoyproxy#4496

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

* build: adding an option to hard-fail when deprecated config is used. (envoyproxy#7962)

Adding a build option to default all deprecated protos off, and using it on the debug build.

Risk Level: Low
Testing: new UT
Docs Changes: inline
Release Notes: n/a
Fixes envoyproxy#7548

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

* envoy_cc_library: add export of foo_with_external_headers (envoyproxy#8005)

Add a parallel native.cc_library to envoy_cc_library
for external projects that consume Envoy's libraries. This allows the consuming
project to disambiguate overlapping include paths when repository overlaying is used,
as it can now include envoy headers via external/envoy/...

Risk Level: Low
Testing: N/A

Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>

* ci: add fuzz test targets to ci (envoyproxy#7949)

Builds fuzz targets with asan+libfuzzer and runs them against their corpora. Our native bazel builds work, this PR integrates the asan+libfuzzer builds in to CI. The fuzz target binaries will be in your envoy docker build directory.

Invoke with the following for all fuzz targets, or a specified one.
./ci/run_envoy_docker.sh './ci/do_ci.sh bazel.fuzz'
./ci/run_envoy_docker.sh './ci/do_ci.sh bazel.fuzz //test/common/common:utility_fuzz_test'

Risk level: low
Signed-off-by: Asra Ali asraa@google.com

Signed-off-by: Asra Ali <asraa@google.com>

* tls: support BoringSSL private key async functionality (envoyproxy#6326)

This PR adds BoringSSL private key API abstraction, as discussed in envoyproxy#6248. All comments and discussion is welcomed to get the API sufficient for most private key API tasks.

The PR contains the proposed API and the way how it can be used from ssl_socket.h. Also there is some code showing how the PrivateKeyMethodProvider is coming from TLS certificate config. Two example private key method providers are included in the tests.

Description: tls: support BoringSSL private key async functionality
Risk Level: medium
Testing: two basic private key provider implementation
Docs Changes: TLS arch doc, cert.proto doc

Signed-off-by: Ismo Puustinen <ismo.puustinen@intel.com>

* use SymbolTableCreator rather than fakes in a few stray places. (envoyproxy#8006)

stats: use SymbolTableCreator rather than fakes in a few stray places. (envoyproxy#8006)

Signed-off-by: Joshua Marantz <jmarantz@google.com>

* [router] Add SRDS configUpdate impl (envoyproxy#7451)

This PR contains changes on the xRDS side for SRDS impl, cribbed from http://go/gh/stevenzzzz/pull/8/files#diff-2071ab0887162eac1fd177e89d83175a

* Add onConfigUpdate impl for SRDS subscription
* Remove scoped_config_manager as it's not used now.
* Move ScopedConfigInfo to scoped_config_impl.h/cc
* Add a hash to scopeKey and scopeKeyFragment, so we can look up scopekey by hash value in constant time when SRDS has many scopes.
* Add a initManager parameter to RDS createRdsRouteConfigProvider API interface, when creating RouteConfigProvider after listener/server warmed up, we need to specify a different initManager than the one from factoryContext to avoid an assertion failure. see related:envoyproxy#7617

This PR only latches a SRDS provider into the connection manager, the "conn manager using SRDS to make route decision" plus integration tests will be covered in a following PR.

Risk Level: LOW [not fully implemented].
Testing: unit tests

Signed-off-by: Xin Zhuang <stevenzzz@google.com>

* Fix version history (envoyproxy#8021)

Follow-up for envoyproxy#7995.

Signed-off-by: Raul Gutierrez Segales <rgs@pinterest.com>

* tools: sync tool for envoyproxy/assignable team. (envoyproxy#8015)

Bulk update of team to match envoyproxy organization. While at it, cleaned up some venv stuff in
shell_utils.sh.

Risk level: Low
Testing: Synced 157 members from envoyproxy to envoyproxy/assignable.

Signed-off-by: Harvey Tuch <htuch@google.com>

* redis: fix onHostHealthUpdate got called before the cluster is resolved. (envoyproxy#8018)

Signed-off-by: Henry Yang <hyang@lyft.com>

* api/build: migrate UDPA proto tree to external cncf/udpa repository. (envoyproxy#8017)

This is a one-time movement of all UDPA content from envoyproxy/envoy to
cncf/udpa. The permanent home of UDPA will be
https://github.com/cncf/udpa.

Risk level: Low
Testing: Added UDPA service entry to build_test.

Signed-off-by: Harvey Tuch <htuch@google.com>

* http: tracking active session under L7 timers (envoyproxy#7782)

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

* upstream: remove thread local cluster after triggering call backs (envoyproxy#8004)

Signed-off-by: Rama Chavali <rama.rao@salesforce.com>

* upstream: Introducing close_connections_on_host_set_change property (envoyproxy#7675)

Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>

* upstream: delete stale TODO (envoyproxy#8028)

This was fixed in envoyproxy#7877

Signed-off-by: Matt Klein <mklein@lyft.com>

* Enhance comment about MonotonicTime (envoyproxy#8011)

Depending on the execution environment in which envoy is being run, it
is possible that some of the assumption on the clock are maybe not
holding as previously commented. With some sandboxing technologies the
clock does not reference the machine boot time but the sandbox boot
time. This invalidates the assumtpion that the first update in the
cluster_manager will most likely fall out of the windows and ends up
showing a non intuitive behavior difficult to catch.
This PR simply adds a comment that will allow the reader to consider
this option while reading to the code.

Signed-off-by: Flavio Crisciani <f.crisciani@gmail.com>

* build: some missing dep fixups for Google import. (envoyproxy#8026)

Signed-off-by: Harvey Tuch <htuch@google.com>

* introduce safe regex matcher based on re2 engine (envoyproxy#7878)

The libstdc++ std::regex implementation is not safe in all cases
for user provided input. This change deprecates the used of std::regex
in all user facing paths and introduces a new safe regex matcher with
an explicitly configurable engine, right now limited to Google's re2
regex engine. This is not a drop in replacement for std::regex as all
language features are not supported. As such we will go through a
deprecation period for the old regex engine.

Fixes envoyproxy#7728

Signed-off-by: Matt Klein <mklein@lyft.com>

* docs: reorganize configuration tree (envoyproxy#8027)

This is similar to what I did for the arch overview a while ago as
this section is also getting out of control.

Signed-off-by: Matt Klein <mklein@lyft.com>

* build: missing regex include. (envoyproxy#8032)

Signed-off-by: Harvey Tuch <htuch@google.com>

* [headermap] speedup for appending data (envoyproxy#8029)

For debug builds, performance testing and fuzzers reveal that when appending to a header, we scan both the existing value and the data to append for invalid characters. This PR moves the validation check to just the data that is appended, to avoid hangups on re-scanning long header values multiple times.

Testing: Added corpus entry that reveals time spent in validHeaderString

Signed-off-by: Asra Ali <asraa@google.com>

* eds: avoid send too many ClusterLoadAssignment requests  (envoyproxy#7976)

During initializing secondary clusters, for each initialized cluster, a ClusterLoadAssignment
request is sent to istio pilot with the cluster's name appended into request's resource_names
list. With a huge number of clusters(e.g 10k clusters), this behavior slows down Envoy's
initialization and consumes ton of memory. This change pauses ADS mux for ClusterLoadAssignment to avoid that.

Risk Level: Medium
Testing: tiny change, no test case added

Fixes envoyproxy#7955

Signed-off-by: lhuang8 <lhuang8@ebay.com>

* Set the bazel verison to 0.28.1 explicitly (envoyproxy#8037)

In theopenlab/openlab-zuul-jobs#622 , the OpenLab add the ability to set the bazel to specific version explicitly. This patch add the bazel role for the envoy job.

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>

* Read_policy is not set correctly. (envoyproxy#8034)

Add more integration test and additional checks in the unit tests.

Signed-off-by: Henry Yang <hyang@lyft.com>

* admin: fix /server_info hot restart version (envoyproxy#8022)

Signed-off-by: Matt Klein <mklein@lyft.com>

* test: adding debug hints for integration test config failures (envoyproxy#8038)

Risk Level: n/a (test only)
Testing: manual
Docs Changes: n/a
Release Notes: n/a

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

* udp_listener: refactor ActiveUdpListener creation (envoyproxy#7884)

Signed-off-by: Dan Zhang <danzh@google.com>

* accesslog: implement TCP gRPC access logger (envoyproxy#7941)

Description:
Initial implementation for TCP gRPC access logger.

Risk Level: Low (extension only)
Testing: integration test
Docs Changes: Added
Release Notes: Added

Signed-off-by: Lizan Zhou <lizan@tetrate.io>

* tracing: add OpenCensus agent exporter support to OpenCensus driver. (envoyproxy#8023)

Signed-off-by: Emil Mikulic <g-easy@users.noreply.github.com>

* Exporting platform_impl_lib headers (envoyproxy#8045)

This allows consuming projects using repository overlaying to disambiguate overlapping include paths when it comes to platform_impl.h by going through envoy/external/...

Addendum to envoyproxy#8005

Risk Level: Low
Testing: N/A

Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>

* access_log: minimal log file error handling (envoyproxy#7938)

Rather than ASSERT for a reasonably common error condition
(e.g. disk full) record a stat that indicates log file writing
failed. Also fixes a test race condition.

Risk Level: low
Testing: added stats checks
Docs Changes: documented new stat
Release Notes: updated

Signed-off-by: Stephan Zuercher <zuercher@gmail.com>

* tracing: add grpc-status and grpc-message to spans (envoyproxy#7996)

Signed-off-by: Caleb Gilmour <caleb.gilmour@datadoghq.com>

* fuzz: add bounds to statsh flush interval (envoyproxy#8043)

Add PGV bounds to the stats flush interval (greater than 1ms and less than 5000ms) to prevent Envoy from hanging from too small of a flush time.

Risk Level: Low
Testing: Corpus Entry added
Fixes OSS-Fuzz issue
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=16300

Signed-off-by: Asra Ali <asraa@google.com>

* Improve tools/stack_decode.py (envoyproxy#8041)

Adjust tools/stack_decode.py to more obviously be Python 2 (not 3), and to work on stack traces that don't include the symbol names.

Risk Level: Low
Testing: Manually tested on a stack trace that one of our users sent us

Signed-off-by: Luke Shumaker <lukeshu@datawire.io>

* build: tell googletest to use absl stacktrace (envoyproxy#8047)

Description:
https://github.com/google/googletest/blob/d7003576dd133856432e2e07340f45926242cc3a/BUILD.bazel#L42

Risk Level: Low (test only)
Testing: CI
Docs Changes:
Release Notes:

Signed-off-by: Lizan Zhou <lizan@tetrate.io>

* Update references to local scripts to enable using build container for filter repos (envoyproxy#7907)

Description: This change enables using run_envoy_docker.sh to build envoy-filter-example
Risk Level: Low
Testing: Manually tested building envoy-filter-example using: envoy/ci/run_envoy_docker.sh './ci/do_ci.sh build'
Docs Changes: N/A
Release Notes: N/A

Signed-off-by: Santosh Kumar Cheler <scheler@arubanetworks.com>

* bazel: patch gRPC to fix Envoy builds with glibc v2.30 (envoyproxy#7971)

Description: the latest glibc (v2.30) declares its own `gettid()` function (see [0]) and this creates a naming conflict in gRPC which has a function with the same name.

Apply to gRPC [a patch](grpc/grpc#18950) which renames `gettid()` to `sys_gettid()`.

[0] https://sourceware.org/git/?p=glibc.git;a=commit;h=1d0fc213824eaa2a8f8c4385daaa698ee8fb7c92

Risk Level: low
Testing: unit tests
Docs Changes: n/a
Release Notes: n/a

Signed-off-by: Dmitry Rozhkov <dmitry.rozhkov@linux.intel.com>

* build: link C++ stdlib dynamically in sanitizer runs (envoyproxy#8019)

Description:
Sanitizers doesn't support static link, reverts envoyproxy#7929 and link lib(std)c++ dynamically in sanitizer runs. Addresses test issue for envoyproxy#4251. Added workaround in ASAN for envoyproxy#7647.

Risk Level: Low (test only)
Testing: CI, local libc++ runs
Docs Changes: N/A
Release Notes: N/A
Fixes envoyproxy#7928

* test: cleaning up test runtime (envoyproxy#8012)

Using the new runtime utility to clean up a bunch of test gorp. Yay utils!

Risk Level: n/a (test only)
Testing: tests pass
Docs Changes: n/a
Release Notes: n/a
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

*  test: improved coverage and handling of deprecated config (envoyproxy#8057)

Making ENVOY_DISABLE_DEPRECATED_FEATURES work for unit tests without runtime configured.
Fixing up a handful of unit tests to remove legacy code or use the handy
DEPRECATED_FEATURE_TEST macro
Adding back coverage of cors.enabled() and redis.catch_all_route()

Risk Level: Low (test only)
Testing: new unit tests
Docs Changes: n/a
Release Notes: n/a
Fixes envoyproxy#8013
Fixes envoyproxy#7548

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

* [Docs typo] Remote Executioon -> Remote Execution (envoyproxy#8061)

Fixes mispelling of `Executioon` -> `Execution`

Signed-off-by: Colin Schoen <schoen@yelp.com>

* api: Fix duplicate java_outer_classname declarations (envoyproxy#8059)

The java_outer_classname is unintentionally duplicated in the new
udp_listener_config and regex proto files. This changes them to unique
names that match the predominant naming scheme.

Signed-off-by: Bryce Anderson <banderson@twitter.com>

* http: making the behavior of the response Server header configurable (envoyproxy#8014)

Default behavior remains unchanged, but now Envoy can override, override iff there's no server header from upstream, or always leave the server header (or lack thereof) unmodified.

Risk Level: low (config guarded change)
Testing: new unit tests
Docs Changes: n/a
Release Notes: inline
Fixes envoyproxy#6716

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

* use bazelversion for filter-example too (envoyproxy#8069)

Signed-off-by: Lizan Zhou <lizan@tetrate.io>

* grpc-httpjson-transcode: Update for RFC2045 support (envoyproxy#8065)

RFC2045 (MIME) Base64 decoding support has been fixed upstream

Description: The grpc transcoding filter has been updated to support RFC2045 (MIME) based inputs for protobuf type "Bytes". This is important since Base64 is often using the RFC2045 format for inputs.
Also see: grpc-ecosystem/grpc-httpjson-transcoding#34

Risk Level: Low
Testing: Integration / Manual Tests
Docs Changes: N/A
Release Notes: N/A

Signed-off-by: Hans Viken Duedal <hans.duedal@visma.com>

* stats: Clean up all calls to Scope::counter() et al in production code. (envoyproxy#7842)

* Convert a few more counter() references to use the StatName interface.

Signed-off-by: Joshua Marantz <jmarantz@google.com>

* tls_inspector: inline the recv in the onAccept (envoyproxy#7951)

Description:
As discussed in envoyproxy#7864 this PR is the attempt to peek the socket at the invoke of onAccept.
Usually client_hello packet should be in the buffer when tls_inspector is peeking, we could save a poll cycle for this connection.

Once we agree on the solution I can apply to http_inspector as well.

The expecting latency improvement especially when poll cycle is large.

Benchmark:
Env:
hardware Intel(R) Xeon(R) CPU @ 2.20GHz
envoy: concurrency = 1, tls_inspector as listener filter. One tls filter chain, and one plain text filter chain.
load background: a [sniper](https://github.com/lubia/sniper) client with concurrency = 5 hitting the server with tls handshake, aiming to hit using the tls_filter chain. The qps is about 170/s
Another load client hitting the plain text filter chain but would go through tls_inspector with concurrency = 1

This PR: 
TransactionTime:              10.3 - 11.0 ms(mean)
Master                
TransactionTime:              12.3 - 12.8 ms(mean)

Risk Level: Med (ActiveSocket code is affected to adopt the side effect of onAccept)
Testing: 
Docs Changes:
Release Notes:
Fixes envoyproxy#7864

Signed-off-by: Yuchen Dai <silentdai@gmail.com>

* Fixes gcc 8.3.1 build failure due to FilterChainBenchmarkFixture::SetUp hiding base-class virtual functions (envoyproxy#8071)

Description: I'm seeing "bazel-out/k8-fastbuild/bin/external/com_github_google_benchmark/_virtual_includes/benchmark/benchmark/benchmark.h:1071:16: error: 'virtual void benchmark::Fixture::SetUp(benchmark::State&)' was hidden" when running tests. This resolves the issue with hiding of the base-class functions.
Risk Level: low
Testing:
Docs Changes:
Release Notes:

Signed-off-by: Dmitri Dolguikh <ddolguik@redhat.com>

* test: fix ups for various deprecated fields (envoyproxy#8068)

Takeaways: we've lost the ability to do empty regex (which was covered in router tests and is proto constraint validated on the new safe regex) as well as negative lookahead (also covered in tests) along with a host of other things conveniently documented as not supported here: https://github.com/google/re2/wiki/Syntax

Otherwise split up a bunch of tests, duplicated and tagged a bunch of tests, and cleaning up after we finally can remove deprecated fields again will be an order of magnitude easier.

Also fixing a dup relnote from envoyproxy#8014

Risk Level: n/a (test only)
Testing: yes. yes there is.
Docs Changes: no
Release Notes: no

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

* include: add log dependency header to connection_handler.h (envoyproxy#8072)

Signed-off-by: Teju Nareddy <nareddyt@google.com>

* quiche: Update QUICHE dep (envoyproxy#8044)

Update QUICHE tar ball to 4abb566fbbc63df8fe7c1ac30b21632b9eb18d0c.
Add some new impl's for newly added api.

Risk Level: low
Testing: using quiche build in tests.
Part of envoyproxy#2557

Signed-off-by: Dan Zhang <danzh@google.com>

* tools: deprecated field check in Route Checker tool (envoyproxy#8058)

We need a way to run the deprecated field check on the RouteConfiguration. Today the schema check tool validates the bootstrap config. This change will help achieve similar functionality on routes served from rds.
Risk Level: Low
Testing: Manual testing
Docs Changes: included
Release Notes: included

Signed-off-by: Jyoti Mahapatra <jmahapatra@lyft.com>

* tracing: Add support for sending data in Zipkin v2 format (envoyproxy#6985)

Description: This patch supports sending a list of spans as JSON v2 and protobuf message over HTTP to Zipkin collector. [Sending protobuf](https://github.com/openzipkin/zipkin-api/blob/0.2.1/zipkin.proto) is considered to be more efficient than JSON, even compared to the v2's JSON (openzipkin/zipkin#2589 (comment)). This is an effort to rework envoyproxy#6798.

The approach is by serializing the v1 model to both v2 JSON and protobuf.

Risk Level: Low, since the default is still HTTP-JSON v1 based on https://github.com/openzipkin/zipkin-api/blob/0.2.2/zipkin-api.yaml.
Testing: Unit testing, manual integration test with real Zipkin collector server.
Docs Changes: Added
Release Notes: Added
Fixes: envoyproxy#4839

Signed-off-by: Dhi Aurrahman <dio@tetrate.io>
Signed-off-by: José Carlos Chávez <jcchavezs@gmail.com>

* Route Checker tool Fix code coverage bug in proto based schema (envoyproxy#8101)

Signed-off-by: Jyoti Mahapatra <jmahapatra@lyft.com>

* [hcm] Add scoped RDS routing into HCM (envoyproxy#7762)

Description: add Scoped RDS routing logic into HCM. Changes include:

* in ActiveStream constructor latch a ScopedConfig impl to the activeStream if SRDS is enabled
* in the beginning of ActiveStream::decodeHeaders(headers, end_stream), get routeConfig from latched ScopedConfig impl.

This PR is the 3rd in the srds impl PR chain: [envoyproxy#7704, envoyproxy#7451, this].

Risk Level: Medium
Testing: unit test and integration tests.
Release Notes: Add scoped RDS routing support into HCM.

Signed-off-by: Xin Zhuang <stevenzzz@google.com>

* owners: add @asraa and @lambdai to OWNERS. (envoyproxy#8110)

* @asraa is joining Envoy OSS security team.

* @lambdai is joining Friends of Envoy as v2 xDS point.

Signed-off-by: Harvey Tuch <htuch@google.com>

* protobuf: recursively validate unknown fields. (envoyproxy#8094)

This PR unifies the recursive traversal of deprecated fields with that of unknown fields. It doesn't
deal with moving to a validator visitor model for deprecation; this would be a nice cleanup that we
track at envoyproxy#8092.

Risk level: Low
Testing: New nested unknown field test added.

Fixes envoyproxy#7980

Signed-off-by: Harvey Tuch <htuch@google.com>

* Fuzz reuse (envoyproxy#8119)

This PR allows the envoy_cc_fuzz_test rule to be used when pulling in envoy. which can be useful when you're writing filters for envoy, and want to reuse the fuzzing architecture envoy has already built. other rules already allow for this (see envoy_cc_test in this same file for example).

Risk Level: Low
Testing:

Testing the Old Rule Still Works

It is possible to test the old rules still work (even without specifying a repository), by simply choosing your favorite fuzz test, and choosing to run bazel test on it. For example: bazel test //test/common/router:header_parser_fuzz_test. Any envoy_cc_fuzz_test rule should do.

Testing New Rules Work

I've done testing inside my own repository, but if you want to create your own test rule you can probably do the following in envoy-filter-example:

Checkout envoy-filter-example, and update the envoy submodule to this pr.
Follow the directions in: test/fuzz/README.md to define a envoy_cc_fuzz_test rule. Make sure to add a line for: repository = "@envoy" which is the new argument being added.
You should be able to run the fuzz test.

Signed-off-by: Cynthia Coan <ccoan@instructure.com>

* Set INCLUDE_DIRECTORIES so libcurl can find local urlapi.h (envoyproxy#8113)

Fixes envoyproxy#8112

Signed-off-by: John Millikin <jmillikin@stripe.com>

* cleanup: move test utility methods in ScopedRdsIntegrationTest to base class HttpIntegrationTest (envoyproxy#8108)

Fixes envoyproxy#8050
Risk Level: LOW [refactor only]

Signed-off-by: Xin Zhuang <stevenzzz@google.com>

* upstream: fix invalid access of ClusterMap iterator during warming cluster modification (envoyproxy#8106)

Risk Level: Medium
Testing: New unit test added. Fix verified via --config=asan.

Signed-off-by: Andres Guedez <aguedez@google.com>

* api:Add a flag to disable overprovisioning in ClusterLoadAssignment (envoyproxy#8080)

* api:Add a flag to disable overprovisioning in ClusterLoadAssignment

Signed-off-by: Jie Chen <jiechen@google.com>

* api:Add [#next-major-version and [#not-implemented-hide to the comment
for field of disable_overprovisioning in ClusterLoadAssignment
Signed-off-by: Jie Chen <jiechen@google.com>

* api:Refine comments for the new added bool flag as suggested.
Signed-off-by: Jie Chen <jiechen@google.com>

* api: clone v2[alpha] to v3alpha. (envoyproxy#8125)

This patch establishes a v3alpha baseline API, by doing a simple copy of
v2[alpha] dirs and some sed-style heuristic fixups of BUILD dependencies
and proto package namespaces.

The objective is provide a baseline which we can compare the output from
tooling described in envoyproxy#8083 in later PRs, providing smaller visual diffs.

The core philosophy of the API migration is that every step will be
captured in a script (at least until the last manual steps),
api/migration/v3alpha.sh. This script will capture deterministic
migration steps, allowing v2[alpha] to continue to be updated until we
finalize v3.

There is likely to be significant changes, e.g. in addition to the work
scoped for v3, we might want to reduce the amount of API churn by
referring back to v2 protos where it makes sense. This will be done via
tooling in later PRs.

Part of envoyproxy#8083.

Risk level: Low
Testing: build @envoy_api//...

Signed-off-by: Harvey Tuch <htuch@google.com>

* dubbo: Fix heartbeat packet parsing error (envoyproxy#8103)

Description: 
The heartbeat packet may carry data, and it is treated as null data when processing the heartbeat packet, causing some data to remain in the buffer.

Risk Level: low 
Testing: Existing unit test
Docs Changes: N/A
Release Notes: N/A
Fixes envoyproxy#7970 

Signed-off-by: tianqian.zyf <tianqian.zyf@alibaba-inc.com>

* stats: Shared cluster isolated stats (envoyproxy#8118)

* shared the main symbol-table with the isolated stats used for cluster info.

Signed-off-by: Joshua Marantz <jmarantz@google.com>

* protodoc: upgrade to Python 3. (envoyproxy#8129)

Risk level: Low
Testing: Rebuilt docs, manual inspection of some example generated files.

Signed-off-by: Harvey Tuch <htuch@google.com>

* protodoc: single source-of-truth for doc protos. (envoyproxy#8132)

This avoids having to list new docs protos in both docs/build.sh and
api/docs/BUILD. This technical debt cleanup is helpful in v3 proto work
to simplify collecting proto artifacts from a Bazel aspect.

Risk level: Low
Testing: docs/build.sh, visual inspection of docs.

Signed-off-by: Harvey Tuch <htuch@google.com>

* api: organize go_proto_libraries (envoyproxy#8003)

Fixes envoyproxy#7982

Defines a package level proto library and its associated internal go_proto_library.

Deletes all existing api_go_proto_library, api_go_grpc_library, and go_package annotations in protos (they are not required and pollute the sources).

I deliberately avoided touching anything under udpa since it's being moved to another repository.

Risk Level: low
Testing: build completes

Signed-off-by: Kuat Yessenov <kuat@google.com>

* api: straggler v2alpha1 -> v3alpha clone. (envoyproxy#8133)

These were missed in envoyproxy#8125.

Signed-off-by: Harvey Tuch <htuch@google.com>

* docs: remove extraneous escape (envoyproxy#8150)

Old versions of bash (e.g. on macOS) don't handle ${P/:/\/} the same way as modern versions. In particular, the expanded parameter on macOS includes a backslash, causing subsequent use of the string as a filename to include a slash (/) instead of treating the slash as a directory separator. Both versions of bash accept ${P/://} as a way to substitute : with /. Verified that this change does not alter the generated docs when running under Linux.

Risk Level: low
Testing: generated docs under linux & macOS

Signed-off-by: Stephan Zuercher <zuercher@gmail.com>

* Do not 503 on Upgrade: h2c instead remove the header and ignore. (envoyproxy#7981)

Description: When a request comes in on http1 with "upgrade: h2c", the current behavior is to 503.  Instead we should ignore the upgrade and remove the header and continue with the request as http1.
Risk Level: Medium
Testing: Unit test. Hand test with ithub.com/rdsubhas/java-istio client server locally.
Docs Changes: N/A
Release Notes:  http1: ignore and remove Upgrade: h2c.
Fixes istio/istio#16391

Signed-off-by: John Plevyak <jplevyak@gmail.com>

* docs: add line on installing xcode for macOS build flow (envoyproxy#8139)

Because of rules_foreign_cc in bazelbuild, Envoy will not compile successfully when following the instructions in the build docs due to how the tools are referenced. One fix for this is installing Xcode from the App Store (see bazel-contrib/rules_foreign_cc#185).

Risk Level: Low
Testing: N/A (docs change)
Docs Changes: see Description
Release Notes: N/A

Signed-off-by: Lisa Lu <lisalu@lyft.com>

* docs: note which header expressions cannot be used for request headers (envoyproxy#8138)

As discussed in envoyproxy#8127, some custom header expressions evaluate as
empty when used in request headers.

Risk Level: low, docs only
Testing: n/a
Docs Changes: updated
Release Notes: n/a

Signed-off-by: Stephan Zuercher <zuercher@gmail.com>

* api: use traffic_direction over operation_name if specified (envoyproxy#7999)

Use the listener-level field for the tracing direction over the per-filter field. Unfortunately, the per filter did not provide an "unspecified" default, so this appears to be the right approach to deprecate the per-filter field with minimal impact.

Risk Level: low (uses a newly introduce field traffic_direction)
Testing: unit test
Docs Changes: proto docs

Signed-off-by: Kuat Yessenov <kuat@google.com>

* add more diagnostic logs (envoyproxy#8153)

Istio sets listener filter timeout to 10ms by default but requests fail from time to tome. It's very difficult to debug. Even though downstream_pre_cx_timeout_ is exposed to track the number of timeouts, it would be better to have some debug logs.

Description: add more diagnostic logs
Risk Level: low

Signed-off-by: crazyxy <yxyan@google.com>

* http conn man: add tracing config for path length in tag (envoyproxy#8095)

This PR adds a configuration option for controlling the length of the request path that is included in the HttpUrl span tag. Currently, this length is hard-coded to 256. With this PR, that length will be configurable (defaulting to the old value).

Risk Level: Low
Testing: Unit
Docs Changes: Inline with the API proto. Documented new field.
Release Notes: Added in the PR.

Related issue: istio/istio#14563

Signed-off-by: Douglas Reid <douglas-reid@users.noreply.github.com>

* cds: Add general-purpose LB policy configuration (envoyproxy#7744)

This PR adds fields to CDS that allow for general-purpose LB policy configuration.

Risk Level: Low
Testing: None (but if anything is needed, please let me know)
Docs Changes: Inline with API protos
Release Notes: N/A

Signed-off-by: Mark D. Roth <roth@google.com>

* thrift_proxy: fix crash on invalid transport/protocol (envoyproxy#8143)

Transport/protocol decoder errors that occur before the connection manager
initializes an ActiveRPC to track the request caused a crash. Modifies the
connection manager to handle this case, terminating the downstream the
connection.

Risk Level: low
Testing: test case that triggers crash
Docs Changes: n/a
Release Notes: added

Signed-off-by: Stephan Zuercher <zuercher@gmail.com>

* api: strip gogoproto annotations (envoyproxy#8163)

Remove gogoproto annotations. They can be replaced with a custom gogoproto compiler (e.g. something like https://github.com/gogo/googleapis/tree/master/protoc-gen-gogogoogleapis). I have an experimental version of it to validate that it's possible to re-apply important annotations in the compiler.

Risk Level: low
Testing: builds

Signed-off-by: Kuat Yessenov <kuat@google.com>

* hotrestart: remove dynamic_resources from server config used by hotrestart_test (envoyproxy#8162)

In the server config file `test/config/integration/server.yaml` used by
//test/integration:hotrestart_test, `dynamic_resources` includes `lds_config`
and `cds_config` definitions, which use HTTP API to fetch config, but CDS and
LDS service do not exist, so the initial fetch will be failed with a
connection failure, then Envoy server continue startup.

Envoy server shouldn't continue startup because connection failure, see
issue envoyproxy#8046.

For this test, `dynamic_resources` is not needed, this change clean it up.

Signed-off-by: lhuang8 <lhuang8@ebay.com>

* clang-tidy: misc-unused-using-decls (envoyproxy#8159)

Description: clang-tidy check to flag unused using statements. There's a lot in test code that's just copy pasta, and it's hard to manually review whether it's being used, especially for things like using testing::_;
Risk Level: low
Testing: existing
Docs Changes: N/A
Release Notes: N/A

Signed-off-by: Derek Argueta <dereka@pinterest.com>

* build: curl with c-ares, nghttp2 and zlib (envoyproxy#8154)

Build curl dependency with async DNS resolver c-ares avoiding potential
crashes due to longjmp on modern kernels. Add zlib and nghttp2.
Use Envoy's version of all of the above libraries.

Signed-off-by: Taras Roshko <troshko@netflix.com>

* log: add upstream TLS info (envoyproxy#7911)

Description: add upstream TLS info for logging purposes

Refactor SSL connection info to be a shared pointer.
Use read-only interface.
Cache computed values in the SSL info object (this allows transition to remove the underlying SSL object if necessary).

Risk Level: medium due to use of bssl::SSL to back ConnectionInfo
Testing: unit
Docs Changes: none
Release Notes: add upstream TLS info

Signed-off-by: Kuat Yessenov <kuat@google.com>

* fix windows implementation of PlatformImpl (envoyproxy#8169)

Add missing destructor to class declaration.
Fix copy/paste errors.
These errors were apparently introduced in e1cd4cc.

Risk Level: Low
Testing: Passed Windows testing locally
Docs Changes: n/a
Release Notes: n/a

Signed-off-by: William Rowe wrowe@pivotal.io
Signed-off-by: Yechiel Kalmenson <ykalmenson@pivotal.io>

* Update Opencensus SHA (envoyproxy#8173)

Signed-off-by: Pengyuan Bian <bianpengyuan@google.com>

* Outlier Detection: use gRPC status code for detecting failures (envoyproxy#7942)

Signed-off-by: ZhouyihaiDing <ddyihai@google.com>

* fix build (envoyproxy#8177)

Signed-off-by: Derek Argueta <dereka@pinterest.com>

* docs: improving websocket docs (envoyproxy#8156)

Making it clear H2 websockets don't work by default

Risk Level: n/a
Testing: n/a
Docs Changes: yes
Release Notes: no
envoyproxy#8147

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

* Upstream WebAssembly VM and Null VM from envoyproxy/envoy-wasm. (envoyproxy#8020)

Description: Upstream from envoyproxy/envoy-wasm the WebAssembly VM support along with the Null VM support and tests. This is the first PR dealing with WebAssembly filter support in envoy.  See https://github.com/envoyproxy/envoy-wasm/blob/master/WASM.md and https://github.com/envoyproxy/envoy-wasm/blob/master/docs/root/api-v2/config/wasm/wasm.rst for details.
Risk Level: Medium
Testing: Unit tests.
Docs Changes: N/A
Release Notes: N/A
Part of envoyproxy#4272 

Signed-off-by: John Plevyak <jplevyak@gmail.com>

* quiche: implement Envoy Quic stream and connection (envoyproxy#7721)

Implement QuicStream|Session|Disptacher in Envoy. Weir up QUIC stream and connection with HCM callbacks.

Risk Level: low, not in use
Testing: Added unit tests for all new classes
Part of envoyproxy#2557
Signed-off-by: Dan Zhang <danzh@google.com>

* protodoc/api_proto_plugin: generic API protoc plugin framework. (envoyproxy#8157)

Split out the generic plugin and FileDescriptorProto traversal bits from
protodoc. This is in aid of the work in envoyproxy#8082 ad envoyproxy#8083, where additional
protoc plugins will be responsible for v2 -> v3alpha API migrations and
translation code generation.

This is only the start really of the api_proto_plugin framework. I
anticipate additional bits of protodoc will move here later, including
field type analysis and oneof handling.

In some respects, this is a re-implementation of some of
https://github.com/lyft/protoc-gen-star in Python. The advantage is that
this is super lightweight, has few dependencies and can be easily
hacked. We also embed various bits of API business logic, e.g.
annotations, in the framework (for now).

Risk level: Low
Testing: diff -ru against previous protodoc.py RST output, identical modulo some
  trivial whitespace that doesn't appear in generated HTML. There are no
  real tests yet, I anticipate adding some golden proto style tests.

Signed-off-by: Harvey Tuch <htuch@google.com>

* adaptive concurrency: Gradient algorithm implementation (envoyproxy#7908)

Signed-off-by: Tony Allen <tallen@lyft.com>

* ext_authz: Check for cluster before sending HTTP request (envoyproxy#8144)

Signed-off-by: Dhi Aurrahman <dio@tetrate.io>

* make getters const-ref (envoyproxy#8192)

Description:
Follow-up to envoyproxy#7911 to make cached values be exposed as const-references, saving on a copy of a string during retrieval.

Risk Level: low
Testing: updated mocks to return references
Docs Changes: none
Release Notes: none

Signed-off-by: Kuat Yessenov <kuat@google.com>

* test: add curl features check (envoyproxy#8194)

Add a test ensuring curl was built with the expected features.

Description: Add a test ensuring curl was built with the expected features.
Risk Level: Low.
Testing: n/a.
Docs Changes: n/a.
Release Notes: n/a.

Signed-off-by: Taras Roshko <troshko@netflix.com>

* subset lb: allow ring hash/maglev LB to work with subsets (envoyproxy#8030)

* subset lb: allow ring hash/maglev LB to work with subsets

Skip initializing the thread aware LB for a cluster when the subset
load balancer is enabled. Also adds some extra checks for LB policies
that are incompatible with the subset load balancer.

Risk Level: low
Testing: test additional checks
Docs Changes: updated docs w.r.t subset lb compatibility
Release Notes: n/a
Fixes: envoyproxy#7651

Signed-off-by: Stephan Zuercher <zuercher@gmail.com>

* redis: add a request time metric to redis upstream (envoyproxy#7890)

Signed-off-by: Nicolas Flacco <nflacco@lyft.com>

* bazel: update bazel to 0.29.1 (envoyproxy#8198)

Description:
Upgrade bazel to 0.29.1 and bazel-toolchains to corresponding version.

Risk Level: Low
Testing: CI
Docs Changes: N/A
Release Notes: N/A

Signed-off-by: Lizan Zhou <lizan@tetrate.io>

* upstream: Add ability to disable host selection during panic (envoyproxy#8024)

Previously, when in a panic state, requests would be routed to all
hosts. In some cases it is instead preferable to not route any requests.
Add a configuration option for zone-aware load balancers which switches
from routing to all hosts to no hosts.

Closes envoyproxy#7550.

Signed-off-by: James Forcier jforcier@grubhub.com

Risk Level: Low
Testing: 2 new unit tests written; manual testing
Docs Changes: Note about new configuration option added
Release Notes: added

Signed-off-by: James Forcier <jforcier@grubhub.com>

* metrics service: flush histogram buckets (envoyproxy#8180)

Signed-off-by: Rama Chavali <rama.rao@salesforce.com>

* tracing: fix random sample fraction percent (envoyproxy#8205)

Signed-off-by: Pengyuan Bian <bianpengyuan@google.com>

* stats: Add per-host memory usage test case to stats_integration_test (envoyproxy#8189)

Signed-off-by: Antonio Vicente <avd@google.com>

* router check tool: add flag for only printing failed tests (envoyproxy#8160)

Signed-off-by: Lisa Lu <lisalu@lyft.com>

* fix link to runtime docs (envoyproxy#8204)

Description: Looks like the runtime docs moved under operations/. The PR fixes the link.
Risk Level: low
Testing: existing
Docs Changes: this
Release Notes: n/a

Signed-off-by: Derek Argueta <dereka@pinterest.com>

* config: make SlotImpl detachable from its owner, and add a new runOnAllThreads interface to Slot. (envoyproxy#8135)

See the issue in envoyproxy#7902, this PR is to make the SlotImpl detachable from its owner, by introducing a Booker object wraps around a SlotImpl, which bookkeeps all the on-the-fly update callbacks. And on its destruction, if there are still on-the-fly callbacks, move the SlotImpl to an deferred-delete queue, instead of destructing the SlotImpl which may cause an SEGV error.

More importantly, introduce a new runOnAllThreads(ThreadLocal::UpdateCb cb) API to Slot, which requests a Slot Owner to not assume that the Slot or its owner will out-live (in Main thread) the fired on-the-fly update callbacks, and should not capture the Slot or its owner in the update_cb.

Picked RDS and config-providers-framework as examples to demonstrate that this change works. {i.e., changed from the runOnAllThreads(Event::PostCb) to the new runOnAllThreads(TLS::UpdateCb) interface. }

Risk Level: Medium
Testing: unit test
Docs Changes: N/A
Release Notes: N/A
[Optional Fixes #Issue] envoyproxy#7902

Signed-off-by: Xin Zhuang <stevenzzz@google.com>

* test: remove static config from subset lb integration test (envoyproxy#8203)

Build the config programmatically to make future API changes less
onerous.

Risk Level: low (test change only)
Testing: n/a
Doc Changes: n/a
Release Notes: n/a

Signed-off-by: Stephan Zuercher <zuercher@gmail.com>

* cleanup: clarify Cluster.filters and Dispatcher::createClientConnection (envoyproxy#8186)

Signed-off-by: Fred Douglas <fredlas@google.com>

* redis: health check is not sending the auth command on its connection (envoyproxy#8166)

Signed-off-by: Henry Yang <hyang@lyft.com>

* redis: mirroring should work when default value is zero, not just greater than zero (envoyproxy#8089)

Signed-off-by: Nicolas Flacco <nflacco@lyft.com>

* tools: regularize pip/venv for format_python_tools.py. (envoyproxy#8176)

As well as being a nice cleanup, this fixes some issues I had with local
Docker use of fix_format as a non-root user.

Signed-off-by: Harvey Tuch <htuch@google.com>

* absl: Absl hash hook in a couple of places rather than hash functors (envoyproxy#8179)

Signed-off-by: Joshua Marantz <jmarantz@google.com>

* Update dependency: jwt_verify_lib (envoyproxy#8212)

Signed-off-by: Daniel Grimm <dgrimm@redhat.com>

* upstream: add failure percentage-based outlier detection (envoyproxy#8130)

Description: Add a new outlier detection mode which compares each host's rate of request failure to a configured fixed threshold.

Risk Level: Low
Testing: 2 new unit tests added.
Docs Changes: New mode and config options described.
Release Notes: white_check_mark
Fixes envoyproxy#8105

Signed-off-by: James Forcier <jforcier@grubhub.com>

* Replace deprecated thread annotations macros. (envoyproxy#8237)

Abseil thread annotation macros are now prefixed by ABSL_.

There is no semantic change; this is just a rename.

Signed-off-by: Yan Avlasov <yavlasov@google.com>

* Update protoc-gen-validate (PGV) (envoyproxy#8234)

This picks up fixes for the Windows build and a C preprocessor defect

Signed-off-by: Yechiel Kalmenson <ykalmenson@pivotal.io>
Signed-off-by: William Rowe <wrowe@pivotal.io>

* upstream: use named constants for outlier detection config defaults (envoyproxy#8221)

Signed-off-by: James Forcier <jforcier@grubhub.com>

* server: add a post init lifecycle stage (envoyproxy#8217)

Signed-off-by: Jose Nino <jnino@lyft.com>

* docs: document access control conditions and attributes (envoyproxy#8230)

Signed-off-by: Kuat Yessenov <kuat@google.com>

* server: return processContext as optional reference (envoyproxy#8238)

Signed-off-by: Elisha Ziskind <eziskind@google.com>

* Update envoy.yaml in Redis proxy example (envoyproxy#8220)

Description: Make Redis example use catch_all_route.
Risk Level: Low.
Testing: Done. docker-compose up --build brings up envoy proxy and I was able to run Redis commands using redis-cli.

Signed-off-by: Raju Kadam <rkadam@atlassian.com>

* quiche: implement ActiveQuicListener (envoyproxy#7896)

Signed-off-by: Dan Zhang <danzh@google.com>

* srds: allow SRDS pass on scope-not-found queries to filter-chain (issue envoyproxy#8236).  (envoyproxy#8239)

Description: Allow a no-scope request to pass through the filter chain, so that some special queries (e.g., data plane health-check ) can be processed by the customized filter-chain. By default, the behavior is the same (404).
Risk Level: LOW
Testing: unit test and integration test.
Docs Changes: N/A
Release Notes: N/A
Fixes envoyproxy#8236
Signed-off-by: Xin Zhuang <stevenzzz@google.com>

* Updated to new envoyproxy master branch.

Signed-off-by: John Plevyak <jplevyak@gmail.com>

* Remove offending go proto option.

Signed-off-by: John Plevyak <jplevyak@gmail.com>

* Fix format/tidy issues.

Signed-off-by: John Plevyak <jplevyak@gmail.com>
@antoniovicente
Copy link
Contributor

I have noticed that in my local builds the following tests fail with a dynamic load exception. Is this the same problem? Are there known work-arounds?

Tests:
//test/extensions/tracers/dynamic_ot:config_test
//test/extensions/tracers/dynamic_ot:dynamic_opentracing_driver_impl_test

Failure:
unknown file: Failure
C++ exception with description "opentracing: failed to load dynamic library: /home/avd/.cache/bazel/_bazel_avd/1e20bf3ad6fa0170c07bc768f0392847/sandbox/linux-sandbox/5154/execroot/envoy/bazel-out/k8-fastbuild/bin/test/extensions/tracers/dynamic_ot/config_test.runfiles/io_opentracing_cpp/mocktracer/libmocktracer_plugin.so: undefined symbol: OpenTracingMakeTracerFactory" thrown in the test body.

@PiotrSikora
Copy link
Contributor

@antoniovicente it's because of libc++ vs stdlibc++, see: #4251 (comment).

@antoniovicente
Copy link
Contributor

Are there known work arounds to make the two tests in question succeed on local builds?

@PiotrSikora
Copy link
Contributor

Oops, that would indeed be a more helpful response. Try using --config=libc++:

bazel build --config=libc++ //source/exe:envoy
bazel test  --config=libc++ //test/...

@antoniovicente
Copy link
Contributor

bazel test --config=libc++ seems to trigger a different set of problems on tests like //test/common/access_log:access_log_formatter_fuzz_test at least on my machine.

/home/avd/.cache/bazel/_bazel_avd/1e20bf3ad6fa0170c07bc768f0392847/sandbox/linux-sandbox/7410/execroot/envoy/bazel-out/k8-fastbuild/bin/test/common/access_log/access_log_formatter_fuzz_test.runfiles/envoy/test/common/access_log/access_log_formatter_fuzz_test: error while loading shared libraries: libc++.so.1: cannot open shared object file: No such file or directory

@jmarantz
Copy link
Contributor

I've punted this one... Those two tests fail for me always when running locally, and I just count on CI at this point.

Not a great state and I would love for a solution to exist for however our workstations are configured.

@lizan
Copy link
Member Author

lizan commented Oct 23, 2019

@antoniovicente likely you didn't install libc++ to standard location, and didn't add the path to your ldconfig

@PiotrSikora
Copy link
Contributor

@lizan aren't tests linked statically anymore?

@lizan
Copy link
Member Author

lizan commented Oct 24, 2019

@PiotrSikora fuzzer targets are always dynamic linking libc++

@aishwaryabk
Copy link

aishwaryabk commented Feb 26, 2020

@lizan Is this issue solved? I am working on testing envoy on rhel7.6 ppc64le. I am facing similar errors in
//test/extensions/tracers/dynamic_ot:config_test
//test/extensions/tracers/dynamic_ot:dynamic_opentracing_driver_impl_test

Failure:
unknown file: Failure
C++ exception with description "opentracing: failed to load dynamic library: /root/.cache/bazel/_bazel_root/37e3aec351bcd85a6ea8b58e3592ef6e/sandbox/processwrapper-sandbox/5841/execroot/envoy/bazel-out/ppc-fastbuild/bin/test/extensions/tracers/dynamic_ot/config_test.runfiles/envoy/external/io_opentracing_cpp/mocktracer/libmocktracer_plugin.so: undefined symbol: OpenTracingMakeTracerFactory" thrown in the test body.
[ FAILED ] DynamicOtTracerConfigTest.DynamicOpentracingHttpTracer

Thread finding failed with -1 errno=1
Could not find thread stacks. Will likely report false leak positives.
Have memory regions w/o callers: might report false leaks
No leaks found for check "_main_" (but no 100% guarantee that there aren't any): found 12009 reachable heap objects of 942391 bytes

Would like some help on understanding the same.

@mattklein123 mattklein123 removed the no stalebot Disables stalebot from closing an issue label Jul 9, 2021
@github-actions
Copy link

github-actions bot commented Aug 9, 2021

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale stalebot believes this issue/PR has not been touched recently label Aug 9, 2021
@github-actions
Copy link

This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted" or "no stalebot". Thank you for your contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug stale stalebot believes this issue/PR has not been touched recently
Projects
None yet
Development

No branches or pull requests

8 participants