Skip to content

Releases: openziti/ziti

v0.18.3

21 Jan 23:55
Compare
Choose a tag to compare

Release 0.18.3

What's New

  • Ziti executables that use JSON logging now emit timestamps that include fractional seconds.
    Timestamps remain in the RFC3339 format.
  • Authentication mechanisms now allow appId and appVersion in sdkInfo
  • Ziti executables that use JSON logging now emit timestamps that include fractional seconds.
    Timestamps remain in the RFC3339 format.
  • Improved query performance by caching antlr lexers and parsers. Testing showed 2x-10x performance
    improvement
  • Improve service list time by using indexes get related posture data
  • Improved service polling
  • Improved service policy enforcement - instead of polling this is now event based, which should
    result in slower cpu utlization on the controller
  • Fixed a bug in service policy PATCH which would trigger when the policy type wasn't sent
  • Support agent utilitiles (ziti ps) in ziti-tunnel
  • Cleanup ack handler goroutines when links shut down
  • Remove the following fabric metrics timers, as they degraded performance while being of low value
    • xgress.ack.handle_time
    • xgress.payload.handle_time
    • xgress.ack_write_time
    • xgress.payload_buffer_time
    • xgress.payload_relay_time
  • The check-data-integrity operation may now only run a single instance at a time
    • To start the check, ziti edge db start-check-integrity
    • To check the status of a run ziti edge db check-integrity-status
  • The build date in version info spelling has been fixed from builDate to buildDate
  • A new metric has been added for timing service list requests services.list
  • A bug was fixed in the tunneler which may have lead to leaked connections
  • Ziti Edge API configurable HTTP Timeouts
  • Add ziti log-format or ziti lf for short, for formating JSON log output as something more
    human readable
  • fabric#151 Add two timeout settings to the
    controller to configure how long route and dial should wait before timeout
    • terminationTimeoutSeconds - how long the router has to dial the service
    • routeTimeoutSeconds - how long a router has to respond to a route create/update message
  • fabric#158 Add a session creation timeout to the
    router. This controls how long the router will wait for fabric sessions to be created. This
    includes creating the router and dialing the end service, so the timeout should be at least as
    long as the controller terminationTimeoutSecondsand routeTimeoutSeconds added together
    • getSessionTimeout is specified in the router config under listeners: options:

Improved Service Polling

There's a new REST endpoint /current-api-session/service-updates, which will return the last time
services were changed. If there have been no service updates since the api session was established,
the api session create date/time will be returned. This endpoint can be polled to see if services
need to be refreshed. This will save network and cpu utilization on the client and controller.

Ziti Edge API configurable HTTP Timeouts

The controller configuration file now supports a httpTimeouts section under
edge.api. The section and all of its fields are optional and default to the values of previous
versions.

For production environments these values should be tuned for the networks intended userbase. The
quality and latency of the underlay between the networks endpoints/routers and controller should be
taken into account.

edge:
  ...
  api:
    ...
    httpTimeouts:
      # (optional, default 5s) readTimeoutMs is the maximum duration for reading the entire request, including the body.
      readTimeoutMs: 5000
      # (optional, default 0) readHeaderTimeoutMs is the amount of time allowed to read request headers.
      # The connection's read deadline is reset after reading the headers. If readHeaderTimeoutMs is zero, the value of
      # readTimeoutMs is used. If both are zero, there is no timeout.
      readHeaderTimeoutMs: 0
      # (optional, default 10000) writeTimeoutMs is the maximum duration before timing out writes of the response.
      writeTimeoutMs: 100000
      # (optional, default 5000) idleTimeoutMs is the maximum amount of time to wait for the next request when keep-alives are enabled
      idleTimeoutMs: 5000

v0.17.8

21 Jan 21:26
Compare
Choose a tag to compare

Release 0.17.8

  • Ziti Edge API configurable HTTP Timeouts
  • Add ziti log-format or ziti lf for short, for formating JSON log output as something more
    human readable
  • fabric#151 Add two timeout settings to the
    controller to configure how long route and dial should wait before timeout
    • terminationTimeoutSeconds - how long the router has to dial the service
    • routeTimeoutSeconds - how long a router has to respond to a route create/update message
  • fabric#158 Add a session creation timeout to the
    router. This controls how long the router will wait for fabric sessions to be created. This
    includes creating the router and dialing the end service, so the timeout should be at least as
    long as the controller terminationTimeoutSecondsand routeTimeoutSeconds added together
    • getSessionTimeout is specified in the router config under listeners: options:

Ziti Edge API configurable HTTP Timeouts

The controller configuration file now supports a httpTimeouts section under
edge.api. The section and all of its fields are optional and default to the values of previous
versions.

For production environments these values should be tuned for the networks intended userbase. The
quality and latency of the underlay between the networks endpoints/routers and controller should be
taken into account.

edge:
  ...
  api:
    ...
    httpTimeouts:
      # (optional, default 5s) readTimeoutMs is the maximum duration for reading the entire request, including the body.
      readTimeoutMs: 5000
      # (optional, default 0) readHeaderTimeoutMs is the amount of time allowed to read request headers.
      # The connection's read deadline is reset after reading the headers. If readHeaderTimeoutMs is zero, the value of
      # readTimeoutMs is used. If both are zero, there is no timeout.
      readHeaderTimeoutMs: 0
      # (optional, default 10000) writeTimeoutMs is the maximum duration before timing out writes of the response.
      writeTimeoutMs: 100000
      # (optional, default 5000) idleTimeoutMs is the maximum amount of time to wait for the next request when keep-alives are enabled
      idleTimeoutMs: 5000

v0.17.7

20 Jan 22:23
Compare
Choose a tag to compare

Release 0.17.7

This release has no fucntional changes only build process changes.

v0.17.6

20 Jan 19:40
Compare
Choose a tag to compare

Release 0.17.6

What's New

  • Ziti executables that use JSON logging now emit timestamps that include fractional seconds.
    Timestamps remain in the RFC3339 format.
  • Authentication mechanisms now allow appId and appVersion in sdkInfo
  • Ziti executables that use JSON logging now emit timestamps that include fractional seconds.
    Timestamps remain in the RFC3339 format.
  • Improved query performance by caching antlr lexers and parsers. Testing showed 2x-10x performance
    improvement
  • Improve service list time by using indexes get related posture data
  • Improved service polling
  • Improved service policy enforcement - instead of polling this is now event based, which should
    result in slower cpu utlization on the controller
  • Fixed a bug in service policy PATCH which would trigger when the policy type wasn't sent
  • Support agent utilitiles (ziti ps) in ziti-tunnel
  • Cleanup ack handler goroutines when links shut down
  • The check-data-integrity operation may now only run a single instance at a time
    • To start the check, ziti edge db start-check-integrity
    • To check the status of a run ziti edge db check-integrity-status
  • The build date in version info spelling has been fixed from builDate to buildDate
  • A new metric has been added for timing service list requests services.list
  • A bug was fixed in the tunneler which may have lead to leaked connections
  • Default hosting precedence and cost can now be configured for identities
  • Health checks can now be configured for the go based tunneler (ziti-tunnel) using server configs
  • ziti#177 ziti-tunnel has a new host mode, if you
    are only hosting services
  • edge session events now contain a timestamp
  • Improve log output for invalid API Session Tokens used to connect to Edge Routers
  • Logs default to no color output
  • API Session Certificate Support Added

Improved Service Polling

There's a new REST endpoint /current-api-session/service-updates, which will return the last time
services were changed. If there have been no service updates since the api session was established,
the api session create date/time will be returned. This endpoint can be polled to see if services
need to be refreshed. This will save network and cpu utilization on the client and controller.

Setting precedence and cost for tunneler hosted services

When the tunneler hosts services there was previously no way to specify the precedence and cost
associated with those services.
See Ziti XT documentation
for an overview of how precedence and cost relate to HA and load balancing.

There are now two new fields on identity:

  • defaultHostingPrecedence - value values are default, required and failed. Defaults
    to default.
  • defaultHostingCost - valid values are between 0 and 65535. Defaults to 0.

When hosting a service via the tunneler, the terminator for the SDK hosted service will be created
with the precedence and cost of the identity used by the tunneler.

NOTE: This means all services hosted by an identity will have the same precedence and cost.
We'll likely add support for service specific overrides in the future if/when use cases arise which
call for it. In the meantime, a work-around is to use multiple identities if you need different
values for different services.

CLI Support

The ziti CLI supports setting the default hosting precedence and cost when creating identities

SDK API Change

The GO SDK has a new API method GetCurrentIdentity() (*edge.CurrentIdentity, error) which lets SDK
users retrieve the currently logged in identity, including the default hosting precedence and cost.
This could be used by other SDK applications which may want to use the fields for the same reason
when hosting services.

Tunneler Health Checks

The go tunneler now supports health checks. Support for health checks may be added to other
tunnelers (such as ziti-edge-tunnel) in the future, but that is not guaranteed.

Health checks can be configured in the service configuration using the ziti-tunneler-server.v1
config type. Support in the host.v1 config type will be added when support for that config type is
added to the go tunneler.

Check Types

The tunneler supports two types of health check.

Port Checks

Port checks look to see if a host/port can be dialed. This is simple check which just ensures that
something is listening on a give host/port.

Port checks have the following properties:

  • interval - how often the check is performanced
  • timeout - how long to wait before declaring the check failed
  • address - the address to dial. Should be of the form :. Example: localhost:5432
  • actions - an array of actions to perform based on health check results. Actions will be discussed
    in more detail below

HTTP Checks

Http checks a specific URL. They support the following properties:

  • interval - how often the check is performanced
  • timeout - how long to wait before declaring the check failed
  • url - the url to connect to
  • method - the HTTP method to use. Maybe one of GET, POST, PUT or PATCH. Defaults to GET
  • body - the body of the HTTP request. Defaults to an empty string
  • expectStatus - the HTTP status to expect in the response. Defaults to 200
  • expectBody - an optional string to look for in the response body.
  • actions - an array of actions to perform based on health check results. Actions will be discussed
    in more detail below

Health Check Actions

Each health check may specify actions to execute when a health check runs.

Each action may specify:

  • trigger - valid values pass or fail. Specifies if the action should run when the check is
    passing or failing
  • consecutiveEvents - specifies if the action should only run after N consecutive passes or fails
  • duration - specifies if the action should only run after the check has been passing or failing for
    some period of time
  • action - specifies what to do when the action is run. valid values are:
    • mark healthy - the terminator precedence will be set to the default hosting precedence of
      the hosting identity
    • mark unhealthy - the terminator precedence will be set to failed
    • increase cost N - the terminator cost will be increased by N. This will only happen while
      the terminator precedence is not failed. Once the terminator has failed we don't keep
      increasing cost, otherwise it will likely reach max cost and take a long time to recover after
      it goes back to healthy.
    • decrease cost N - the terminator cost will be decrease by N to a minimuim. The terminator
      cost will not go below the hosting identity's default hosting cost

Examples

The following config defines a TCP service which can be reach at port 8171 on localhost. It has a
port check defined which runs every 5 seconds, with a timeout of 500 milliseconds. The following
actions are defined on the health check:

  1. The terminator will be marked failed after the health check has failed 10 times in a row.
  2. The terminator cost will be increased by 100 each time the health check fails while the
    terminator is not in failed state
  3. The terminator will be returned to a non-failed state if the health check is healthy for 10
    seconds
  4. Every time the health check passes the cost will be reduced by 25, until it hits the baseline
    cost defined by the hosting identity
{
    "protocol" : "tcp",
    "hostname" : "localhost",
    "port" : 8171,
    "portChecks" : [
        {
            "interval" : "5s",
            "timeout" : "500ms",
            "address" : "localhost:8171",
            "actions": [
                {
                    "action": "mark unhealthy",
                    "consecutiveEvents": 10,
                    "trigger": "fail"
                },
                {
                    "action": "increase cost 100",
                    "trigger": "fail"
                },
                {
                    "action": "mark healthy",
                    "duration": "10s",
                    "trigger": "pass"
                },
                {
                    "action": "decrease cost 25",
                    "trigger": "pass"
                }
            ]
        }
    ]
}

ziti-tunnel host command

The ziti-tunnel can now be run in a mode where it will only host services and will not intercept any
services.

Ex: ziti-tunnel host -i /path/to/identity.json

Schema Reference

For reference, here is the full, updated ziti-tunneler-server.v1 schema:

{
    "$id": "http://edge.openziti.org/schemas/ziti-tunneler-server.v1.config.json",
    "additionalProperties": false,
    "definitions": {
        "action": {
            "additionalProperties": false,
            "properties": {
                "action": {
                    "pattern": "(mark (un)?healthy|increase cost [0-9]+|decrease cost [0-9]+)",
                    "type": "string"
                },
                "consecutiveEvents": {
                    "maximum": 65535,
                    "minimum": 0,
                    "type": "integer"
                },
                "duration": {
                    "$ref": "#/definitions/duration"
                },
                "trigger": {
                    "enum": [
                        "fail",
                        "pass"
                    ],
                    "type": "string"
                }
            },
            "required": [
                "trigger",
                "action"
            ],
            "type": "object"
        },
        "actionList": {
            "items": {
                "$ref": "#/definitions/action"
            },
            "maxItems": 20,
            "minItems": 1,
            "type": "array"
        },
        "duration": {
            "pattern": "[0-9]+(h|m|s|ms)",
            "type": "str...
Read more

v0.18.2

12 Jan 19:27
Compare
Choose a tag to compare

Release 0.18.2

What's New

  • Default hosting precedence and cost can now be configured for identities
  • Health checks can now be configured for the go based tunneler (ziti-tunnel) using server configs
  • ziti#177 ziti-tunnel has a new host mode, if you
    are only hosting services
  • Changes to terminators (add/updated/delete/router online/router offline) will now generate events
    that can be emitted
  • fabric and edge session events now contain a timestamp

Setting precedence and cost for tunneler hosted services

When the tunneler hosts services there was previously no way to specify the precedence and cost
associated with those services.
See Ziti XT documentation
for an overview of how precedence and cost relate to HA and load balancing.

There are now two new fields on identity:

  • defaultHostingPrecedence - value values are default, required and failed. Defaults
    to default.
  • defaultHostingCost - valid values are between 0 and 65535. Defaults to 0.

When hosting a service via the tunneler, the terminator for the SDK hosted service will be created
with the precedence and cost of the identity used by the tunneler.

NOTE: This means all services hosted by an identity will have the same precedence and cost.
We'll likely add support for service specific overrides in the future if/when use cases arise which
call for it. In the meantime, a work-around is to use multiple identities if you need different
values for different services.

CLI Support

The ziti CLI supports setting the default hosting precedence and cost when creating identities

SDK API Change

The GO SDK has a new API method GetCurrentIdentity() (*edge.CurrentIdentity, error) which lets SDK
users retrieve the currently logged in identity, including the default hosting precedence and cost.
This could be used by other SDK applications which may want to use the fields for the same reason
when hosting services.

Tunneler Health Checks

The go tunneler now supports health checks. Support for health checks may be added to other
tunnelers (such as ziti-edge-tunnel) in the future, but that is not guaranteed.

Health checks can be configured in the service configuration using the ziti-tunneler-server.v1
config type. Support in the host.v1 config type will be added when support for that config type is
added to the go tunneler.

Check Types

The tunneler supports two types of health check.

Port Checks

Port checks look to see if a host/port can be dialed. This is simple check which just ensures that
something is listening on a give host/port.

Port checks have the following properties:

  • interval - how often the check is performanced
  • timeout - how long to wait before declaring the check failed
  • address - the address to dial. Should be of the form :. Example: localhost:5432
  • actions - an array of actions to perform based on health check results. Actions will be discussed
    in more detail below

HTTP Checks

Http checks a specific URL. They support the following properties:

  • interval - how often the check is performanced
  • timeout - how long to wait before declaring the check failed
  • url - the url to connect to
  • method - the HTTP method to use. Maybe one of GET, POST, PUT or PATCH. Defaults to GET
  • body - the body of the HTTP request. Defaults to an empty string
  • expectStatus - the HTTP status to expect in the response. Defaults to 200
  • expectBody - an optional string to look for in the response body.
  • actions - an array of actions to perform based on health check results. Actions will be discussed
    in more detail below

Health Check Actions

Each health check may specify actions to execute when a health check runs.

Each action may specify:

  • trigger - valid values pass or fail. Specifies if the action should run when the check is
    passing or failing
  • consecutiveEvents - specifies if the action should only run after N consecutive passes or fails
  • duration - specifies if the action should only run after the check has been passing or failing for
    some period of time
  • action - specifies what to do when the action is run. valid values are:
    • mark healthy - the terminator precedence will be set to the default hosting precedence of
      the hosting identity
    • mark unhealthy - the terminator precedence will be set to failed
    • increase cost N - the terminator cost will be increased by N. This will only happen while
      the terminator precedence is not failed. Once the terminator has failed we don't keep
      increasing cost, otherwise it will likely reach max cost and take a long time to recover after
      it goes back to healthy.
    • decrease cost N - the terminator cost will be decrease by N to a minimuim. The terminator
      cost will not go below the hosting identity's default hosting cost

Examples

The following config defines a TCP service which can be reach at port 8171 on localhost. It has a
port check defined which runs every 5 seconds, with a timeout of 500 milliseconds. The following
actions are defined on the health check:

  1. The terminator will be marked failed after the health check has failed 10 times in a row.
  2. The terminator cost will be increased by 100 each time the health check fails while the
    terminator is not in failed state
  3. The terminator will be returned to a non-failed state if the health check is healthy for 10
    seconds
  4. Every time the health check passes the cost will be reduced by 25, until it hits the baseline
    cost defined by the hosting identity
{
    "protocol" : "tcp",
    "hostname" : "localhost",
    "port" : 8171,
    "portChecks" : [
        {
            "interval" : "5s",
            "timeout" : "500ms",
            "address" : "localhost:8171",
            "actions": [
                {
                    "action": "mark unhealthy",
                    "consecutiveEvents": 10,
                    "trigger": "fail"
                },
                {
                    "action": "increase cost 100",
                    "trigger": "fail"
                },
                {
                    "action": "mark healthy",
                    "duration": "10s",
                    "trigger": "pass"
                },
                {
                    "action": "decrease cost 25",
                    "trigger": "pass"
                }
            ]
        }
    ]
}

ziti-tunnel host command

The ziti-tunnel can now be run in a mode where it will only host services and will not intercept any
services.

Ex: ziti-tunnel host -i /path/to/identity.json

Schema Reference

For reference, here is the full, updated ziti-tunneler-server.v1 schema:

{
    "$id": "http://edge.openziti.org/schemas/ziti-tunneler-server.v1.config.json",
    "additionalProperties": false,
    "definitions": {
        "action": {
            "additionalProperties": false,
            "properties": {
                "action": {
                    "pattern": "(mark (un)?healthy|increase cost [0-9]+|decrease cost [0-9]+)",
                    "type": "string"
                },
                "consecutiveEvents": {
                    "maximum": 65535,
                    "minimum": 0,
                    "type": "integer"
                },
                "duration": {
                    "$ref": "#/definitions/duration"
                },
                "trigger": {
                    "enum": [
                        "fail",
                        "pass"
                    ],
                    "type": "string"
                }
            },
            "required": [
                "trigger",
                "action"
            ],
            "type": "object"
        },
        "actionList": {
            "items": {
                "$ref": "#/definitions/action"
            },
            "maxItems": 20,
            "minItems": 1,
            "type": "array"
        },
        "duration": {
            "pattern": "[0-9]+(h|m|s|ms)",
            "type": "string"
        },
        "httpCheck": {
            "additionalProperties": false,
            "properties": {
                "actions": {
                    "$ref": "#/definitions/actionList"
                },
                "body": {
                    "type": "string"
                },
                "expectInBody": {
                    "type": "string"
                },
                "expectStatus": {
                    "maximum": 599,
                    "minimum": 100,
                    "type": "integer"
                },
                "interval": {
                    "$ref": "#/definitions/duration"
                },
                "method": {
                    "$ref": "#/definitions/method"
                },
                "timeout": {
                    "$ref": "#/definitions/duration"
                },
                "url": {
                    "type": "string"
                }
            },
            "required": [
                "interval",
                "timeout",
                "url"
            ],
            "type": "object"
        },
        "httpCheckList": {
            "items": {
                "$ref": "#/definitions/httpCheck"
            },
            "type": "array"
        },
        "method": {
            "enum": [
                "GET",
                "POST",
                "PUT",
                "PATCH"
            ],
            "type": "string"
        },
        "portCheck": {
            "additionalProperties": false,
            "properties": {
                "actions": {
                    "$ref": "#/definitions/actionList"
                },
                "address": {
                    "type": "string"
                },
                "interval": {
                    "$ref": "#/definitions/duration"
...
Read more

v0.18.1

07 Jan 21:15
Compare
Choose a tag to compare

Release 0.18.1

  • Improve log output for invalid API Session Tokens used to connect to Edge Routers
  • Logs default to no color output
  • API Session Certificate Support Added

Logs default to no color output

Logs generated by Ziti components written in Go (Controller, Router, SDK) will
no longer output ANSI color control characters by default. Color logs can be
enabled by setting in the environment variable PFXLOG_USE_COLOR to any
truthy value: 1, t, T, TRUE, true, True, 0, f, F, FALSE, false, False.

API Session Certificate Support Added

All authentication mechanisms can now bootstrap key pairs via an authenticated session
using API Session Certificates. These key pairs involve authenticating, preparing an
X509 Certificate Signing Request (CSR), and then submitting the CSR for processing.
The output is an ephemeral certificate tied to that session that can be used to
connect to Edge Routers for session dial/binds.

New Endpoints:

  • current-api-session/certificates
    • GET - lists current API Session Certificates
    • POST - create a new API Session Certificate (accepts a JSON payload with a csr field)
  • current-api-session/certificates/
    • GET - retrieves a specific API Session Certificate
    • DELETE - removes a specific API Session Certificate

API Session Certificates have a 12hr life span. New certificates can be created
before previous ones expire and be used for reconnection.

v0.18.0

17 Dec 20:00
Compare
Choose a tag to compare

Release 0.18.0

What's New

  • ziti#253 ziti-tunnel enroll should set non-zero
    exit status if an error occur
  • Rewrite of Xgress with the following goals
    • Fix deadlocks at high throughput
    • Fix stalls when some endpoints are slower than others
    • Improve windowing/retransmission by pulling forward some concepts from Michael Quigley's
      transwarp work
    • Split xgress links into two separate connections, one for data and one for acks
  • Allow hosting applications to mark incoming connections as failed. Update go tunneler so when a
    dial fails for hosted services, the failure gets propagated back to controller
  • Streamline edge hosting protocol by allowing router to assign connection ids
  • Edge REST query failures should now result in 4xx errors instead of 500 internal server errors
  • Fixed bug where listing terminators via ziti edge would fail when terminators referenced pure
    fabric services

Xgress Rewrite

Overview

This rewrite fixed several deadlocks observed at high throughput. It also tries to ensure that slow
clients attached to a router can't block traffic/processing for faster clients. It does this by
dropping data for a client if the client isn't handling incoming traffic quickly enough. Dropped
payloads will be retransmitted. The new xgress implementation uses similar windowing and
retransmission strategies to the upcoming transwarp work.

Backwards Compatability

0.18+ routers will probably work with older router versions, but probably not well. 0.18+ xgress
instances expect to get round trip times and receive buffer sizes on ack messages. If they don't get
them then retransmission will likely be either too agressive or not aggressive enough.

Mixing 0.18+ routers with older router versions is not recommended without doing more testing first.

Xgress Options Changes

Added

  • txQueueSize - Number of payloads that can be queued for processing per client. Default value: 1
  • txPortalStartSize - Initial size of send window. Default value: 16Kb
  • txPortalMinSize - Smallest allowed send window size. Default value: 16Kb
  • txPortalMaxSize - Largest allowed send window size. Default value: 4MB
  • txPortalIncreaseThresh - Number of successful aks after which to increase send portal size:
    Default value: 224
  • txPortalIncreaseScale - Send portal will be increased by amount of data sent since last
    retransmission. This controls how much to scale that amount by. Default value: 1.0
  • txPortalRetxThresh - Number of retransmits after which to scale the send window. Default value: 64
  • txPortalRetxScale - Amount by which to scale the send window after the retransmission threshold is
    hit. Default value: 0.75
  • txPortalDupAckThresh - Number of duplicates acks after which to scale the send window. Default
    value: 64
  • txPortalDupAckScale - Amount by which to scale the send window after the duplicate ack threshold
    is hit. Default value: 0.9
  • rxBufferSize - Receive buffer size. Default value: 4MB
  • retxStartMs - Time after which, if no ack has been received, a payload should be queued for
    retransmission. Default value: 200ms
  • retxScale - Amount by which to scale the retranmission timeout, which is calculated from the round
    trip time. Default value: 2.0
  • retxAddMs - Amount to add to the retransmission timeout after it has been scaled. Default value: 0
  • maxCloseWaitMs - Maximum amount of time to wait for queued payloads to be
    acknowledged/retransmitted after an xgress session has been closed. If queued payloads are all
    acknowledged before this timeout is hit, the xgress session will be closed sooner. Default value:
    30s

REMOVED: The retransmission option is no longer available. Retransmission can't be toggled off
anymore as that would lead to lossy connections.

Xgress Metrics Changes

New metrics were introduced as part of the rewrite.

NOTE: Some of these metrics were introduced to try and find places where tuning was required.
They may not be interesting or useful in the long term and may be removed in a future release.

The new metrics include:

New Meters

  • xgress.dropped_payloads
    • The count and rates payloads being dropped
  • xgress.retransmissions
    • The count and rates payloads being retransmitted
  • xgress.retransmission_failures
    • The count and rates payloads being retransmitted where the send fails
  • xgress.rx.acks
    • The count and rates of acks being received
  • xgress.tx.acks
    • The count and rates of acks being sent
  • xgress.ack_failures
    • The count and rates of acks being sent where the send fails
  • xgress.ack_duplicates
    • The count and rates of duplicate acks received

New Histograms

  • xgress.rtt
    • Round trip time statistics aggregated across all xgress instances
  • xgress.tx_window_size
    • Local window size statistics aggregated across all xgress instances
  • xgress.tx_buffer_size
    • Local send buffer size statistics aggregated across all xgress instances
  • xgress.local.rx_buffer_bytes_size
    • Receive buffer size statistics in bytes aggregated across all xgress instances
  • xgress.local.rx_buffer_msgs_size
    • Receive buffer size statistics in number of messages aggregated across all xgress instances
  • xgress.remote.rx_buffer_size
    • Receive buffer size from remote systems statistics aggregated across all xgress instances
  • xgress.tx_buffer_size
    • Receive buffer size from remote systems statistics aggregated across all xgress instances

New Timers

  • xgress.tx_write_time
    • Times how long it takes to write xgress payloads from xgress to the endpoint
  • xgress.tx_write_time
    • Times how long it takes to write acks to the link
  • xgress.payload_buffer_time
    • Times how long it takes to process xgress payloads coming off the link (mostly getting them
      into the receive buffer)
  • xgress.payload_relay_time
    • Times how long it takes to get xgress payloads out of the recieve buffer and queued to be sent

New Gauges

  • xgress.blocked_by_local_window
    • Count of how many xgress instances are blocked because the local tranmit buffer size equals or
      exceeds the window size
  • xgress.blocked_by_local_window
    • Count of how many xgress instances are blocked because the remote receive buffer size equals
      or exceeds the window size
  • xgress.tx_unacked_payloads
    • Count of payloads in the transmit buffer
  • xgress.tx_unacked_payload_bytes
    • Size in bytes of the transmit buffer

Split Links

The fabric will now create two channels for each link, one for data and the other for acks. When
establishing links the dialing side will attach headers indicating the channel type and a shared
link ID. If the receiving side doesn't support split links then it will treat both channels as
regular links and send both data and acks over both.

If an older router dials a router expecting split links it won't have the link type and will be
treated as a regular, non-split link.

Allow SDK Hosting Applications to propagate Dial Failures

The service terminator strategies use dial failures to adjust terminator weights and/or mark
terminators as failed. Previously SDK applications didn't have a way to mark a dial as failed. If
the SDK was hosting an application, this was generally not a problem. If the application could be
reached, it wouldn't want to mark an incoming connection as failed. However, the tunneler is just
proxying connections. It wants to be able to reach out to another application when the service is
dialed and proxy data. If the dial fails, it wants to be able to notify the controller that the
application wasn't reachable. The golang SDK now has the capability.

There is a new API on edge.Listener.

	AcceptEdge() (Conn, error)

The Conn returned here is an edge.Conn (which extends net.Conn). edge.Conn has two new APIs.

	CompleteAcceptSuccess() error
	CompleteAcceptFailed(err error)

If ListenWithOptions is called with the ManualStart: true in the provided options, the
connection won't be established until CompleteAcceptSuccess is called. Writing or reading the
connection before call that method will have undefined results.

The ziti-tunnel has been updated to use this API, and so should now work correctly with the various
terminator strategies.

Edge Hosting Dial Protocol Enhancement

When establishing a new virtual connection to hosted SDK application the router had to execute the
following steps:

  1. Send a Dial message to the sdk application
  2. Receive the dial response, which included the sdk generaetd connection id.
  3. Create the router side virtual connection with the new id and register it
  4. Create the xgress instance tied to the new connection
  5. Now that the xgress is created, send a message to the sdk application letting it now that it can
    start sending traffic

If the connection id could be established on the router, we could simplify things as follows

  1. Create the router side virtual connection with the new id and register it
  2. Create the xgress instance tied to the new connection
  3. Send the dial mesasge to the sdk with the connection id
  4. Receive the response and return the result to the controller

We didn't do this previously because the sdk controls ids for outbound connection. To enable this we
have split the 32 bit id range in half. The top half is now reserved for hosted connection ids. This
behavior is controlled by the SDK, which requests it when it binds uisng a boolean flag. The new
flag is:

    RouterProvidedConnId = 1012

If the bind result from the router has the same flag set to true, then the sdk will expect Dial
messages from the router to have a connection id provided in the header keyed with the same 1012.

This means that this feature should be both backwards and forward compatible.

v0.17.5

03 Dec 22:00
Compare
Choose a tag to compare

Release 0.17.5

What's New

  • Builds have been moved from travis.org to Github Actions

  • IDs generated for entities in the Edge no longer use underscores and instead use periods to avoid issues when used as a common name in CSRs

  • edge#424 Authenticated, non-admin, clients can query service terminators

  • sdk-golang#112 Process checks for Windows are case-insensitive

  • The CLI agent now runs over unix sockets and is enabled by default. See doc/ops-agent.md for details in the ziti repository.

  • ziti#245 Make timeout used by CLI's internal REST client configurable via cmd line arg

    All ziti edge controller subcommands now support the --timeout=n flag which controls the internal REST-client timeout used when communicating with the Controller. The timeout resolution is in seconds. If this flag is not specified, the default is 5. Prior to this release, the the REST-client timeout was always 2. You now have the opportunity to increase the timeout if necessary (e.g. if large amounts of data are being queried).

    All ziti edge controller subcommands now support the --verbose flag which will cause internal REST-client to emit debugging information concerning HTTP headers, status, raw json response data, and more. You now have the opportunity to see much more information, which could be valuable during trouble-shooting.