PUT HTTP API return status 500 without human-readable message #1345

gfazioli · 2015-09-07T15:48:03Z

Hi there,
I have a PHP loop with 10, 20 or 30 CURL to Fleet API. I used it as STRESS TEST

CoreOS stable 723.3.0 
fleetctl 0.10.2

This is my method class

  public function putUnit( $unitId, $options, $desiredState = 'launched' )
  {
    // Path
    $path = '/fleet/v1/units/' . $unitId;

    // Build the post fields - default only the desired state
    $postFields = [
      'desiredState' => $desiredState,
      'options'      => $options
    ];

    $postFields = json_encode( $postFields );

    $ch = curl_init( $this->endpoint . $path );
    curl_setopt( $ch, CURLOPT_RETURNTRANSFER, true );
    curl_setopt( $ch, CURLOPT_FOLLOWLOCATION, true );
    curl_setopt( $ch, CURLOPT_CUSTOMREQUEST, 'PUT' );
    curl_setopt( $ch, CURLOPT_POSTFIELDS, $postFields );
    curl_setopt( $ch, CURLOPT_HTTPHEADER, array(
                      'Content-Type: application/json',
                      'Content-Length: ' . strlen( $postFields )
                    )
    );
    $res        = curl_exec( $ch );
    $httpStatus = curl_getinfo( $ch, CURLINFO_HTTP_CODE );
    curl_close( $ch );

    $result = [
      'result'       => $res,
      'status'       => $httpStatus,
      'unit'         => $unitId,
      'options'      => $options,
      'desiredState' => $desiredState
    ];

    return $result;

  }

But sometimes (randomly), the $res = curl_exec( $ch ); return 500 HTTP Status with a blank human-readable message. Consider that $unitId and $options are very simple and I keep them equals for each test.

Any suggestions?

Thanks in advance

The text was updated successfully, but these errors were encountered:

jonboulle · 2015-09-07T22:39:12Z

The place to start is probably looking at the fleet engine's logs and
looking for errors there (in particular at the time you are receiving 500s
on the client)

On Mon, Sep 7, 2015 at 8:48 AM, Giovambattista Fazioli <
notifications@github.com> wrote:

Hi there,
I have a PHP loop with 10, 20 or 30 CURL to Fleet API. I used it as STRESS
TEST

CoreOS stable 723.3.0
fleetctl 0.10.2

This is my method class

public function putUnit( $unitId, $options, $desiredState = 'launched' ) { // Path $path = '/fleet/v1/units/' . $unitId; // Build the post fields - default only the desired state $postFields = [ 'desiredState' => $desiredState, 'options' => $options ]; $postFields = json_encode( $postFields ); $ch = curl_init( $this->endpoint . $path ); curl_setopt( $ch, CURLOPT_RETURNTRANSFER, true ); curl_setopt( $ch, CURLOPT_FOLLOWLOCATION, true ); curl_setopt( $ch, CURLOPT_CUSTOMREQUEST, 'PUT' ); curl_setopt( $ch, CURLOPT_POSTFIELDS, $postFields ); curl_setopt( $ch, CURLOPT_HTTPHEADER, array( 'Content-Type: application/json', 'Content-Length: ' . strlen( $postFields ) ) ); $res = curl_exec( $ch ); $httpStatus = curl_getinfo( $ch, CURLINFO_HTTP_CODE ); curl_close( $ch ); $result = [ 'result' => $res, 'status' => $httpStatus, 'unit' => $unitId, 'options' => $options, 'desiredState' => $desiredState ]; return $result; }

But sometimes (randomly), the $res = curl_exec( $ch ); return 500
HTTP Status with a blank human-readable message. Consider that $unitId
and $options are very simple and I keep them equals for each test.

Any suggestions?

Thanks in advance

—
Reply to this email directly or view it on GitHub
#1345.

wuqixuan · 2015-09-08T11:30:51Z

@gfazioli @jonboulle , yes, Http 500 error is the "Internal server error", it always does not give the exact internal reason to the browser. Need check the fleetd's logs.

gfazioli · 2015-09-08T17:23:15Z

Ok, the error is

ERROR units.go:223: Failed creating Unit(web_test-app-40.johnf.service) in Registry: timeout reached

Any ideas?

jonboulle · 2015-09-08T17:38:38Z

That's indicative of etcd taking too long to respond, and fleet giving up
on the request - so now it seems most likely that performance of etcd is
your culprit. You could start exploring some of the metrics it exposes to
analyse its performance -
https://github.com/coreos/etcd/blob/master/Documentation/metrics.md

On Tue, Sep 8, 2015 at 10:23 AM, Giovambattista Fazioli <
notifications@github.com> wrote:

Ok, the error is

ERROR units.go:223: Failed creating Unit(web_test-app-40.johnf.service) in Registry: timeout reached

Any ideas?

—
Reply to this email directly or view it on GitHub
#1345 (comment).

gfazioli · 2015-09-09T08:17:42Z

@jonboulle Ok, I will try, thx

wuqixuan · 2015-09-09T09:19:09Z

Yes, seems like etcd is the culprit.
Maybe not the performance issue, maybe etcd is stopping or some error.

func (ur *unitsResource) create(rw http.ResponseWriter, name string, u *schema.Unit) {
    if err := ur.cAPI.CreateUnit(u); err != nil {
        log.Errorf("Failed creating Unit(%s) in Registry: %v", u.Name, err)
        sendError(rw, http.StatusInternalServerError, nil)
        return
    }

    rw.WriteHeader(http.StatusCreated)
}

nicolaballotta · 2015-09-09T09:33:37Z

Hey guys I join the conversation cause @gfazioli is my collegue (he deals with the code and I'm dealing with the CoreOS cluster). I suspected an etcd2 culprit. This is our testing infra:

3 machines acting as etcd2
2 machines acting as workers (configured as etcd2 proxy)

In the last days we were trying to launch 100 units (with its discovery) for testing, across all the 5 machines (and not only the two workers). For this reason I suspected that etcd2 cluster was suffering.

Today we started seeing machines disappearing and reappearing again (never seen before). I restarted etcd2 service on all machines and this stopped. The only error I've seen on etcd2 logs is:

015/09/09 09:09:48 sender: error posting to 172991be8af6b7f8: unexpected http status Internal Server Error while posting to "http://192.168.101.11:2380/raft"

The next test we're going to do is to launch our units only on worker machines.

Btw any idea on how to reset etcd2 metrics? Atm I see a lot of getsFail, deleteFail, createFail and I can't understand if they are referred to the past or are new.

nicolaballotta · 2015-09-09T10:21:44Z

FYI these are etcd2 cluster stats

xiang90 · 2015-09-09T17:19:16Z

@gfazioli You have to update etcd to 2.1 to get all the metrics. See https://github.com/coreos/etcd/blob/release-2.1/Documentation/metrics.md

xiang90 · 2015-09-09T17:21:33Z

@gfazioli The easiest way to identify if it is an etcd issue is to monitoring the etcd log. You would see leader elections if etcd is not stable or under very heavy load.

waddles · 2016-07-26T05:10:16Z

I'm pretty sure this is the same as #1650 but there is not enough information here to tell how fleet and/or fleetctl was being used.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PUT HTTP API return status 500 without human-readable message #1345

PUT HTTP API return status 500 without human-readable message #1345

gfazioli commented Sep 7, 2015

jonboulle commented Sep 7, 2015

wuqixuan commented Sep 8, 2015

gfazioli commented Sep 8, 2015

jonboulle commented Sep 8, 2015

gfazioli commented Sep 9, 2015

wuqixuan commented Sep 9, 2015

nicolaballotta commented Sep 9, 2015

nicolaballotta commented Sep 9, 2015

xiang90 commented Sep 9, 2015

xiang90 commented Sep 9, 2015

waddles commented Jul 26, 2016

PUT HTTP API return status 500 without human-readable message #1345

PUT HTTP API return status 500 without human-readable message #1345

Comments

gfazioli commented Sep 7, 2015

jonboulle commented Sep 7, 2015

wuqixuan commented Sep 8, 2015

gfazioli commented Sep 8, 2015

jonboulle commented Sep 8, 2015

gfazioli commented Sep 9, 2015

wuqixuan commented Sep 9, 2015

nicolaballotta commented Sep 9, 2015

nicolaballotta commented Sep 9, 2015

xiang90 commented Sep 9, 2015

xiang90 commented Sep 9, 2015

waddles commented Jul 26, 2016