Skip to content
This repository has been archived by the owner on Jan 30, 2020. It is now read-only.

PUT HTTP API return status 500 without human-readable message #1345

Open
gfazioli opened this issue Sep 7, 2015 · 11 comments
Open

PUT HTTP API return status 500 without human-readable message #1345

gfazioli opened this issue Sep 7, 2015 · 11 comments

Comments

@gfazioli
Copy link

gfazioli commented Sep 7, 2015

Hi there,
I have a PHP loop with 10, 20 or 30 CURL to Fleet API. I used it as STRESS TEST

CoreOS stable 723.3.0 
fleetctl 0.10.2

This is my method class

  public function putUnit( $unitId, $options, $desiredState = 'launched' )
  {
    // Path
    $path = '/fleet/v1/units/' . $unitId;

    // Build the post fields - default only the desired state
    $postFields = [
      'desiredState' => $desiredState,
      'options'      => $options
    ];

    $postFields = json_encode( $postFields );

    $ch = curl_init( $this->endpoint . $path );
    curl_setopt( $ch, CURLOPT_RETURNTRANSFER, true );
    curl_setopt( $ch, CURLOPT_FOLLOWLOCATION, true );
    curl_setopt( $ch, CURLOPT_CUSTOMREQUEST, 'PUT' );
    curl_setopt( $ch, CURLOPT_POSTFIELDS, $postFields );
    curl_setopt( $ch, CURLOPT_HTTPHEADER, array(
                      'Content-Type: application/json',
                      'Content-Length: ' . strlen( $postFields )
                    )
    );
    $res        = curl_exec( $ch );
    $httpStatus = curl_getinfo( $ch, CURLINFO_HTTP_CODE );
    curl_close( $ch );

    $result = [
      'result'       => $res,
      'status'       => $httpStatus,
      'unit'         => $unitId,
      'options'      => $options,
      'desiredState' => $desiredState
    ];

    return $result;

  }

But sometimes (randomly), the $res = curl_exec( $ch ); return 500 HTTP Status with a blank human-readable message. Consider that $unitId and $options are very simple and I keep them equals for each test.

Any suggestions?

Thanks in advance

@jonboulle
Copy link
Contributor

The place to start is probably looking at the fleet engine's logs and
looking for errors there (in particular at the time you are receiving 500s
on the client)

On Mon, Sep 7, 2015 at 8:48 AM, Giovambattista Fazioli <
notifications@github.com> wrote:

Hi there,
I have a PHP loop with 10, 20 or 30 CURL to Fleet API. I used it as STRESS
TEST

CoreOS stable 723.3.0
fleetctl 0.10.2

This is my method class

public function putUnit( $unitId, $options, $desiredState = 'launched' ) { // Path $path = '/fleet/v1/units/' . $unitId; // Build the post fields - default only the desired state $postFields = [ 'desiredState' => $desiredState, 'options' => $options ]; $postFields = json_encode( $postFields ); $ch = curl_init( $this->endpoint . $path ); curl_setopt( $ch, CURLOPT_RETURNTRANSFER, true ); curl_setopt( $ch, CURLOPT_FOLLOWLOCATION, true ); curl_setopt( $ch, CURLOPT_CUSTOMREQUEST, 'PUT' ); curl_setopt( $ch, CURLOPT_POSTFIELDS, $postFields ); curl_setopt( $ch, CURLOPT_HTTPHEADER, array( 'Content-Type: application/json', 'Content-Length: ' . strlen( $postFields ) ) ); $res = curl_exec( $ch ); $httpStatus = curl_getinfo( $ch, CURLINFO_HTTP_CODE ); curl_close( $ch ); $result = [ 'result' => $res, 'status' => $httpStatus, 'unit' => $unitId, 'options' => $options, 'desiredState' => $desiredState ]; return $result; }

But sometimes (randomly), the $res = curl_exec( $ch ); return 500
HTTP Status with a blank human-readable message. Consider that $unitId
and $options are very simple and I keep them equals for each test.

Any suggestions?

Thanks in advance


Reply to this email directly or view it on GitHub
#1345.

@wuqixuan
Copy link
Contributor

wuqixuan commented Sep 8, 2015

@gfazioli @jonboulle , yes, Http 500 error is the "Internal server error", it always does not give the exact internal reason to the browser. Need check the fleetd's logs.

@gfazioli
Copy link
Author

gfazioli commented Sep 8, 2015

Ok, the error is

ERROR units.go:223: Failed creating Unit(web_test-app-40.johnf.service) in Registry: timeout reached

Any ideas?

@jonboulle
Copy link
Contributor

That's indicative of etcd taking too long to respond, and fleet giving up
on the request - so now it seems most likely that performance of etcd is
your culprit. You could start exploring some of the metrics it exposes to
analyse its performance -
https://github.com/coreos/etcd/blob/master/Documentation/metrics.md

On Tue, Sep 8, 2015 at 10:23 AM, Giovambattista Fazioli <
notifications@github.com> wrote:

Ok, the error is

ERROR units.go:223: Failed creating Unit(web_test-app-40.johnf.service) in Registry: timeout reached

Any ideas?


Reply to this email directly or view it on GitHub
#1345 (comment).

@gfazioli
Copy link
Author

gfazioli commented Sep 9, 2015

@jonboulle Ok, I will try, thx

@wuqixuan
Copy link
Contributor

wuqixuan commented Sep 9, 2015

Yes, seems like etcd is the culprit.
Maybe not the performance issue, maybe etcd is stopping or some error.

func (ur *unitsResource) create(rw http.ResponseWriter, name string, u *schema.Unit) {
    if err := ur.cAPI.CreateUnit(u); err != nil {
        log.Errorf("Failed creating Unit(%s) in Registry: %v", u.Name, err)
        sendError(rw, http.StatusInternalServerError, nil)
        return
    }

    rw.WriteHeader(http.StatusCreated)
}

@nicolaballotta
Copy link

Hey guys I join the conversation cause @gfazioli is my collegue (he deals with the code and I'm dealing with the CoreOS cluster). I suspected an etcd2 culprit. This is our testing infra:

  • 3 machines acting as etcd2
  • 2 machines acting as workers (configured as etcd2 proxy)

In the last days we were trying to launch 100 units (with its discovery) for testing, across all the 5 machines (and not only the two workers). For this reason I suspected that etcd2 cluster was suffering.

Today we started seeing machines disappearing and reappearing again (never seen before). I restarted etcd2 service on all machines and this stopped. The only error I've seen on etcd2 logs is:

015/09/09 09:09:48 sender: error posting to 172991be8af6b7f8: unexpected http status Internal Server Error while posting to "http://192.168.101.11:2380/raft"

The next test we're going to do is to launch our units only on worker machines.

Btw any idea on how to reset etcd2 metrics? Atm I see a lot of getsFail, deleteFail, createFail and I can't understand if they are referred to the past or are new.

@nicolaballotta
Copy link

FYI these are etcd2 cluster stats

screen shot 2015-09-09 at 12 21 05

@xiang90
Copy link
Contributor

xiang90 commented Sep 9, 2015

@gfazioli You have to update etcd to 2.1 to get all the metrics. See https://github.com/coreos/etcd/blob/release-2.1/Documentation/metrics.md

@xiang90
Copy link
Contributor

xiang90 commented Sep 9, 2015

@gfazioli The easiest way to identify if it is an etcd issue is to monitoring the etcd log. You would see leader elections if etcd is not stable or under very heavy load.

@waddles
Copy link

waddles commented Jul 26, 2016

I'm pretty sure this is the same as #1650 but there is not enough information here to tell how fleet and/or fleetctl was being used.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants