Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AxeOS based firmware updates fail #213

Closed
skot opened this issue Jun 10, 2024 · 2 comments
Closed

AxeOS based firmware updates fail #213

skot opened this issue Jun 10, 2024 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@skot
Copy link
Owner

skot commented Jun 10, 2024

In some cases the AxeOS dashboard firmware updater fails, leaving the Bitaxe in a state where it does not boot.
Usually the failure happens during the www.bin image update.
In all of these cases (afaik) the bitaxe can be recovered with a USB firmware flash.

@skot skot added the bug Something isn't working label Jun 10, 2024
@skot skot self-assigned this Jun 10, 2024
@tdb3
Copy link
Contributor

tdb3 commented Jun 12, 2024

Do users encountering the issue see a particular error response (e.g. one of the 500 reason phrases returned)? Could help see what error is encountered frequently.

Looking at the code below, it would make sense that the Bitaxe is left in a state where it doesn't boot with the web interface available. The partition is being erased optimistically with the assumption that the subsequent write/update will work. If it doesn't, there wouldn't be a www.bin to load on next boot.

esp_err_t POST_WWW_update(httpd_req_t * req)
{
char buf[1000];
int remaining = req->content_len;
const esp_partition_t * www_partition =
esp_partition_find_first(ESP_PARTITION_TYPE_DATA, ESP_PARTITION_SUBTYPE_DATA_SPIFFS, "www");
if (www_partition == NULL) {
httpd_resp_send_err(req, HTTPD_500_INTERNAL_SERVER_ERROR, "WWW partition not found");
return ESP_FAIL;
}
// Erase the entire www partition before writing
ESP_ERROR_CHECK(esp_partition_erase_range(www_partition, 0, www_partition->size));
while (remaining > 0) {
int recv_len = httpd_req_recv(req, buf, MIN(remaining, sizeof(buf)));
if (recv_len == HTTPD_SOCK_ERR_TIMEOUT) {
continue;
} else if (recv_len <= 0) {
httpd_resp_send_err(req, HTTPD_500_INTERNAL_SERVER_ERROR, "Protocol Error");
return ESP_FAIL;
}
if (esp_partition_write(www_partition, www_partition->size - remaining, (const void *) buf, recv_len) != ESP_OK) {
httpd_resp_send_err(req, HTTPD_500_INTERNAL_SERVER_ERROR, "Write Error");
return ESP_FAIL;
}
remaining -= recv_len;
}
httpd_resp_sendstr(req, "WWW update complete\n");
return ESP_OK;
}

Not sure if we have enough space for this, but some devices handle this by having two partitions and flipping between the two. Updates are written to the partition not currently being used. On next boot, the updated partition is tried. If loading is unsuccessful, execution falls back / fails safe to the partition that still functions, and the update can be tried again.

One alternative, if space is a luxury, would be to have three partitions, two that are very minimal in size to support bare-bones recovery, and the last to contain the bulk of the app.

POST_OTA_update() appears to do something similar with OTA updates (juggling more than one OTA partition).

esp_err_t POST_OTA_update(httpd_req_t * req)
{
char buf[1000];
esp_ota_handle_t ota_handle;
int remaining = req->content_len;
const esp_partition_t * ota_partition = esp_ota_get_next_update_partition(NULL);
ESP_ERROR_CHECK(esp_ota_begin(ota_partition, OTA_SIZE_UNKNOWN, &ota_handle));
while (remaining > 0) {
int recv_len = httpd_req_recv(req, buf, MIN(remaining, sizeof(buf)));
// Timeout Error: Just retry
if (recv_len == HTTPD_SOCK_ERR_TIMEOUT) {
continue;
// Serious Error: Abort OTA
} else if (recv_len <= 0) {
httpd_resp_send_err(req, HTTPD_500_INTERNAL_SERVER_ERROR, "Protocol Error");
return ESP_FAIL;
}
// Successful Upload: Flash firmware chunk
if (esp_ota_write(ota_handle, (const void *) buf, recv_len) != ESP_OK) {
esp_ota_abort(ota_handle);
httpd_resp_send_err(req, HTTPD_500_INTERNAL_SERVER_ERROR, "Flash Error");
return ESP_FAIL;
}
remaining -= recv_len;
}
// Validate and switch to new OTA image and reboot
if (esp_ota_end(ota_handle) != ESP_OK || esp_ota_set_boot_partition(ota_partition) != ESP_OK) {
httpd_resp_send_err(req, HTTPD_500_INTERNAL_SERVER_ERROR, "Validation / Activation Error");
return ESP_FAIL;
}
httpd_resp_sendstr(req, "Firmware update complete, rebooting now!\n");
ESP_LOGI(TAG, "Restarting System because of Firmware update complete");
vTaskDelay(1000 / portTICK_PERIOD_MS);
esp_restart();
return ESP_OK;
}

tdb3 added a commit to tdb3/ESP-Miner that referenced this issue Jun 19, 2024
Adds a recovery web interface to enable users
to recover from a failed www.bin update.
Partial fix for Issue skot#213.
skot pushed a commit that referenced this issue Jun 20, 2024
Adds a recovery web interface to enable users
to recover from a failed www.bin update.
Partial fix for Issue #213.
tommywatson pushed a commit to tommywatson/ESP-Miner that referenced this issue Jun 20, 2024
Adds a recovery web interface to enable users
to recover from a failed www.bin update.
Partial fix for Issue skot#213.
skot added a commit that referenced this issue Jun 20, 2024
* Fixed fan speed web update #141

These changes fix fan rpm/percent requested and update both on the web

* fix readme

* refactor self_test to be modular for new hardware

* Supra 402 (#221)

* port TCH Supra 402 branch

* refactor TMP1075 (unused?) driver using i2c_master module

* pulled in @BitMaker-hub stratum_task.c DNS changes from PR #185

* removing serial debug

---------

Co-authored-by: Skot <skot@bitnet.cx>

* adjust share accpeted/rejected functions to take higher level GLOBAL_STATE to fix share accounting.

* Code clean resulting from looking into #218 (#220)

* Code clean resulting from looking into #218

* Fixed asic count

Set canary value for invalid device's asic_count

---------

Co-authored-by: tommy <tommy@tommywatson.com>

* fix another pointer error

* Changes efficiency metric display in AxeOS (#231)

Fixes #230

* try to explain nonce space duration from paramters (#228)

* try to explain nonce space duration from paramters

* Fix Nonce Space duration for BM1397 (no version-rolling)

* fixed issue with version mask on 1397. added easy serial debugging on 1397

* cleanup jobID debugs

---------

Co-authored-by: Skot <skot@bitnet.cx>

* Update bm1397.c to increase the max frequency to 650Mhz (#209)

* Update bm1397.c to increase the max frequency to 650Mhz

The original version was setting everything above 500Mhz to 500Mhz, the update increases the limit to 650Mhz.
No changes to the web interface - drop-down still shows up to 575Mhz

* Update edit.component.ts to include higher freqeuncy for BM1397

* Updated BM1397 frequencies to above 500Mhz

* Update bm1397.c

* Update bm1397.c

* UN-Update readme.md

* Update bm1397.c

* Update bm1397.c

* Update bm1397.c

* fix: add recovery page (#232)

Adds a recovery web interface to enable users
to recover from a failed www.bin update.
Partial fix for Issue #213.

* refactor: unify merge_bin scripts (#189)

Combines the functionality of merge_bin_update.sh
and merge_bin_with_config.sh into merge_bin.sh.
Also adds more verbose usage printing.

* fix: check www.bin size before updating (#216)

Adds a basic sanity check for www.bin uploading.
Returns 400 if upload is attempted on a file larger
than the available partition space.

---------

Co-authored-by: tommy <tommy@tommywatson.com>
Co-authored-by: Georges Palauqui <g.palauqui@gptechinno.com>
Co-authored-by: Skot <skot@bitnet.cx>
Co-authored-by: Nathan Day <87125117+dadofsambonzuki@users.noreply.github.com>
Co-authored-by: yanir99 <32940160+yanir99@users.noreply.github.com>
Co-authored-by: tdb3 <106488469+tdb3@users.noreply.github.com>
@WantClue
Copy link
Collaborator

WantClue commented Dec 1, 2024

With recovery page this is been closed as fixed.

@WantClue WantClue closed this as completed Dec 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants