why does fmt::format_to_n perform so much worse than fmt::format_to #3484

mentalmap · 2023-06-12T07:55:13Z

fmt Version:

10.0.0

Benchmark:

#include <benchmark/benchmark.h>
#include <fmt/chrono.h>
#include <fmt/compile.h>
#include <fmt/core.h>
#include <fmt/format.h>

static void BM_format_to(benchmark::State &state) {
  char out[1024] = {0};
  auto format = FMT_COMPILE("{} - {} - {} - {} - {} - {} - {} - {} - {} - {}");

  for (auto _ : state) {
    fmt::format_to(out, format, "abcdef", 12345, "abcdef", 12345, "abcdef", 12345, "abcdef", 12345,
                   "abcdef", 12345);
  }
}
BENCHMARK(BM_format_to);

static void BM_format_to_n(benchmark::State &state) {
  char out[1024] = {0};
  auto format = FMT_COMPILE("{} - {} - {} - {} - {} - {} - {} - {} - {} - {}");

  for (auto _ : state) {
    fmt::format_to_n(out, sizeof(out), format, "abcdef", 12345, "abcdef", 12345, "abcdef", 12345,
                     "abcdef", 12345, "abcdef", 12345);
  }
}
BENCHMARK(BM_format_to_n);

BENCHMARK_MAIN();

Result:

Running ./benchmark_format_to.fmt10
Run on (16 X 2595.12 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x8)
  L1 Instruction 32 KiB (x8)
  L2 Unified 4096 KiB (x8)
  L3 Unified 16384 KiB (x2)
Load Average: 1.17, 1.10, 1.59
---------------------------------------------------------
Benchmark               Time             CPU   Iterations
---------------------------------------------------------
BM_format_to         78.3 ns         78.3 ns      8849844
BM_format_to_n        564 ns          564 ns      1242982

vitaut · 2023-06-12T08:16:17Z

format_to_n hasn't been optimized for format string compilation yet. The default (not compiled) API is actually much faster in this case:

Run on (8 X 2300 MHz CPU s)
CPU Caches:
  L1 Data 49K (x4)
  L1 Instruction 32K (x4)
  L2 Unified 524K (x4)
  L3 Unified 8388K (x1)
Load Average: 3.99, 3.63, 3.01
---------------------------------------------------------
Benchmark               Time             CPU   Iterations
---------------------------------------------------------
BM_format_to         97.2 ns         96.9 ns      6949063
BM_format_to_n        227 ns          227 ns      2990584

An easy way to optimize format_to_n would be by applying the same buffering to the compiled API:

fmt/include/fmt/core.h

Lines 2794 to 2802 in de0757b

    
           template <typename OutputIt, typename... T, 
        
                     FMT_ENABLE_IF(detail::is_output_iterator<OutputIt, char>::value)> 
        
           auto vformat_to_n(OutputIt out, size_t n, string_view fmt, format_args args) 
        
               -> format_to_n_result<OutputIt> { 
        
             using traits = detail::fixed_buffer_traits; 
        
             auto buf = detail::iterator_buffer<OutputIt, char, traits>(out, n); 
        
             detail::vformat_to(buf, fmt, args, {}); 
        
             return {buf.out(), buf.count()}; 
        
           }

A PR would be welcome.

vitaut · 2023-07-20T19:54:04Z

Applied the optimization in 436c131 which gave ~2x speedup on the given benchmark (tested on macOS with M1 and clang). Compared to your original timing the improvement is even larger possibly due to some other changes.

Before:

Run on (8 X 24.1212 MHz CPU s)
CPU Caches:
  L1 Data 65K (x8)
  L1 Instruction 131K (x8)
  L2 Unified 4194K (x4)
Load Average: 1.70, 2.40, 2.43
---------------------------------------------------------
Benchmark               Time             CPU   Iterations
---------------------------------------------------------
BM_format_to         75.5 ns         75.5 ns      7828927
BM_format_to_n        317 ns          317 ns      2210356

After:

Run on (8 X 24.1211 MHz CPU s)
CPU Caches:
  L1 Data 65K (x8)
  L1 Instruction 131K (x8)
  L2 Unified 4194K (x4)
Load Average: 6.92, 4.27, 3.17
---------------------------------------------------------
Benchmark               Time             CPU   Iterations
---------------------------------------------------------
BM_format_to         75.5 ns         75.5 ns      8163741
BM_format_to_n        165 ns          165 ns      4229658

mentalmap · 2023-07-27T06:58:01Z

fmt 10.0.0:

2023-07-27T14:55:22+08:00
Running ./benchmark.fmt10
Run on (16 X 2595.12 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x8)
  L1 Instruction 32 KiB (x8)
  L2 Unified 4096 KiB (x8)
  L3 Unified 16384 KiB (x2)
Load Average: 0.57, 0.69, 0.30
---------------------------------------------------------
Benchmark               Time             CPU   Iterations
---------------------------------------------------------
BM_format_to         78.9 ns         78.9 ns      8881746
BM_format_to_n        568 ns          568 ns      1232089

fmt master:

2023-07-27T14:55:28+08:00
Running ./benchmark.fmt_master
Run on (16 X 2595.12 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x8)
  L1 Instruction 32 KiB (x8)
  L2 Unified 4096 KiB (x8)
  L3 Unified 16384 KiB (x2)
Load Average: 0.52, 0.67, 0.30
---------------------------------------------------------
Benchmark               Time             CPU   Iterations
---------------------------------------------------------
BM_format_to         54.9 ns         54.9 ns     12727944
BM_format_to_n        133 ns          133 ns      5257795

👍

vitaut · 2023-07-28T04:56:18Z

Thanks for testing.

vitaut closed this as completed Jul 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

why does fmt::format_to_n perform so much worse than fmt::format_to #3484

why does fmt::format_to_n perform so much worse than fmt::format_to #3484

mentalmap commented Jun 12, 2023

vitaut commented Jun 12, 2023 •

edited

Loading

vitaut commented Jul 20, 2023 •

edited

Loading

mentalmap commented Jul 27, 2023

vitaut commented Jul 28, 2023

why does fmt::format_to_n perform so much worse than fmt::format_to #3484

why does fmt::format_to_n perform so much worse than fmt::format_to #3484

Comments

mentalmap commented Jun 12, 2023

fmt Version:

Benchmark:

Result:

vitaut commented Jun 12, 2023 • edited Loading

vitaut commented Jul 20, 2023 • edited Loading

mentalmap commented Jul 27, 2023

vitaut commented Jul 28, 2023

vitaut commented Jun 12, 2023 •

edited

Loading

vitaut commented Jul 20, 2023 •

edited

Loading