Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizing Report Generation Runtime Using fmt Library's fmt::output_file API #3576

Closed
07anhad opened this issue Aug 8, 2023 · 4 comments
Closed
Labels

Comments

@07anhad
Copy link

07anhad commented Aug 8, 2023

Update

I also ran a small program in c++ for usage of both, system fprintf and fmt::output_file API but still could not see any difference in the runtime using the output_file API

For c++ program using fprintf -

#include <cstdio>
#include <iostream>
#include <chrono>
#include <cstring>

int main() {
    const char* filename = "large_file.txt";
    const char* content = "Test";

    auto start = std::chrono::high_resolution_clock::now();

    FILE* fp = std::fopen(filename, "wb");
    if (!fp) {
        std::cerr << "Error opening file: " << filename << std::endl;
        return 1;
    }   
    
    size_t iterations = 1e7;

    for (size_t i = 0; i < iterations; ++i) {
        std::fprintf(fp, "%s", content);
    }   

    std::fclose(fp);

    auto end = std::chrono::high_resolution_clock::now();
    auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count();

    std::cout << "Elapsed time: " << duration << " ms" << std::endl;

    return 0;
}
Output - 
[anhad fmt_tests]$ ./output_bench1 
Elapsed time: 319 ms
[anhad fmt_tests]$ 

For c++ programme using fmt::output_file API

#include "fmt/core.h"
#include "fmt/os.h"
#include "fmt/ostream.h"
#include "fmt/printf.h"
#include <iostream>
#include <chrono>

int main() {
    const char* content = "Test";

    const size_t bufferSize = 653539;
    auto start = std::chrono::high_resolution_clock::now();

    fmt::ostream out = fmt::output_file("filename.txt", fmt::buffer_size= bufferSize);
  
    size_t iterations = 1e7;

    for (size_t i = 0; i < iterations; ++i) {
        out.print("{}", content);
    }

    auto end = std::chrono::high_resolution_clock::now();
    auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count();

    std::cout << "Elapsed time: " << duration << " ms" << std::endl;
    
    return 0;
}

Output -

[anhad fmt_tests]$ ./output_fmt_bench
Elapsed time: 1745 ms
[anhad fmt_tests]$ 

Both the files which were populated have the same size -

[anhad fmt_tests]$ du -sh large_file.txt filename.txt
39M	large_file.txt
39M	filename.txt
[anhadp fmt_tests]$ 

Greetings,

I want to share a technical exploration I recently undertook in my C++ program aimed at enhancing the efficiency of the programme's runtime.

We were using fprintf to write to a file and sprintf for character conversion.

However, for performance improvement, I integrated the fmt library (version 10) into my C++ project. fmt::output_file API in place of fprintf and fmt::format_to API in place of sprintf My attention was particularly drawn to the fmt::output_file API, which promised a substantial runtime enhancement of 5-9 times compared to the conventional fprintf approach. To my intrigue, though, the anticipated performance gain was not readily evident in my runtime measurements.

  1. File Opening:
    I initiated the file for writing within a separate function using the fmt::ostream class along with the fmt::output_file API. The code snippet appeared as follows:
fmt::ostream out = fmt::output_file(filename, fmt::buffer_size);

The out object created through this approach was subsequently passed by reference to other functions.

  1. Buffer Size Control:
    I introduced a level of configurability to the buffer size by employing an environment variable. The snippet below demonstrates the dynamic adjustment of the buffer size based on the environment variable BUFFER_SIZE:
static int bufferSize = (bufferSizeEnv != nullptr) ? std::stoi(bufferSizeEnv) : 16384;

This enabled me to fine-tune the buffer size according to the specific needs of my application.

Despite setting up the fmt::output_file API and configuring the buffer size with different values ranging from 4k to 65k, I was surprised to find that there were no runtime improvements. The fmt library documentation suggested a significant boost, but I encountered a gap between the expected and observed outcomes.

Below is an example snippet of how this API was applied in my C++ code.

FILE* outputFileApiFunc(string type, string outputfile ,fmt::ostream& out)
{
   FILE *fp = NULL;
   fp = std::fopen(outputfile.c_str(),"w");
 
    if(type == "one" || type == "two" || type == "three" || type == "all")
     out.print("\n type found = {}\n", func1.c_str());
    else
      out.print("\n{} type not found\n\n",func2.c_str());
    out.print("output is           = {}\n", outputFunc.c_str());
   
  return fp;
}

When used the fmt::format_to API and a fmt::memory_buffer of FMT version 10 in place of system sprintf, I could see a gain of 10 mins in my runtime.

Below is an example snippet of how this fmt::format_to API was applied in my C++ code.

{
  fmt::memory_buffer bufferString;
File* fp = NULL;

case name:
            fmt::format_to(std::back_inserter(bufferString),"{:<15} ", "Dave");
            break;
case age:
            fmt::format_to(std::back_inserter(bufferString),"{:<30} ", 17);                                    
             break;

fmt::print(fp,bufferString.data());
}

So there was improvement in fmt::format_to API only and this result was expected from the output_file API as well.

Perhaps there are subtleties or bottlenecks that I might have overlooked. If any of you have experienced a similar phenomenon or possess an in-depth understanding of the inner workings of the fmt library, your expertise would be invaluable.

@vitaut
Copy link
Contributor

vitaut commented Aug 8, 2023

I wasn't able to repro your timings. On macOS with clang the fmt program is ~2x faster:

fmt:

% c++ -O2 -DNDEBUG -I include test-fmt.cc src/format.cc src/os.cc -std=c++17
% time ./a.out
Elapsed time: 142 ms
./a.out  0.14s user 0.01s system 34% cpu 0.431 total

printf:

% rm filename.txt
% c++ -O2 -DNDEBUG -I include test-printf.cc -std=c++17
% time ./a.out
Elapsed time: 298 ms
./a.out  0.25s user 0.05s system 54% cpu 0.558 total

My guess is that you are comparing a debug version of fmt with an optimized version of printf (since it is a part of the system C runtime).

The choice of the buffer size is questionable too. It should normally be a multiple of the page size.

@vitaut vitaut closed this as completed Aug 8, 2023
@vitaut vitaut added the question label Aug 8, 2023
@07anhad
Copy link
Author

07anhad commented Aug 9, 2023

Hi @vi

I wasn't able to repro your timings. On macOS with clang the fmt program is ~2x faster:

fmt:

% c++ -O2 -DNDEBUG -I include test-fmt.cc src/format.cc src/os.cc -std=c++17
% time ./a.out
Elapsed time: 142 ms
./a.out  0.14s user 0.01s system 34% cpu 0.431 total

printf:

% rm filename.txt
% c++ -O2 -DNDEBUG -I include test-printf.cc -std=c++17
% time ./a.out
Elapsed time: 298 ms
./a.out  0.25s user 0.05s system 54% cpu 0.558 total

My guess is that you are comparing a debug version of fmt with an optimized version of printf (since it is a part of the system C runtime).

The choice of the buffer size is questionable too. It should normally be a multiple of the page size.

Hi @vitaut

We are using the optimized version of fmt 10.0.0 which was downloaded from Github's FMT repository.

@vitaut
Copy link
Contributor

vitaut commented Aug 9, 2023

We don't distribute binaries, optimized or not so you need to check your build commands.

@vitaut
Copy link
Contributor

vitaut commented Oct 15, 2023

Likely related to #3674.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants