Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature request: Change Rc to Arc #29

Closed
adriandelgado opened this issue Apr 16, 2023 · 7 comments · Fixed by #32
Closed

feature request: Change Rc to Arc #29

adriandelgado opened this issue Apr 16, 2023 · 7 comments · Fixed by #32

Comments

@adriandelgado
Copy link
Contributor

Feature Request

In a previous PR (#27), I made the mistake to change the Strings inside CellType to Box<str> and Rc<str>. However, Rc is not Send. This means that if we write worksheets on different theads, we can't send them to a single thread to use Workbook::push_worksheet. This shouldn't change the performance characteristics of the crate.

@adriandelgado
Copy link
Contributor Author

This could be considered more than an inconvenience, in that case, maybe we should add a regression test to avoid this happening again.

@adriandelgado
Copy link
Contributor Author

adriandelgado commented Apr 16, 2023

image
I can't do what I have commented out. The variable names are in spanglish but I think the problem is clear.

@jmcnamara
Copy link
Owner

jmcnamara commented Apr 17, 2023

That seems reasonable. Go ahead and submit a PR for that.

As an aside I also want to look at multithreading the worksheet writing during file saving:

https://github.com/jmcnamara/rust_xlsxwriter/blob/main/src/packager.rs#L105

You will see a discussion on this in #1.

I'll look at it initially with some of the standard/built in threading but is it worth considering rayon like you are using and introducing another dependency?

@adriandelgado
Copy link
Contributor Author

adriandelgado commented Apr 17, 2023

The advantage of using rayon is that the code would be easy to understand/maintain. It can be added as an optional dependency. Consider that using rayon for parallelism is very common in the Rust ecosystem so it is likely that if someone is looking to add more parallelism, they already are using rayon.

We can do a lot of things using only the standard library, we don't need rayon. One nice feature of rayon is that it automatically adapts to the number of cores available and It becomes single threaded if there's only one core available. All of this means it is a tradeoff between ease of development vs having less dependencies.

One thing I should add is that if we zip all the parts in parallel we can obtain HUGE improvements. This can be done (I think) using ZipWriter::new_append.

I don't have much experience using the zip crate so I don't know if the new_append can be used to run things in parallel. The other option would be to bypass the zip crate and use the flate2 crate directly. There's a PR that tried to add parallelism using rayon directly into the zip crate but it got closed because of inactivity: zip-rs/zip-old#31.

Take a look at the potential parallelism in the app I'm currently working on:
image

@jmcnamara
Copy link
Owner

Take a look at the potential parallelism in the app I'm currently working on:

Wow.

I'll definitely look into adding parallelism in the back end. I never bothered in the C version because of portability issues but it seems wrong not to use Rust's strengths in this area.

@LeoniePhiline
Copy link

Thanks for #32!

The change to Rc broke my pipelines:

image

I am generating four worksheets worksheets in parallel tasks (tokio multi-threaded) like this:

    let mut workbook = rust_xlsxwriter::Workbook::new();
    let column_header_format = Arc::new(Format::new().set_bold());

    let (
        worksheet_institutions_weekly,
        worksheet_institutions_monthly,
        worksheet_subscriptions_weekly,
        worksheet_subscriptions_monthly,
    ) = tokio::try_join!(
        tokio::spawn(institution::export_institution_activity_archive_weekly(
            column_header_format.clone(),
            pool.clone()
        )),
        tokio::spawn(institution::export_institution_activity_archive_monthly(
            column_header_format.clone(),
            pool.clone()
        )),
        tokio::spawn(subscription::export_subscription_activity_archive_weekly(
            column_header_format.clone(),
            pool.clone()
        )),
        tokio::spawn(subscription::export_subscription_activity_archive_monthly(
            column_header_format.clone(),
            pool.clone()
        )),
    )?;

    workbook.push_worksheet(
        worksheet_institutions_weekly
            .wrap_err("failed to export weekly institution activity to report worksheet")?,
    );
    workbook.push_worksheet(
        worksheet_institutions_monthly
            .wrap_err("failed to export monthly institution activity to report worksheet")?,
    );
    workbook.push_worksheet({
        // Set worksheet "subscription activity per week" as the active tab.
        let mut worksheet = worksheet_subscriptions_weekly
            .wrap_err("failed to export weekly subscription activity to report worksheet")?;
        worksheet.set_active(true);
        Ok::<_, Report>(worksheet)
    }?);
    workbook.push_worksheet(
        worksheet_subscriptions_monthly
            .wrap_err("failed to export monthly subscription activity to report worksheet")?,
    );

    let mut file_path = output_path.clone();
    let file_name = Utc::now()
        .with_timezone(&Amsterdam)
        // TODO: File from arg / env!
        .format("%G-W%V activity report.xlsx")
        .to_string();

    file_path.push(file_name);

    workbook
        .save(&file_path)
        .wrap_err("failed to save report workbook file to disk")?;
    info!("Workbook written to '{}'", file_path.display());

Can't wait for the 0.36 release :)

@jmcnamara
Copy link
Owner

Can't wait for the 0.36 release :)

No problem. The fix is now available on crates.io. However, I messed up the release and had to release a patch release of v0.36.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants