Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate performance of config loading for big projects #3893

Closed
astrojuanlu opened this issue May 25, 2024 · 22 comments
Closed

Investigate performance of config loading for big projects #3893

astrojuanlu opened this issue May 25, 2024 · 22 comments
Assignees

Comments

@astrojuanlu
Copy link
Member

Description

Earlier this week a user reached out to me in private saying that it was taking 3 minutes for Kedro to load their configuration (KedroContext._get_catalog).

Today another user mentioned that "Looking at the logs, it gets stuck at the kedro.config.module for more than 50% of the pipeline run duration, but we do have a lot of inputs and outputs"

I still don't have specific reproducers, but I'm noticing enough qualitative evidence to open an issue about it.

@datajoely
Copy link
Contributor

I'd like to see us add a CLI command which users can run to produce a flamegraph. It would massively reduce the guesswork here.

kedro profile {kedro command} -> .html / .bin

@yury-fedotov
Copy link
Contributor

I'd like to see us add a CLI command which users can run to produce a flamegraph. It would massively reduce the guesswork here.

kedro profile {kedro command} -> .html / .bin

@datajoely flamegraph for the entire pipeline run (how much time each node takes) or just the config resolution / pipeline initialization?

@datajoely
Copy link
Contributor

In my mind, it would run the whole command as normal, but also generate the profiling data.

Perhaps if we were to take this seriously, a full on memray integration would incredible.

@astrojuanlu
Copy link
Member Author

Continuing the discussion on creating custom commands here #3908

@astrojuanlu
Copy link
Member Author

astrojuanlu commented Jul 2, 2024

Many users have been complaining about the slowness of Kedro with big projects and that can be attributed to many different causes. However one of the most prevailing cause is big parameter files that get expanded into hundreds of datasets on their own. That process takes a lot of time and if the files become too big (a couple of MB), it presents as significant slowdown.

Originally posted by @idanov in #3732 (comment)

The solution works, but couples the DataCatalog with OmegaConf is still under review.

From the discussion in the PR:

Shouldn't we redesign the DataCatalog API instead so that parameters are first class citizens, and not fake datasets?

There were a few thumbs up to the idea, and it was brought up again in #3973 (@datajoely please do confirm that this is what you had in mind 😄)

@merelcht pointed out that there's a pending research item on how users use parameters and for what #2240

@ElenaKhaustova agreed that this is relevant in the context of the ongoing DataCatalog API redesign #3934.

Ideally, if there's a way we can tackle this issue without blocking it on #2240, the time to look at it would be now. But I have very little visibility on what are the implications, or whether we would actually solve the performance problem at all. So, leaving the decision to the team.

@merelcht
Copy link
Member

merelcht commented Jul 2, 2024

The solution works, but couples the DataCatalog with OmegaConf

Would you really call this coupling? The way I read it is that is uses omegaconf to parse the parameters config. We already have a dependency on omegaconf anyway, and I actually quite like that we can leverage it in more places than just the OmegaConfigLoader itself. I would have called it coupling if it uses the actual OmegaConfigLoader class, but this just imports the library.

@astrojuanlu
Copy link
Member Author

Sorry to keep moving the conversation but I'd rather not discuss the specifics of a particular solution outside the corresponding PR, addressed your question in context at #3732 (comment)

@astrojuanlu
Copy link
Member Author

Now that we're working on this, some goals for this ticket:

  • understand under what circumstances omegaconf becomes the dominant bottleneck of loading configuration
  • find where the hotspots are, in terms of functions called (function profiling) but also where in the Kedro code are they being called (line profiling)
  • understand the scaling properties, for example
    • what happens with 1 config file with 10, 100, 1 000 variables?
    • what happens with 1 config file with 10, 100, 1 000 dataset factories?
    • what happens with 10 config files with a given pattern, 100, 1 000?
      • in other words: catalog1.yml, catalog2.yml, ..., catalog1000.yml when the pattern is catalog*.yml
    • are all the scaling laws above linear? are some of them superlinear?
  • in the absence of additional context from the original reporter (reached out to them by email), how many files, variables etc are needed to reach 3 minutes of config loading on modern hardware?

The outcomes should be

  • an analysis of what functions dominate with "big" configuration
  • an analysis of what lines of code in the Kedro codebase dominate with "big" configuration
  • scatter plots that showcase the scaling properties of configuration loading with respect to different properties as outlined above, see Low performance of pipeline sums #3167 (comment) for an example

Hopefully by the end of the analysis we should either have

  • a clear recommendation of what part of the code we can optimise, or
  • a mandate to look for something faster than omegaconf 😬

@noklam
Copy link
Contributor

noklam commented Oct 30, 2024

@astrojuanlu My worry is that we will likely find nothing actionable unless there is a project that is actually slow with OmegaConfigLoader. With the benchmark result, it seems that it is reasonable fast.

We can still do the exponential scaling (not necessary a combination of all) to better understand the performance of configloader (this should probably move into the benchmark once done):

  1. Scaling with number of files, i.e. file1, file2, file - maybe issues with globbing or related to small files
  2. Scaling number of entry (benchmark covers already), we can increase the number of entries see how well it scales.
  3. #number of factories

The result of this should be a table (one axis being the thing to be test, the other axis is the number of entry) + profiling

@deepyaman
Copy link
Member

  • a mandate to look for something faster than omegaconf 😬

I had started working on something for fun a few months ago to solve this (potential) problem. So if you can find cases where omegaconf is slow, I'd be very interested. 😉

@datajoely
Copy link
Contributor

Did someone hear a 🦀 walking?

@ravi-kumar-pilla
Copy link
Contributor

ravi-kumar-pilla commented Nov 6, 2024

Hi Team,

As suggested by @astrojuanlu and @noklam , I tried creating stress test scripts and analyze how OmegaConfigLoader scales. You can find the test scripts here under kedro/kedro_benchmarks/temp_investigate_ocl. I used line_profiler and kernprof for the analysis. Used matplotlib.pyplot for plotting.

Machine used:

image

1. Single Catalog file with increasing variable interpolations -

OmegaConfigLoader Scaling Properties with Variable Interpolations

Catalog with 10 datasets with variable interpolation 10

Line Profiler - kernprof -lvr --unit 1 ocl_plot_variables.py
Total time: 265.218 s
File: /KedroOrg/kedro/kedro/config/omegaconf_config.py
Function: load_and_merge_dir_config at line 272

Line #     Hits    Time    Per Hit    % Time                   Line Contents
326        16        103.0      6.4       38.8                      config = OmegaConf.load(tmp_fo) 
353        32        134.1      4.2        50.6                 for k, v in OmegaConf.to_container(                                                   
354        16         28.0      1.8         10.6                  OmegaConf.merge(*aggregate_config), resolve=True  

2. Single Catalog file without variable interpolations -

Catalog with 1000 datasets without variable interpolation

with 10 datasets without variable interpolation 10

Line profiler - kernprof -lvr --unit 1 ocl_plot_datasets.py

Total time: 50.2196 s
File: /KedroOrg/kedro/kedro/config/omegaconf_config.py
Function: load_and_merge_dir_config at line 272

Line #      Hits     Time    Per Hit   % Time                  Line Contents
 326        16         37.6      2.3        74.9                   config = OmegaConf.load(tmp_fo)  
 353        32          2.4      0.1         4.8                    for k, v in OmegaConf.to_container(                                                   
 354        16         10.1      0.6         20.1                  OmegaConf.merge(*aggregate_config), resolve=True       

3. Multiple catalog files following catalog* pattern -

Pasted Graphic 13

Conf source with 10 catalog files 10

Line Profiler - kernprof -lvr --unit 1 ocl_plot_multifile.py

Total time: 106.144 s
File: /KedroOrg/kedro/kedro/config/omegaconf_config.py
Function: load_and_merge_dir_config at line 272

Line #      Hits         Time   Per Hit    % Time                Line Contents
322      3615         16.8       0.0          15.8                  with self._fs.open(str(config_filepath.as_posix())) as open_config: 
326      3615         58.3      0.0           54.9                 config = OmegaConf.load(tmp_fo)
354        10         19.7         2.0           18.5                  OmegaConf.merge(*aggregate_config), resolve=True    

Summary: Below are the methods which take most of the time when resolving catalog. All of them are part of load_and_merge_dir_config function.

  1. OmegaConf.load
  2. OmegaConf.to_container
  3. OmegaConf.merge

All of these are from OmegaConf module which we use under the hood. So based on the above analysis, we could try alternatives to OmegaConf to have better performance (I am not sure if there are any better alternatives. I found hydra which again uses OmegaConf under the hood.)

Thank you !

@datajoely
Copy link
Contributor

I'm not sure if it's worth the engineering overhead, but its cool

https://github.com/SergioBenitez/Figment

@datajoely
Copy link
Contributor

@astrojuanlu
Copy link
Member Author

Thanks a lot for the analysis @ravi-kumar-pilla , the analysis looks good. Looks like everything scales linearly.

We're still waiting for feedback from a user that was struggling with high latency.

@noklam
Copy link
Contributor

noklam commented Nov 7, 2024

Thanks @ravi-kumar-pilla , this aligns with the result here https://kedro-org.github.io/kedro-benchmark-results/#benchmark_ocl.TimeOmegaConfigLoader.time_loading_catalog

This looks reasonably fast enough (1000 datasets in 1 second). Let's wait for the feedback from the user.

@merelcht
Copy link
Member

merelcht commented Nov 7, 2024

Great analysis, thanks @ravi-kumar-pilla ! Is there by any chance any parts of the Kedro code that are also introducing latency or is it only the omegaconf stuff?

And for number 3 Multiple catalog files following catalog* pattern, did these catalogs have variable interpolation or not? This one is clearly the slowest, but any project with close to 100 catalogs sounds a bit crazy 😅 I think it's reasonable to assume that's not a very realistic setup and the timings for around 10 catalogs are still reasonable IMO.

@merelcht
Copy link
Member

merelcht commented Nov 7, 2024

I'm also curious to hear how you think we can add this kind of profiling to the QA/benchmarking tests @ravi-kumar-pilla ? And did you find the newly added benchmarking setup useful for this kind of testing at all?

@ravi-kumar-pilla
Copy link
Contributor

Is there by any chance any parts of the Kedro code that are also introducing latency or is it only the omegaconf stuff?

I did use pyinstrument and speedscope to analyze the overall behavior. Most of the time was spent on load_and_merge_dir_config function. I will attach those results in some time.

And for number 3 Multiple catalog files following catalog* pattern, did these catalogs have variable interpolation or not?

Number 3 was without variable interpolations (I think it would take more time with variable interpolations based on the individual test).

This one is clearly the slowest, but any project with close to 100 catalogs sounds a bit crazy 😅 I think it's reasonable to assume that's not a very realistic setup and the timings for around 10 catalogs are still reasonable IMO.

Yes 100 catalogs is crazy simulation which might not happen in real. I think the overall behavior of OmegaConf was reasonable.

Thank you

@ravi-kumar-pilla
Copy link
Contributor

I'm also curious to hear how you think we can add this kind of profiling to the QA/benchmarking tests @ravi-kumar-pilla ? And did you find the newly added benchmarking setup useful for this kind of testing at all?

The benchmark setup does show similar performance trajectory and is useful for testing these cases. We can definitely iterate and add more use cases of generating the catalog.

@ravi-kumar-pilla
Copy link
Contributor

Hi Team,

Based on the materials we received, it is evident that the bottleneck is OmegaConf usage (to_container method). Testing this locally, we observed that the method takes time in resolving variable interpolations i.e., if the catalog contains any global or variable references and if the referenced file has complex hierarchical structure then the method get_node_value in omegaconf/basecontainer.py takes considerably long time which is called inside to_container. Please find below observations -

image

image

These observations are in-line with the above benchmark test plots. OmegaConf seems to be the bottleneck in resolving complex variable references. Happy to hear any suggestions. Thank you

@ravi-kumar-pilla
Copy link
Contributor

Closing this as the investigation is completed. Opened a follow up issue to improve time taken by OmegaConf in resolving global interpolations here

@github-project-automation github-project-automation bot moved this from In Progress to Done in Kedro Framework Nov 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

No branches or pull requests

7 participants