-
Notifications
You must be signed in to change notification settings - Fork 394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance refactor for Schedule:File #8795
Conversation
} | ||
} else { | ||
state.dataScheduleMgr->Schedule(SchNum).ScheduleTypePtr = CheckIndex; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main concept is that the processing moves from a Schedule:File object perspective to a CSV file perspective since we need to decrease file open and close operations as much as possible.
|
||
void PreProcessIDF(EnergyPlus::EnergyPlusData &state, int &SchNum, int NumCommaFileSchedules); | ||
|
||
struct schedInputIdfPreprocessObject |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Objects of this struct will hold the IDF file info about a particular Schedule:File that is read by the getObjectItem function. Implementing it like this hoping to bring a more object-oriented approach to this module.
@@ -355,6 +461,10 @@ struct ScheduleManagerData : BaseGlobalStruct | |||
Array1D<ScheduleManager::WeekScheduleData> WeekSchedule; // Week Schedule Storage | |||
Array1D<ScheduleManager::ScheduleData> Schedule; // Schedule Storage | |||
|
|||
// Schedule:File variables | |||
std::vector<ScheduleManager::schedInputIdfPreprocessObject> allIdfSchedData; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A vector of schedInputIdfPreprocessObject type. We will add the IDF data about each Schedule:File object to this vector as schedInputIdfPreprocessObject elements.
state.dataScheduleMgr->Schedule(SchNum).ScheduleTypePtr = CheckIndex; | ||
|
||
// Runs getObjectItem for Schedule:File and saves it to vector in state: allIdfSchedData | ||
PreProcessIDF(state, SchNum, NumCommaFileSchedules); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrapped the getObjectItem function and all the variable assignments into this PreProcessIDF function. This will populate the allIdfSchedData vector. We can access every Schedule:File object data from here, since these have been saved as elements of the allIdfSchedData vector. The main module also looks cleaner.
|
||
// Because we are focusing on saving performance by decreasing number of file open-close operations, get set of filenames | ||
std::set<std::string> setOfFilenames; | ||
PopulateSetOfFilenames(state.dataScheduleMgr->allIdfSchedData, setOfFilenames); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function runs through the IDF data and saves all the CSV file names to the std::set setOfFilenames. std::set makes sure that the list only has unique members in it.
if (setOfFilenames.find(idfObject.fileName) == setOfFilenames.end()) { | ||
setOfFilenames.insert(idfObject.fileName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do not find the file name in the set, add it to the set...
|
||
void PopulateSetOfFilenames(const std::vector<schedInputIdfPreprocessObject> &allIdfSchedData, std::set<std::string> &setOfFilenames); | ||
|
||
struct PreProcessedColumn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Objects of this struct will have a vector (vals) that holds the actual schedule values from the CSV file.
All it takes is 💵 💵 💵 . Maybe it has to do with the fact that you're returning a |
@@ -231,7 +231,7 @@ namespace ScheduleManager { | |||
Array1D_bool AllDays(MaxDayTypes); | |||
Array1D_bool TheseDays(MaxDayTypes); | |||
bool ErrorHere; | |||
int SchNum; | |||
int SchNum{}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious why this is necessary if you're not actually initializing to a value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would automatically initialize it to 0 if I used the {}
. I will change it to {0}
for clarity 👍🏽
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Thanks for clarifying.
I will check it out. |
Yup, that's it. https://godbolt.org/z/TqfqWWoTz @amirroth has requested that we make a pass through the code to add string_view to places which may involve adding specialized functions for string_view as an alternate to the existing Objexx methods. I believe this is one such case. Perhaps for the sake of this PR, we can avoid going to string_view right now but use this as a first example use to dig in to string_view. Or maybe in this case, string_view is not actually the right thing to use and it really should be a string itself here. I've still got to learn a little bit more to be able to answer that without looking at the documentation at the same time. |
Thanks for figuring it out so quickly. I will take out the string_view stuff for this PR 👍🏽 I did learn so much from it though! |
@jmythms what's the status of this now? I see it is still a draft PR, so I won't dig in much, but at a minimum, CI is complaining about some unused variables. Let me know if you need anything from me here. This'll be a good one though! |
Closing this PR:The problems intended to be fixed in this PR have already been addressed in PR#8996 and is part of Develop.
Running the HybridZoneModel_shadow113.idf file through the Clion Profiler, we see the CPU time spent on ProcessScheduleInput() has reduced from As the identified problems have already been solved, and reading the file is just taking ~10% CPU time of the simulation process for a 'large' CSV file, I propose to close this PR. The small advantage I see from continuing this work is reducing the dependency of E+ on third-party libraries like nlohmann. Open to discussion, if necessary. |
Pull request overview
This pull request refactors the Schedule:File module in ScheduleManager.cc mainly focusing on improving run times.
The current Schedule:File module mainly has three steps:
Out of these three, parsing the CSV is the most expensive step, because it involves reading information from potentially very big external files. This step also has the potential for generating maximum performance gain with a refactor, mainly because of problems with:
We address both of these problems in one step by reading in information from the CSV file in one step. This is explained below:
Methodology
Current Method as in Develop:
It parses all Schedule:File in the following way:
Fig 1.
This is how it looks in the above example:
Repeat till all Schedule:File objects are processed
Proposed Method:
This is how it looks for the example:
0. Open file
Advantages of this method:
1.For IDFs that have a large number of Schedule:File objects, decrease file open and file close operations significantly. These are very expensive IO operations.
2. Decrease the number of rows read, delimiter checks from N*(N+1)/2 to N. Where N is the number of columns in a file.
std::string_view
I tried to use std::string_view for the CSV parse part, since that will be the operation that is repeated most and it would be great to cut down the dynamic allocations made by string arrays. But I had to cast it to a string at some point, since replace_if needs a not-read-only object to replace things.
Performance Improvements
Percent SpeedUp for Schedule:File module and total simulation time when comparing this branch to develop is shown in Fig.2. Where percentage speed-up is calculated as (t_develop - t_branch)*100/t_branch.
Fig 2.
We observe that the percentage speed-up is greater as the size and number of schedules to be parsed become greater.
We also observe that the improvements in run-time don't affect the IDFs with a smaller number of schedules very much. This is because of two reasons:
i. The percentage of time spent on schedule manager is low, compared to the rest of the simulation as seen in Fig 3. Therefore, even huge improvements in the Schedule:Manager can only improve the total simulation run time by so much.
ii. The improvement in schedule manager simulation is also low since the file open-close operations are very less.
Fig 3.
Time measured using the Chrono library between points a and b.
The test IDFs (added in this commit ) were created by adding Schedule:File objects that used information from the SolarShadingTest_Shading_Data.csv. The schedule manager will process any Schedule:File object that is present in the IDF irrespective of whether it is used or not. This 'problem' was exploited to create this performance chart.
Going Forward:
Data Used For Graphs - Link
TO DO
NOTE: ENHANCEMENTS MUST FOLLOW A SUBMISSION PROCESS INCLUDING A FEATURE PROPOSAL AND DESIGN DOCUMENT PRIOR TO SUBMITTING CODE
Pull Request Author
Add to this list or remove from it as applicable. This is a simple templated set of guidelines.
Reviewer
This will not be exhaustively relevant to every PR.