Performance refactor for Schedule:File #8795

jmythms · 2021-06-02T05:03:28Z

Pull request overview

This pull request refactors the Schedule:File module in ScheduleManager.cc mainly focusing on improving run times.

The current Schedule:File module mainly has three steps:

Get input from IDF.
Parse CSV according to input from IDF from the last step.
Save parsed information to schedule data structure.

Out of these three, parsing the CSV is the most expensive step, because it involves reading information from potentially very big external files. This step also has the potential for generating maximum performance gain with a refactor, mainly because of problems with:

Repeating the file open-close operations for each Schedule:File object in the IDF
Repeating the unnecessary delimiter checks each time the schedule is run

We address both of these problems in one step by reading in information from the CSV file in one step. This is explained below:

Methodology

Current Method as in Develop:

It parses all Schedule:File in the following way:

For each Schedule:File Object, it
Opens file
Parses each line, scans the line looking for the specific column
Once it finds the correct column, it converts the token in that cell to a Real64 value and saves it to an array
Repeat till end of file
Close file
Repeat for next Schedule:File object in IDF

Fig 1.

This is how it looks in the above example:

Open file
Skip header row
Read the second line, process 1, skip 2,3,4,5,
Read the third line, process 6, skip 7,8,9,10
Repeat till end of file
Close file
Open the file again.
Skip header row
Read the second line, skip 1, process 2, skip 3,4,5
Read the third line, skip 6, process 7, skip 8,9,10
Repeat till end of file
Close file...

Repeat till all Schedule:File objects are processed

Proposed Method:

Scan IDF for all files associated with Schedule:File
For each file:
Find the schedule:File objects associated with it.
Open file
Read the row, process all tokens of interest in that specific row in the file to Real64.
Repeat for every row
Go till the end of the file
Repeat for every file

This is how it looks for the example:
0. Open file

Skip header row
Read first line, process 1,2,3,4,5
Read second line, process 6,7,8,9,10
Repeat for every row
Close file
Repeat for any other file

Advantages of this method:

1.For IDFs that have a large number of Schedule:File objects, decrease file open and file close operations significantly. These are very expensive IO operations.
2. Decrease the number of rows read, delimiter checks from N*(N+1)/2 to N. Where N is the number of columns in a file.

std::string_view

I tried to use std::string_view for the CSV parse part, since that will be the operation that is repeated most and it would be great to cut down the dynamic allocations made by string arrays. But I had to cast it to a string at some point, since replace_if needs a not-read-only object to replace things.

Performance Improvements

Percent SpeedUp for Schedule:File module and total simulation time when comparing this branch to develop is shown in Fig.2. Where percentage speed-up is calculated as (t_develop - t_branch)*100/t_branch.

Fig 2.

We observe that the percentage speed-up is greater as the size and number of schedules to be parsed become greater.
We also observe that the improvements in run-time don't affect the IDFs with a smaller number of schedules very much. This is because of two reasons:
i. The percentage of time spent on schedule manager is low, compared to the rest of the simulation as seen in Fig 3. Therefore, even huge improvements in the Schedule:Manager can only improve the total simulation run time by so much.
ii. The improvement in schedule manager simulation is also low since the file open-close operations are very less.

Fig 3.

Time measured using the Chrono library between points a and b.
The test IDFs (added in this commit ) were created by adding Schedule:File objects that used information from the SolarShadingTest_Shading_Data.csv. The schedule manager will process any Schedule:File object that is present in the IDF irrespective of whether it is used or not. This 'problem' was exploited to create this performance chart.

Going Forward:

This refactor seems to show a great improvement for bigger schedules. The File:Schedule:Shading module will greatly benefit from this methodology, mainly because the HybridZoneModel_shadow113.idf references the whole SolarShadingTest_Shading_Data.csv, and the average run time decreased from 17s to 7s on my local machine.
My local machine ran through the smaller schedule test files quickly. I think someone with a slower machine might find improvements (by shaving off a few seconds) in the smaller IDFs also. (Referencing the ERR files from CI here).
There were slight variations in the total simulation time for similar runs. The data shown in the graphs are averages of three runs, the complete data file is linked below. An interesting observation was that the simulation time slightly increased sometimes even when the time to run schedule manager was decreased. I thought about using callgrind here but it will not measure performance improvements from reducing file operations.
I don't understand why I am getting the unused variable warning.
There is an issue if someone pointed multiple schedule objects to the same CSV column. I set up a fatal error for this scenario. This would have worked with the code in the current develop.

Data Used For Graphs - Link

TO DO

Implement many to one mapping support.

NOTE: ENHANCEMENTS MUST FOLLOW A SUBMISSION PROCESS INCLUDING A FEATURE PROPOSAL AND DESIGN DOCUMENT PRIOR TO SUBMITTING CODE

Pull Request Author

Add to this list or remove from it as applicable. This is a simple templated set of guidelines.

Reviewer

This will not be exhaustively relevant to every PR.

jmythms · 2021-06-02T14:09:30Z

src/EnergyPlus/ScheduleManager.cc

-                    }
-                } else {
-                    state.dataScheduleMgr->Schedule(SchNum).ScheduleTypePtr = CheckIndex;
+


The main concept is that the processing moves from a Schedule:File object perspective to a CSV file perspective since we need to decrease file open and close operations as much as possible.

jmythms · 2021-06-02T14:14:31Z

src/EnergyPlus/ScheduleManager.hh

+
+    void PreProcessIDF(EnergyPlus::EnergyPlusData &state, int &SchNum, int NumCommaFileSchedules);
+
+    struct schedInputIdfPreprocessObject


Objects of this struct will hold the IDF file info about a particular Schedule:File that is read by the getObjectItem function. Implementing it like this hoping to bring a more object-oriented approach to this module.

jmythms · 2021-06-02T14:18:11Z

src/EnergyPlus/ScheduleManager.hh

@@ -355,6 +461,10 @@ struct ScheduleManagerData : BaseGlobalStruct
    Array1D<ScheduleManager::WeekScheduleData> WeekSchedule; // Week Schedule Storage
    Array1D<ScheduleManager::ScheduleData> Schedule;         // Schedule Storage

+    // Schedule:File variables
+    std::vector<ScheduleManager::schedInputIdfPreprocessObject> allIdfSchedData;


A vector of schedInputIdfPreprocessObject type. We will add the IDF data about each Schedule:File object to this vector as schedInputIdfPreprocessObject elements.

jmythms · 2021-06-02T14:23:23Z

src/EnergyPlus/ScheduleManager.cc

-                    state.dataScheduleMgr->Schedule(SchNum).ScheduleTypePtr = CheckIndex;
+
+        // Runs getObjectItem for Schedule:File and saves it to vector in state: allIdfSchedData
+        PreProcessIDF(state, SchNum, NumCommaFileSchedules);


I wrapped the getObjectItem function and all the variable assignments into this PreProcessIDF function. This will populate the allIdfSchedData vector. We can access every Schedule:File object data from here, since these have been saved as elements of the allIdfSchedData vector. The main module also looks cleaner.

jmythms · 2021-06-02T14:27:58Z

src/EnergyPlus/ScheduleManager.cc

+
+        // Because we are focusing on saving performance by decreasing number of file open-close operations, get set of filenames
+        std::set<std::string> setOfFilenames;
+        PopulateSetOfFilenames(state.dataScheduleMgr->allIdfSchedData, setOfFilenames);


This function runs through the IDF data and saves all the CSV file names to the std::set setOfFilenames. std::set makes sure that the list only has unique members in it.

jmythms · 2021-06-02T14:30:20Z

src/EnergyPlus/ScheduleManager.cc

+            if (setOfFilenames.find(idfObject.fileName) == setOfFilenames.end()) {
+                setOfFilenames.insert(idfObject.fileName);


If we do not find the file name in the set, add it to the set...

jmythms · 2021-06-02T14:38:37Z

src/EnergyPlus/ScheduleManager.hh

+
+    void PopulateSetOfFilenames(const std::vector<schedInputIdfPreprocessObject> &allIdfSchedData, std::set<std::string> &setOfFilenames);
+
+    struct PreProcessedColumn


Objects of this struct will have a vector (vals) that holds the actual schedule values from the CSV file.

mitchute · 2021-06-03T16:06:48Z

All it takes is 💵 💵 💵 .

Maybe it has to do with the fact that you're returning a std::string but the new function has a std::string_view return type?

mitchute · 2021-06-03T16:08:35Z

src/EnergyPlus/ScheduleManager.cc

@@ -231,7 +231,7 @@ namespace ScheduleManager {
        Array1D_bool AllDays(MaxDayTypes);
        Array1D_bool TheseDays(MaxDayTypes);
        bool ErrorHere;
-        int SchNum;
+        int SchNum{};


I'm curious why this is necessary if you're not actually initializing to a value?

It would automatically initialize it to 0 if I used the {}. I will change it to {0} for clarity 👍🏽

I see. Thanks for clarifying.

jmythms · 2021-06-03T16:14:29Z

All it takes is .

Maybe it has to do with the fact that you're returning a std::string but the new function has a std::string_view return type?

I will check it out.

Myoldmopar · 2021-06-03T16:23:36Z

Yup, that's it.

https://godbolt.org/z/TqfqWWoTz

@amirroth has requested that we make a pass through the code to add string_view to places which may involve adding specialized functions for string_view as an alternate to the existing Objexx methods. I believe this is one such case. Perhaps for the sake of this PR, we can avoid going to string_view right now but use this as a first example use to dig in to string_view. Or maybe in this case, string_view is not actually the right thing to use and it really should be a string itself here. I've still got to learn a little bit more to be able to answer that without looking at the documentation at the same time.

jmythms · 2021-06-03T16:50:38Z

Thanks for figuring it out so quickly. I will take out the string_view stuff for this PR 👍🏽 I did learn so much from it though!

Myoldmopar · 2021-06-21T21:00:02Z

@jmythms what's the status of this now? I see it is still a draft PR, so I won't dig in much, but at a minimum, CI is complaining about some unused variables. Let me know if you need anything from me here. This'll be a good one though!

nrel-bot-3 · 2021-07-22T00:45:06Z

@jmythms @lgentile it has been 28 days since this pull request was last updated.

nrel-bot · 2021-09-04T00:48:28Z

@jmythms @lgentile it has been 28 days since this pull request was last updated.

nrel-bot · 2021-10-15T00:17:24Z

@jmythms @lgentile it has been 29 days since this pull request was last updated.

jmythms · 2021-11-01T22:42:36Z

Closing this PR:

The problems intended to be fixed in this PR have already been addressed in PR#8996 and is part of Develop.

Reducing repeating the file open-close operations have been fixed.
Parsing CSV files line by line have also been implemented.

Running the HybridZoneModel_shadow113.idf file through the Clion Profiler, we see the CPU time spent on ProcessScheduleInput() has reduced from
73.98% - 7415d82 (Develop at the time of last update of this PR).
19% - This PR.
10% - b448a82 (Current Develop, which also includes additional improvements to ProcessNumber function, etc).

As the identified problems have already been solved, and reading the file is just taking ~10% CPU time of the simulation process for a 'large' CSV file, I propose to close this PR.

The small advantage I see from continuing this work is reducing the dependency of E+ on third-party libraries like nlohmann.
Since nlohmann is already used in the input processor since 2019, and the nlohmann license grants permission to modify it if necessary, I think it is too minor an advantage to pursue.

Open to discussion, if necessary.

@Myoldmopar

jmythms added 17 commits May 25, 2021 01:35

Move Schedule:Shading code to Schedule:File:Shading

c672e80

Initial commit

7c3eada

clang format

f7345b3

removed unused variables

3b0ad93

rearrange code and better comments

e8759fb

Fixed rowCountFld position

572729f

Merge branch 'develop' into schedManGoZoom

c3d9613

Clang format + remove temporary vars for debug

ddad276

moved File:Shading + Overloads for ProcessNum + Added checks

f68ec73

fixed unit tests

d81d886

clang format

9ed3f00

fix numerrors

88cc194

moved stripped back

43a26f3

Merge branch 'develop' into schedManGoZoom

69518f2

temp - added performance metric

70f3713

added stress test files

ca7692e

moved timer

0654156

jmythms added Performance Includes code changes that are directed at improving the runtime performance of EnergyPlus Refactoring Includes code changes that don't change the functionality of the program, just perform refactoring labels Jun 2, 2021

jmythms self-assigned this Jun 2, 2021

jmythms added this to the EnergyPlus 9.6 IOFreeze milestone Jun 2, 2021

jmythms added 2 commits June 2, 2021 07:55

removed time mesaurements

f06a6fc

removed temp test files

c1dc928

jmythms commented Jun 2, 2021

View reviewed changes

mitchute reviewed Jun 3, 2021

View reviewed changes

jmythms added 3 commits June 7, 2021 18:37

Fixed many to one map problem + changed string_view to string

5a4a046

cleaned up comments + clang format

5a60d62

Merge branch 'develop' into schedManGoZoom

5a8a285

jmythms added 9 commits June 22, 2021 10:14

fix unused var warning

bd12208

revert shading data.csv

f760a95

merge develop + conflict resolution

c2c88f8

conflict resolution

cc095be

removed ProcessNumber overload

b0acb76

removed string_view implementation in string.functions.hh

6d21e92

removed all string_view changes

5e6ef79

revert SolarShadingTest.csv

fe51879

Merge branch 'develop' into schedManGoZoom

5e8ed20

jmythms modified the milestones: EnergyPlus 9.6 IOFreeze, EnergyPlus 9.6 BugFix Freeze Aug 4, 2021

Myoldmopar modified the milestones: EnergyPlus 9.6 BugFix Freeze, EnergyPlus 9.6 Release Aug 6, 2021

jmythms modified the milestones: EnergyPlus 9.6 Release, EnergyPlus 2022.1 Sep 15, 2021

Myoldmopar closed this Nov 2, 2021

Myoldmopar deleted the schedManGoZoom branch November 2, 2021 14:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance refactor for Schedule:File #8795

Performance refactor for Schedule:File #8795

jmythms commented Jun 2, 2021 •

edited

Loading

jmythms Jun 2, 2021

jmythms Jun 2, 2021

jmythms Jun 2, 2021

jmythms Jun 2, 2021

jmythms Jun 2, 2021

jmythms Jun 2, 2021

jmythms Jun 2, 2021

mitchute commented Jun 3, 2021

mitchute Jun 3, 2021

jmythms Jun 3, 2021

mitchute Jun 3, 2021

jmythms commented Jun 3, 2021

Myoldmopar commented Jun 3, 2021

jmythms commented Jun 3, 2021

Myoldmopar commented Jun 21, 2021

nrel-bot-3 commented Jul 22, 2021

nrel-bot commented Sep 4, 2021

nrel-bot commented Oct 15, 2021

jmythms commented Nov 1, 2021


		void PreProcessIDF(EnergyPlus::EnergyPlusData &state, int &SchNum, int NumCommaFileSchedules);

		struct schedInputIdfPreprocessObject

		if (setOfFilenames.find(idfObject.fileName) == setOfFilenames.end()) {
		setOfFilenames.insert(idfObject.fileName);


		void PopulateSetOfFilenames(const std::vector<schedInputIdfPreprocessObject> &allIdfSchedData, std::set<std::string> &setOfFilenames);

		struct PreProcessedColumn

Performance refactor for Schedule:File #8795

Performance refactor for Schedule:File #8795

Conversation

jmythms commented Jun 2, 2021 • edited Loading

Pull request overview

Methodology

Current Method as in Develop:

Fig 1.

Proposed Method:

std::string_view

Performance Improvements

Fig 2.

Fig 3.

Going Forward:

Data Used For Graphs - Link

TO DO

Pull Request Author

Reviewer

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mitchute commented Jun 3, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jmythms commented Jun 3, 2021

Myoldmopar commented Jun 3, 2021

jmythms commented Jun 3, 2021

Myoldmopar commented Jun 21, 2021

nrel-bot-3 commented Jul 22, 2021

nrel-bot commented Sep 4, 2021

nrel-bot commented Oct 15, 2021

jmythms commented Nov 1, 2021

Closing this PR:

jmythms commented Jun 2, 2021 •

edited

Loading