Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Set initialization performance #3302

Merged
merged 15 commits into from
Jul 31, 2024

Conversation

jsiirola
Copy link
Member

@jsiirola jsiirola commented Jun 25, 2024

Fixes # .

Summary/Motivation:

When working on some very large models (100M+ variables), we identified some performance issues in how we were initializing Set objects. This PR resolves those performance bottlenecks:

import time
from pyomo.environ import *
start = time.time()
m = ConcreteModel()
m.I = Set(initialize=range(100000000))
print(time.time() - start)

is reduced from 91.3s to 13.2s, and a real model with a more complex set initializer (for 200M elements) is reduced from 305s to 69s.

One side effect is to remove the warning message for adding duplicate items to a Set (which brings the Pyomo Set behavior closer in line with the Python set behavior).

Changes proposed in this PR:

  • Rework Set initialization to make initializing large data sets more efficient
  • Remove the warning for adding duplicate elements to a Set

Legal Acknowledgement

By contributing to this software project, I have read the contribution guide and agree to the following terms and conditions for my contribution:

  1. I agree my contributions are submitted under the BSD license.
  2. I represent I am authorized to make the contributions and grant the license. If my employer has rights to intellectual property that includes these contributions, I represent that I have received permission to make contributions and grant the required license on behalf of that employer.

Copy link
Contributor

@emma58 emma58 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good. A few questions mostly out of curiosity!

pyomo/core/base/set.py Outdated Show resolved Hide resolved
Comment on lines +1732 to +1736
# Note that we reset _ordered_values within the loop because
# of an old example where the initializer rule makes
# reference to values previously inserted into the Set
# (which triggered the creation of the _ordered_values)
self._ordered_values = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, I'm still confused why you do this for each val though? What would be the difference with doing this above the loop?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The offending example is:

def U_init(model, z):
    if z == 6:
        return Set.End
    if z == 1:
        return 1
    else:
        return model.U[z - 1] * z

model.U = Set(ordered=True, initialize=U_init)

The issue is that when z == 2 the __getitem__ causes the _ordered_values list to be created with a single entry. If we don't clear _ordered_values, then it will stay a 1-element list, and when z == 3, looking for model.U[2] looks up _ordered_values[1]; running off the end of the list.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ewwww, this is horrifying! But okay, makes sense.

pyomo/core/base/set.py Show resolved Hide resolved
@emma58
Copy link
Contributor

emma58 commented Jul 9, 2024

(Also I am very excited about not warning for duplicate entries! :))

Copy link

codecov bot commented Jul 10, 2024

Codecov Report

Attention: Patch coverage is 98.13665% with 3 lines in your changes missing coverage. Please review.

Project coverage is 88.46%. Comparing base (17f8aed) to head (ba9706d).
Report is 98 commits behind head on main.

Files Patch % Lines
pyomo/core/base/set.py 98.13% 3 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #3302   +/-   ##
=======================================
  Coverage   88.46%   88.46%           
=======================================
  Files         867      867           
  Lines       98218    98250   +32     
=======================================
+ Hits        86886    86919   +33     
+ Misses      11332    11331    -1     
Flag Coverage Δ
linux 86.28% <98.13%> (+<0.01%) ⬆️
osx 75.57% <98.13%> (+<0.01%) ⬆️
other 86.47% <98.13%> (+<0.01%) ⬆️
win 83.77% <98.13%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@blnicho blnicho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a couple minor questions but otherwise this looks great!

pyomo/core/base/set.py Show resolved Hide resolved
Comment on lines 1475 to 1476
# It is important that the iterator is an actual iterator
val_iter = iter(val_iter)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed? Seems like val_iter is guaranteed to be an iterator from line 1399 unless you expect this method to be called from elsewhere.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch! Removed this, plus updated some other bugs / inconsistencies / documentation around flattening & initialization.

@blnicho blnicho merged commit 5c85b7c into Pyomo:main Jul 31, 2024
32 checks passed
@jsiirola jsiirola deleted the set-init-performance branch August 2, 2024 01:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

4 participants