Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue: Performance/Memory Challenges in Large Domains #3

Open
merddyin opened this issue Dec 19, 2018 · 8 comments
Open

Issue: Performance/Memory Challenges in Large Domains #3

merddyin opened this issue Dec 19, 2018 · 8 comments
Labels
enhancement New feature or request

Comments

@merddyin
Copy link

Issue: When running the tools against a domain with over 100k users, or large numbers of OUs and ACLs, script consumes upwards of 4GB or more of RAM, ~20% proc, and generally consumes all Disk IOPs which, when run from a laptop with 8GB of RAM, brought it to an unusable level. In the event the script consumes too much overhead, the system crashes, resulting in having to start completely over. On older domains (2008 R2), large queries can result in timeouts in the domain web services causing 'invalid enumeration context' errors.

Runtime Environment:
Windows 10 v1803
PowerShell v5.1
Module Versions - Latest of all dependent modules, pulled directly from GitHub

Expectations:
a) When possible, data for each area should be written out to files as it's pulled, and then cleared from memory, to minimize memory footprint.
b) Long running tasks, such as collecting OU ACLs, should have a progress bar to provide an indication of elapsed time and percent complete.
c) By making use of frequent data dumps to the file system, it should be possible to enable a resume in the event that a given run crashes the system, or some other factor causes the process to be interrupted.

Suggestion: It might be beneficial to pull a count of objects and performing some pre-analysis on the number of objects prior to initiation of a section run. For larger environments, it should be possible to break processing up into smaller chunks, perhaps using filters (ie found 100k objects, find all users with given name starting with a, b, or c, then do d,e,f, etc until all objects have been processed). This information can be dumped to disk, and then the files processed individually into the final formats at the end of the run. It may also be beneficial to switch from using the AD cmdlets to using the legacy ADSI interface in larger environments, since it doesn't have the same timeout limitations.

@PrzemyslawKlys
Copy link
Member

I've encountered similar issues on even smaller scale domains. I even added an option to save AD object to XML and after 24 hours of data gathering, it wasn't able to save an XML object to disk for 48 hours on more powerful machines.

So indeed there is a big problem with that one, and it needs to be addressed. I'm even wondering if building a large final object will require saving to disk every section as well.

One issue that you briefly mentioned is exporting all properties, and that is something that needs to be limited. But since I didn't know what people may want I didn't want to add it just yet. But recently I've been working on PSWinReporting where you get to define anything you want, in a language you want and was thinking to bring this approach into PSWinDocumentation:

https://github.com/EvotecIT/PSWinReporting/blob/9b351e5c7bf523704d4f5c458c22ef5b0155613c/Private/Parameters/Script.ReportDefinitions.ps1#L404-L570

The way it works is that on the left-hand side in Fields section you define fields that you want each type of events to get and the right side will be the one it will be displayed over. In the case of PSWinDocumentation, each section could have similar Fields section where each Type would have its definition. The biggest issue with such an approach is that some objects are related to each other. For example, I didn't want to ask AD multiple times for User's manager so I just ask once and then build Manager data from Users. My problem is that if people will start removing fields errors would occur.

I guess I could make 2 Fields sections, one that is static, required, and other which is optional that people can remove.

My main problem with any of my tools is design. I sometimes spend days building script and then rebuild config file multiple times because I can't get it right. Feel free to make suggestions or even submit Pull Requests :-) I'm happy to get help as clearly there are lots of issues and lots of ground to cover. If you can lay out how config file should, could look like in your opinion that would be great.

BTW. I do plan export to Dynamic HTML as well at some point but again... performance issue is a priority, that may even require a complete rewrite to some extent.

@merddyin
Copy link
Author

merddyin commented Dec 21, 2018 via email

@PrzemyslawKlys
Copy link
Member

Thanks for your feedback. I've some ideas on how to do things but it will take time. Feel free to jump in when you're comfortable enough.

Btw. you do know this module is able to export to SQL (MS SQL) right? Just like it does for Excel, it does the same magic with SQL so you can build your reporting based on that. I will add support to HTML (already working on PSWriteHTML module) which should help people use things as they want. There are few people having similar projects to mine like Brad Wyatt and if we join forces try to fix this one correctly it will be easier :-) 3 - 4 people working on one module, making sure it works from 10 to 200k users would be great.

@merddyin
Copy link
Author

merddyin commented Dec 21, 2018 via email

@PrzemyslawKlys
Copy link
Member

Feel free to provide whatever you can. This module is not just for me but for everyone. I already have some more stuff in my mind what data is needed so if you can provide code snippets as part of an issue feel free to do so. If I will get time I will incorporate it. If not.. you will do it yourself later on.

Generally, the goal for this tool is to have fully functional documentation & audit module in one. I want to go to the Client and be ready for anything. And I need it to just work.

So when you will have time to drop your ideas, drop problems, drop issues, and drop PR's when you have time. I willl rebuild part of module to make sure it's a bit modular and easier to work with.

@merddyin
Copy link
Author

merddyin commented Jan 10, 2019 via email

@PrzemyslawKlys
Copy link
Member

I'll address some of the stuff when I get around to it. I believe the first step is to mark code which is required to be run against DC as first. That way you get only data that is available on DC, rest can be built from data you gather once. Adding RunSpaces to execute code in parallel is possible (and I have know-how) however since most of your issues come from memory problems having multiple processes would be killing machine even more right?

I will start slowly rebuilding what I can. I am not sure I want to use OLEDB or go really down into deep hole.

I was wondering also if there's a way to ask for user count, computer count, group count in a way that would not trigger data download. Like Get-UserCount. Get-ComputerCount and so on. This would allow me to decide on which route to take when gathering information. I could code it so that there are 3 approaches to getting data - 1 standard for smaller domains, 1 extended and 1 that would try to drop everything to file every letter of the alphabet or so.

It's gonna be a long road to get it to a working place thou in large domains.

@PrzemyslawKlys PrzemyslawKlys transferred this issue from EvotecIT/PSWinDocumentation May 15, 2019
@PrzemyslawKlys
Copy link
Member

I believe this has been partly addressed with the newest release. I still need to rework group membership but otherwise, it should be fast.

@PrzemyslawKlys PrzemyslawKlys added the enhancement New feature or request label Jun 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants