-
-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue: Performance/Memory Challenges in Large Domains #3
Comments
I've encountered similar issues on even smaller scale domains. I even added an option to save AD object to XML and after 24 hours of data gathering, it wasn't able to save an XML object to disk for 48 hours on more powerful machines. So indeed there is a big problem with that one, and it needs to be addressed. I'm even wondering if building a large final object will require saving to disk every section as well. One issue that you briefly mentioned is exporting all properties, and that is something that needs to be limited. But since I didn't know what people may want I didn't want to add it just yet. But recently I've been working on PSWinReporting where you get to define anything you want, in a language you want and was thinking to bring this approach into PSWinDocumentation: The way it works is that on the left-hand side in Fields section you define fields that you want each type of events to get and the right side will be the one it will be displayed over. In the case of PSWinDocumentation, each section could have similar Fields section where each Type would have its definition. The biggest issue with such an approach is that some objects are related to each other. For example, I didn't want to ask AD multiple times for User's manager so I just ask once and then build Manager data from Users. My problem is that if people will start removing fields errors would occur. I guess I could make 2 Fields sections, one that is static, required, and other which is optional that people can remove. My main problem with any of my tools is design. I sometimes spend days building script and then rebuild config file multiple times because I can't get it right. Feel free to make suggestions or even submit Pull Requests :-) I'm happy to get help as clearly there are lots of issues and lots of ground to cover. If you can lay out how config file should, could look like in your opinion that would be great. BTW. I do plan export to Dynamic HTML as well at some point but again... performance issue is a priority, that may even require a complete rewrite to some extent. |
Thanks so much for getting back to me. I was actually intending to go back and update one of my logged items, as I was able to chase down the source of at least part of the issue, though I encountered another as the Word document didn't generate, even though the Excel file did.
In this particular domain I am pulling information for, not only is it fairly large (more than 180k user objects), but it is a 2008 R2 domain and the company has deployed a number of custom attributes to AD. I was able to retrieve data for all of the other domains without error, but the big one kept breaking for groups, computers, and users, though it got all the other data fine. The issue ended up being two-fold;
- The large size of the environment, when paired with retrieving all attributes, results in a query that takes longer than the allowed 30 minute timeout on the AD web service that the AD cmdlets uses. You can increase the timeout, but this is a security risk and not a recommended practice.
- The second issue was the manner in which the custom attributes were deployed, which allowed them to be tied to user, computer, and group objects, even though they currently only use the attributes for users. That itself isn't necessarily a problem, but they also didn't set the attributes as indexed, so this slows the request down even further and results in an enumeration error. The error isn't currently trapped by your module, so the process proceeds as if everything is fine.
Currently, even if I turned off everything for say, group objects, the script still retrieves all of the objects at the beginning of the run, which takes up memory unnecessarily, particularly when retrieving all properties, though I understand the challenge around meeting user expectations. I have several ideas on how you could make some possibly minor enhancements to improve efficiency, without having to resort to using ADSI.
- Adjust the logic to skip retrieval of an given object type entirely if all items and dependencies are turned off in the config for that type. This will save a lot of overhead if someone doesn't want a particular item.
- Instead of retrieving all properties for all objects of each type up front and holding it in memory, which hits issues when targeting large domains or a large number of objects with non-indexed properties, retrieve the objects with the default properties only to keep in memory.
- Once you get to the section where you actively process a given object type, you can make several adjustments
- Adjust the script to allow for a default of all properties, or for the person running to provide a string collection of specific properties
- To optimize the default of all properties, but still exclude key properties, create a helper function (that fires only for the default) to retrieve a single AD object of that type, run it through Get-Member to get the property names, pass it through a select (like you do now) to filter out the problem attributes, then return a string collection of the remaining properties (since the MS AD cmdlets don't accept anything other than strings)
- Once you have your property name collection (as a string), do a foreach($object in $objects){} loop to retrieve each object with all desired properties one at a time
- As each object is returned, dump it to a file instead of storing it in memory...possibly to CSV
- Use one of the several modules available that allow interacting with a CSV as if it were a table resource to selectively query items for the various reports, like admin users, password never expires, etc
By doing things like this, you get a lot of benefits, like avoiding errors due to lack of indexing, the ability to provide a progress indicator using something like Write-Progress (you can do some nifty stuff with parent and child progress windows), as well as my personal favorite in this instance, which is the ability to resume without having to start from zero if something causes an interruption. The obvious trade-off is a bit more time to run overall, but with feedback to the console, the reduced memory footprint, and the ability to resume, I think most people would be fine with it.
As far as my thoughts on the config, I'm just thinking of a more hierarchical approach when it comes to the objects, like you did with the Forest and Domain pieces:
SectionDomain = [ordered] @{ SectionDomainOverview = [ordered]@{
Enable = $true
DomainSubSections = [ordered]@{
DomainIntroduction = $true
DomainControllers = $true
DomainFSMO = $true
DomainTrusts = $true
DomainDNS = [ordered]@{ (include some DNS zone info here perhaps, SOA, which servers are running DNS, self-pointed, etc)
Enabled = $true DomainDNSSrv = $true
DomainDNSA = $true
}
}
etc
}
SectionDomainGroupPolicies = [ordered]@{
Enable = $true GroupPoliciesSubSections = [ordered]@{ GroupPoliciesDetails = $true
GroupPoliciesACL = $true }
}
SectionDomainOrganizationalUnits = [ordered]@{
SectionDomainOrganizationalUnitsBasicACL
SecitonDomainOrganizationalUnitsExtendedACL
} SectionDomainUsers = [ordered]@{
Enable = $true
InScopeProperties = * (allow to specify a comma separated string 'item1,item3,item54' in place of *)
DomainUsersSubSections = [ordered]@{ SystemAccounts = $true
NeverExpiring = $true
NeverExpringIncDisabled = $true
etc
}
}
}
You could obviously make each of the subsections definitions their own ordered hash tables if you wanted to specify additional settings. The settings for things like Excel worksheet names, document heading or table types and the like, I would pull up into their own main section and variablize everything. I can specify a table look for every table item, and every header style to be applied, and they would relate directly to the hierarchy of the other sections. This reduces the code required and ensures consistency between document sections without a lot of effort on the part of the person running the script. The hierarchy also clearly shows where the dependencies are...can't get system account information if I didn't enable getting users for example. You can also add in comments to show things like 'this section outputs to Excel only' or 'selecting this option will add X property to any custom property specification' etc.
As to doing a pull request, I would actually love to, once I have some time to spend on actually doing the work, but right now the best I can do is provide some feedback. I came across your module while trying to quickly pull together content for a customer and wanting to save some time converting data to documents. Now, of course, it's the holidays and I'm going on PTO for the next two weeks.
I've actually been working on a similar project to this one on my own that is currently at the concept phase, but looking at a different data collation approach, possibly using SQLite or some similar flat file database to dump data into. Not only will this provide at least some limited protection of the collected data, but then I can more readily separate collection from document generation for better modularity, as well as tie it in to things like Power BI or Visio for added visualizations to go with the docs. I am making some headway on this piece presently, but only in relation to another pet project that will use a similar data approach, though the groundwork for the one effort will feed right into the discovery/documentation project. Perhaps we can work on a collaboration that combines our two efforts once I get this other project knocked out?
Chris Whitfield
On Wednesday, December 19, 2018, 5:11:29 PM CST, Przemysław Kłys <notifications@github.com> wrote:
I've encountered similar issues on even smaller scale domains. I even added an option to save AD object to XML and after 24 hours of data gathering, it wasn't able to save an XML object to disk for 48 hours on more powerful machines.
So indeed there is a big problem with that one, and it needs to be addressed. I'm even wondering if building a large final object will require saving to disk every section as well.
One issue that you briefly mentioned is exporting all properties, and that is something that needs to be limited. But since I didn't know what people may want I didn't want to add it just yet. But recently I've been working on PSWinReporting where you get to define anything you want, in a language you want and was thinking to bring this approach into PSWinDocumentation:
https://github.com/EvotecIT/PSWinReporting/blob/9b351e5c7bf523704d4f5c458c22ef5b0155613c/Private/Parameters/Script.ReportDefinitions.ps1#L404-L570
The way it works is that on the left-hand side in Fields section you define fields that you want each type of events to get and the right side will be the one it will be displayed over. In the case of PSWinDocumentation, each section could have similar Fields section where each Type would have its definition. The biggest issue with such an approach is that some objects are related to each other. For example, I didn't want to ask AD multiple times for User's manager so I just ask once and then build Manager data from Users. My problem is that if people will start removing fields errors would occur.
I guess I could make 2 Fields sections, one that is static, required, and other which is optional that people can remove.
My main problem with any of my tools is design. I sometimes spend days building script and then rebuild config file multiple times because I can't get it right. Feel free to make suggestions or even submit Pull Requests :-) I'm happy to get help as clearly there are lots of issues and lots of ground to cover. If you can lay out how config file should, could look like in your opinion that would be great.
BTW. I do plan export to Dynamic HTML as well at some point but again... performance issue is a priority, that may even require a complete rewrite to some extent.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Thanks for your feedback. I've some ideas on how to do things but it will take time. Feel free to jump in when you're comfortable enough. Btw. you do know this module is able to export to SQL (MS SQL) right? Just like it does for Excel, it does the same magic with SQL so you can build your reporting based on that. I will add support to HTML (already working on PSWriteHTML module) which should help people use things as they want. There are few people having similar projects to mine like Brad Wyatt and if we join forces try to fix this one correctly it will be easier :-) 3 - 4 people working on one module, making sure it works from 10 to 200k users would be great. |
Yes, I saw that your module does export to SQL, but from what I saw in the code, it's based on a full SQL Server installation, and it's a dump of data at the end of the process. While my approach does end with data in a SQL-esque location, there are some marked differences.
- it is not full SQL, but SQLLite, which essentially consists of just a DLL and a flat file, but with many of the optimizations and speed you would get from one of the bigger products, making it highly portable while also eliminating challenges around setup and security, since nothing other than the module itself, and possibly the the sub-module for interaction with the DB instance, need to be 'installed' anywhere.
- I am intending to go a step further and eliminate all flat file storage during runtime, to include the advanced config stuff for other output options. For example, you have a great idea with providing a whole framework for turning elements on or off, and setting formatting in your output and the like. What I'm thinking of is a table in the database file that contains all my preferences, and a set of functions to retrieve and update those settings. There is even a SQLite module that will mount the DB via a provider, which should reduce the T-SQL required to write, retrieve, and update data.
- I'm working through an approach that will essentially allow me to more or less stream the data into the database file directly, and therefore enabling me to substantially reduce the memory footprint and giving me a full relational data source to work with more or less offline
If I use full SQL, even if I use Express, that has to be downloaded and installed somewhere, and then I have to build the tables (though I did see a module that will dynamically build the table based on the object in the pipeline), and I won't be able to persistently keep my preferences within that more flexible format. Yes, I could simply create a setup utility that creates the table and loads in my defaults, but then I have to put all that into code, so it just seemed more efficient to use something in between.
I fully agree that having a handful of dedicated people all working on a single module is more efficient than multiple separate approaches and then trying to combine. My challenge at the moment is that I need some of these underlying items for a customer facing effort rather immediately, and I'm taking some shortcuts to get to the end result. For example, while trying to figure out the source of the problem I was having with this module, I ended up firing elements of your module code manually, but I replaced the ActiveDirectory module commands with the Quest ActiveRoles module, which doesn't use the AD Web service, and uses pure .NET under the covers. It's getting me what I need immediately, but it's not something you could redistribute, and it doesn't solve the other refactoring issues.
I'm also simultaneously working on a reporting type effort that involves Jira and Power BI, and I'm using PowerShell to retrieve the data and write to a file that I then consume with Power BI for reporting. Problem is that the customer now wants historical and change data as well, and that was going to get ugly to manage with flat files, but is easy in a database, which led to the portability bit, and now I'm here. In a couple months, these immediate needs should ease, and I can work with you guys on your project. In the meantime, just trying to contribute what I can, which is suggestions.
Btw, along those lines, if you are interesting, I have put together some pieces from around the web to pull some schema details...namely a schema update history report and some determination on which schema extensions are in place for Microsoft products, updated with the latest versions. I'd be happy to share the snippets if you think it would be useful in your documentation module.
Chris Whitfield
On Friday, December 21, 2018, 1:33:08 PM CST, Przemysław Kłys <notifications@github.com> wrote:
Thanks for your feedback. I've some ideas on how to do things but it will take time. Feel free to jump in when you're comfortable enough.
Btw. you do know this module is able to export to SQL (MS SQL) right? Just like it does for Excel, it does the same magic with SQL so you can build your reporting based on that. I will add support to HTML (already working on PSWriteHTML module) which should help people use things as they want. There are few people having similar projects to mine like Brad Wyatt and if we join forces try to fix this one correctly it will be easier :-) 3 - 4 people working on one module, making sure it works from 10 to 200k users would be great.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Feel free to provide whatever you can. This module is not just for me but for everyone. I already have some more stuff in my mind what data is needed so if you can provide code snippets as part of an issue feel free to do so. If I will get time I will incorporate it. If not.. you will do it yourself later on. Generally, the goal for this tool is to have fully functional documentation & audit module in one. I want to go to the Client and be ready for anything. And I need it to just work. So when you will have time to drop your ideas, drop problems, drop issues, and drop PR's when you have time. I willl rebuild part of module to make sure it's a bit modular and easier to work with. |
So, as mentioned previously, I am currently on a project where I'm working with your module, and I have continued to have challenges due to the following factors:
- The large domain size (more than 100k users)
- This company has added around 20 custom attributes, of which none are indexed
- The domain is running on Windows Server 2008 R2, which has a 30 min timeout by default on the web service that the native AD cmdlets use for query
- The memory utilization caused by retrieving so much of so many objects, and then duplicating large portions of those objects within memory
- I did finally get a full user dump manually using the Quest ActiveRoles cmdlets, but adding the groups and computers to the mix as you do in lines 23 and 27 of Get-WinADDomainInformation, and then trying to process them through as you do further on caused PowerShell to randomly dump the whole variable from memory (possibly because my laptop only has 8GB, and it was completely flattened), so I lost everything
- I left the last attempt running for more than 12 hours, and it still hadn't completed just the user query, but I'm using some of what I outline below now, and getting much better results
As you have mentioned, I should just do a pull request and submit my suggested adjustments, but as I also mentioned, I have some tight deadlines at present and just don't have the time just yet. That said, I do very much want to provide what help I can in the meantime. As I've worked through the challenges I'm facing here, and based on our email conversation, I think I may have figured out a methodology to render the code more efficient, reduce the overall memory footprint, and also possibly solve the issue you mentioned around the not knowing what attributes a person might want to retrieve...as well as identifying a possible bug that could also be addressed.
First the bug...you mentioned that you retrieve all attributes because you don't know what the person using the script might want to retrieve, however as part of processing, you are sending the collected objects through sub-functions, such as Get-WinUsers, where you essentially drop all the other attributes but a specified set, which renders the larger memory footprint unnecessary. In theory the data would be there to retrieve should the person wish to dump it, but in practice this is unlikely to be the case.
Suggested Approach:
- For the initial data grab of users, computers, and groups, get only the sAMAccountName, objectclass, and maybe the DistinguishedName to hold in memory persistently
- This type of query will be SUBSTANTIALLY faster, take a much smaller memory footprint during runtime, and give you much more processing flexibility
- Rather than three separate variable data properties, you have one that contains everything, but with easy ability to filter for runtime activities by object class
- Add an additional config setting for each major object type for property collection type, with possible options for Default, All, Enhanced, Special, and Custom
- Default - Grabs only the default properties you currently process via the existing Get-Win* functions
- Enhanced - Gets the defaults, plus additional properties based on user specified property sets (obviously previously defined within the module)
- Obviously need another config element for this, as well either documentation, or better a command that shows available property sets and included properties
- Special - A string of user specified attributes only for that object type, as well as specifying one or more property sets (this answers situations like mine where there are custom attributes, but without having to specify every attribute I want by hand)
- Custom - User specified attributes only, though the default property set so that required attributes are accounted for to address the other config settings (for the user who wants nearly complete control)
- All - Get all attributes for a given object, less those that are blobs, certs, password fields, etc, such as those you are excluding now
- To enable the Enhanced option, as well as defining a filter set for properties to exclude (passwords, certs, unusable blob data fields, other system fields not of use to people), you can create a hash table with Keys that are the property set names, and values that are a string of attributes
- Ideas examples for user attributes:
- Exchange
- Default (minimum attributes to accommodate config items)
- SIP
- SCCM
- Extensions (all those extensionAttribute##)
- Could also create property sets based on the various config options (which you are kind of already doing with the TypesRequired and TypesNeeded)
- To accommodate the 'All' switch
- Give the user a warning prompt that Excluded property set attributes will still be filtered out, and possibly either list, or provide a command to list or adjust those values for the run
- To get all the attribute names dynamically, and filter out the bad ones, you could do something similar to the following:
- $AttributesToQuery = @()
- $AttribsTemp = (Get-ADUser -filter * -ResultSetSize 1 -Properties * | Get-Member | select-object Name).name
- foreach($attrib in $attribstemp){ if($FilterAttribs -notcontains $attrib){$AttributesToQuery += $attrib}}
- Note: There might be a better/easier way of dumping unwanted...this was just my off the top approach, but it eliminates the need to do a Select-Object that slows the process down
- Combine the various property sets as required, filter out duplicates, then convert to get to the single comma segmented string of all properties, as required by the native AD cmdlets for each object type
- When you get to the sections where you are retrieving the actual content for the report options configured by the user, you can adjust as follows
- Use OLEDB to allow interaction with an Excel worksheet as if it were a DB and open a connection to the destination workbook
- Do a foreach($item in $items) type loop to retrieve the AD objects one at a time, specifying the combined attributes variable as the properties to retrieve
- Instead of passing to a function that builds a custom object one parameter at a time as is done now with the Get-Win* functions (and which also means dumping any unaddressed properties) you can simply pass the single retrieved object through a Select-Object, specifying * for the properties, and then just adding new calculated properties for those items requiring manipulation
- $user | select-object *,@{Name='DaysToExpire';Expression={Convert-ToDateTime -Timestring $($_."msDS-UserPasswordExpiryTimeComputed"}},<custom 2>, <custom 3>, etc
- Dump the finalized data for the single object to the appropriate Excel tab using the OLEDB connection, rather than keeping it in memory, so you never have more than one object with a substantial number of properties in memory at a time
- For the additional worksheets, like you do for the user objects to show never expiring, system, etc, you could potentially adjust in one of a couple ways to be more efficient
- Simply add additional columns to the primary worksheet that show true/false for those values (this is better from a data reuse perspective in my opinion, as a user can then generate their own reports using things like Power BI, or other PowerShell queries, and you can use OLEDB as mentioned below for generating the pre-filtered tables within Word)
- Use OLEDB again to query the Excel worksheet, perform additional processing, and use that to generate the additional worksheets
Obviously the potential drawback here is the additional time required to process each object individually, as opposed to having everything all immediately in memory, but my thought here is that the substantial reduction in memory use is more than worth it, provided feedback is provided during run to show it's still in progress. You've already got all the SQL style queries worked out, so using OLEDB to interact with Excel shouldn't change too much. Yes, you could just run the tools from a server with more RAM, rather than a little laptop, but unless it's a completely unused box, you run the risk of impacting other activities that need to take place on that system if memory consumption or disk utilization goes too high (as was happening to me, thus my issues). None of the above items should require a substantial code rewrite, just slight adjustments to what you already have.
Something that would substantially reduce the time trade-off, but also potentially require substantial rewrite, would be to add in some multi-threading to speed things up a bit, either by splitting the objects into smaller groups, with dumps to temp CSV files that you combine at the end into the final Excel, or by splitting out runspaces/background jobs by object type, though again you'd probably have to do separate temp holding files that are later combined unless you can form multiple OLEDB connections to a single source that isn't a real DB (though I suppose you could use an SQLite.flat file DB). A side benefit of the temp files though, is that a sudden system interruption won't cause a complete loss of all data. You might also be able to split the objects up so you have no more than 10000 or so objects in active memory at a time (maybe in 10 groups of a 1000), combine in memory, then dump to temp file before clearing memory and starting the next batch. Either way, I would avoid trying to append to the temp files, as that is a very high overhead process. Based on your support modules, it looks like you have more runspace experience than I do, though I do have a solid example of dynamically splitting objects into groups for that purpose I can share if desired.
As mentioned above, I absolutely want to contribute to your code base when I can come up for air, but presently I'm having to do a lot of things manually that I had hoped to leverage your module for, so that's got me bogged down. I am still leveraging pieces of your module though, and I am doing everything within PowerShell, so I'm hopeful I will be able to turn that into real code and do a pull request down the road. In the meantime though, wanted to provide the direction I'm going to address the issue on my own in the short run, with thoughts on how to evolve the code.
What do you think?
Chris Whitfield
On Friday, December 21, 2018, 3:09:24 PM CST, Chris Whitfield <merddyin@yahoo.com> wrote:
Yes, I saw that your module does export to SQL, but from what I saw in the code, it's based on a full SQL Server installation, and it's a dump of data at the end of the process. While my approach does end with data in a SQL-esque location, there are some marked differences.
- it is not full SQL, but SQLLite, which essentially consists of just a DLL and a flat file, but with many of the optimizations and speed you would get from one of the bigger products, making it highly portable while also eliminating challenges around setup and security, since nothing other than the module itself, and possibly the the sub-module for interaction with the DB instance, need to be 'installed' anywhere.
- I am intending to go a step further and eliminate all flat file storage during runtime, to include the advanced config stuff for other output options. For example, you have a great idea with providing a whole framework for turning elements on or off, and setting formatting in your output and the like. What I'm thinking of is a table in the database file that contains all my preferences, and a set of functions to retrieve and update those settings. There is even a SQLite module that will mount the DB via a provider, which should reduce the T-SQL required to write, retrieve, and update data.
- I'm working through an approach that will essentially allow me to more or less stream the data into the database file directly, and therefore enabling me to substantially reduce the memory footprint and giving me a full relational data source to work with more or less offline
If I use full SQL, even if I use Express, that has to be downloaded and installed somewhere, and then I have to build the tables (though I did see a module that will dynamically build the table based on the object in the pipeline), and I won't be able to persistently keep my preferences within that more flexible format. Yes, I could simply create a setup utility that creates the table and loads in my defaults, but then I have to put all that into code, so it just seemed more efficient to use something in between.
I fully agree that having a handful of dedicated people all working on a single module is more efficient than multiple separate approaches and then trying to combine. My challenge at the moment is that I need some of these underlying items for a customer facing effort rather immediately, and I'm taking some shortcuts to get to the end result. For example, while trying to figure out the source of the problem I was having with this module, I ended up firing elements of your module code manually, but I replaced the ActiveDirectory module commands with the Quest ActiveRoles module, which doesn't use the AD Web service, and uses pure .NET under the covers. It's getting me what I need immediately, but it's not something you could redistribute, and it doesn't solve the other refactoring issues.
I'm also simultaneously working on a reporting type effort that involves Jira and Power BI, and I'm using PowerShell to retrieve the data and write to a file that I then consume with Power BI for reporting. Problem is that the customer now wants historical and change data as well, and that was going to get ugly to manage with flat files, but is easy in a database, which led to the portability bit, and now I'm here. In a couple months, these immediate needs should ease, and I can work with you guys on your project. In the meantime, just trying to contribute what I can, which is suggestions.
Btw, along those lines, if you are interesting, I have put together some pieces from around the web to pull some schema details...namely a schema update history report and some determination on which schema extensions are in place for Microsoft products, updated with the latest versions. I'd be happy to share the snippets if you think it would be useful in your documentation module.
Chris Whitfield
On Friday, December 21, 2018, 1:33:08 PM CST, Przemysław Kłys <notifications@github.com> wrote:
Thanks for your feedback. I've some ideas on how to do things but it will take time. Feel free to jump in when you're comfortable enough.
Btw. you do know this module is able to export to SQL (MS SQL) right? Just like it does for Excel, it does the same magic with SQL so you can build your reporting based on that. I will add support to HTML (already working on PSWriteHTML module) which should help people use things as they want. There are few people having similar projects to mine like Brad Wyatt and if we join forces try to fix this one correctly it will be easier :-) 3 - 4 people working on one module, making sure it works from 10 to 200k users would be great.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
I'll address some of the stuff when I get around to it. I believe the first step is to mark code which is required to be run against DC as first. That way you get only data that is available on DC, rest can be built from data you gather once. Adding RunSpaces to execute code in parallel is possible (and I have know-how) however since most of your issues come from memory problems having multiple processes would be killing machine even more right? I will start slowly rebuilding what I can. I am not sure I want to use OLEDB or go really down into deep hole. I was wondering also if there's a way to ask for user count, computer count, group count in a way that would not trigger data download. Like Get-UserCount. Get-ComputerCount and so on. This would allow me to decide on which route to take when gathering information. I could code it so that there are 3 approaches to getting data - 1 standard for smaller domains, 1 extended and 1 that would try to drop everything to file every letter of the alphabet or so. It's gonna be a long road to get it to a working place thou in large domains. |
I believe this has been partly addressed with the newest release. I still need to rework group membership but otherwise, it should be fast. |
Issue: When running the tools against a domain with over 100k users, or large numbers of OUs and ACLs, script consumes upwards of 4GB or more of RAM, ~20% proc, and generally consumes all Disk IOPs which, when run from a laptop with 8GB of RAM, brought it to an unusable level. In the event the script consumes too much overhead, the system crashes, resulting in having to start completely over. On older domains (2008 R2), large queries can result in timeouts in the domain web services causing 'invalid enumeration context' errors.
Runtime Environment:
Windows 10 v1803
PowerShell v5.1
Module Versions - Latest of all dependent modules, pulled directly from GitHub
Expectations:
a) When possible, data for each area should be written out to files as it's pulled, and then cleared from memory, to minimize memory footprint.
b) Long running tasks, such as collecting OU ACLs, should have a progress bar to provide an indication of elapsed time and percent complete.
c) By making use of frequent data dumps to the file system, it should be possible to enable a resume in the event that a given run crashes the system, or some other factor causes the process to be interrupted.
Suggestion: It might be beneficial to pull a count of objects and performing some pre-analysis on the number of objects prior to initiation of a section run. For larger environments, it should be possible to break processing up into smaller chunks, perhaps using filters (ie found 100k objects, find all users with given name starting with a, b, or c, then do d,e,f, etc until all objects have been processed). This information can be dumped to disk, and then the files processed individually into the final formats at the end of the run. It may also be beneficial to switch from using the AD cmdlets to using the legacy ADSI interface in larger environments, since it doesn't have the same timeout limitations.
The text was updated successfully, but these errors were encountered: