Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow export on large database due to ClearFilesByExtension #527

Open
rsonnier-cfs opened this issue Aug 7, 2024 · 6 comments
Open

Slow export on large database due to ClearFilesByExtension #527

rsonnier-cfs opened this issue Aug 7, 2024 · 6 comments

Comments

@rsonnier-cfs
Copy link

I am new to this site, but I have been using Version_Control_v3.4.23 for about a year. I tried upgrading to Version_Control_v4.0.34 months ago, but was discouraged by the extreme slowness of the export source function compared to the previous version. Last month, I decided to go ahead and move to the V4 anyway. So my exports have now increased from seconds or a few minutes to 30 minutes or more! BTW...I really like the new version, and it has resolved some of the issues we were experiencing with v3.

In our environment, we have 4 developers, over 40 MS-Access databases. Some of those databases have in excess of 400 - 500 linked tables (I inherited all of this and am in the process of cleaning up unused linked tables). Regardless, many of our databases actually use well over 100-200 linked tables (and views).

When investigating the cause of the extreme delays in the export to source process, it was fairly easy to determine that the delay occurred in the export tables section of the routine. By suspending code and stepping through the export process, I zeroed in on these new calls to Public Sub ClearFilesByExtension (not in v3) that appear to be searching for and removing some legacy files by the file types "LNKD", "bas", and "tdf" from the tbldefs source folder.

The issue is that the ClearFilesByExtension seems to be called at the wrong place in the code. I am still evaluating the logic behind this, but it appears to me that these only need to be called once during the export tables section of the code. Instead the ClearFilesByExtension sub is called during the loop for EVERY table in the database project. SO in a large database with hundreds of linked tables, the delay is excessive. Further, my source does not have any of these file types in the tbldefs folder so I am guessing that the reason for the ClearFilesByExtension sub may related file types created by an earlier version of the add-in?

MODULE: modImportExport
Public Sub ExportSource
(calls) cDbObject.Export

which leads to:
MODULE: clsDbTableDef
Private Sub IDbComponent_Export
.....
ClearFilesByExtension IDbComponent_BaseFolder, "LNKD"
ClearFilesByExtension IDbComponent_BaseFolder, "bas"
ClearFilesByExtension IDbComponent_BaseFolder, "tdf"
......

@joyfullservice
Copy link
Owner

Thanks for the feedback on this! I am interested in see some more details on the performance issues. I actually worked on a file scanning function modFileWinAPI.GetFileList() that uses the Windows API for very performant file scanning. It sounds like I should probably rework the ClearFilesByExtension() function to use this approach.

But before I dig into this, I had a couple questions about your environment. Are these database projects on local SSD storage, or a network location? Several minutes seems excessively long, especially for subsequent exports with few changes.

Would you be able to post the Performance Reports from the end of the export log on one of these larger databases? That could be helpful in seeing the counts and times for some of the internal operations.

@rsonnier-cfs
Copy link
Author

rsonnier-cfs commented Aug 7, 2024 via email

@joyfullservice
Copy link
Owner

I would love to assist…Let me gather the requested info. We would be a good test environment because of the large number and large size of our databases.

Thank you! This is helpful feedback because sometimes performance issues only become obvious in scenarios dealing with large amounts of data.

We are a high end network environment we state of the art hardware. Development is primarily performed on a remote desktop server. All storage locations are networked locations.

The remote desktop server shouldn't be a problem, although you will probably see a difference between SSD vs. mechanical drives due to the higher random access reads that SSDs can do on different parts of the drive at the same time. The key here would be to make sure the developers are working on a local clone of the repository directly on the application server, not trying to export source to a network location on a different server. (It will still function over a network share, but the performance will suffer drastically.)

In the ideal scenario, each developer makes a clone of the relevant repositories, (one for each database or system) and performs the development work locally on his machine, committing changes back to the repository which are then merged to the upstream forks as per your development workflow. Operations like exporting and building from source are very fast when running on SSD and typical development hardware.

Obviously not everyone gets to choose how the development process happens for their environment, but I just share that background for what I consider a typical usage scenario, for which this add-in is designed to (hopefully) function as optimally as possible.

I want to emphasize that I don’t think the ClearFilesByExtension()is the issue but rather when it is called in the loop. As a result it runs 1500 (500x3 for the 3 extensions) times if we have 500 tables. It just needs to run just once before the loop. I am fairly confident I have analyzed this and that’s where the delay is.

I absolutely agree! This should only be called once during an export operation, and probably only during a full export. I am already refactoring some things to move this out of the component type classes.

Also, just as a friendly note, the GitHub project is public on the Internet, so you will want to redact out any sensitive information from your log file that you don't want posted here. I will go ahead and paste in the relevant section below:

--------------------------------------------------
                PERFORMANCE REPORTS
--------------------------------------------------
Category                      Count     Seconds
--------------------------------------------------
Tables                        480       1421.78
Reports                       4         10.52
Forms                         15        2.81
Queries                       20        1.48
DB Connections                1         1.12
Modules                       8         0.32
Doc Properties                10        0.22
DB Properties                 50        0.19
Table Data                    1         0.10
Macros                        4         0.07
VBE References                5         0.05
Nav Pane Groups               1         0.04
Proj Properties               2         0.03
VB Project                    1         0.03
Project                       1         0.03
Hidden Attributes             0         0.03
Table Data Macros             0         0.02
Relations                     0         0.00
Shared Images                 0         0.00
IMEX Specs                    0         0.00
Themes                        0         0.00
VBE Forms                     0         0.00
Saved Specs                   0         0.00
--------------------------------------------------
TOTALS:                       603       1438.84
--------------------------------------------------

--------------------------------------------------
Operations                    Count     Seconds
--------------------------------------------------
Clear Files by Ext            1444      1395.29
Write File                    1063      7.92
Save Table SQL                480       7.24
Convert to JSON               5696      2.47
App.SaveAsText()              43        2.24
Scan DB Objects               1         1.86
Sanitize File                 43        1.67
Get File Property Hash        1076      1.30
Clear Orphaned Files          16        1.16
Increment Progress            2114      1.02
Get Modified Date             2129      1.00
Read File DevMode             19        0.65
Format SQL                    20        0.49
Check for linked table        480       0.45
Read File                     56        0.36
Compute SHA256                1670      0.28
Verify Path                   1110      0.24
Delete File                   99        0.14
Export VBE Module             8         0.12
Read File Bytes               55        0.10
Parse JSON                    1         0.08
App.ExportXML()               4         0.08
Console Updates               8         0.06
Save Query SQL                20        0.05
Quick Count Files             1         0.02
Get VBA Hash                  27        0.02
Sanitize VBA Code             27        0.01
Sanitize XML                  4         0.01
Quick Count Objects           1         0.01
Format XML                    4         0.00
--------------------------------------------------
Other Operations                        12.71
--------------------------------------------------

As you correctly surmised, clearing files by extension is definitely the bottleneck. The good news is that this should be changing with some refactoring I am working on today.

joyfullservice added a commit that referenced this issue Aug 8, 2024
 These calls only need to be run during a full export. Definitely not repeated for each object exported. We may do some further cleanup later as some of these legacy files are far enough removed from the current version that the checks are probably no longer needed. See #527
@joyfullservice
Copy link
Owner

@rsonnier-cfs - If you build the dev branch from source, then install the add-in, you should see a dramatic improvement in performance. I am thinking you should be under a minute, based on the performance log. Let me know how it goes!

@rsonnier-cfs
Copy link
Author

I will get back to this next week and follow up here.

Also, I have another suggestion that I have implemented in my copy that I want to run by you to see if its something that you would be interested in adding. Related to export path Should I start a new thread to discuss?

@joyfullservice
Copy link
Owner

Also, I have another suggestion that I have implemented in my copy that I want to run by you to see if its something that you would be interested in adding. Related to export path Should I start a new thread to discuss?

Sure, a new issue would be great for the export path idea. We try to use separate issues for unrelated bugs and feature requests. Thanks!

josef-poetzl added a commit to josef-poetzl/msaccess-vcs-addin that referenced this issue Aug 27, 2024
These calls only need to be run during a full export. Definitely not repeated for each object exported. We may do some further cleanup later as some of these legacy files are far enough removed from the current version that the checks are probably no longer needed. See joyfullservice#527

Co-authored-by: joyfullservice <joyfullservice@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants