Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential improvement: Importing and exporting recent changes only #81

Closed
mmzmusicnotes opened this issue Oct 26, 2020 · 38 comments
Closed

Comments

@mmzmusicnotes
Copy link

mmzmusicnotes commented Oct 26, 2020

Hi @joyfullservice,

I've done some work recently on the original branch of this project which I think you might find valuable. In short, I implemented the following:

  • Added the ability to import or export only a specific type of object (modules, macros, queries, forms, references, or table data and definitions)
  • Added the ability to export only files which have changed since the date of the current HEAD commit
  • Added the ability to import only files which have changed between the current HEAD commit and a commit tagged with access-vcs-last-imported-commit (this tag is applied and moved the import process, and should not be pushed)

Essentially, the goal is to make the process as fast as possible by only importing/exporting objects which are likely to have changed. Even with fast save, some of our databases take upwards of 10 minutes to export, and nearly a minute subsequently. These changes make it take less than half of that; combined with your faster save, I think it would cut things down to be nearly instant.

Once that's working, it would make sense to have a button or command that can just do a quick update of that kind, rather than going all the way through the export/build UI (which is lovely, but takes many clicks). That button-press would then become a normal part of the git workflow - export, commit, push; fetch, pull, import.

I did all this before I knew about your approach, otherwise, I would have started from your code instead, as it seems to do quite a bit more, and more quickly. We use the plugin through the immediate window; it would be nice to have a way to access this that was integrated more completely into the VBA editor (rather than either typing into the immediate window or having to go to Addins and open the UI, configure it, etc).

If you are interested in integrating this work into your project, I would be happy to help you get this working in your add-in.

@joyfullservice
Copy link
Owner

@mmzmusicnotes - Thanks for reaching out! On the performance side, 10 minutes seems really long, even for a very complex database... Is that with the joyfullservice version of the add-in? I have a very complex Access ADP project that exports about a thousand objects. That takes just under a minute for a full export, and under four seconds for subsequent exports. (And most of that time is dealing with SQL objects, not database components.)

image

I actually used to have a toolbar in the VBA IDE, but I moved away from that because I preferred the approach of a demand-loaded add-in that I could use on any database instead of needing to add code to each project to load the VCS tools.

On the workflow side, I personally use GitHub Desktop alongside Access in my development work. When I am ready to make a commit, I simply run the export from the add-in, then switch over to the GitHub Desktop to review my changes, add the commit message and push the changes.

image

It sounds like your version is much more tightly tied to git itself, whereas mine is looking at file modification dates, which also allows you to use other file-based version control systems like CVS, SVN, TFS, etc...

That being said, if there is something specific that you feel would really improve the process, I would be happy to take that into consideration. 😄

@mmzmusicnotes
Copy link
Author

mmzmusicnotes commented Oct 27, 2020

@joyfullservice Thanks for the quick response!

On the performance side, 10 minutes seems really long, even for a very complex database... Is that with the joyfullservice version of the add-in?

It is. Here's my initial export:

image

and subsequent:

image

Import is similar. Clearly, my DB is much larger than yours (and arguably, larger than anyone's ever should be)!

I actually used to have a toolbar in the VBA IDE, but I moved away from that because I preferred the approach of a demand-loaded add-in that I could use on any database instead of needing to add code to each project to load the VCS tools.

Is that code somewhere in this repo? It would be enormously helpful to us, I think...

Or, does the add-in have a way to add that functionality? I'd be happy to help get it working if so.

On the workflow side, I personally use GitHub Desktop alongside Access in my development work. When I am ready to make a commit, I simply run the export from the add-in, then switch over to the GitHub Desktop to review my changes, add the commit message and push the changes.

Our workflow is somewhat similar, but substitute Sourcetree for GitHub. The big advantage to integrating into git is that you can go both back and forward in time. For instance, suppose that I revert to an earlier commit. My code will determine the delta of changed files between the last commit I imported and that earlier reversion, and import only those files. Unless I misunderstand what you are doing (or your audit trail is much more robust than I realized), I don't think that can be done with modification dates?

If you desired it, this could be available as an option - a method of deciding what to import. That way, for other source control systems you can still use file modification dates.

@joyfullservice
Copy link
Owner

Thank you for the additional background information, and the screenshots. Wow, I have never seen that many queries in a single database. 😄 It makes sense why you would be wanting to improve the export performance.

As to the code for the VBA IDE toolbar, absolutely! You can browse the related code here. Specifically, you are probably wanting to see clsVbeMenu and modMenuHandler. If you would find something like this helpful in your development cycle, I would be happy to integrate a pull request into the core. (I would probably also suggest an option where you could choose whether or not to load the VBE menu.)

The integration you are describing with git sounds intriguing... I would be interested in reviewing a pull request on how you would suggest integrating this into the project. As you have probably discovered, I have completely restructured the source files for this project, moving towards a more object-oriented class-based approach that can more easily be refactored and extended going forward.

I love the idea of being able to combine efforts with other skilled developers on this project. The more people we have behind this project and supporting it going forward, the more all of us will benefit in the long run. 😄

@mmzmusicnotes
Copy link
Author

Wow, I have never seen that many queries in a single database. 😄 It makes sense why you would be wanting to improve the export performance.

And that's nothing - we have one DB with over 25,000 queries! (There are some good and some not-so-good reasons why things are this way, but regardless, we need to capture the current state before we can go forward.)

We are also working on a continuous integration method whereby we use automated tools (e. g. TeamCity) and an autoexec script to rebuild the database from source on a build server for each development increment. That isn't up and running yet, but when it is, it will probably use a version of this tool.

As to the code for the VBA IDE toolbar, absolutely! You can browse the related code here.

Thank you! Yes, I think this would be helpful for us, especially if I integrated menu items for our "export changes", "import changes" and "[export/import] all [object type]" options.

The integration you are describing with git sounds intriguing... I would be interested in reviewing a pull request on how you would suggest integrating this into the project.

I'd like to give that a shot. It may be a couple of weeks before I get a chance, as we've just finished this current pass of work on the system (using the structure of the original version of the plugin), but after that I can hopefully try to dig in and see how we can integrate it.

@joyfullservice
Copy link
Owner

And that's nothing - we have one DB with over 25,000 queries! (There are some good and some not-so-good reasons why things are this way, but regardless, we need to capture the current state before we can go forward.)

I totally understand! It is nice to have a current state before diving into major development.

We are also working on a continuous integration method whereby we use automated tools (e. g. TeamCity) and an autoexec script to rebuild the database from source on a build server for each development increment. That isn't up and running yet, but when it is, it will probably use a version of this tool.

You might be interested in #51 where a couple of us were discussing some build automation using this add-in. I think this is something that would be of interest to several other Access developers as well.

I'd like to give that a shot. It may be a couple of weeks before I get a chance, as we've just finished this current pass of work on the system (using the structure of the original version of the plugin), but after that I can hopefully try to dig in and see how we can integrate it.

Sounds great! I think you will find the code structure fairly self-explanatory, but feel free to reach out if you get stuck on something. I appreciate your interest and collaboration on this project. 😄 Hopefully the many hours of cleanup and restructuring I have spent on it will give you a jump-start on the development side as you work out issues with very large and complex databases.

@joyfullservice
Copy link
Owner

@mmzmusicnotes - As of the latest release (v3.2.1) I have implemented some additional performance tuning and logging. I am guessing that you will see noticeable improvement when exporting your databases. 😄 I am definitely interested in your idea of a "Merge Build" where the changed files are integrated into an existing database based on the date of a head commit rather than a complete rebuild each time.

@mmzmusicnotes
Copy link
Author

mmzmusicnotes commented Nov 9, 2020

Hi @joyfullservice - thank you for your work on this!

Unfortunately, the new version is crashing on this line:

Log.Add "[" & cCategory.Count & "]" & IIf(Options.ShowDebug, " " & cCategory.Category & " processed.", vbNullString)
which is line 115 of modImportExport. It does seem to be going faster, though!

I am continuing to work on our git implementation. Once I have it in a relatively stable place, I will put it up on GitHub as a starting point, and then we can discuss integrating those changes here.

@joyfullservice
Copy link
Owner

Is the error on the export or the import? Sounds like maybe a quick fix...

@mmzmusicnotes
Copy link
Author

Sorry, I should have specified - on the export.

@joyfullservice
Copy link
Owner

Could you post a screenshot of the error message? Also, if you click Debug and use F8 to step through the code when it hits the error, can you see what the issue is?

@mmzmusicnotes
Copy link
Author

mmzmusicnotes commented Nov 9, 2020

Ran it again, and it's now working -- not sure what the issue was. Here's the output:

image

Definitely a savings.

@joyfullservice
Copy link
Owner

Nice! At the bottom of the log file, you should see some performance reports... I would be interested in seeing how those come out for your database... 😄

@mmzmusicnotes
Copy link
Author

Here's what I've got:

-------------------------------------
        PERFORMANCE REPORTS
-------------------------------------
Object Type         Count     Seconds
-------------------------------------
themes              1         0.01
VB project          1         0.02
vbe references      12        0.05
imex specs          20        1.08
db properties       50        0.17
tables              813       55.07
queries             1938      481.20
forms               321       10.24
modules             76        2.17
reports             58        2.19
doc properties      6         0.04
nav pane groups     1         0.02
-------------------------------------
TOTALS:             3297      552.25

-------------------------------------
Operations          Count     Seconds
-------------------------------------
Console Updates     11        0.34
Extract Theme       1         0.00
Clear Orphaned      6         19.48
Increment Progress  799       1.76
Delete File         5244      3.25
Unicode Check       5595      4.57
Write to Disk       5595      6.40
Save Table SQL      813       16.24
App.ExportXML()     296       1.04
App.SaveAsText()    2393      263.13
Read File           2375      0.58
Sanitize File       2317      1.56
Save Query SQL      1938      107.64
Read File DevMode   58        0.03
Other Operations              150.25

If we can improve the performance of query/table export, that would be the biggest bang for the buck from my perspective.

@joyfullservice
Copy link
Owner

Thanks! That is helpful to see. As you may have seen, the two tables are basically two perspectives of the same export. The top table is by object type, and the bottom table is by process. The totals are roughly similar, but will have some expected variation based on the timing precision and other overhead details.

Based on my performance tuning development last week, here is my analysis:

image

Of this, about 70% is internal processes that we really can't speed up any further. (Such as Access actually serializing the object and writing it to a file.) I am encouraged to see that we are getting some really nice performance, even on a large and complex database.

Going forward, the biggest performance gains will obviously be in refining the selective export/import of objects.

P.S. I would also be curious to see the performance reports on a subsequent export using Fast Save. I am thinking that you should see some significant improvements there as well.

@mmzmusicnotes
Copy link
Author

mmzmusicnotes commented Nov 9, 2020

P.S. I would also be curious to see the performance reports on a subsequent export using Fast Save. I am thinking that you should see some significant improvements there as well.

Subsequent fast save, as you suspect, is much faster:

image

For workflows where most work is edited in Access, this is more than sufficient, I think, provided you don't have to export too frequently. We like to edit in VS code, though, where possible - it has many features the Access editor doesn't have, such as changing all references, searching globally for all references, and so on. We import into Access (using my git-based ImportChanges which is quite speedy) and then export any changes made to resolve debugging issues.

One thing I wonder is whether Application.SaveAsText is the only way for us to go for all objects. For instance, there is DoCmd.TransferText: https://docs.microsoft.com/en-us/office/vba/api/access.docmd.transfertext

or even using a query-execution approach: https://stackoverflow.com/questions/17272945/exporting-a-query-result-into-a-text-file-using-vba-ms-access

This of course is referring specifically to tables and queries, which are the long-tail in my databases.

Without knowing what Application.SaveAsText does internally, I am unsure whether any of these would be faster. (At a guess, TransferText could be, it seems to me, but a query-based approach seems unlikely to be.) I'm wondering if you know much more about any of these?

@joyfullservice
Copy link
Owner

@mmzmusicnotes - Thanks for the additional background on your workflow process. I will look forward to learning how you are interacting with Git and determining the changed files and such. Ultimately I believe this is going to be the missing link to fast and efficient merging of changes into a working database.

In terms of raw speed, the undocumented SaveAsText is pretty fast. The performance report you attached above indicates that files were being generated and written at a speed of just over a tenth of a second per object. I don't know if this is to a network drive or mechanical drive, but on my older laptop with an SSD, I am getting 0.04 seconds per object for a local export of 295 objects. Reduce that down with a Git integration to say, 10 objects that actually changed, and I think we could be running exports in under 2 seconds.

In regards to other functions to export database objects, I am familiar with those approaches you mentioned. They are for exporting the data stored in a database, as compared to the object definitions themselves. Based on my experience with this add-on and others like it, the SaveAsText function seems to be the very best approach that I am aware of for the purpose of serializing a database design for version control.

@mmzmusicnotes
Copy link
Author

That makes sense - thank you.

I'm going to speak with my colleagues today about getting some of our git integration back out into a repo for you to take a look at. Not sure what form it'll take, but hopefully will be able to put it up soon. Thanks again for your help!

@A9G-Data-Droid
Copy link
Contributor

A9G-Data-Droid commented Nov 17, 2020

@mmzmusicnotes, I'm just popping in to say that the RubberDuck may make working in the VBE more attractive than VSCode:

https://rubberduckvba.com/

Their ultimate goal is to replace the code window in the VBE with AvalonEdit but that is a long term goal. For now, it has some amazing refactoring and code inspection tools that I use every day.

@mmzmusicnotes
Copy link
Author

@A9G-Data-Droid Hey, thanks for stopping by. Yes, we've tried Rubberduck - it tries to chew through our code and pretty much dies. It's not usable, just much too slow to do anything, and crashes frequently. Doubtless it would be less painful in a DB of smaller size... though if you have any ideas on optimizations that would make it usable, I'd be interested.

@joyfullservice since I am here, an update - I got the go-ahead to get my code uploaded to Github. We are working on our CI pipeline and that's revealed a few bugs in our implementation I'd like to get resolved. Regardless of that, though, I will put up what I have before the end of the month. Just hoping a little extra time will further improve the state of things.

@mmzmusicnotes
Copy link
Author

@joyfullservice to get us started, I went ahead and threw up the current state of things here:

https://github.com/mmzmusicnotes/msaccess-vcsintegration-gitsupport

I'm sorry that this does not have the history from the original repo, and is thus not a true fork. I moved it into our internal source control, and there was not a clean way for me to set that up. I spent a little time trying to rebase it off the original repo, but that turned out to be a pain - and as I said, this is just a proof-of-concept.

The parts to look at are the VCS_GitOperations module and the related changes to VCS_References and VCS_ImportExport - there is commit history to show you how those changed. I would start with the ImportChanges and ExportChanges functions in ImportExport to get an idea of what's going on.

I think if we wanted to integrate this work here, we would want to be much more circumspect about architecture. I threw this together and it is fairly fragile (as the original plugin was). Obviously, the git stuff could all be contained in a class that handles git operations. We'd also want to do a better job of managing tags (right now, users have to manually create the access-vcs-last-imported-commit tag when they set up the VCS plugin).

Finally, there's some challenges with export - when you export your changes, it may include changes that were last imported from git, because export has to go by date of modification, and you must always export all modules, because I have not found a reliable way to ascertain when a module was modified. I also don't yet have a good way to notify git that a module was renamed or delete within Access itself - but as I mentioned, this is not a super high priority for us as we mostly work in VS Code/the filesystem, and then import the changes into Access.

The UI is also worth discussing. If this is presented as a setting or option, then it could change the import/export behavior accordingly. It might be better, though, to have it as an entirely different set of operations - e. g. a "Sync with git" button that exports any changes made in Access, and imports any changes made in git, and updates the tagged commit. You could even provide a list of commits and choose visually which one to import changes starting from (by moving the tag).

I hope this is helpful!

@joyfullservice
Copy link
Owner

@mmzmusicnotes - Thanks for uploading a copy of your project! This was exactly what I needed to be able to see how you were interacting with git. I like your technique of reading the StdOut from the shell commands. I have never used that before, but it looks like a very effective approach.

I agree with your suggestion of using a dedicated class for the git interaction. Perhaps clsGitSync. I am not too worried about some of the items not having modified dates. Most of them are going to be lightweight items that we can export to a temp file and compare with the repository file to identify any changes.

I think the bigger question is going to be how we work out the overall workflow and development cycle. We need to consider files that may have changed in the following areas:

  • Uncommitted changes in git (such as from external editing of source files, or recent exports)
  • Internal changes in Microsoft Access objects that have not yet been exported to source
  • Upstream changes in Git that have not been pulled down to the repository

Obviously the most important thing here is that a user does not inadvertently overwrite their changes and lose work. Any time changes are overwritten from one source to another, there needs to be an intentional decision on the part of the user as to which changes are preserved.

After some consideration here, I would like to propose the following approach for managing the changes, then outline what this looks like from a workflow perspective in some different development scenarios. This would be the behavior with the git integration functionality turned on.

A guiding principle here is that we are using GitHub Desktop, SourceTree, or other git management tool to handle commits, pushes and pulls to the remote repository.

Export Source:

  • After running an export, a list of the modified files would be retrieved by git, and a .json file created that records the modified dates of the changed files at the time of the export. The files may be overwritten on subsequent exports, as long as the date matches, and the head commit has not changed.
  • To keep from overwriting changes in the source files, an export can only be performed when there are no uncommitted changes in the source files folder, other than code exports (see above). Any existing changes to source files must be discarded, stashed or committed first.
  • If uncommitted source changes are found, the developer will need to either discard or stash them, or perform a merge build before proceeding with an export.
  • When exporting, with Fast Save on, only objects that have changed between the last export date, and the last commit date in the repository are exported. This should give us a very fast export.
  • Certain types of changes are ignored when writing files, such as changes in the VCS version in .json header sections, and some properties that may frequently change.

Merge Build:

  • Database objects would be replaced from source files when the git file diff indicates that the file has changed between the head commit of the last build, and the current head commit, including any uncommitted changed files.
  • Database objects would be added when they exist only in source files.
  • Database objects would be removed when they no longer exist in source.

Use Cases:

Primary Development in Microsoft Access
Periodic exports to source gives the developer a chance to review the changes and commit them to git. Changes made since the last merge build would continue to be exported (but probably not flagged as changed in git) until the next merge build is performed.

Collaborative Development in Microsoft Access
Differences in the source files and database objects would need to be resolved at the source file level, then a merge build performed before source could be exported.

Primary Development in Source Files
Periodic merge builds would keep the Access objects up to date with the latest changes made in the source files.

After reviewing this in more detail, I would suggest that we take the following plan of action towards integrating this into the core.

  1. First, let's establish the complete specification of how we want this to work, especially in reference to dealing with change conflicts, to make the process as clear and intuitive as possible for the users. (I went ahead and added a page to the wiki to get us started using what I described in this post.)
  2. With the specification completed to satisfactorily meet the various usage requirements, we can move forward with the code updates to implement the specification into the software.

While this takes a little more planning and thought at the beginning, I believe it will give us the best quality product in the end, help minimize surprises and reduce the potential of developers losing work when something doesn't function as expected.

What do you think? Do you like this approach?

@mmzmusicnotes
Copy link
Author

Apologies for the delay on looking at this - want to make sure I give it complete thoughts.

In general, I like all of the goals you've laid out, but I want to take an iterative approach in terms of getting the functionality into the plugin. More advanced work around resolving conflicts, etc., I think we should set a specific policy on to start, and then add more complex functionality after we've built the basics out if needed. For instance, if there's a change conflict - the exported/imported file/object changed and the target file/object also changed after the last import/export date - I think for now we just alert the user and let them choose which version to keep (or to skip the file).

Likewise, I appreciate the defensive approach as far as avoiding file overwrites, but I want to give the user the power to choose the "dangerous" option as long as they know they're doing it. For instance, in this scenario:

To keep from overwriting changes in the source files, an export can only be performed when there are no uncommitted changes in the source files folder, other than code exports (see above). Any existing changes to source files must be discarded, stashed or committed first.

I would say that an alert and the option to proceed would be preferable.

When exporting, with Fast Save on, only objects that have changed between the last export date, and the last commit date in the repository are exported. This should give us a very fast export.

This sounds like a great idea, similar to what I've done for import.

Certain types of changes are ignored when writing files, such as changes in the VCS version in .json header sections, and some properties that may frequently change.

Agreed; this can be achieved on git's end via .gitignore. In access we'll obviously have to do it explicitly. We should also ignore any local settings the add-in may save, as different users may want different things.

A couple of minor notes as well:

I am not too worried about some of the items not having modified dates. Most of them are going to be lightweight items that we can export to a temp file and compare with the repository file to identify any changes.

So long as we use git diff to do this, I agree.

Also, there's a unique issue I've run into with import/export that we should be aware of. If you import changes, then export immediately, Access will see the modification date as being when the import happened, and will thus flag those changes to be exported. We can get around this by updating the json on both import and export, as I think you intend?

Lastly, with regards to how we turn this functionality on and off - what were you thinking? This is a fairly big feature, so maybe we want some way of making it clear what you lose if you disable it?

@joyfullservice
Copy link
Owner

Thank you for the feedback and thoughtful replies! I feel like we are both on the same page, just working through the finer details as we move towards implementing this enhancement. I have added some replies to your comments below:

In general, I like all of the goals you've laid out, but I want to take an iterative approach in terms of getting the functionality into the plugin. More advanced work around resolving conflicts, etc., I think we should set a specific policy on to start, and then add more complex functionality after we've built the basics out if needed. For instance, if there's a change conflict - the exported/imported file/object changed and the target file/object also changed after the last import/export date - I think for now we just alert the user and let them choose which version to keep (or to skip the file).

Sure, that sounds fine to me. Practically speaking we could probably do a scan first, then present the user with a single question rather than individual message boxes for each file affected. (Just in case there are hundreds or thousands of changed files.) It starts to sounds a bit complicated when we present a list of files and manage what to do with each of them. (It makes me think of a SyncBack confirmation screen.) But like you said, if we start with the basics, we can build from there as needed.

Likewise, I appreciate the defensive approach as far as avoiding file overwrites, but I want to give the user the power to choose the "dangerous" option as long as they know they're doing it. For instance, in this scenario:

To keep from overwriting changes in the source files, an export can only be performed when there are no uncommitted changes in the source files folder, other than code exports (see above). Any existing changes to source files must be discarded, stashed or committed first.

I would say that an alert and the option to proceed would be preferable.

Good idea. We could add this as an option, kind of like you have with SSMS where you can allow more dangerous queries to run if you enable the option. We might have this off or set to prompt by default, but it could be set to overwrite.

Certain types of changes are ignored when writing files, such as changes in the VCS version in .json header sections, and some properties that may frequently change.

Agreed; this can be achieved on git's end via .gitignore. In access we'll obviously have to do it explicitly. We should also ignore any local settings the add-in may save, as different users may want different things.

I am not sure I was quite following how this would be done in .gitignore... I know you can ignore whole files, but not easily ignore specific lines or parts of files.

The way my plugin does it for the json files is that it compares the content of the json file and only rewrites the file if the content section changed. See the source code here.

image

The VCS version can be important in the future if the structure of the file changes between VCS versions, and we need to transform the source content before importing. But we don't really need to update the version numbers in every json file every time we upgrade VCS. (That was the thought process behind this approach.)

Do you have some specific examples of local things that we would not want saved in VCS that are currently being saved? I have functionality for localizing relative paths, and also some handling of the printer options to help minimize this type of issue. The VCS options are saved with the project in the export folder, so other users working on the same project should be utilizing the same options and getting the same output.

Also, there's a unique issue I've run into with import/export that we should be aware of. If you import changes, then export immediately, Access will see the modification date as being when the import happened, and will thus flag those changes to be exported. We can get around this by updating the json on both import and export, as I think you intend?

Yes, this additional state catalog will allow us to more accurately determine when changes happen outside the export/import processes. I found some interesting posts regarding how git determines which files have changed, but git is obviously only comparing the source file to the git head version, and doesn't know anything about the modified date of an internal database component.

Lastly, with regards to how we turn this functionality on and off - what were you thinking? This is a fairly big feature, so maybe we want some way of making it clear what you lose if you disable it?

Yes, initially I was looking at having this option available, but off by default. When you check the box for Advanced Git Integration, it would make a new tab visible that would contain additional options relating to this feature. (Similar to how I did the hidden pages for print settings and encryption.)

Initially, the add-in would continue to function using the file-date based approach for Fast Save, but as the git integration matures, this may eventually become the default option, with the other still available as a fallback for those using other systems for version control.

@joyfullservice
Copy link
Owner

@mmzmusicnotes - I have created a new branch and began some initial implementation with the git integration. You can follow the changes and contribute on the git-integration branch. Since this involves some more significant refactoring, I thought it would be best to do the main development in the feature branch, then once it is working pretty well, we can merge it in with the master branch. I plan to pick this up again on Monday, and hope to make some good progress on it next week. 😄

@Indigo744 - You might be interested to follow this development as well, and potentially contribute if you are interested... When finished, it should significantly improve the performance of exporting and importing changes with git repositories.

@mmzmusicnotes
Copy link
Author

@joyfullservice That sounds great. I will have fuller responses to your comments on Monday as well - just wanted to add one other thing I forgot to mention: It would be nice, for my team's uses, if we could have access to the VCS window or some shortcuts to export and merge changes in the VBA editor itself.

I think that can be a separate issue - in fact, I would encourage us starting Monday to chunk this work up into several issues under a Git integration "epic" of sorts, so we can each work on portions - but would you be willing to have that in the plugin if I provided it/included it as something optional?

@joyfullservice
Copy link
Owner

@mmzmusicnotes - Sure! I would be happy to include an option for a custom VBE menu. I can totally understand how that would be useful in your workflow.

I like your idea of breaking this out into chunks so we can better collaborate on the project. I have never used the GitHub Projects feature, but it looks like it might work nicely for something like this. I just set up a basic project for us to use as we work through the integration.

@Indigo744
Copy link
Contributor

@joyfullservice thank you for pinging me. I was following this thread and was interested in the initial improvements. And by the way I want to thank you and @mmzmusicnotes for the work!

I'm less interested in the advanced git integration, but that said I'll continue to follow the development and contribute if an opportunity arise. Good luck with your endeavour!

@A9G-Data-Droid
Copy link
Contributor

On VCS Version: Instead of storing the version in every file exported, would it make sense to only output the VCS version in one place. I believe that it should be tracked in source control and not ignored but it shouldn't update every file to track this information. I know there could be problems with files that are exported from mixed versions but backwards compatibility will become critical as you embark on this journey. Changes will have to be planned carefully and tested to make sure it works. You could start with a version and say that you support every version from that one forwards.

@joyfullservice
Copy link
Owner

@A9G-Data-Droid - Thanks for chiming in! I appreciate your input.

On VCS Version: Instead of storing the version in every file exported, would it make sense to only output the VCS version in one place....I know there could be problems with files that are exported from mixed versions but backwards compatibility will become critical as you embark on this journey.

Thinking about this more, perhaps a better approach would be to make the distinction between the version of the VCS software, and the version of the file format. The VCS version represents which features have been added and which bugs have been fixed. The file format version represents how the file contents are organized, and would only change when we make a (breaking) change in how the data is stored.

Taking this approach, the file format version would very rarely change, and when it does, we would fill out the Upgrade function to transform the loaded data into the new structure.

image

The name of this property is important because we don't want any confusion about what it means. It is not the VCS add-in version, it is not the version of the database or the object in the database, it is not the version in git, or the version of the source class. Does File Format sound like a clear name, or would there be a better term for this?

Perhaps we don't even need this element at all until we have a breaking change from the present version. This could be passed as an optional parameter to the WriteJsonFile function with a default of 0.0 (technically 0, but with the syntactical hint to use 1.0, 1.1, etc...) If the format is > 0, then the line would be included in the export file. This seems to be the simplest and cleanest implementation while still fully backwards compatible.

The ultimate goal is that you could drag and drop source files from other projects, exported at different times from various versions of VCS, into a new project, build from source, and have everything load in correctly, regardless of which version was used to export the source.

@A9G-Data-Droid
Copy link
Contributor

@joyfullservice That's a great solution. I would call it "Export File Format" because it's specific to the Export/Import process. Import versions can then cite compatibility with certain Export File Format Versions. There could even be a test method that keeps one of each version example file and certifies compatibility with each Export Version.

joyfullservice added a commit that referenced this issue Nov 23, 2020
Rather than saving the VCS version in every json file, we are now only saving an `Export File Format` version number if a format change involves a breaking change. This should reduce change noise when upgrading VCS versions. See discussion in #81
@mmzmusicnotes
Copy link
Author

Hi all. Here are the issues I would recommend we create as thin slices for this project.

The following I'd call "MVP", in the sense that we couldn't call the feature working without them:

  • Add the "Enable Git Integration" option to settings as a toggle for this feature
  • Add support for basic export (recording the files which have been modified and putting them on filesystem for git)
  • Add support for basic import (using git to find the list of changes and importing only those objects)

and then these I'd call nice-to-haves which can be done subsequently:

  • Add the optional integration with VBA's add-ins menu or similar
  • Add the ability to do more complex resolution of conflicts on export or import (e. g. by calling out to a program to run a diff)
  • Add the option to configure level of safety - prompt to overwrite vs just overwrite vs never overwrite

How does this sound as a starting point? @joyfullservice , if it works for you, and you are comfortable giving me the appropriate permissions, I can try to create these and hook them up to the project board. (Or I can create them, and you can do the project setup.)

@joyfullservice
Copy link
Owner

@mmzmusicnotes - Sounds great! I think you should have permissions to edit the project now, so feel free to set up these items.

@mmzmusicnotes
Copy link
Author

@joyfullservice I don't seem to be able to add or edit cards, or to update the projects or labels for issues, unfortunately. I am creating the issues now, though, starting with #100.

@joyfullservice
Copy link
Owner

@mmzmusicnotes - Thanks! It looks like the invite is still pending... Maybe see if the email landed in your spam folder?

I have been making some good progress today in some of the initial integration. I will plan to push out an update soon so you can see how things are coming together.

@mmzmusicnotes
Copy link
Author

Ahh yes, it fell into a different folder. I have what I need now. Will have each of those issues created tonight, and obviously we can refine their requirements as I go. Happy Thanksgiving!

@mmzmusicnotes
Copy link
Author

@joyfullservice I've created all the issues I listed above and fleshed them out as best I can. I also set up mvp and post-mvp labels so we can prioritize individual issues. Finally, I set up an automatic trigger for the project so that issues added to it go to "to-do" to be prioritized, and I added a "review" column for work awaiting approval/merge.

I know this issue chain is getting chatty - if you would prefer to email me privately to coordinate issues and other things that fall outside of the scope of this particular thread, my email is on my profile.

@joyfullservice
Copy link
Owner

@mmzmusicnotes - Sounds great! Thanks for your help in organizing this project. I am thinking that we can probably close this issue since we have completed the original objective of determining that yes, we would like to move forward with a tighter git integration and the related performance improvements. 😄

@joyfullservice
Copy link
Owner

For future reference, I am moving the wiki page of specifications to this issue. (I don't have a way to hide the wiki page, and this isn't a feature that is on the roadmap for further development at this time.) If we decide to pick this up again in the future, we can move this back to a wiki page with the other documentation pages.


NOTE: New Feature Under Development

If you are using this add-in with a .git repository like GitHub or GitLab, you may be interested in enabling a deeper git integration. Out of the box, this add-in uses the file modified date/time to determine when to export an object when the Fast Save option is turned on. The Advanced git Integration allows the add-in to interact with the actual git commits to determine which files have changed. Furthermore, it allows a Merge Build to integrate changes directly into a working database instead of needing to rebuild the entire application from source each time. See #81 for more discussion on the development of this feature.

Because this tighter integration can result in files being overwritten or replaced, it is very important that you understand how to use this functionality works.

SPECIFICATION

This outlines the intended functionality of the software in regards to the Export and Import of source files with the Advanced Git Integration option turned on.

A guiding principle here is that we are using GitHub Desktop, SourceTree, or other git management tool to handle commits, pushes and pulls to the remote repository.

Export Source:

  • After running an export, a list of the modified files would be retrieved by git, and a .json file created that records the modified dates of the changed files at the time of the export. The exported files may be overwritten on subsequent exports, as long as the date matches, and the head commit has not changed.
  • To keep from overwriting changes in the source files, an export can only be performed when there are no uncommitted changes in the source files folder, other than code exports (see above). Any existing changes to source files must be discarded, stashed or committed first.
  • If uncommitted source changes are found, the developer will need to either discard or stash them, or perform a merge build before proceeding with an export.
  • When exporting, with Fast Save on, only objects that have changed between the last export date, and the last commit date in the repository are exported. This should give us a very fast export.
  • Certain types of changes are ignored when writing files, such as changes in the VCS version in .json header sections, and some properties that may frequently change.

Merge Build:

  • Database objects would be replaced from source files when the git file diff indicates that the file has changed between the head commit of the last build, and the current head commit, including any uncommitted changed files.
  • Database objects would be added when they exist only in source files.
  • Database objects would be removed when they no longer exist in source.

Use Cases:

Primary Development in Microsoft Access
Periodic exports to source gives the developer a chance to review the changes and commit them to git. Changes made since the last merge build would continue to be exported (but probably not flagged as changed in git) until the next merge build is performed.

Collaborative Development in Microsoft Access
Differences in the source files and database objects would need to be resolved at the source file level, then a merge build performed before source could be exported.

Primary Development in Source Files
Periodic merge builds would keep the Access objects up to date with the latest changes made in the source files.


This PRELIMINARY specification was last updated November 18, 2020

joyfullservice added a commit that referenced this issue Nov 30, 2023
Taking some time to document the intended behavior of the merge build functionality as we work through some bugs. #471, #81
josef-poetzl added a commit to josef-poetzl/msaccess-vcs-addin that referenced this issue Jan 19, 2024
* Updating clsPerformance, as some objects never restart timing, and when resetting some objects are not cleared. Fixes joyfullservice#331

* Fixing Private/Public declarations.

* This isn't actually used.

* Bump Version

* Update API examples

Removed dependency on an external function, and added an example for building from source.

* Refine some dialect-specific SQL string quotations

Backticks only apply to MySQL, while square brackets are used with MSSQL and Access. joyfullservice#442

* Allow wrapping of long names in performance class

Extending the performance class to allow the wrapping of long names used for categories or operations. (Not really needed within this project, but could potentially be helpful in the future with translations.) joyfullservice#441

* Update based on feedback from @joyfullservice.

* Resolve conflict with upstream file

Putting the comma after the argument seems to be the preferred industry-standard approach, based on ChatGPT and Bard.

* Add option to pass path to build API

You can now specify the source files path when you initiate a build through the API. This allows automated builds to be run even if a copy of the database does not yet exist. (Such as after checking out a project in an automated CI workflow.) joyfullservice#430

* The logic for checking of existence of git files wasn't always working as expected due to searching the current directory rather than using the export folder.

* Add a check when loading XML and verify it was successfully parsed. This avoid generating a bad export where the data are not actually exported due to invalid XML being generated by Application.ExportXML. Unfortunately, if a table contains any characters that aren't valid for XML document, it won't try to escape them and include them as literals. Even if they were escaped, they might not be accepted anyway. XML specifications forbids having any characters in 0x01-0x31 range so if a table data contains such characters, this can cause the XML export to fail. In this case, tab delimited will have to be used instead. However, the previous version was simply silently exporting as if everything is hunky-dory when it's not. Hence, the error.

* The export log was littered with bunch of warnings about unclosed blocks. This seems to be due to not closing it when evaluating the UseTheme. Even if we skipped it, we still need to remove it from m_colBlocks to balance everything out.

* Fix a subscript out of range error where the tokens advance beyond the end of the string but the function GetNextTokenID returns 0, which then fails within FormatSQL function since there is no member at index 0. It's not clear why this only fails every second time a query is exported but it is the case where if it fails, exporting it next time will not yield the error. Do it 3rd time, then it fails.

* Add more types of queries that should not be formatted by SQL formatter because they are a variant of pass-through queries.

* The AutoClose may run after the form has closed (e.g. if the user is quick to close it) which may result in an error about object members not available. Since the form is closed, there's no point in setting the timer interval. To avoid the error when debugging, we add a IsLoaded check and skip it if it's not loaded.

* Fix issue with LogUnhandledErrors and simplify use. (joyfullservice#449)

* Add option to SplitLayoutFromVBA

This option (on by default) will save the VBA code from forms and reports as a related .cls file. (Still under development.) joyfullservice#378

Also removed the "Strip out Publish Option" from the options form. I have never heard of a case where this needs to be changed, and it frees up space for the new option we are adding without cluttering the form.

* Refactor code module export to shared function

This logic will be shared when exporting code modules from forms and reports.

* Support "|" in performance log entry names

Refactored parsing the key from the performance item so that we are not dependent upon a unique delimiter. The timing value is always a number, so we can be confident that the first pipe character is the delimiter. The text after that can be anything, including pipe characters. joyfullservice#450

* Adjust indenting

(minor change)

* Convert Sanitize module to class

In some cases sanitizing a source file actually creates two distinct outputs. A layout file and a code file. Rather than making the sanitize function more complicated with byref outputs and non-obvious side effects, I am taking the approach of a more explicit object-oriented route where the code is easier to understand and maintain. (And also allows for future enhancements such as SQL extraction for query definition files.)

* Refactor sanitizing to use class

Updating the existing code to use the new class.

* Refactor class variables

* Refactor form/report export to split VBA

Export is now splitting the VBA from Form and Report objects to separate files with a .cls extension. Moving on to the code that will stitch these files back together before import.

* Rename Sanitize class to SourceParser

This better reflects the expanded role of the class.

* Refactor for name change

* Verify ribbon active state when the add-in loads

Ensure that the ribbon is active when installing or activating the add-in. See joyfullservice#451

* Don't auto split layout/VBA for existing projects

For existing projects in git repositories, form and report layouts should not be automatically split from the VBA code classes. There is another process that will allow us to split the files while preserving history in both files, but this involves a couple commits and requires a clean branch. For existing projects, this is a manual upgrade (option changes). For new projects, it can happen by default.

* Move print settings processing to clsSourceParser

This keeps the LoadComponentFromText function cleaner and easier to read.

* Move source reading function

This is used in several areas, and allows us to maintain the source file encoding determination in a single location.

* Rework merging source content before import

Cleaning this up to avoid reading and writing the file additional times while merging content from different sources. (Print settings, VBA code)

* Add support to overlay VBA code after import

For some (rare) situations, it is necessary to push the VBA code directly using VBE to preserve certain extended characters that may be corrupted in a regular round-trip export/import cycle.

* Code cleanup and minor tweaks

* Fix bugs in build logic

Uncovered these while testing.

* Check for diff tool before comparing objects

* Implement correction according to rubberduck (joyfullservice#453)

replace VBA commands:
format with format$
trim with trim$

* Add wiki page for Split Files

Describes the process in a little more detail.

* Add change hook for options

Used for special processing when certain options change.

* Automate splitting forms and reports

Adds a link and some code automation to split forms and reports in existing projects to layout and class files.

* Rename function

Git.Installed sounds better than Git.GitInstalled, and will almost always be called in the context of the git class.

* Fixes joyfullservice#354 and Fixes joyfullservice#452 (joyfullservice#454)

From @hecon5:

Bump version minor number because it's not clear that the index will allow round trip from prior types in all cases; it worked on my machine, but that may not always be the case.

The date types for the index are handled natively by modJsonConverter and should import/export correctly regardless of user's date / time zone or date encoding on machines.

* Add performance timing to ISO date parsing

See joyfullservice#354

* Add high-performance wrapper functions

Avoids the use of RegEx when it is not necessary to parse a standard date format. joyfullservice#354

* Fix copy-paste oversight

joyfullservice#354

* Update error handling

Refactored a number of locations to use the new syntax for On Error Resume Next, and added code to clear expected errors.

* Use faster date parsing for date only values

* Add Split Files utility to ribbon (Advanced Tools)

Also added an informational message box when the split is complete.

* Rename as new files

* Restore original files

* Split layout from VBA in testing database

Separates the VBA code from the layout definition in the source files. (Applying to testing database now, will apply to main project soon.)

* Adjust version number

I am using the minor version number to represent releases from the main branch, and the build number to continuously increment during the development cycle.

* Revert the ConvDateUTC and ConvTimeUTC functions to always parse the "Fast" way first and revert otherwise. this allows the optimization to be used everywhere with no code changes. Ensure that millisecond accuracy is kept for otherse using the function. No Speed impact is noted on my end to doing this.

* Pass by ref so we don't need to build more memory use. Optimize Offset string building to only do math when it's required and fix whitespace.

* Cache the format types instead of needing to build them every time.

* Bump Version

* Verify consistent naming and byref passing of strings

* Implement dialect in SQL formatting

This was previously only partially implemented. joyfullservice#457

* Add support for ! boundary character

This character is used in Microsoft Access in a query when referring directly to a control on a form, and should be treated similar to a period as a separator between elements. joyfullservice#457

* Add SQL formatting query to testing database

This query demonstrates that we can properly parse and format expressions that refer to controls. joyfullservice#457

* Solve rare edge case with SQL IN clause

Just in case a user has an embedded unquoted path in a string, the colon will be treated as a non-spaced boundary character during formatting. (For Microsoft Access SQL only) Fixes joyfullservice#447

* Addresses joyfullservice#459 (joyfullservice#460)

Addresses joyfullservice#459

* Allow sort of operationames with leading spaces (joyfullservice#463)

If a operationname has a leading space (like " MyOperation" ) the function "SortItemsByTime" fails.
Now sorting will success.

* Update comment

After removing string padding in the previous commit.

* Adjust detection of system tables

Switching to just using a bit flag check to solve joyfullservice#462

* Log warning for UNC path access errors

Failing to convert a path to UNC may not prevent the operation from completing, but it should be handled and logged. Fixes joyfullservice#461

* Refactor date conversion for DB Properties

Save custom date properties in ISO (UTC) format in source files, without converting other property types like strings that may parse as dates. joyfullservice#459

* Turn off date ISO conversion by default

This is only used in the index and certain database properties. joyfullservice#459

* Turn on date ISO conversion before reading index

These dates need to be converted to local dates for internal processing. joyfullservice#459

* Add saved date property to testing database

Verifies that the round trip conversion of saved date properties is working correctly. (The dates are stored as UTC in source files, but converted to local dates when imported into the database properties.) joyfullservice#459

* Add dates stored as text to testing database

One stored as a regular date string, and the other as an ISO8601 string. (Neither should convert when reading or writing to JSON.) joyfullservice#459

* Add default git files if dbs in repository root

If you use the default options of no special export folder path defined, the project may likely be in the repository root. Add the default .gitignore and .gitattributes files if they are missing. (This would be the default setup for most projects.)

* Add note about Access 2007 support

See joyfullservice#464

* Add test for Public Creatable class instance

This is an undocumented property value that is sometimes used in the wild. Currently when you build from source, PublicCreatable (5) classes are converted to PublicNotCreatable (2). This instancing property can be set in VBA, and we want the imported class to match what was exported. This test currently fails, but will pass when the add-in is updated to support this property.

* Support for PublicCreatable instancing for classes

This (technically undocumented) technique allows class objects to be created by external projects without using factory methods. This approach was used in some of my projects, so it was important for me to see this property correctly set when the application was built from source.

* Check VCS version before export

Checks the VCS version before export to warn the user if we are running an export with an older version of VCS than was last used on this project. joyfullservice#465

* Check VCS version on build

Check the VCS version before building, and warn if the project version is greater than the installed version. joyfullservice#465

* Reset error break mode after loading options

The ability to break and debug VBA errors is dependent on an option value that is saved with each project. The on error directive should be reset after loading the project options to ensure that we can successfully break on errors.

* Include name prefix with VBA code overlay

Testing this on a machine using the Unicode BETA option in Windows.

* Resolve build path during options upgrade

We may not have a database open when upgrading options. Fall back to the path used to load the project options to determine the source file path. joyfullservice#467

* Add some documentation for merge build

Taking some time to document the intended behavior of the merge build functionality as we work through some bugs. joyfullservice#471, joyfullservice#81

* Update Merge-Build.md

Expand table of expected behavior.

* Remove git integration for getting modified source

I don't think this is actually being used in the wild, and it simplifies the process to have only a single code path for detecting changed source files.

* Add missing database objects on merge

If a source file exists, but no matching database object exists, we should merge the source file into the database. joyfullservice#471

* Fix SQL export of non-formatted queries

Pass-through queries are now exported as SQL again.

* Refactor source modified date for multiple files

Some types of components, such as tables, forms, reports, queries, and shared images may use multiple source files to represent a single database component. We may need to check all of the related source files to accurately determine the latest modification date.

* Log performance of clearing files by extension

* Refactor components to provide file extension list

Parent functionality such as determining the most recent file modified, getting the last modified date, and checking for alternate source files is better done by having the class provide the list of file extensions that might be used by the class, and having single external functions perform these tasks. (Avoids some redundant code.)

* Add multi-file support to file properties hash

This will allow us to more accurately detect changes in non-primary source files. (Such as a change in a shared image.)

* Simplify component class

Removing three functions that are not uniquely specific to each component type and are handled by external functions now that we have exposed the source code file extensions.

* Remove Upgrade function on IDbComponent

We don't need to try to support mixes of various versions of export files. Use the same version of VCS to build a fresh copy of the project, then export with the latest VCS to upgrade source file.

* Move logic to clear legacy files

Moving this to the Export function.

* Rework processing of conflicts & orphaned objects

Refactored the detection and processing of source conflicts and orphaned source files & database objects to better handle source file types that involve multiple files. joyfullservice#473 joyfullservice#471 joyfullservice#472

* Compare source contents of related files

When checking for changes in source files, we need to check all the related source files for each component, not just the primary source file.

* Adapt export comparison to support multiple files

Further changes to compare all related source files.

* Update testing database

Updated to latest version of VCS.

* Include class instancing in code module hash

Class modules have an instancing property that needs to be checked for changes along with the VBA code to ensure that the database object matches the last export. A module will now be flagged as changed if the instancing property is changed.

* Trap any XML import errors

* Add alternate XML format function for big files

Large XML files may cause memory errors with XSLT operations. Adding an alternate approach to simply replace the leading tabs with two spaces. This should allow the add-in to export even extremely large table data files as formatted XML. joyfullservice#389, fixes joyfullservice#474

* Remove format version from custom groups

Any recent version of export file should be using the new format, and we don't need to carry this conversion forward into v4.

* Fix issue with orphaned file detection

Need to pass a dictionary, not a collection to the CompareToIndex function.

* Move testing code to testing module

* Require hash on index update

Any update to the index is now required to provide a hash to match the source file. joyfullservice#472

* Save schema filter rules as collection

Saving each filter line as a single element in a collection makes a much more readable section in the options.json file, especially when the rules become more complex. Previously this was saved as a combined string value which makes it harder to read changes to individual rules.

* Support AfterBuild hooks in add-in project

Made a tweak so we can use the RunAfterBuild hook in the add-in project to verify (load) the resources immediately after building from source. This will help prevent accidentally deploying the add-in without the needed resource records, as happened in joyfullservice#477.

* Rename as new files

* Restore original files

* Split forms from VBA code in add-in project

Going forward, this will allow us to edit the VBA code without affecting the layout definition files in forms.

* Add region support type double (joyfullservice#481)

Co-authored-by: Festiis <festim.nuredini@axami.se>

* Add some additional comments to code changes

Clarifies why we are using the Val() function when parsing ISO dates.

* Add initial support for CommandBar popup menus

This is still a work in progress, but has the basic functionality of exporting and importing custom CommandBars.

* Add error handling to linked table refresh

This could be related to a recent Access bug, but it is helpful to trap the error if it occurs. joyfullservice#484

---------

Co-authored-by: Hecon5 <54177882+hecon5@users.noreply.github.com>
Co-authored-by: joyfullservice <joyfullservice@users.noreply.github.com>
Co-authored-by: bclothier <bgclothier@gmail.com>
Co-authored-by: Tanarri <Tanarri@users.noreply.github.com>
Co-authored-by: Festim Nuredini <44016065+Festiis@users.noreply.github.com>
Co-authored-by: Festiis <festim.nuredini@axami.se>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants