Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement SQL Formatter #426

Closed
joyfullservice opened this issue Aug 15, 2023 · 6 comments
Closed

Implement SQL Formatter #426

joyfullservice opened this issue Aug 15, 2023 · 6 comments
Assignees
Labels
enhancement pending resolved Possibly resolved, needs testing or confirmation
Milestone

Comments

@joyfullservice
Copy link
Owner

One of the challenges with managing queries in Microsoft Access is that they don't support comments or formatting. The .SQL property returns a large blob of sql code.

SELECT MSysNavPaneGroupCategories.Name AS CategoryName, MSysNavPaneGroupCategories.Position AS CategoryPosition, 
MSysNavPaneGroupCategories.Flags AS CategoryFlags, MSysNavPaneGroups.Name AS GroupName, MSysNavPaneGroups.Flags AS 
GroupFlags, MSysNavPaneGroups.Position AS GroupPosition, MSysObjects.Type AS ObjectType, MSysObjects.Name AS 
ObjectName, MSysNavPaneGroupToObjects.Flags AS ObjectFlags, MSysNavPaneGroupToObjects.Icon AS ObjectIcon, 
MSysNavPaneGroupToObjects.Position AS ObjectPosition, MSysNavPaneGroupToObjects.Name AS NameInGroup, 
MSysNavPaneGroupCategories.Id AS CategoryID, MSysNavPaneGroups.Id AS GroupID, MSysNavPaneGroupToObjects.Id AS LinkID
FROM MSysNavPaneGroupCategories INNER JOIN (MSysNavPaneGroups LEFT JOIN (MSysNavPaneGroupToObjects LEFT JOIN 
MSysObjects ON MSysNavPaneGroupToObjects.ObjectID = MSysObjects.Id) ON MSysNavPaneGroups.Id = 
MSysNavPaneGroupToObjects.GroupID) ON MSysNavPaneGroupCategories.Id = MSysNavPaneGroups.GroupCategoryID
WHERE (((MSysNavPaneGroups.Name) Is Not Null) AND ((MSysNavPaneGroupCategories.Type)=4))
ORDER BY MSysNavPaneGroupCategories.Name, MSysNavPaneGroups.Name, MSysObjects.Type, MSysObjects.Name;

I can compare diffs of changes and get a general idea of what has changed, but this would be a whole lot easier to read if it was formatted something like this:

SELECT 
  MSysNavPaneGroupCategories.Name AS CategoryName,
  MSysNavPaneGroupCategories.Position AS CategoryPosition,
  MSysNavPaneGroupCategories.Flags AS CategoryFlags,
  MSysNavPaneGroups.Name AS GroupName,
  MSysNavPaneGroups.Flags AS GroupFlags,
  MSysNavPaneGroups.Position AS GroupPosition,
  MSysObjects.Type AS ObjectType,
  MSysObjects.Name AS ObjectName,
  MSysNavPaneGroupToObjects.Flags AS ObjectFlags,
  MSysNavPaneGroupToObjects.Icon AS ObjectIcon,
  MSysNavPaneGroupToObjects.Position AS ObjectPosition,
  MSysNavPaneGroupToObjects.Name AS NameInGroup,
  MSysNavPaneGroupCategories.Id AS CategoryID,
  MSysNavPaneGroups.Id AS GroupID,
  MSysNavPaneGroupToObjects.Id AS LinkID
FROM MSysNavPaneGroupCategories
INNER JOIN (
  MSysNavPaneGroups LEFT JOIN (
    MSysNavPaneGroupToObjects LEFT JOIN MSysObjects ON MSysNavPaneGroupToObjects.ObjectID = MSysObjects.Id
    ) ON MSysNavPaneGroups.Id = MSysNavPaneGroupToObjects.GroupID
  ) ON MSysNavPaneGroupCategories.Id = MSysNavPaneGroups.GroupCategoryID
WHERE (
    ((MSysNavPaneGroups.Name) IS NOT NULL)
    AND ((MSysNavPaneGroupCategories.Type) = 4)
    )
ORDER BY 
  MSysNavPaneGroupCategories.Name,
  MSysNavPaneGroups.Name,
  MSysObjects.Type,
  MSysObjects.Name;

I don't feel like we need to create a massive 10K line project to support every SQL dialect and every formatting option out there. But I do feel like I would benefit from a basic implementation of a SQL formatter that would make queries much more readable at the source level. I started digging into this a couple years ago, but didn't get very far.

Last week I started researching this again, and found what seems to be a happy medium of pretty wide dialect support and simple enough to port into a single VBA class. I am using the php Doctrine project's sql-formatter which was forked from another project with very similar goals.

I currently have this about 80% implemented in VBA, and I am about ready to go ahead and move it into this project as I continue to work through the debugging and testing process. (I won't turn it on until I am pretty confident in the output.) I just figured I would create an issue for it where we can discuss any implementation details or other considerations at the outset.

My plan is to add an option to format the SQL output in the *.sql files. I am hoping the performance impact will be pretty minimal, but the readability improvement in the source files will be a definite help for some of my more complex projects.

@mwolfe02
Copy link

I'm in complete agreement. I implemented a very simple version of this in my own version control script years ago. I wrote about it here: Exporting Queries for Version Control. Here's the code (it's actually VBScript):

'Txt: the .SQL property of an Access QueryDef object
Public Function AddLineBreaks(Txt) 'As String
    Dim s
    s = Txt
    s = Replace(s, " INNER JOIN ",  vbCrLf & " INNER JOIN ")
    s = Replace(s, " LEFT JOIN ",  vbCrLf & " LEFT JOIN ")
    s = Replace(s, " ON ", vbCrLf & " ON ")
    s = Replace(s, " AND ", vbCrLf & "  AND ")
    s = Replace(s, " OR ", vbCrLf & " OR ")
    s = Replace(s, ", ", "," & vbCrLf & " ")
    AddLineBreaks = s
End Function

It's very much an 80/20 solution. I think a more polished SQL formatter is called for with this project, but I wanted to validate and second your concerns. By far the most important thing (in my mind) is to get the SQL broken up into separate lines as much as possible, as that's what helps the most when viewing diffs.

joyfullservice added a commit that referenced this issue Aug 15, 2023
This (work in progress) is a VBA port of https://github.com/doctrine/sql-formatter intended to format SQL queries with better indenting and layout to improve readability in version control. #426
@hecon5
Copy link
Contributor

hecon5 commented Aug 16, 2023

one thing to note: for SQL that I've found measurably improve the portability and troubleshooting (and enhance alteration speed), is to put the comma at the start of the new column, or row.

We also indent and start all JOINs on their own row, so it's immediately clear this is a JOIN and what kind of JOIN; in the past, we've had codeblobs which were maaaaany lines long, and JOIN on JOIN which became ... difficult to troubleshoot and worse: add / modify later when it became time to do so. I realize indent tracking adds complexity, so we decided at a minimum having a new line for each JOIN, field, or condition made for much easier to modify code to troubleshoot.

An example of what we do, and it's made code a bit longer, but clarity upped by many times.

Eg:

SELECT 
  MSysNavPaneGroupCategories.Name AS CategoryName
  , MSysNavPaneGroupCategories.Position AS CategoryPosition
  , MSysNavPaneGroupCategories.Flags AS CategoryFlags
  , MSysNavPaneGroups.Name AS GroupName
  , MSysNavPaneGroups.Flags AS GroupFlags
  , MSysNavPaneGroups.Position AS GroupPosition
  , MSysObjects.Type AS ObjectType
  , MSysObjects.Name AS ObjectName
  , MSysNavPaneGroupToObjects.Flags AS ObjectFlags
  , MSysNavPaneGroupToObjects.Icon AS ObjectIcon
  , MSysNavPaneGroupToObjects.Position AS ObjectPosition
  , MSysNavPaneGroupToObjects.Name AS NameInGroup
  , MSysNavPaneGroupCategories.Id AS CategoryID
  , MSysNavPaneGroups.Id AS GroupID
  , MSysNavPaneGroupToObjects.Id AS LinkID
FROM MSysNavPaneGroupCategories
INNER JOIN (
  MSysNavPaneGroups 
    LEFT JOIN (
            MSysNavPaneGroupToObjects 
            LEFT JOIN MSysObjects ON MSysNavPaneGroupToObjects.ObjectID = MSysObjects.Id
            ) ON MSysNavPaneGroups.Id = MSysNavPaneGroupToObjects.GroupID
    ) ON MSysNavPaneGroupCategories.Id = MSysNavPaneGroups.GroupCategoryID
WHERE (
    ((MSysNavPaneGroups.Name) IS NOT NULL)
    AND ((MSysNavPaneGroupCategories.Type) = 4)
    )
ORDER BY 
  MSysNavPaneGroupCategories.Name
  , MSysNavPaneGroups.Name
  , MSysObjects.Type
  , MSysObjects.Name;

What this does, is while we're troubleshooting, say we want to remove one condition for a moment. We only need to comment out that one item, and don't need to comma track, except the very first line. Makes for a much easier to deal with cases. Also yes I know this code probably doesn't work, it's illustrative in nature.

Example comments:

SELECT 
  MSysNavPaneGroupCategories.Name AS CategoryName
  , MSysNavPaneGroupCategories.Position AS CategoryPosition
  , MSysNavPaneGroupCategories.Flags AS CategoryFlags
  , MSysNavPaneGroups.Name AS GroupName
--  , MSysNavPaneGroups.Flags AS GroupFlags -- We don't want this yet...example.
  , MSysNavPaneGroups.Position AS GroupPosition
  , MSysObjects.Type AS ObjectType
  , MSysObjects.Name AS ObjectName
  , MSysNavPaneGroupToObjects.Flags AS ObjectFlags
  , MSysNavPaneGroupToObjects.Icon AS ObjectIcon
  , MSysNavPaneGroupToObjects.Position AS ObjectPosition
  , MSysNavPaneGroupToObjects.Name AS NameInGroup
  , MSysNavPaneGroupCategories.Id AS CategoryID
  , MSysNavPaneGroups.Id AS GroupID
  , MSysNavPaneGroupToObjects.Id AS LinkID
FROM MSysNavPaneGroupCategories
INNER JOIN (
  MSysNavPaneGroups 
    LEFT JOIN (
            MSysNavPaneGroupToObjects 
            -- LEFT JOIN MSysObjects ON MSysNavPaneGroupToObjects.ObjectID = MSysObjects.Id
            ) ON MSysNavPaneGroups.Id = MSysNavPaneGroupToObjects.GroupID
    ) ON MSysNavPaneGroupCategories.Id = MSysNavPaneGroups.GroupCategoryID
WHERE (
    ((MSysNavPaneGroups.Name) IS NOT NULL)
    AND ((MSysNavPaneGroupCategories.Type) = 4)
    )
ORDER BY 
  MSysNavPaneGroupCategories.Name
  , MSysNavPaneGroups.Name
--  , MSysObjects.Type -- we don't know if this is causing issues, so leaving it out for now.
  , MSysObjects.Name;

I know Access's SQL doesn't allow comments, (but it also seems to sometimes, namely pass-through queries), but this is an example of why moving the comma and indent control makes a difference this way (for us).

joyfullservice added a commit that referenced this issue Aug 17, 2023
Worked through several things yesterday on the SQL Formatter class. (Still a work in progress) #426
joyfullservice added a commit that referenced this issue Aug 17, 2023
I am working to replace slower RegEx functions with more performant alternatives. (We will still probably use RegEx for some of the more complex work, but we can improve the performance quite a bit by optimizing some of the simple ones like finding whitespace or boundary characters.) #426
joyfullservice added a commit that referenced this issue Aug 18, 2023
The overhead on performance timing is very low, but adding an option to disable it when an instance of the class is used internally for testing SQL formatting. #426
joyfullservice added a commit that referenced this issue Aug 18, 2023
Worked through the implementation bugs using some test queries, and ready to integrate this into the add-in! #426
joyfullservice added a commit that referenced this issue Aug 18, 2023
Formatting of queries is now available as an option in the add-in. #426
@joyfullservice joyfullservice added the pending resolved Possibly resolved, needs testing or confirmation label Aug 18, 2023
@joyfullservice joyfullservice added this to the Release 4.0.0 milestone Aug 18, 2023
@joyfullservice
Copy link
Owner Author

I finished building out the formatter today, and I am very happy with the performance. After the initial port, I was able to optimize the logic to use faster AscW() comparisons instead of regular expressions to drastically reduce the number of RegEx calls. The overall logic follows the doctrine/sql-formater project pretty closely, but I fine-tuned the search and comparison functions in the tokenizer to utilize faster performing VBA functions. (For example, if we can determine that the first character is not a number, we don't need to make an expensive RegEx call to return several possible number formats.)

I have also added some additional functions to verify the tokenization and output layout so we can confirm that everything is working as intended if we make adjustments to class. At present it produces identical output to the sample query on the original project's home page.

Here is what our internal Access query looks like after formatting, right out of the box:

SELECT
  MSysNavPaneGroupCategories.Name AS CategoryName,
  MSysNavPaneGroupCategories.Position AS CategoryPosition,
  MSysNavPaneGroupCategories.Flags AS CategoryFlags,
  MSysNavPaneGroups.Name AS GroupName,
  MSysNavPaneGroups.Flags AS GroupFlags,
  MSysNavPaneGroups.Position AS GroupPosition,
  MSysObjects.Type AS ObjectType,
  MSysObjects.Name AS ObjectName,
  MSysNavPaneGroupToObjects.Flags AS ObjectFlags,
  MSysNavPaneGroupToObjects.Icon AS ObjectIcon,
  MSysNavPaneGroupToObjects.Position AS ObjectPosition,
  MSysNavPaneGroupToObjects.Name AS NameInGroup,
  MSysNavPaneGroupCategories.Id AS CategoryID,
  MSysNavPaneGroups.Id AS GroupID,
  MSysNavPaneGroupToObjects.Id AS LinkID
FROM
  (
    MSysNavPaneGroupCategories
    INNER JOIN MSysNavPaneGroups ON MSysNavPaneGroupCategories.Id = MSysNavPaneGroups.GroupCategoryID
  )
  LEFT JOIN (
    MSysNavPaneGroupToObjects
    LEFT JOIN MSysObjects ON MSysNavPaneGroupToObjects.ObjectID = MSysObjects.Id
  ) ON MSysNavPaneGroups.Id = MSysNavPaneGroupToObjects.GroupID
WHERE
  (
    (
      (MSysNavPaneGroups.Name) Is Not Null
    )
    AND (
      (
        MSysNavPaneGroupCategories.Type
      )= 4
    )
  )
ORDER BY
  MSysNavPaneGroupCategories.Name,
  MSysNavPaneGroups.Name,
  MSysObjects.Type,
  MSysObjects.Name;

If there was a way we could run some test queries against the original project and compare the output, that would help ensure that we are coming up with the same results. Someone probably has a Web implementation of sql-formatter out there, but I wasn't finding anything right off in my initial searches... 🤔

This should be ready to build from the dev branch and test out in some of your projects, if you like... 😄 I have the SQL formatting turned on by default, but you can always turn it back off if you run into a problem. Let me know what you think!

@AlexHedley
Copy link

AlexHedley commented Sep 10, 2023

Just for ref I’ve used VB - SQL 'Select' statement formatter/checker in case if it’s of any use.

@AlexHedley
Copy link

@joyfullservice
Copy link
Owner Author

This has been implemented, and seems to be working fine. Closing this issue as completed, and we can open new issues if we encounter any bugs going forward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement pending resolved Possibly resolved, needs testing or confirmation
Projects
None yet
Development

No branches or pull requests

4 participants