Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add ruby support #24

Merged
merged 7 commits into from
Feb 13, 2024
Merged

feat: add ruby support #24

merged 7 commits into from
Feb 13, 2024

Conversation

yenif
Copy link
Contributor

@yenif yenif commented Jan 15, 2024

Hey! I took a swing at adding Ruby support

First pass just grabs method names, I'm thinking it might work better with fully qualified method names

@fynnfluegge
Copy link
Owner

Nice One, thanks a lot!

I was testing it with the following ruby code snippet

# A class for detecting the programming language based on file extension.
class LanguageDetection
  # Enumeration representing various programming languages.
  module Language
    PYTHON = :python
    JAVASCRIPT = :javascript
    TYPESCRIPT = :typescript
    JAVA = :java
    KOTLIN = :kotlin
    LUA = :lua
    UNKNOWN = :unknown
  end

  # Gets the corresponding programming language based on the given file extension.
  def self.get_programming_language(file_extension)
    language_mapping = {
      '.py' => Language::PYTHON,
      '.js' => Language::JAVASCRIPT,
      '.ts' => Language::TYPESCRIPT,
      '.java' => Language::JAVA,
      '.kt' => Language::KOTLIN,
      '.lua' => Language::LUA
    }

    language_mapping[file_extension] || Language::UNKNOWN
  end

  def self.get_file_extension(file_name)
    File.extname(file_name)
  end
end

but somehow the treesitter output is this 🤔

(program (class_declaration (ERROR (identifier) (identifier) (identifier) (identifier) (identifier) (identifier) (identifier) (identifier) (identifier) (identif
ier) (identifier) (identifier) (identifier) (identifier) (identifier) (identifier) (identifier) (identifier) (identifier) (identifier) (identifier) (identifier)) name: (identifier) (ERROR) body: (c
lass_body (ERROR (character_literal)) (field_declaration type: (type_identifier) (ERROR) declarator: (variable_declarator name: (identifier)) (ERROR (character_literal) name: (identifier)) declarat
or: (variable_declarator name: (identifier)) (ERROR (character_literal) name: (identifier)) declarator: (variable_declarator name: (identifier)) (ERROR (character_literal) name: (identifier)) decla
rator: (variable_declarator name: (identifier)) (ERROR (character_literal) name: (identifier)) declarator: (variable_declarator name: (identifier)) (ERROR (character_literal) name: (identifier)) de
clarator: (variable_declarator name: (identifier)) (MISSING ";")))) (expression_statement (binary_expression left: (array_access array: (identifier) index: (identifier)) right: (method_reference (i
dentifier) (identifier))) (MISSING ";")) (local_variable_declaration type: (type_identifier) declarator: (variable_declarator name: (identifier)) (MISSING ";")) (expression_statement (method_invoca
tion object: (method_invocation object: (identifier) name: (identifier) arguments: (argument_list (identifier))) (ERROR (identifier)) name: (identifier) arguments: (argument_list (identifier))) (MI
SSING ";")) (local_variable_declaration type: (type_identifier) declarator: (variable_declarator name: (identifier)) (MISSING ";")))

Not related to our changes, this is related to the ruby treesitter parser. Have you been able to run it with some ruby code?

You can log the treesitter syntax with print(self.tree.root_node.sexp()) btw.

@yenif
Copy link
Contributor Author

yenif commented Jan 16, 2024

Yea I was getting similar testing against this file

#! env ruby

require 'ostruct'

# method line comment
def global_method
  puts "global_method"
end

=begin
class block comment
=end
class TopClass < OpenStruct
    # multiline
    # hash comment
    # on InnerModule
    module InnerModule
        # comment on module method
        def module_method
            puts "module_method"
        end
    end

    # comment on inner class
    class InnerClass
        def inner_class_instance_method
            puts "inner_class_instance_method"
        end

        # comment on inner class method
        def self.inner_class_class_method
            puts "inner_class_class_method"
        end

        def self.inner_class_class_method_with_args(arg1, arg2)
            puts "inner_class_class_method_with_args: #{arg1}, #{arg2}"
        end

        class < self
            # comment on inner eigen class method
            def inner_eigen_class_class_method_with_args2(arg1, arg2)
                puts "inner_eigen_class_class_method_with_args2: #{arg1}, #{arg2}"
            end
        end
    end
end

It runs, but definitely not optimal. I'll hopefully have some more time to poke a it this week

@fynnfluegge
Copy link
Owner

Hey @yenif any luck? let me know if you need some help 🙌

@yenif
Copy link
Contributor Author

yenif commented Jan 24, 2024

I'll finally have some time today :-) but honestly the issue list over on ruby treesitter is not giving me confidence that this will be doable

@yenif
Copy link
Contributor Author

yenif commented Jan 25, 2024

Looks like the error nodes are just syntax treesitter doesn't support. The rest of the syntax seems like it is still getting parsed correctly and I think gets all of the methods.

I think multiline comments are currently not being handled, need to iterate node.prev_named_sibling

Sufficiently verbose method comments would probably work well as you mention in the docs. Seems like an opportunity to expand any existing comments with further hints like full method path, maybe list of references to dependencies/dependents of the method.

I'm at the limit for tonight, possibly tomorrow. Definitely welcome to make any edits or drive it home if its on your list :-)

@fynnfluegge
Copy link
Owner

@yenif Hey sorry for the late response, lets try to ship this! I think we can iterate over multine comments similar to rust here https://github.com/fynnfluegge/codeqai/blob/main/codeqai/treesitter/treesitter_rs.py

I will test this soon 🙂

@yenif
Copy link
Contributor Author

yenif commented Feb 7, 2024

No worries, I've been trying to get back but two year old and day job aren't leaving much extra :-)

Ran into further issue with treesiter that it treats a comment in the first line of a method body as a sibling to the method body. This would match with the Python (and others) convention of putting method comments on the first line of the method or module, but Ruby convention doesn't do this and a comment is almost always associated to the following code. So a comment on a method at the beginning of a module

    module InnerModule

        # multiline
        # comment on module method
        def module_method
            # first line comment
            puts "module_method"
        end
    end

results in something like

module
  module
  constant InnerModule
  comment # multiline
  comment # comment on module method
  body_statment
    method
      def
      identifier module_method
      comment # first line comment
      body_statement
        ...
      end
  end

Of course can traverse up and check types and then do siblings, but that creates a bunch more edge cases

@fynnfluegge
Copy link
Owner

@yenif Is this ready to merge, what would you say? :)

@yenif
Copy link
Contributor Author

yenif commented Feb 13, 2024

Seems like it matches existing functionality, so yeah good to merge if you are happy with it!

@fynnfluegge
Copy link
Owner

Alright, thanks for your effort! ❤️

@fynnfluegge fynnfluegge changed the title ruby support feat: add ruby support Feb 13, 2024
@fynnfluegge fynnfluegge merged commit 1db7ca2 into fynnfluegge:main Feb 13, 2024
@yenif yenif deleted the ruby_support branch March 21, 2024 04:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants