Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-7819: [C++][Gandiva] Add DumpIR to Filter/Projector object #6417

Conversation

fsaintjacques
Copy link
Contributor

@fsaintjacques fsaintjacques commented Feb 13, 2020

The following patch exposes the generated IR as a method of the objects
for further inspection. This is a breaking change for the internal
method FinalizeModule which doesn't take the dump_ir and optimize
flags, it receives optimize from Configuration now.

  • Refactored Engine, notably removed dead code, organized init in a single
    function and simplified LLVMGenerator.
  • Dumping IR should not write to stdout, but instead return it as a
    string in the DumpIR method.
  • Refactored Types, fixing some bad methods type.
  • Added the optimize field to Configuration class.
  • Simplified some unit tests.

But more importantly, we can now inspect dynamically:

>>> filter = gandiva.make_filter(table.schema, condition)                                                                                    
>>> print(filter.ir)
; ModuleID = 'codegen'                                                
source_filename = "codegen"                    
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"           
target triple = "x86_64-unknown-linux-gnu"                 
                                                                      
@llvm.global_ctors = appending global [0 x { i32, void ()*, i8* }] zeroinitializer
@_ZN5arrow7BitUtilL8kBitmaskE = internal unnamed_addr constant [8 x i8] c"\01\02\04\08\10 @\80", align 1
                                                                                                                                             
; Function Attrs: norecurse nounwind                      
define i32 @expr_0_0(i64* nocapture readonly %args, i64* nocapture readonly %arg_addr_offsets, i64* nocapture readnone %local_bitmaps, i16* nocapture readnone %selection_vector, i64 %context_ptr, i64 %nrecords) local_unnamed_addr #0 {
entry:                                                                
  %0 = bitcast i64* %args to i8**                                                                                                            
  %cond_mem56 = load i8*, i8** %0, align 8                            
  %1 = getelementptr i64, i64* %arg_addr_offsets, i64 3            
  %2 = load i64, i64* %1, align 8                                                                                                            
  %a_mem_addr = getelementptr i64, i64* %args, i64 3                  
  %3 = bitcast i64* %a_mem_addr to double**            
  %a_mem7 = load double*, double** %3, align 8                        
  %scevgep = getelementptr double, double* %a_mem7, i64 %2   
  br label %loop                                            
                                                                                                                                             
loop:                                             ; preds = %loop, %entry
  %loop_var = phi i64 [ 0, %entry ], [ %"loop_var+1", %loop ]                                                                                
  %scevgep8 = getelementptr double, double* %scevgep, i64 %loop_var
  %a = load double, double* %scevgep8, align 8      
  %4 = fcmp olt double %a, 1.000000e+03                   
  %5 = sext i1 %4 to i8                             

@github-actions
Copy link

@fsaintjacques fsaintjacques force-pushed the ARROW-7819-gandiva-dump-ir-tool branch from 6ab97b0 to 2dc7955 Compare February 13, 2020 17:13
Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in general, a few comments.

@@ -25,6 +25,7 @@
#include <arrow/builder.h>
#include <arrow/pretty_print.h>
#include <arrow/record_batch.h>
#include <arrow/result.h>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we use type_fwd.h instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried removing the others and minimizing but it unfolded in too many changes.

#include "arrow/util/macros.h"

#include "gandiva/arrow.h"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... can we keep the includes as minimal as possible? Arrow is already slow enough to build.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clangd has this option to automatically add header when auto-completing. I disabled it.

cpp/src/gandiva/engine_llvm_test.cc Outdated Show resolved Hide resolved
python/pyarrow/gandiva.pyx Outdated Show resolved Hide resolved
python/pyarrow/tests/test_gandiva.py Outdated Show resolved Hide resolved
python/pyarrow/tests/test_gandiva.py Outdated Show resolved Hide resolved
The following patch exposes the generated IR as a method of the objects
for further inspection. This is a breaking change for the internal
method `FinalizeModule` which doesn't take the dump_ir and optimize
flags, it receives `debug` from Configuration now.

- Refactored Engine, notably removed dead code, organized init in a single
  function and simplified LLVMGenerator.
- Dumping IR should not write to stdout, but instead return it as a
  string in the `DumpIR` method.
- Refactored Types, fixing some bad methods type.
- Added the optimize field to `Configuration` class.
- Simplified some unit tests.
@fsaintjacques fsaintjacques force-pushed the ARROW-7819-gandiva-dump-ir-tool branch from 2dc7955 to 0bcebc8 Compare February 13, 2020 21:51
Copy link
Contributor

@projjal projjal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

llvm::Function* fn = module->getFunction(iter.pc_name());
EXPECT_NE(fn, nullptr) << "function " << iter.pc_name()
<< " missing in precompiled module\n";
EXPECT_NE(module->getFunction(iter.pc_name()), nullptr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

having the name in stderr is helpful for debugging.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend to set this in your .gdbinit

set environment GTEST_BREAK_ON_FAILURE=1

Then you run the failing test under gdb, it'll break at the first failing test.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

He might mean that the problem may be more evident from looking at test log files :) I think it's OK for now, if it becomes an issue we can improve

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, wes - I meant easier to get error from log files or travis output !. but, we'll only hit this when adding new functions - so, it's obvious anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants