Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch LLVM codegen of Ptr{T} to an actual pointer type. #53687

Merged
merged 13 commits into from
Mar 21, 2024
Merged

Conversation

maleadt
Copy link
Member

@maleadt maleadt commented Mar 10, 2024

This PR switches our code generation for Ptr{T} from i64 to an actual LLVM pointer type (ptr when using opaque pointers, an untyped i8* otherwise). The main motivation is to simplify llvmcall usage (doing away with the inttoptr/ptrtoint conversions), and also make it possible to simply use ccall to call intrinsics with pointer arguments (where we currently always need llvmcall for converting to a pointer).

Changing codegen like this is a breaking change for llvmcall users, but we don't promise any stability there. Also, with the switch to LLVM 17 where typed pointers have been removed, all llvmcall snippets will have to be updated anyway, so this seems like a good time to make that change.

Before:

julia> @code_llvm pointer([])
define i64 @julia_pointer_1542(ptr noundef nonnull align 8 dereferenceable(24) %"x::Array") #0 {
top:
; ┌ @ pointer.jl:65 within `cconvert`
   %0 = load ptr, ptr %"x::Array", align 8
; └
; ┌ @ pointer.jl:90 within `unsafe_convert`
; │┌ @ pointer.jl:30 within `convert`
    %bitcast_coercion = ptrtoint ptr %0 to i64
    ret i64 %bitcast_coercion
; └└
}

After:

julia> @code_llvm pointer([])
define ptr @julia_pointer_3880(ptr noundef nonnull align 8 dereferenceable(24) %"x::Array") #0 {
top:
; ┌ @ pointer.jl:65 within `cconvert`
   %0 = load ptr, ptr %"x::Array", align 8
; └
; ┌ @ pointer.jl:90 within `unsafe_convert`
; │┌ @ pointer.jl:30 within `convert`
    ret ptr %0
; └└
}

This also simplifies "real code", e.g., when ccall converts an Array to a pointer. I don't expect that to affect performance though.

There are a couple of other patterns that could be updated, e.g. the type argument to the GCAllocBytes intrinsic is currently still a T_size.

@maleadt maleadt requested a review from vtjnash March 10, 2024 14:34
Comment on lines +628 to +631
if (isa<Instruction>(vx) && !vx->hasName())
// CreatePtrToInt may undo an IntToPtr
setName(ctx.emission_context, vx, "bitcast_coercion");
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently the LLVM IRBuilder is smart enough to collapse ptrtoint+inttoptr etc into a reference to the original value. If that happens to be an argument, our setName aborts. Given that we can't predict whether that would happen, shouldn't we have setName bail out for non-Instructions/Constants, instead of aborting and having to check at the call site? We now have many of those already:

cgutils.cpp:            if (isa<Instruction>(src) && !src->hasName())
cgutils.cpp:            if (isa<Instruction>(dst) && !dst->hasName())
intrinsics.cpp:            if (isa<Instruction>(vx) && !vx->hasName())
intrinsics.cpp:            if (isa<Instruction>(vx) && !vx->hasName())
intrinsics.cpp:        if (isa<Instruction>(thePtr) && !thePtr->hasName())

@maleadt maleadt added the compiler:codegen Generation of LLVM IR and native code label Mar 10, 2024
@giordano

This comment was marked as resolved.

@maleadt

This comment was marked as resolved.

@Seelengrab
Copy link
Contributor

Seelengrab commented Mar 10, 2024

Very nice! If I'm not mistaken, this should also make things that use Ptr{T} in their API more friendly to unsupported platforms. I remember that I had some codegen/size issues on AVR because of the use of i64 instead of the platform-native i16!

@maleadt maleadt force-pushed the tb/ptr branch 2 times, most recently from 6f27310 to ac4227b Compare March 11, 2024 12:31
}
case sub_ptr: {
assert(nargs == 2);
if (!jl_is_cpointer_type(argv[0].typ) || argv[1].typ != (jl_value_t*)jl_ulong_type)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is adding constraints that weren't previously there, so this is potentially breaking (and since it moved from the generic emit_untyped_intrinsic cases to here, it will also need a custom tfunc)

Copy link
Member Author

@maleadt maleadt Mar 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this was on purpose, because currently the emit_untyped_intrinsic logic assumes that the types of the inputs are identical. Since I couldn't find any uses with non-UInt inputs in any public code, it seemed easier to move the pointer intrinsics out of the arithmetic ones (especially wrt. implementing a runtime version; the macros in there are a bit hairy).

I'll mark it as technically breaking though and run PkgEval to make sure nothing breaks. Unless you think it's better to keep the previous, untyped behavior?

Copy link
Member

@vtjnash vtjnash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like a good change to me

@maleadt maleadt added minor change Marginal behavior change acceptable for a minor release needs pkgeval Tests for all registered packages should be run with this change labels Mar 11, 2024
@gbaraldi
Copy link
Member

Could this affect pointer provenance/alias analysis? Since ptrtoint inttoptr sometimes wash those away?

@maleadt
Copy link
Member Author

maleadt commented Mar 12, 2024

I can't reproduce these llvmpasses failures...

❯ cat Make.user
FORCE_ASSERTIONS=1
LLVM_ASSERTIONS=1

❯ make -C test/llvmpasses
Testing Time: 15.89s
  Passed: 44

EDIT: could reproduce them using the CI rootfs images.

@maleadt
Copy link
Member Author

maleadt commented Mar 15, 2024

@nanosoldier runtests()

@maleadt
Copy link
Member Author

maleadt commented Mar 15, 2024

@nanosoldier runbenchmarks(vs=":master")

@nanosoldier
Copy link
Collaborator

The package evaluation job you requested has completed - possible new issues were detected.
The full report is available.

@maleadt
Copy link
Member Author

maleadt commented Mar 18, 2024

Turns out that even on LLVM 17 from #53070, the IR autoupgrader knows how to handle typed pointers:

julia> Base.libllvm_version
v"17.0.6"


julia> function foo(ptr::Ptr{Int8})
           Base.llvmcall(
               """%2 = inttoptr i64 %0 to i8*
                  %3 = load i8, i8* %2
                  ret i8 %3""", Int8, Tuple{Ptr{Int8}}, ptr)
       end

julia> code_llvm(foo, Tuple{Ptr{Int8}})
define i8 @julia_foo_564(i64 zeroext %"ptr::Ptr") #0 {
top:
  %0 = inttoptr i64 %"ptr::Ptr" to ptr
  %1 = load i8, ptr %0, align 1
  ret i8 %1
}


julia> function bar(ptr::Ptr{Int8})
           Base.llvmcall(
               ("""define i8 @entry(i64 %ptrval) #0 {
                   top:
                       %ptr = inttoptr i64 %ptrval to i8*
                       %val = load i8, i8* %ptr
                       ret i8 %val
                   }
                   attributes #0 = { alwaysinline }
                """, "entry"), Int8, Tuple{Ptr{Int8}}, ptr)
       end

julia> code_llvm(bar, Tuple{Ptr{Int8}})
define i8 @julia_bar_581(i64 zeroext %"ptr::Ptr") #0 {
top:
  %ptr.i = inttoptr i64 %"ptr::Ptr" to ptr
  %val.i = load i8, ptr %ptr.i, align 1
  ret i8 %val.i
}

... so it's probably too breaking to make llvmcall error out when using i64 pointers after switching to ptr. I'll try to make llvmcall smart enough to handle this transparently.

@gbaraldi
Copy link
Member

Does LLVM have some guarantee that this will continue working in the future?

@maleadt
Copy link
Member Author

maleadt commented Mar 18, 2024

Not that I know.

@maleadt
Copy link
Member Author

maleadt commented Mar 18, 2024

@nanosoldier runtests(["AsmMacro", "FindFirstFunctions", "UnsafeAtomics", "Atomix", "ThreadingUtilities", "Modulo2", "LinearRationalExpectations", "Losers", "NewlineLexers", "RunningQuantiles", "SubSIt", "Blake3Hash", "BlockMatching", "Scrypt", "MbedTLS", "KissMCMC", "TiledIteration", "HMatrices", "AccurateArithmetic", "SIMDMathFunctions", "ImplicitPlots", "DifferentialForms", "AMGCLWrap", "LoopManagers", "BlockBandedMatrices", "Sixel", "ExaTron", "ImageInTerminal", "ImplicitGlobalGrid", "ExtendableSparse", "ChunkedCSV", "BGEN", "ImageSmooth", "BPGates", "SpectralResampling", "FastDMTransform", "MarsagliaDiscreteSamplers", "FastHistograms", "BoxLeastSquares", "ImageFiltering", "VectorizedStatistics", "MutualInformationImageRegistration", "MonteCarloSummary", "GIFImages", "FinanceCore", "ParamPunPam", "MathJaxRenderer", "NLLSsolver", "TropicalGEMM", "StatGeochemBase", "Ipaper", "CategoricalMonteCarlo", "ObjectPools", "Gaius", "WASMCompiler", "LibRaw", "Powerful", "ImageIO", "RollingFunctions", "AlgebraicSolving", "SpheriCart", "SmoQyDEAC", "FastGeoProjections", "VisualGeometryOptimization", "Determinantal", "RandomWalkBVP", "SpatialAccessTrees", "ComputerVisionMetrics", "ImageDistances", "MatrixProfile", "LocalPoly", "RegularizedLeastSquares", "PFFRGSolver", "MicroscopePSFs", "FractionalDiffEq", "ChebParticleMesh", "SignalAlignment", "AbstractCosmologicalEmulators", "LifeContingencies", "DynamicAxisWarping", "ArDCA", "FinEtools", "FinEtoolsAcoustics", "FinEtoolsHeatDiff", "AlgebraicMultigrid", "SimpleDiffEq", "Jadex", "RedClust", "FinEtoolsMeshing", "FinEtoolsFlexStructures", "Falcons", "CalciumScoring", "SimSearchManifoldLearning", "EmpiricalPotentials", "FinEtoolsDeforLinear", "SpeedMapping", "ImageSegmentation", "VisualRegressionTests", "SurveyDataWeighting", "BoundaryValueProblems", "PSSFSS", "PALEOboxes", "SphericalFunctions", "PyBraket", "StartUpDG", "RandomFeatures", "PolaronMobility", "Unfolding", "OptimizationSpeedMapping", "MaximumEntropyMomentClosures", "StatGeochem", "GMMParameterEstimation", "DiffEqFinancial", "Korg", "RealPolyhedralHomotopy", "QuasiCopula", "ROMEO", "Jutul", "GeoEnergyIO", "PALEOaqchem", "SensitivityRankCondition", "PredefinedDynamicalSystems", "BattMo", "SchwarzChristoffel", "MultiStateSystems", "DIVAnd_HFRadar", "Eikonal", "GeneralizedSDistributions", "Phonetics", "ManifoldDiffEq", "AllenNeuropixelsBase", "Circuitscape", "ProcessBasedModelling", "FSimZoo", "ImageFeatures", "Omniscape", "GameTheory", "BaseModelica", "Ai4EComponentLib", "NeuronBuilder", "ParameterizedFunctions", "BlockSystems", "BLASBenchmarksCPU", "Petri", "Pesto", "Isoplot", "FinEtoolsVoxelMesher", "FinEtoolsVibInFluids", "MinimallyDisruptiveCurves", "WGPUgfx", "AstrodynamicalModels", "Bactos", "ChargeTransport", "SMLMMetrics", "CitableImage", "Fable", "Test", "PortfolioAnalytics", "StateSpacePartitions", "MendelImpute", "AvailablePotentialEnergyFramework", "StructuredLight", "WaveletsExt", "CDGRNs", "MathepiaModels", "FSimPlots"])

@nanosoldier
Copy link
Collaborator

The package evaluation job you requested has completed - possible new issues were detected.
The full report is available.

@maleadt
Copy link
Member Author

maleadt commented Mar 19, 2024

Bunch of packages failed to test due to what looks like a Nanosoldier-related issue. I've restarted the machine, so let's try again:

@nanosoldier runtests(["AsmMacro", "FindFirstFunctions", "UnsafeAtomics", "Atomix", "ThreadingUtilities", "Modulo2", "LinearRationalExpectations", "Losers", "NewlineLexers", "RunningQuantiles", "SubSIt", "Blake3Hash", "BlockMatching", "Scrypt", "MbedTLS", "KissMCMC", "TiledIteration", "HMatrices", "AccurateArithmetic", "SIMDMathFunctions", "ImplicitPlots", "DifferentialForms", "AMGCLWrap", "LoopManagers", "BlockBandedMatrices", "Sixel", "ExaTron", "ImageInTerminal", "ImplicitGlobalGrid", "ExtendableSparse", "ChunkedCSV", "BGEN", "ImageSmooth", "BPGates", "SpectralResampling", "FastDMTransform", "MarsagliaDiscreteSamplers", "FastHistograms", "BoxLeastSquares", "ImageFiltering", "VectorizedStatistics", "MutualInformationImageRegistration", "MonteCarloSummary", "GIFImages", "FinanceCore", "ParamPunPam", "MathJaxRenderer", "NLLSsolver", "TropicalGEMM", "StatGeochemBase", "Ipaper", "CategoricalMonteCarlo", "ObjectPools", "Gaius", "WASMCompiler", "LibRaw", "Powerful", "ImageIO", "RollingFunctions", "AlgebraicSolving", "SpheriCart", "SmoQyDEAC", "FastGeoProjections", "VisualGeometryOptimization", "Determinantal", "RandomWalkBVP", "SpatialAccessTrees", "ComputerVisionMetrics", "ImageDistances", "MatrixProfile", "LocalPoly", "RegularizedLeastSquares", "PFFRGSolver", "MicroscopePSFs", "FractionalDiffEq", "ChebParticleMesh", "SignalAlignment", "AbstractCosmologicalEmulators", "LifeContingencies", "DynamicAxisWarping", "ArDCA", "FinEtools", "FinEtoolsAcoustics", "FinEtoolsHeatDiff", "AlgebraicMultigrid", "SimpleDiffEq", "Jadex", "RedClust", "FinEtoolsMeshing", "FinEtoolsFlexStructures", "Falcons", "CalciumScoring", "SimSearchManifoldLearning", "EmpiricalPotentials", "FinEtoolsDeforLinear", "SpeedMapping", "ImageSegmentation", "VisualRegressionTests", "SurveyDataWeighting", "BoundaryValueProblems", "PSSFSS", "PALEOboxes", "SphericalFunctions", "PyBraket", "StartUpDG", "RandomFeatures", "PolaronMobility", "Unfolding", "OptimizationSpeedMapping", "MaximumEntropyMomentClosures", "StatGeochem", "GMMParameterEstimation", "DiffEqFinancial", "Korg", "RealPolyhedralHomotopy", "QuasiCopula", "ROMEO", "Jutul", "GeoEnergyIO", "PALEOaqchem", "SensitivityRankCondition", "PredefinedDynamicalSystems", "BattMo", "SchwarzChristoffel", "MultiStateSystems", "DIVAnd_HFRadar", "Eikonal", "GeneralizedSDistributions", "Phonetics", "ManifoldDiffEq", "AllenNeuropixelsBase", "Circuitscape", "ProcessBasedModelling", "FSimZoo", "ImageFeatures", "Omniscape", "GameTheory", "BaseModelica", "Ai4EComponentLib", "NeuronBuilder", "ParameterizedFunctions", "BlockSystems", "BLASBenchmarksCPU", "Petri", "Pesto", "Isoplot", "FinEtoolsVoxelMesher", "FinEtoolsVibInFluids", "MinimallyDisruptiveCurves", "WGPUgfx", "AstrodynamicalModels", "Bactos", "ChargeTransport", "SMLMMetrics", "CitableImage", "Fable", "Test", "PortfolioAnalytics", "StateSpacePartitions", "MendelImpute", "AvailablePotentialEnergyFramework", "StructuredLight", "WaveletsExt", "CDGRNs", "MathepiaModels", "FSimPlots"])

@nanosoldier
Copy link
Collaborator

The package evaluation job you requested has completed - possible new issues were detected.
The full report is available.

@KristofferC
Copy link
Member

KristofferC commented Mar 19, 2024

Should the dep warnings be a bit limited? Right now, some packages write endless number of

WARNING: llvmcall with integer pointers is deprecated.
Use actual pointers instead, replacing i32 or i64 with i8* or ptr, likely near /home/pkgeval/.julia/packages/VectorizedStatistics/Bgibo/test/testVreducibles.jl:4

to the logs.

@maleadt
Copy link
Member Author

maleadt commented Mar 19, 2024

I guess, but we don't have much depwarn-like functionality in codegen. I could do something one-off, of course...


Another remaining issue is a failed assertion with the following code (reduced from the compileall test):

foobar(ptr) = ptr + UInt(0)
code_llvm(foobar, Tuple{Ptr{T}} where {T})

This generates simply:

Core.CodeInfo(code=Array{Any, 1}(dims=(2,), mem=Memory{Any}(2, 0x7fffecb10680)[
  Expr(:call, Base.add_ptr, Core.Argument(n=2), 0x0000000000000000),
  Core.ReturnNode(val=SSAValue(1))]

The call to Base.add_ptr is handled here:

julia/src/codegen.cpp

Lines 6180 to 6201 in 9df47f2

else if (head == jl_call_sym) {
jl_value_t *expr_t;
bool is_promotable = false;
if (ssaidx_0based < 0)
// TODO: this case is needed for the call to emit_expr in emit_llvmcall
expr_t = (jl_value_t*)jl_any_type;
else {
expr_t = jl_is_long(ctx.source->ssavaluetypes) ? (jl_value_t*)jl_any_type : jl_array_ptr_ref(ctx.source->ssavaluetypes, ssaidx_0based);
is_promotable = ctx.ssavalue_usecount[ssaidx_0based] == 1;
}
jl_cgval_t res = emit_call(ctx, ex, expr_t, is_promotable);
// some intrinsics (e.g. typeassert) can return a wider type
// than what's actually possible
if (is_promotable && res.promotion_point && res.promotion_ssa == -1) {
res.promotion_ssa = ssaidx_0based;
}
res = update_julia_type(ctx, res, expr_t);
if (res.typ == jl_bottom_type || expr_t == jl_bottom_type) {
CreateTrap(ctx.builder);
}
return res;
}

The result of emit_call there is of type Ptr{T}, while expr_t (as looked up in ssavaluetypes) is the UnionAll Ptr{T} where T. This results in update_julia_type aborting when it creates cgval_t that expects the value to exactly match the type:

assert(isboxed || v.typ == typ || tindex);

@vtjnash What's the intended design here; should emit_call also have returned a UnionAll, should the expr_t be unwrapped, or is this difference expected and should the cgval_t allow it?

FWIW, it didn't use to be a problem as everything was UInt before it reached add_ptr.

EDIT: calling jl_unwrap_unionall on expr_t gives a type that looks the same but isn't? That seems wrong?

(gdb) p expr_t
$4 = (jl_value_t *) 0x7fffe271dea0 <jl_system_image_data+6951840>
(gdb) call jl_(expr_t)
Ptr{T}

(gdb) p res.typ
$5 = (jl_value_t *) 0x7fffeda9e410
(gdb) call jl_(res.typ)
Ptr{T}

@gbaraldi
Copy link
Member

We do jl_unwrap_unionall in plenty of places so it seems to be fine i.e emitting builtin_memoryref

@maleadt
Copy link
Member Author

maleadt commented Mar 20, 2024

We do jl_unwrap_unionall in plenty of places so it seems to be fine i.e emitting builtin_memoryref

Those uses do not involve unboxing or creating boxes; doing so with the unwrapped type seems to create corrupt objects (segfaulting and aborting all over the place).

@nanosoldier
Copy link
Collaborator

The package evaluation job you requested has completed - possible new issues were detected.
The full report is available.

@Zentrik
Copy link
Member

Zentrik commented Mar 20, 2024

FWIW, I suspect most of the improvements on the time benchmarks seem to be noise due to a lot of the benchmarks on master being unusually slow. The regressions similarly seem to be on the order of ns or on historically noisy benchmarks. I think it's a bit easier to see here.

@maleadt maleadt removed the needs pkgeval Tests for all registered packages should be run with this change label Mar 21, 2024
@maleadt
Copy link
Member Author

maleadt commented Mar 21, 2024

And PkgEval looks good too: ExaTron failure is a AMDGPU.jl bug (filed downstream), Atomix.jl fails due to overly strict doctests stumbling over the depwarn, and the other failures are unrelated.

Rebased, will merge once CI is "green" (with the same failures as currently on master).

@maleadt maleadt added the merge me PR is reviewed. Merge when all tests are passing label Mar 21, 2024
@maleadt maleadt merged commit 09400e4 into master Mar 21, 2024
6 of 8 checks passed
@maleadt maleadt deleted the tb/ptr branch March 21, 2024 12:10
@IanButterworth IanButterworth removed the merge me PR is reviewed. Merge when all tests are passing label Apr 5, 2024
maleadt added a commit to JuliaGPU/GPUCompiler.jl that referenced this pull request Apr 9, 2024
maleadt added a commit to JuliaGPU/CUDA.jl that referenced this pull request Apr 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:codegen Generation of LLVM IR and native code minor change Marginal behavior change acceptable for a minor release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants