Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Embedded codegens and refactoring #3577

Closed
wants to merge 20 commits into from

Conversation

mmn80
Copy link

@mmn80 mmn80 commented Dec 13, 2016

This addresses #3542.

This PR is minimalistic in the sense that the only changes were those necessary for splitting the CodeGenerator functions (including the built in C and JavaScript ones) in separate packages that can be bundled in the build-depends of the idris fronend package. I tried to avoid any renaming or other related refactoring / cleaning up / bug fixing in order to simplify reviewing, but I do plan to pursue these in the future. The PR is also a conservative extension, so any changed behavior other then supporting the stated new features is a mistake on my part. This means that the old style external codegens should still work. More information about the general vision for this can be found in the comments on #3542.

Design

The old package has been split into 4:

idris-core

This contains all Haskell code except for Main and auto generated modules other then Version_idris, but otherwise is a simple, pure Haskell package, easier to depend on, with less chance of upstream build breakage causing problems. The idea is that platform-independent language development will be contained here, while all the weird packaging/tools integration stuff happens above.

All dynamic "environment" info, including data paths, lib/inc flags, and the codegens themselves are now centralized in the IRTS.System module, inside an MVar. The module only exports the getter functions as before, plus some registration functions that are called from Main.hs of the frontend, where all the codegen dependencies are available. So idris-core is supposed to be "lifted", and changing the implementation of the dynamic system should only involve IRTS.System, Main.hs, and the registration code in the codegen packages.

The build type is Custom only because the Version_idris module is generated here. Beside being in general a sensible choice, I had the problem that, unlike other environment values, the type of version is pure, and changing it so we could read it from the MVar would imply changing the types of many functions throughout idris-core, and I tried to avoid that for now.

idris-codegen-c and idris-codegen-javascript

These contain the codegen functions of type CodeGenerator, and a register function that registers the codegen to IRTS.System, plus if needed, lib and include flags (I used a simple priority system for those to ensure they end up in the same order when finally combined to a string), and info strings for the idris --info display (I used this to show the RTS install dirs).

As a result, it is much easier to maintain such packages as compared to external codegens, as no custom Main.hs and passing flags / state needs to be dealt with.

Some simplifications:

  • GMP flag is only used by idris-codegen-c (less pollution)
  • it turned out that only the codegens themselves ever needed their own RTS data path, so core and frontend are simplified
  • idris-codegen-javascript does not even have a Setup.hs

The front end: idris

This package contains Main.hs, libs, docs, tests, builds Target_idris and Tools_idris, and has flags to conditionally build-depend on the codegen packages. In the future, libs, docs and tests will also be taken out, such that people can easily maintain batteries-included forks build-depending on other codegens (no merging required since idris will almost never change). But the main idris could also bundle (under flags) some of the more stable 3rd party codegens if desired.

In addition to the package dependency, Main.hs contains an initIdrisEnvironment function that, under CPP flags, calls all the appropriate registration functions. This function is the first thing called by main.

Future work

I deferred as much as I could to future PRs in order to make reviewing simpler and reduce the chance of errors. But these are some things I would like to do:

  • split libs+docs and tests into separate packages in order to support the no maintenance idris forking and 3rd party libs I talked about in Interactive codegen and refactoring #3542
  • further study the FreeT idea for runtime extensions mentioned there, and split the REPL and IDE to support 3rd party REPL/IDE commands
  • as I was working on this, I noticed a lot of cleaning opportunities and also incomplete implementations (I added some TODOs) I would like to address
  • I think there are bugs concerning passing of flags and state to external codegens

Implementation issues

  1. The idris-codegen-c package also needs custom.mk, so, for now, I decided to just include the one from the idris parent dir, but if we want to move it to a separate repo the file will need to be duplicated.

  2. The libtest.c file did not belong to the C backend and has been moved to a cfiles dir in idris-core.

  3. I removed a getVersion function from Setup.hs that nobody used.

  4. I noticed that stack sdist gives an error about the Target_idris module, but I haven't changed any code relating to this, so I'll have to look into it (maybe sdist wasn't supposed to be called from stack?).

  5. The Codegen type changed, with ViaExternal, ViaEmbedded and Bytecode constructors, causing the need for increasing the IBC file version.

  6. Additionally, instead of just the --codegen command line flag with its incomplete baked in IBCFormat setting, we now have the --codegen only for embedded codegens, and extra flags --codegen-ibc and --codegen-json for the external codegens. I chose this instead of having it search through embedded and then fall back to external, because that would be a potentially confusing implicit behavior and I'd rather have an explicit API and clean errors. People won't need to change their external codegens, but they will need to change how they use them (idris --codegen-ibc instead of idris --codegen). I updated all docs to reflect this. This is a small detour from my "conservative" claim, but it was a necessary change since the old flag did not deal with IR format.

  7. I used the --constraint="idris-core +FFI" pattern to set flags to dependencies, see section bellow for details.

  8. As you can see, I have not done any merging yet.

  9. I tried to make single-issue, descriptive commits, but occasionally failed. In my defense, it happened late at night after a long day of work. I believe it's best to review this in bulk (the "Files changed" view), rather then commit by commit.

Testing the build

Currently I only tested the build on my Linux machine, with the stack tool. I tried most combinations of flags in the stack.yaml file and everything seems to work. In particular, all tests pass. What else needs to be tested:

  • stack install with various flags (Linux)
  • stack test (Linux)
  • cabal-install based builds
  • OSX
  • Windows
  • NixOS
  • CI
  • docs (there should be nothing different about them, but just in case)

Concerning the cabal-install scripts: the only issue I identified is that some flags have been moved from the frontend to the appropriate package. For example FFI and release are only used in idris-core, GMP is only in idris-codegen-c, and CI is replicated in all packages. The resulting simplification of the cabal and Setup.hs files is one of the benefits of refactoring.

Following this stackoverflow answer, I used the pattern --constraint="idris-core +FFI"
instead of -fffi in the various scripts invoking cabal (look into the "Passing flags in cabal-install based makefiles" commit: 00ad040). I don't know if this works, so it should definitely be tested.

I will work on merging myself, but I hope you will help me with testing.

setupBundledCC moved into Main.hs from Util.System
(see further commits for the other half)
implemented a part of the environment variable abstraction in IRTS.System
setupBundledCC moved from Util.System to Main.hs (see prev. commit)
same LICENSE file from idris
basic setup of the .cabal file based on the one from idris,with stuff deleted
that is no longer needed
only changes: stuff that has been moved in the other .cabal files
Some work done on the cabal file and Setup.hs, but not completed
Moved Version_idris file generation from idris to Setup.hs of idris-core
Dynamic registering libFlags and incFlags in IRTS.System
Moved RTS specific setting of libFlags and incFlags from IRTS.System to Setup.hs of respective codegens
Moved setting of OS specific flags -L/usr/local/lib -I/usr/local/include from IRTS.System to Setup.hs of idris
Registering codegens and infostrings in IRTS.System and Setup.hs of respective codegens
Removed unnecesary type IRFormat and changed Codegen type -> incremented ibcVersion
Moved GMP dependency to just the C RTS package
Also changed path in Package.hs from CRTS to new location
- Implemented the dynamic registretion system in its final form
- Some minor tweaks
It now builds without errors.
- Via "C" => Via "c"
- flags are small caps in Setup.hs
Finally, all tests pass.
There is a bug in stack where the config hook of Setup.hs is not being
fired when a flag is changed just in stack.yaml. So, in order to prevent
others from wasting their time with this, the init code has moved to
Main.hs, where it is slightly more inconvenient to change.

Also made small cosmetic cleanups.
In a previous commit I removed the support for the external codegens.
But later on the refactoring project I decided to make it conservative.
So I am bringing it all back. Note that the Codegen type still changed,
so the IBC version is still updated.
I also added 2 TODOs where I noticed previously incomplete implementations.
I use the pattern --constraint="idris-core +CI" in order to pass flags to
packages in the build-depends section of idris. I found this on this page:
http://stackoverflow.com/questions/23523869/is-there-any-way-to-define-flags-for-cabal-dependencies
With stack it already worked, as you could set these flags in stack.yaml.
It is only used in the `idris-core` package.
@mmn80
Copy link
Author

mmn80 commented Dec 14, 2016

My git foo is weak. What would be the suggested method for merging? Should I try to rebase? Squash? Add a commit where I go manually one by one through upstream commits and merge by hand?

Another thing. Even if in the "Files changed" the relocated files correctly appear as "renamed", here they are added to "Conflicting files".

@melted
Copy link
Contributor

melted commented Dec 14, 2016 via email

@jfdm
Copy link
Contributor

jfdm commented Dec 14, 2016

Hi, I've added a few labels to make the status of this PR explicit. Given the impact this PR will have on the project, I think it might be best if @edwinb makes the final decision. I personally won't merge this.

@mmn80
Copy link
Author

mmn80 commented Dec 14, 2016

Well, that's why I asked for comments on the issue, so that if the general idea of splitting the package is rejected, I won't have to work for a week to implement it.

@edwinb has said that he's ok with it as long as he doesn't need to keep track of 3rd party codegens, and it leads to better organization. The first item is dealt with in 2 ways: firstly, the old style codegens still exist as before, and secondly the new embedded ones (the idris fork) will be maintained by their authors. On the second item: idris-core maintains its integrity (no inconvenient package boundaries), the codegens are in their own package, so the build issues are contained and do not pollute the main package, the paths and other environment stuff is centralized in IRTS.System, etc.

Of course I'm aware of the potential impact, that is why I tried to be as minimalistic as possible.

@mmn80
Copy link
Author

mmn80 commented Dec 14, 2016

So the plan is this: I'll proceed to test the build and fix things where I can: the cabal-install scripts, and also on Windows, and report back. The merging I'll postpone until I get a confirmation that this is in general on the right track or not. I hope you'll see that there is almost no modification in the cabal files or Setup.hs, just that parts have been moved to the appropriate packages.

- Added list of packages to "cabal build" and "cabal install" commands
- Updated paths to linecount
@mmn80
Copy link
Author

mmn80 commented Dec 14, 2016

I'll keep tabs here for the things I've tested:

Linux

stack

  • stack build, stack install
  • stack install with/out flags: FFI, GMP, CI, release, execonly, codegen_C, codegen_JavaScript
  • stack test
  • stack sdist - fails

make (in a cabal sandbox)

  • make, make install, make build, make fast, make w/ custom.mk-alldeps, make linecount
  • make pinstall - fails
  • make test_update
  • make test - fails
  • make relib, make lib_clean, make lib_doc, make lib_doc_clean
  • make doc - fails
  • make user_doc

make (global)

  • ...

Windows

  • ...

Travis

  • ...

Appveyor

  • ...

@mmn80
Copy link
Author

mmn80 commented Dec 14, 2016

Ok, so I had the chance to look into all these scripts. There are all kinds of small issues that need to be dealt with, and I really need to go into all possible combinations and platforms to make sure it's ok. But I believe I brought the project to the point that its main design is reviewable. It already works fine with stack + basic build commands with cabal. Even if I do all possible work with testing & merging, I still need a maintainer actively interested in this, since, after all, this is 100% about maintenance.

As a reminder, the main goal is to turn the build system into a flexible multi-package one, such that people can distribute extensions (not just codegens) in separate packages, and have an easy, maintainable way to integrate them into any front end. But the main advantage, completely independent of this, would be internal, as after the initial setup work, it would be easy to further split the package(s) and reorganize.

As for extensions, there is also the possibility to just add the extension mechanism (eg: for codegens), without the painful package splitting and all its hygiene benefits. In fact, I could redo the PR this way, to get something much lighter then this, something that doesn't touch the build system at all. I could then make a package that depends on idris, has its own custom main where it registers the codegen, and use that as my idris.

Since @jfdm already rejected the PR in its current 2-issues-into-one form, I'll just wait for @edwinb for a reply. I don't need any explaining, as I already learned the answers to most of my questions by just working on this. I understand that people are busy and have other priorities etc., so just a yes/no/⊥/SIGILL would do. Thank you.

@ahmadsalim
Copy link

@mmn80 As far as I can read from @jfdm's comments, I don't think he rejected your PR, rather he is just being careful and letting @edwinb take the final decision on this, because it's impacting a large part of the code base.

We are very grateful for your PR, and would like to thank you for the effort!

@ahmadsalim
Copy link

@mmn80 As for Travis and AppVeyor, it's possible for you to set them up on your own forks for free, so it at least provides you an opportunity for testing things there.

@jfdm
Copy link
Contributor

jfdm commented Dec 15, 2016

@mmn80 I have not rejected the PR, far from it. It does propose something interesting, and potentially something very useful for Idris. However, I am not the right person to merge it, and cannot really judge upon its impact. @ahmadsalim is correct in his interpretation that this PR is something that @edwinb should have final say on.

Keep working on it, and we'll help you where you can.

@mmn80
Copy link
Author

mmn80 commented Dec 16, 2016

This horse may have collapsed due to overloading, but I'll try to salvage what I can by splitting the work into smaller PRs, now that I know how the final product might look like.

  1. codegen extension: this will contain just the embedded codegen registration thing, while keeping the old codegens baked in as they are, and the Codegen type unchanged (a simple fallback mechanism in generate). This small PR will already solve my problem, as it will enable the easy to maintain batteries-included package I talked about (just that idris itself won't be able to bundle such 3rd party codegens).
  2. centralization of paths and other installation info in IRTS.System
  3. moving current built in codegens on the new system, upgrading Codegen type and IBC version
  4. further work on "lifting the core", in particular the multi-libs feature, the FreeT thing or something else
  5. improvements in the build system, conceptually separate the parts before the actual separation
  6. the splitting of packages should now be trivial

Each of these steps will bring some new features or better organization on their own, but the details will be informed by the final goal. They'll also provide a handle for discussion and change of plans, and avoid delays and expensive merging.

The first 3 steps are based on code that already exists in this branch.

The big problem will be step 5, I think. As I stopped using stack and begun testing the make scripts, I immediately remembered why I uninstalled cabal-install a while ago. It is stateful, and has poor support for multi-package projects. I know that cabal experts would disagree, but the point is that changing the build system for multi-package while using cabal-install would inevitably make it more complex then before, so I won't do it. As far as I'm concerned, this step either involves switching everything to stack, or waiting for cabal new-build/nix to become the norm, which I hope will bring it up to par with stack in terms of usability/features, or have somebody else do it. If the project blocks at this stage, at least I'll have step 1., which is what really holds me back right now.

@mmn80
Copy link
Author

mmn80 commented Dec 16, 2016

Opened PR for step 1.

@jfdm
Copy link
Contributor

jfdm commented Dec 19, 2016

@mmn80 Is it possible to keep this PR, and your other PR #3582 around i.e. Please do not delete the branches from your repository. From what I have seen of the code there are some promising features that could be beneficial for the Idris compiler i.e. embedded code generators. In the New Year there should be scope to look at one of the features from your PRs that we can then take the time to develop, and refine for inclusion within Idris.

@mmn80
Copy link
Author

mmn80 commented Dec 19, 2016

I will not delete the branch, as the diff can be used to extract bits of it, which is just what I did for the other PR.

I postponed the refactoring project because of issues with cabal-install, as explained. Waiting for 1.25 and Backpack for a better shot.

As for codegen extensions w/o refactoring I'll explain better on the other PR.

@mmn80
Copy link
Author

mmn80 commented Dec 20, 2016

Here's a more detailed report on the cabal-install issues with multiple packages I mentioned earlier.

With stack you have the config file stack.yaml where you define all packages and all flags and then all standard stack commands are "purely functional" and aware of this environment.

With cabal-install you have to squeeze all this info on the command line, and then some commands are stateful, like OpenGL. Eg: you cannot specify the packages on the test command, but have to rely on implicit state (some config or build you issued previously). And then you have to replicate all these incantations in the Makefile, travis and appveyor scripts. This prospect raised red flags to me, as I realized I'm reinventing stack.

On the other hand, in the upcoming 1.25 new-build version of cabal (GHC>=8.2) you can define multiple named library sections in the same cabal file, with downstream being able to depend on a specific such sub-library (I hope they will also eliminate the per-package circularity restriction and turn the deps analysis fully per sub-library; for instance now if you have a library and an executable in the same cabal file, and another cabal file that depends on just the library, you cannot depend on it in the executable). So you can keep the same scripts as now and do all the multi-package split in the same cabal file. Additionally, Backpack offers interesting prospects about how to implement the extension system in a more robust way, i.e. based on mix-in linking rather then rolling our own late binding scheme.

So this is why I consider this project more as an early experiment then a finished product. We should focus on smaller extensions/cleanups/modularization within the current setup, as preparatory steps, but turn the physical splitting into a thunk to be forced only when we are ready to commit to either new-build or stack.

@mmn80
Copy link
Author

mmn80 commented Dec 20, 2016

Btw, this is the issue on the cabal project where they track the status of the per-component analysis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants