Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow modules to be nested like in Python #86

Open
certik opened this issue Nov 14, 2019 · 18 comments
Open

Allow modules to be nested like in Python #86

certik opened this issue Nov 14, 2019 · 18 comments
Labels
Clause 14 Standard Clause 14: Program units

Comments

@certik
Copy link
Member

certik commented Nov 14, 2019

Use Cases

Module name clashes

It is very common to have modules named constants.f90, mesh.f90, solver.f90, utils.f90, types.f90, etc. (the modules have the same name as the file name in this example). When writing an application this is not a problem. But when writing a library, one cannot name modules like that, because if the application that uses the library also has constants.f90, then the modules will clash. Pretty much the only solution is to name the modules as libraryname_constants (usually implemented in libraryname_constants.f90). This is suboptimal, because when the application uses any module from the library, it has to type in the long name (e.g., use libraryname_constants, only: pi).

Distribution

When you distribute a Fortran library, you have to distribute all modules, which can easily be dozens and dozens of .mod files, which all end up littering your $PREFIX/include.

Solution

Allow modules to be nested. In Python, one creates the following directory structure:

mylib
├── a.py
├── b.py
└── __init__.py

and then one can import files as follows:

>>> import mylib
>>> import mylib.a
>>> import mylib.b

This particular import will become possible in Fortran with #1, but in the meantime one could do the equivalent of this:

>>> from mylib import something
>>> from mylib.a import something
>>> from mylib.b import something

This solution fixes both uses cases above. There can be mylib.constants and there will be no name clashes. When the library is distributed, it is installed as:

$PREFIX/include/mylib.mod
$PREFIX/include/mylib/a.mod
$PREFIX/include/mylib/b.mod

Perhaps it could even be:

$PREFIX/include/mylib/__module__.mod
$PREFIX/include/mylib/a.mod
$PREFIX/include/mylib/b.mod

So there is no littering.

Issues to resolve

The Fortran standard does not talk about files and filesystem. Unless we want to change it, the module nesting feature must be implemented similar to submodules. The exact syntax needs to be figured out.

Why not submodules? Submodules unfortunately do not allow to implement the above use case with mylib to be able to import nested modules. With submodules a large Fortran library would have to stop using modules, it would only have one module and lots of small submodules. My understanding is that the submodule can only extend things declared in the main module. So you cannot for example do use mylib%b, only: something. You would have to have something in mylib.f90 itself. This does not scale for large libraries with hundreds of files.

Syntax: Regarding syntax, probably using % would work, to make it consistent with #1, so the above Python imports would translate to:

use, namespace :: mylib
use, namespace :: mylib%a
use, namespace :: mylib%b
...
call mylib%a%something()

and

use mylib, only: something
use mylib%a, only: something
use mylib%b, only: something
...
call something()
@klausler
Copy link

I think that you conflate source file names with module names above. They're actually independent. Can you state the problem you're trying to solve just in terms of module names and Fortran?

@certik
Copy link
Member Author

certik commented Nov 14, 2019

In the above I assume the module name is the same as the file name. I agree the standard does not care about the file name, just the module name. The two use cases I give are already formulated in terms of module names (edit: I see, I updated the above description to be specific that I mean module names, not file names). If you have some specific question, I'll be happy to clarify.

@klausler
Copy link

klausler commented Nov 14, 2019

The set of global identifiers (19.2) is a flat namespace. I think you're proposing that, at least for module names and submodule identifiers (q.v.), that they could be distinct members of an independent set of identifiers associated with a module rather than in the set of global identifiers.

If that's right, then I suggest that way to indicate that module M be nested within another module's namespace is to put that top-level module's name on M's MODULE statement, as in

MODULE(TOP) M

A USE X within M would search for and prefer a module X in TOP, if it exists, before attempting to associate with a global module X.

Does this make sense, or have I missed the point?

(EDIT: syntax changed to look more like SUBMODULE statement)

@certik
Copy link
Member Author

certik commented Nov 15, 2019

@klausler yes that's exactly what I am proposing. (There might be more details to iron out beyond what you just wrote.)

@klausler
Copy link

That seems quite useful and would be easy to implement.

@tskeith
Copy link

tskeith commented Nov 15, 2019

If that's right, then I suggest that way to indicate that module M be nested within another module's namespace is to put that top-level module's name on M's MODULE statement, as in

MODULE(TOP) M

Is TOP really another module? If so does M have visibility to any declarations in TOP besides modules?

Or is TOP more like a C++ namespace? I.e. it comes into being when it's mentioned as above and all it does is organize module names. This seems a lot simpler.

@klausler
Copy link

If that's right, then I suggest that way to indicate that module M be nested within another module's namespace is to put that top-level module's name on M's MODULE statement, as in

MODULE(TOP) M

Is TOP really another module? If so does M have visibility to any declarations in TOP besides modules?

Or is TOP more like a C++ namespace? I.e. it comes into being when it's mentioned as above and all it does is organize module names. This seems a lot simpler.

I guess that I was thinking the former rather than the latter, since other parts of the program would contain USE TOP to access its external interfaces.

@klausler
Copy link

After more reflection: how is this concept any different from using SUBMODULE? There's a top-level MODULE with the public API, and then nested submodules in their own module-specific namespace.

@FortranFan
Copy link
Member

FortranFan commented Nov 15, 2019

@certik wrote:

Use Cases

Module name clashes

It is very common to have modules named constants.f90, mesh.f90, solver.f90, utils.f90, types.f90, etc. (the modules have the same name as the file name in this example). When writing an application this is not a problem. But when writing a library, one cannot name modules like that, because if the application that uses the library also has constants.f90, then the modules will clash. Pretty much the only solution is to name the modules as libraryname_constants (usually implemented in libraryname_constants.f90). This is suboptimal, because when the application uses any module from the library, it has to type in the long name (e.g., use libraryname_constants, only: pi).

Distribution

When you distribute a Fortran library, you have to distribute all modules, which can easily be dozens and dozens of .mod files, which all end up littering your $PREFIX/include.
..

@certik, I have long debated in my own mind whether Fortran should formally introduce the concept of NAMESPACEs or whether MODULEs can themselves be elevated to first-class namespaces. Looking at your proposal here and with #1, I'm presently inclined toward the former.

Whilst I don't quite have a list of the requirements and specifications necessary toward an initial proposal, a sketch of the idea is to include either a new NAMESPACE scope and/or NAMESPACE attribute to MODULEs in addition to PUBLIC/PRIVATE visibility of MODULEs in a NAMESPACE. Then a given NAMESPACE can IMPORT other NAMESPACEs. Fortran can formally define a GLOBAL namespace and state all EXTERNAL MODULEs (a la external procedures) in a program are part of this namespace. The language can then have some rules for name qualification and aliasing. Code then might look like the following with illustrative syntax:

namespace my_lib
   public module constants
      real, parameter :: PI = 3.14
   end module
   private module utils !<-- can be USE'd only by modules in the same namespace
      interface
         module subroutine draw()
         end subroutine
      end interface
   end module
end namespace

namespace my_lib !<-- A namespace can be extended with more modules
   public module solver
      use constants, only : PI !<-- It can USE modules from the same namespace
   contains
      function calcradius( a ) result( r )
         real, intent(in) :: a
         result :: r
         r = sqrt(a/PI)
      end function
   end module
end namespace

namespace global !<- optional, for 'global' namespace will exist by default
   import my_lib, only : constants, solver ! import 'public' modules only from another namespace
   program p !<-- A program unit can only be part of GLOBAL namespace
      use constants, only : PI !<-- does not 'clash' with my_lib::constants
      real :: area
      print *, "PI = ", PI !<-- PI from USE-associated 'constants' module 
      area = 100.0
      associate ( PI => my_lib::constants%PI, calcrad => my_lib::solver%calcradius )
         print *, "circumference = ", 2.0*PI*calcrad(area)
      end associate
   end program
end namespace

Note the above is based on the use cases in #1 and this one; once more use cases are brought to light, it might help refine the idea of a NAMESPACE.

@certik
Copy link
Member Author

certik commented Nov 15, 2019

@FortranFan would you mind please opening a separate issue for namespaces? Let's discuss it there. Namespaces are another approach how to fix some of the issues reported here, but it is I think premature to decide that modules are dead, namespaces are the way to go, because namespaces have downsides also. We just need to discuss both approaches and see.

Update: I created #87 for namespaces.

@certik
Copy link
Member Author

certik commented Nov 15, 2019

After more reflection: how is this concept any different from using SUBMODULE? There's a top-level MODULE with the public API, and then nested submodules in their own module-specific namespace.

Submodules do not allow you to import a symbol from a submodule. For example, let's say you have a parent module TOP and a submodule A. If A defines sin, you cannot import it from elsewhere. The only way to do that is if TOP itself defines sin (it can be implemented in A).
In otherwords, with submodules you have to move every declaration to the TOP module.

With nested modules as I am proposing, the TOP module can be empty. You can still put things into it if you want to, but you don't have to. Just like in Python.

@aradi
Copy link
Contributor

aradi commented Nov 15, 2019

Further advantage of nested modules over submodules: Nested modules would not (should not!) share names via host association.

If in a big library you have several levels of sub- modules, e.g. modules, sub-modules, sub-sub-modules, it can become very cumbersome to find out, where a given name comes from. Looking at a name in the sub-sub-module, it can be either something local, something defined in the parent sub-module or in grand-parent module, so you would have to look up 3 files to check for it.

I think, submodules are a perfect way to separate interfaces from implementations in order to avoid re-compiling cascades. (For this, you only need modules and one level of submodules.) But I don't think, that they are the right construct for structuring a big library.

@klausler
Copy link

klausler commented Nov 15, 2019

Thanks for the clarification vs. submodules, I think that I understand the distinction.

So something like this would work for you?

MODULE(parent) child
  ...
END MODULE
...
MODULE parent
  USE, NESTED :: child
END MODULE

(with additional extensions to SUBMODULE so that they can still be used with nested modules)

EDIT: Make NESTED look like other module-nature keywords in the USE statement.

@certik
Copy link
Member Author

certik commented Nov 15, 2019

@klausler I think that would work. And I want multiple level of nesting, just like in Python.

It's just too bad that submodules were designed the way they were. It's a missed opportunity. I don't know if there is a way to adapt them to do what we want, or if the above must be a separate functionality.

@certik certik mentioned this issue Nov 15, 2019
@aradi
Copy link
Contributor

aradi commented Nov 15, 2019

Even for nested modules it would make sense to divide each of them into an interface part and an implementation part to avoid recompilation cascades. For that, the submodule construct is perfect as it is now. So, I think, what we need is an additional mechanism enabling module nesting.

@klausler
Copy link

@klausler I think that would work. And I want multiple level of nesting, just like in Python.

It's just too bad that submodules were designed the way they were. It's a missed opportunity. I don't know if there is a way to adapt them to do what we want, or if the above must be a separate functionality.

Submodules seem entirely complementary to your nested modules. Submodules automatically inherit from their ancestors and supply implementations. Nested modules define interfaces (and implementations) to be composed by their parents. The information flows in opposite directions.

@certik
Copy link
Member Author

certik commented Nov 15, 2019

Yes you are right, indeed the nested modules are exactly the opposite of submodules. So we should have both.

@aradi
Copy link
Contributor

aradi commented Nov 15, 2019

In Python, the package hierarchy is tightly connected to the directory structure. In Fortran, due to the lack of that concept in the standard, we can not rely on any file system information for the hierarchy. That means the hierarchy information basically reduces to an arbitrary hierarchy-prefix you can add to your modules (a kind of name mangling). The only rule we need, I guess, to require that no module can import any other module being above it in the hierarchy. Connected with namespaces (#1) we could get quite close to what Python offers. And we could still keep the sub-module concept for the separation of interface and implementation. Based on @klausler's suggestion, I am thinking about something like the following:

! module without prefix
module level1
  use, namespace, relative :: level2
end module level1

! module with prefix
module (level1) level2
contains
  subroutine level2_test()
  end subroutine level2_test
end module level1

! module with prefix
module (level1%level2) level3
contains
  interface
    module subroutine level3_test1()
    end subroutine level3_test1
    module subroutine level3_test2()
    end module subroutine
  end interface
end module level3

! submodule
submodule (level1%level2%level3) level3impl
contains
  subroutine level3_test1()
  end subroutine level3_test1
end submodule level3impl

! sub-sub-module
submodule (level1%level2%level3:level3impl) level3impl2
contains
  subroutine child_test2()
  end subroutine child_test2
end submodule level3impl2


program test
  ! Imports level1 as namespace
  use, namespace :: level1
  use, namespace :: level1%level2%level3

  call level1%level2%test_level2()
  call level1%level2%level3%test_level3()

end program test

It could be realized also without namespaces, if that turns out to be a problem, then it would correspond to the from level1.level2.level3 import * kind of imports in Python.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Clause 14 Standard Clause 14: Program units
Projects
None yet
Development

No branches or pull requests

5 participants