Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fparser2] capture symbol information #201

Open
3 tasks
rupertford opened this issue Jun 1, 2019 · 9 comments
Open
3 tasks

[fparser2] capture symbol information #201

rupertford opened this issue Jun 1, 2019 · 9 comments

Comments

@rupertford
Copy link
Collaborator

rupertford commented Jun 1, 2019

Some of the things we need to do:

  • Capture primitive type of symbols representing data variables
  • Support symbols representing program/subroutine/function names
  • Support symbol renaming within use statements
@rupertford
Copy link
Collaborator Author

fparser2 implements the rules specified in the Fortran spec. However, in certain cases, more than one rule can match and it is not possible to determine which rule is valid without having access to symbol table information

The two known cases are

1: statement function or array access, where it is not possible to determine whether a statement is a statement function or an array access when it is the latest declaration in a declaration block or the first executable statement in an execution block.

2: function or array access, where it is not possible to determine whether a statement is a (non-intrinsic) function or an array access.

There may be a third where a symbol is referenced before it is declared e.g. real a(n); integer n which needs checking. If this is a problem then we might need to have two passes, at least for declarations?

Fparser2 should add a symbol table which it populates as code is parsed. Whether we need to then have the parser go in two phases or simply use the current information in the symbol table as we go along I do not know.

Note, there will still be cases where we don't know what something is (e.g. due to it being declared in another module which has not been parsed and we need to decide what to do in this circumstance. We could abort, always parse all referenced modules (I don't think this is feasible in general), or have an additional node which says it is one of n matches i.e. we catch the ambiguity.

@rupertford
Copy link
Collaborator Author

@pelson also found issues requiring symbol and contextual information, see #190 and #182

Some of the above problems are related to constraints that can't be checked at the moment, which could be checked with symbol table and related contextual information.

@rupertford
Copy link
Collaborator Author

rupertford commented Jan 31, 2020

This is also a problem when trying to distinguish between an array slice and a character string section.

        program test
        character(len=10) :: a
        a(1:3)='hey'
        end program test

        program test
        real :: a(10)
        a(1:3)=0.0
        end program test

See failing test in fortran2003/test_designator.py

This is due to two sub-rules in Designator (Fortran2003 R603) matching when constraints are not enforced and the constraints can not be enforced as they are based on datatype, see Fortran2003 C619.

@reuterbal
Copy link
Collaborator

Since I think the following is related / can be resolved by the same means, I put it here:

Variable names, procedure names, etc. can shadow intrinsic functions. Consider the following example:

code = '''
... subroutine shadow()
... integer :: ibits(10)
... integer :: i
... do i=1,10
... ibits(i)=i
... end do
... i=ibits(5)
... end subroutine shadow
... '''
>>> from fparser.common.readfortran import FortranStringReader
>>> from fparser.two.parser import ParserFactory
>>> reader = FortranStringReader(code)
>>> f2008_parser = ParserFactory().create(std='f2008')
>>> ast = f2008_parser(reader)
...
fparser.two.utils.InternalSyntaxError: Intrinsic 'IBITS' expects 3 arg(s) but found 1.

The problem is the next to last line, where it is not obvious that this is an array element access and not a function call.

@reuterbal
Copy link
Collaborator

(Sorry for the double-post: meant to append the following but hit Ctrl+Enter instead of Enter. That happens when you write slack in another window at the same time.)

There are even more evil situations possible when the offending name is defined in a used module, for example:

>>> code = '''
... subroutine shadow2()
... use some_mod
... type(some_type) :: a, b, c
... real :: z
... z = dot_product(a, b, c)
... end subroutine shadow2
... '''
>>> reader = FortranStringReader(code)
>>> ast = f2008_parser(reader)
...
fparser.two.utils.InternalSyntaxError: Intrinsic 'DOT_PRODUCT' expects 2 arg(s) but found 3.

(I will neither confirm nor deny that such beauty exists in a, say, operational code base)

Any ideas how to overcome such situations? I can also put this in a separate issue since it might not even be mitigated by a symbol table alone but could require parsing the used module first.

@rupertford
Copy link
Collaborator Author

The operational code base that you may or may not be referring to would not be the only one. Another more liquid oriented model also has such a lovely example.

Actually I happen to be writing some (minimal) documentation that explains the general problem of matching multiple rules but that does not help solve the problem.

The current plan is to keep symbol and context information as we go along and then use that in rules to check constraints. This will sort out many of the ambiguities. In your first ibits example this will sort the problem out as we will be able to determine that ibits is actually an array and therefore not match it as an intrinsic.

Your second example is a general problem which compilers solve using .mod files. If we keep symbol and context information we should know what has not been defined. At this point we are thinking of having various options
1: the associated include files are provided somehow and we parse those too. We already do something like this in PSyclone. The problem is that one can potentially recurse down through an arbitrary number of module files and may even end up with something that does not have source code (e.g. an mpi or netcdf module file).
2. require the user to supply the missing types in a config file (equivalent to manually writing a mod file)
3. raise an exception
4. smile and wave

At the moment we do 4. I don't like 3. unless a user explicitly asks for this to happen. I think we should try to do 1. and allow an option for 2. and in fact 1. could actually produce a file for 2. for future reference (i.e. our simple version of a mod file).

@reuterbal
Copy link
Collaborator

Thanks for the quick reply!

The ibits-example was indeed intended as one of those cases that can be fixed fairly easy once symbol information is kept and might be worth testing against.

I agree with your opinion on the second example. In our downstream tool we also do a very simplistic version of 1. to fill in type information.
Luckily, the number of such cases (that I have found so far) is rather small, thus I will probably use a variant of 2. as a short-term fix (maintain a list of offending files and names and regex-replace those on the fly in the source string).

@arporter
Copy link
Member

A certain liquid-orientated model has code that redefines idim which, I was suprised to learn, is an (archaic) Fortran intrinsic. This then trips us up for all the reasons described above.

@rupertford
Copy link
Collaborator Author

Another example is the false matching of a structure constructor as an array access (designator). In general we would need to know whether the name was the name of a structure or the name of an array.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants