[fparser2] capture symbol information #201

rupertford · 2019-06-01T23:05:04Z

Some of the things we need to do:

Capture primitive type of symbols representing data variables
Support symbols representing program/subroutine/function names
Support symbol renaming within use statements

The text was updated successfully, but these errors were encountered:

rupertford · 2019-10-22T20:50:20Z

fparser2 implements the rules specified in the Fortran spec. However, in certain cases, more than one rule can match and it is not possible to determine which rule is valid without having access to symbol table information

The two known cases are

1: statement function or array access, where it is not possible to determine whether a statement is a statement function or an array access when it is the latest declaration in a declaration block or the first executable statement in an execution block.

2: function or array access, where it is not possible to determine whether a statement is a (non-intrinsic) function or an array access.

There may be a third where a symbol is referenced before it is declared e.g. real a(n); integer n which needs checking. If this is a problem then we might need to have two passes, at least for declarations?

Fparser2 should add a symbol table which it populates as code is parsed. Whether we need to then have the parser go in two phases or simply use the current information in the symbol table as we go along I do not know.

Note, there will still be cases where we don't know what something is (e.g. due to it being declared in another module which has not been parsed and we need to decide what to do in this circumstance. We could abort, always parse all referenced modules (I don't think this is feasible in general), or have an additional node which says it is one of n matches i.e. we catch the ambiguity.

rupertford · 2019-10-24T12:21:57Z

@pelson also found issues requiring symbol and contextual information, see #190 and #182

Some of the above problems are related to constraints that can't be checked at the moment, which could be checked with symbol table and related contextual information.

rupertford · 2020-01-31T15:53:52Z

This is also a problem when trying to distinguish between an array slice and a character string section.

        program test
        character(len=10) :: a
        a(1:3)='hey'
        end program test

        program test
        real :: a(10)
        a(1:3)=0.0
        end program test

See failing test in fortran2003/test_designator.py

This is due to two sub-rules in Designator (Fortran2003 R603) matching when constraints are not enforced and the constraints can not be enforced as they are based on datatype, see Fortran2003 C619.

reuterbal · 2020-02-12T15:33:24Z

Since I think the following is related / can be resolved by the same means, I put it here:

Variable names, procedure names, etc. can shadow intrinsic functions. Consider the following example:

code = '''
... subroutine shadow()
... integer :: ibits(10)
... integer :: i
... do i=1,10
... ibits(i)=i
... end do
... i=ibits(5)
... end subroutine shadow
... '''
>>> from fparser.common.readfortran import FortranStringReader
>>> from fparser.two.parser import ParserFactory
>>> reader = FortranStringReader(code)
>>> f2008_parser = ParserFactory().create(std='f2008')
>>> ast = f2008_parser(reader)
...
fparser.two.utils.InternalSyntaxError: Intrinsic 'IBITS' expects 3 arg(s) but found 1.

The problem is the next to last line, where it is not obvious that this is an array element access and not a function call.

reuterbal · 2020-02-12T15:45:03Z

(Sorry for the double-post: meant to append the following but hit Ctrl+Enter instead of Enter. That happens when you write slack in another window at the same time.)

There are even more evil situations possible when the offending name is defined in a used module, for example:

>>> code = '''
... subroutine shadow2()
... use some_mod
... type(some_type) :: a, b, c
... real :: z
... z = dot_product(a, b, c)
... end subroutine shadow2
... '''
>>> reader = FortranStringReader(code)
>>> ast = f2008_parser(reader)
...
fparser.two.utils.InternalSyntaxError: Intrinsic 'DOT_PRODUCT' expects 2 arg(s) but found 3.

(I will neither confirm nor deny that such beauty exists in a, say, operational code base)

Any ideas how to overcome such situations? I can also put this in a separate issue since it might not even be mitigated by a symbol table alone but could require parsing the used module first.

rupertford · 2020-02-12T16:06:06Z

The operational code base that you may or may not be referring to would not be the only one. Another more liquid oriented model also has such a lovely example.

Actually I happen to be writing some (minimal) documentation that explains the general problem of matching multiple rules but that does not help solve the problem.

The current plan is to keep symbol and context information as we go along and then use that in rules to check constraints. This will sort out many of the ambiguities. In your first ibits example this will sort the problem out as we will be able to determine that ibits is actually an array and therefore not match it as an intrinsic.

Your second example is a general problem which compilers solve using .mod files. If we keep symbol and context information we should know what has not been defined. At this point we are thinking of having various options
1: the associated include files are provided somehow and we parse those too. We already do something like this in PSyclone. The problem is that one can potentially recurse down through an arbitrary number of module files and may even end up with something that does not have source code (e.g. an mpi or netcdf module file).
2. require the user to supply the missing types in a config file (equivalent to manually writing a mod file)
3. raise an exception
4. smile and wave

At the moment we do 4. I don't like 3. unless a user explicitly asks for this to happen. I think we should try to do 1. and allow an option for 2. and in fact 1. could actually produce a file for 2. for future reference (i.e. our simple version of a mod file).

reuterbal · 2020-02-12T16:23:33Z

Thanks for the quick reply!

The ibits-example was indeed intended as one of those cases that can be fixed fairly easy once symbol information is kept and might be worth testing against.

I agree with your opinion on the second example. In our downstream tool we also do a very simplistic version of 1. to fill in type information.
Luckily, the number of such cases (that I have found so far) is rather small, thus I will probably use a variant of 2. as a short-term fix (maintain a list of offending files and names and regex-replace those on the fly in the source string).

arporter · 2020-07-24T12:25:59Z

A certain liquid-orientated model has code that redefines idim which, I was suprised to learn, is an (archaic) Fortran intrinsic. This then trips us up for all the reasons described above.

rupertford · 2021-01-09T01:20:51Z

Another example is the false matching of a structure constructor as an array access (designator). In general we would need to know whether the name was the name of a structure or the name of an array.

…p-level symbol tables into it.

Add basic symbol table functionality (towards #201)

rupertford mentioned this issue Oct 22, 2019

[fparser2] Support statement functions #202

Open

rupertford mentioned this issue Oct 23, 2019

Statement function fix #214

Closed

arporter mentioned this issue Feb 13, 2020

array_element vs array_sections #213

Closed

arporter mentioned this issue Feb 12, 2021

Strict matching against Fortran intrinsics causes false Syntax Errors #291

Closed

arporter added a commit that referenced this issue Feb 12, 2021

#201 add initial sketch of symbol table module

c0f56bf

arporter added a commit that referenced this issue Feb 22, 2021

#201 get basic capturing of use statements [skip ci]

f5368f5

arporter added a commit that referenced this issue Feb 23, 2021

#201 use symbol table to spot simple instances of intrinsic shadowing

b727052

arporter added a commit that referenced this issue Feb 25, 2021

#201 change singleton to single, global tables object and only put to…

825203f

…p-level symbol tables into it.

arporter mentioned this issue Apr 20, 2021

Add basic symbol table functionality (towards #201) #293

Merged

rupertford added a commit that referenced this issue Dec 8, 2021

Merge pull request #293 from stfc/201_symbol_table

e0b14f8

Add basic symbol table functionality (towards #201)

This was referenced Jun 14, 2022

(Towards #201) Symbol table does not respect use association #349

Closed

Allow symbol checks to be disabled #350

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[fparser2] capture symbol information #201

[fparser2] capture symbol information #201

rupertford commented Jun 1, 2019 •

edited by arporter

Loading

rupertford commented Oct 22, 2019

rupertford commented Oct 24, 2019

rupertford commented Jan 31, 2020 •

edited

Loading

reuterbal commented Feb 12, 2020

reuterbal commented Feb 12, 2020

rupertford commented Feb 12, 2020

reuterbal commented Feb 12, 2020

arporter commented Jul 24, 2020

rupertford commented Jan 9, 2021

[fparser2] capture symbol information #201

[fparser2] capture symbol information #201

Comments

rupertford commented Jun 1, 2019 • edited by arporter Loading

rupertford commented Oct 22, 2019

rupertford commented Oct 24, 2019

rupertford commented Jan 31, 2020 • edited Loading

reuterbal commented Feb 12, 2020

reuterbal commented Feb 12, 2020

rupertford commented Feb 12, 2020

reuterbal commented Feb 12, 2020

arporter commented Jul 24, 2020

rupertford commented Jan 9, 2021

rupertford commented Jun 1, 2019 •

edited by arporter

Loading

rupertford commented Jan 31, 2020 •

edited

Loading