Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Place ASCII control characters in derived type #49

Closed
wants to merge 1 commit into from
Closed

Place ASCII control characters in derived type #49

wants to merge 1 commit into from

Conversation

ivan-pi
Copy link
Member

@ivan-pi ivan-pi commented Dec 28, 2019

As discussed in #11, I have moved the ascii control characters into a derived type. A single instance of this type (with the parameter attribute) is exposed to the public and can be accesed using

use stdlib_experimental_ascii, only: ascii_control_char 
write(*,*) ascii_control_char%TAB//"Hello"//ascii_control_char%LF

The ascii character validation and conversion procedures are now elemental and work on character arrays. One of the tests has been modified to demonstrate this by allocating character arrays filled with different subsets of characters and generating a true/false table:

          is_control
          | is_printable
          | | is_whitespace
          | | | is_blank
          | | | | is_graphical
          | | | | | is_punctuation
          | | | | | | is_alphanum
          | | | | | | | is_alpha
          | | | | | | | | is_upper
          | | | | | | | | | is_lower
          | | | | | | | | | | is_digit
 decimal  | | | | | | | | | | | is_hex_digit
 -------------------------------------------
     0-8  T F F F F F F F F F F F
       9  T F T T F F F F F F F F
   10-13  T F T F F F F F F F F F
   14-31  T F F F F F F F F F F F
      32  F T T T F F F F F F F F
   33-47  F T F F T T F F F F F F
   48-57  F T F F T F T F F F T T
   58-64  F T F F T T F F F F F F
   65-70  F T F F T F T T T F F T
   71-90  F T F F T F T T T F F F
   91-96  F T F F T T F F F F F F
  97-102  F T F F T F T T F T F T
 103-122  F T F F T F T T F T F F
 123-126  F T F F T T F F F F F F
     127  T F F F F F F F F F F F

Last, I have replaced the whitechar function in stdlib_experimental_io with the is_blank function from the ascii module.

@ivan-pi
Copy link
Member Author

ivan-pi commented Dec 28, 2019

I tried to squash two commits. I am not sure what went wrong with the MacOS tests as the ubuntu tests run successfully.

@certik
Copy link
Member

certik commented Dec 29, 2019 via email

@milancurcic
Copy link
Member

milancurcic commented Dec 29, 2019

Looks good to me. @certik In what scenario would an application not want to use a derived type?

We should fix the MacOS builds before merging though. I don't know what's going on there and GitHub is not letting me expand the details for the step that fails.

@certik
Copy link
Member

certik commented Dec 29, 2019 via email

@gronki
Copy link

gronki commented Dec 29, 2019 via email

@ivan-pi
Copy link
Member Author

ivan-pi commented Dec 29, 2019

Thank you @certik and @gronki for explaining your opinions. It is good we discuss this usage case as it sets a precedence and will also influence future design choices.

If we do not want the control character constants to polute the stdlib_experimental_ascii namespace, aside from the current PR which wraps them in a derived type (only exposing a single instance publically), we could put them in a separate module stdlib_experimental_ascii_constants along with the other constant character sequences (letters, digits).

We could also just give them longer names, e.g. ascii_control_char_tab instead of the current tab, and thereby hopefully prevent name clashes.

@gronki Do you offer any other suggestions?

@jvdp1
Copy link
Member

jvdp1 commented Dec 29, 2019

If we do not want the control character constants to polute the stdlib_experimental_ascii namespace, aside from the current PR which wraps them in a derived type (only exposing a single instance publically), we could put them in a separate module stdlib_experimental_ascii_constants along with the other constant character sequences (letters, digits).

I agree with the comments. And this @ivan-pi 's solution is a solution I often use in my programs. If a user doesn't need constants, he doesn't need to call the module. While not ideal, it would also allow the use to simply write
use stdlib_experimental_ascii
without a need to specify what it just wants.

We could also just give them longer names, e.g. ascii_control_char_tab instead of the current tab, and thereby hopefully prevent name clashes

Maybe the name of the module could be used in front of the variable, e.g., for stdlib_experimental_ascii_constants, tab would become stdlib_ascii_constants_tab (where I remove the experimental since it would be eventually removed).
I am afraid that something like ascii_control_char_xxx is still too generic. Also, using the names of the module in the names of the variables may help to find its origin in complex libraries.

@ivan-pi ivan-pi mentioned this pull request Dec 29, 2019
@certik
Copy link
Member

certik commented Dec 29, 2019 via email

@zbeekman
Copy link
Member

@ivan-pi I suspect the failures are just "cloud" issues. I'm not a member of the fortran-lang org, however, so I cannot re-run the CI test to confirm this. Given that the failures won't expand their logs, this is most likely the cause of the issue.

@milancurcic
Copy link
Member

It doesn't quite apply in this PR, but a good example is a sparse CSR matrix, represented by three arrays Ap, Aj, Ax. We will provide operations on it, such as matmul. The lowest level public API should simply accept the three arrays as input arguments, not use a derived type CSRMatrix.

I agree, you're talking in the context of providing procedural/functional vs. object-oriented APIs. Here it's meant exactly to emulate a namespace.

@milancurcic
Copy link
Member

@gronki Can you elaborate why you think it's a bad design choice? What are the specific caveats or downsides? I don't see them right now.

No matter if we adopt it here or not, I'd bet that it'd come up again as we keep working on stdlib. It'd be good for all of us to understand what are the downsides. I don't care if it was originally intended for a purpose or not. What I care about is what is the problem we're trying to solve and does the proposed method solve it.

Different question is whether this is a problem at all. That's why I asked about this in #11.

@milancurcic
Copy link
Member

Let's put them in a separate module then? In that case their names can stay as is, cannot they?

I think this solves only the scenario of user doing use stdlib_experimental_ascii, to prevent populating the global namespace with 33 short-named constants. The same user would do use stdlib_experimental_ascii_constants to get the control characters, which is convenient but against our recommended use of modules (use ..., only ...:). The downside is small complexity overhead from having a separate module. I don't recommend this approach.

Alternatives that remain are:

  • Prefix control constants with ascii_. This would solve the problem of polluting a namespace in the event of use stdlib_experimental_ascii.
  • Leave as is (reject this PR) and reconsider only in the event of users complaining.

I like both latter options better than adding the constants module.

@certik
Copy link
Member

certik commented Dec 31, 2019

I restarted the checks, but I don't know if it did anything... GitHub actions might not be as production ready as I was hoping.

We can move to Azure pipelines, which can also check all three platforms (Linux, macOS and Windows) in one framework.

Otherwise we can do Travis + AppVeyor.

@zbeekman
Copy link
Member

The conflict may need to be resolved too. I'll try reproducing CI locally on macOS to see if there's anything obvious.

@milancurcic
Copy link
Member

Let's keep this ball rolling. How do we want to proceed?

Options:

  • Emulate namespace with derived type (this PR). Supported by @ivan-pi and myself, cautiously supported by @certik; @gronki against (if you can provide some concrete downsides or caveats it would help!)
  • Prefix control constants with ascii_. This would prevent polluting the global namespace with 33 short constants, but doesn't make it any easier to import;
  • Mark this as "not a problem" (reject this PR)

@jvdp1 @marshallward @jacobwilliams do you mind chiming in?

@milancurcic
Copy link
Member

Also, @zbeekman what do you think?

Copy link
Member

@zbeekman zbeekman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally prefer having these constants packed into a derived type like this. Yes, a UDT is not a namespace, but I think that the advantages of encapsulating this group of related compile time constants makes sense. I think it is reasonable to assume that if a user wants access to one of them, they likely want (or at least don't mind) access to all of them without having to type out ten lines of only: ... in their import statement. Furthermore, if they don't like the DT syntax, they could very easily write their own "mixin" module that imports this ascii module and declares constants to be used the way they would like to use them. I don't want to muddy the waters here, but let me also lay out for you:

  1. Some concerns about portability due compiler support & bugs
  2. Some suggestions to make the interface more uniform
  3. A proposal (perhaps controversial, and perhaps exacerbating point 1 above) for an even more OO design approach

1. Compiler support (portability?)

I have seen some very strange behavior (most of my experience is with Intel and GNU) with both character compile time constants and character components of derived types. Often these bugs are not easily exercised until the DTs/parameters are used in a complex, real example. Just something to keep in mind, and not necessarily worth abandoning this proposed change over.

2. More uniform interface

For consistency I think that:

  • Other constants like fullhex_digits, letters etc. should be packed into a UDT as well
  • If this approach is adopted the UDT compile time constant should have a descriptive but not overly-verbose name like std_ascii_const or something like that to indicate its origin

4. A more OO approach

This may be controversial, and I'm not necessarily saying this is the right approach™️: What if everything was wrapped up in an object including the procedures as TBPs/methods?

The advantage is that you pull everything into client code with the class/UDT and so long as it has a decent name, there is only one entity that might have a name conflict with client code. You also only have one entity to import. The disadvantages are that you may encounter more compiler bugs (1 above) and that client code may need to type out longer procedure calls. Also, client code won't see at first glance the internal details of the class/object, but if this becomes a truly standard library that is well documented, then it is not unreasonable to assume that the documentation will be readily available and searchable, and that people will start becoming familiar with the pieces they frequently use.

At the end of the day I like this PR, but I would want to see the other constant character sequences wrapped up in a UDT as well, perhaps the same one as the control characters but with a different name, should this PR get accepted.

While the other approaches are acceptable to me, I think I'd rather be writing

std_ascii % tab

than

stdlib_experimental_ascii_tab

I'll mark this as approved, but I would like to see all constants included in a UDT, or none.

@certik
Copy link
Member

certik commented Jan 3, 2020

By priority, my first preference is to simply reject the PR. My personal opinion is that we should not use derived type for everything to try to emulate namespaces. Rather, let's work on fixing this issue: j3-fortran/fortran_proposals#1 and then this issue: j3-fortran/fortran_proposals#86. With those fixed, one could access it by std%ascii%tab. I recommend against doing use stdlib_ascii, but rather always use, stdlib_ascii, only: tab. So there is no polluting.

My second preference is to put them in a separate module, that will not be imported by default when you use stdlib_ascii, so it won't pollute the namespace.

My last preference is a derived type. But these particular constants will not be used that often, so if the majority wants a derived type, then I am fine with that.

More pragmatically, this is in the experimental module, so we can always revert this change later.

@jvdp1
Copy link
Member

jvdp1 commented Jan 3, 2020

My preferences would be:

  1. to move all ASCII control characters AND constant character sequences in another module (with, e.g., stdlib_ascii_ or ascii_, in front of them).
  2. to put all ASCII control characters AND constant character sequences in a public derive type.
  3. to let the module as it was, BUT still adding stdlib_ascii_ or ascii_ (or similar) in front of the control characters and others. There will be always someone who will do use stdlib_ascii without only, and this will most likely lead to problems because of the existence of some ff, lf, bel,...

My feelind is that we should find, at least, something to avoid polluting namespaces by variables as ff, lf, bel,...

@ivan-pi
Copy link
Member Author

ivan-pi commented Jan 3, 2020

This may be controversial, and I'm not necessarily saying this is the right approach™️: What if everything was wrapped up in an object including the procedures as TBPs/methods?

If I understand correctly you mean something along the lines of:

type :: ascii_tools_t
contains
... procedures for character conversion, etc.
end type
type(ascii_tools_t) :: ascii_tools = ascii_tools_t()

The user would then import a single instance of this derived type and just call the TPB/methods, completely avoiding the hassle of listing ten different functions:

use std_ascii, only: at => ascii_tools
print *, at%is_upper('A') ! prints T

It is kind of like a swiss-army knife. You only take out the tool you need. I would like to see what the others thoughts are on this one, however I having the feeling it is somehow too radically different to what most users are accustomed too.


Given the replies from @certik, @jvdp1, and @zbeekman we are now at 50/50 for or against using a derived type to emulate a namespace. While it would be easier to judge with some real world usage examples in the end it might turn out more comfortable to move the constants to a different module, e.g. std_ascii_constants. Then someone who just needs the tab character and carriage return can easily do:

use stdlib_ascii_constants, only: ascii_TAB, ascii_CR

and someone who wants all control characters will just leave the only part out. You can always wrap the constants into a derived type in the client code (I am here reversing the argument of @zbeekman).

Under the solution with a separate module, would the sequences for letters, digits, whitespace, and punctuation stay in the current module or would they move to the new one?

Could we use this issue to further motivate the development of namespaces over at j3-fortran? With the namespace syntax, we could have:

use, namespace :: ctr => ascii_constants
print *, ctr%TAB//"Hello"//ctr%CR//ctr%LF

@certik
Copy link
Member

certik commented Jan 3, 2020

Could we use this issue to further motivate the development of namespaces over at j3-fortran? With the namespace syntax, we could have:

Yes!! Please do. Comment at the issue. Then go to j3-fortran/fortran_proposals#122, and put this high in your priority list. :)

@milancurcic
Copy link
Member

I'm not adding a separate module just for these constants. It only solves half of the problem and introduces complexity cost.

Considering we don't have majority agreement on way forward, I agree we should shelf this away (close PR) as @certik suggested.

@certik
Copy link
Member

certik commented Jan 5, 2020

Ok. Let's close this one for now.

This does not mean that we are saying "no" forever. It's just that we can't reach a solid agreement on this right now, and let's use our energy on things where we can reach an agreement. Once we get more users, we can revisit this.

@marshallward
Copy link

marshallward commented Jan 6, 2020

We have successfully used a class-like type in Australia's main climate model's library, libaccessom2. (All credit to the author, @nichannah, who designed and programmed this class, and would probably be very interested in the efforts going on here.)

The definition is based on procedure statements:

https://github.com/COSIMA/libaccessom2/blob/9fda758e017b5d1c55e7f39797da351317fb9390/libcouple/src/accessom2.F90#L19-L103

The overhead to load it is very low:

https://github.com/mom-ocean/MOM5/blob/50dc61e9d77c181e6ad0047925fafdcead64d87a/src/accessom_coupler/ocean_solo.F90#L111

Here is the accessom2 class in action:

https://github.com/mom-ocean/MOM5/blob/50dc61e9d77c181e6ad0047925fafdcead64d87a/src/accessom_coupler/ocean_solo.F90#L228-L257

I am a big fan of this approach. It worked well on our supercomputer, which was your typical hornet's nest of old and new libraries, so it is hopefully reasonably robust these days.

I could see a design approach using procedure adapted to the ASCII methods here.

And big apologies for taking so long to reply, I'm slowly getting caught up on all the activities going on here.

@epagone
Copy link

epagone commented Jan 9, 2020

As a user, I would prefer the OO implementation along the lines suggested by @zbeekman and @marshallward . As soon as the standard is fixed (as suggested by @certik), then the interface can be simplified or changed. My 2 cent.

@milancurcic
Copy link
Member

We have successfully used a class-like type in Australia's main climate model's library, libaccessom2. (All credit to the author, @nichannah, who designed and programmed this class, and would probably be very interested in the efforts going on here.)

Always nice to stumble upon datetime-fortran used in the wild :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants