-
Notifications
You must be signed in to change notification settings - Fork 170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Place ASCII control characters in derived type #49
Conversation
…IO module with is_blank.
I tried to squash two commits. I am not sure what went wrong with the MacOS tests as the ubuntu tests run successfully. |
I think this is fine. Generally it's better not to use derived types in a library because it forces all applications to also use them, even if they don't want to. But in this case it seems it is ok, it's used more like a namesoace.
…On Sat, Dec 28, 2019, at 2:16 PM, Ivan wrote:
I tried to squash two commits. I am not sure what went wrong with the
MacOS tests as the ubuntu tests run successfully.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#49?email_source=notifications&email_token=AAAFAWFVONKCF5OJRBAQCGTQ2663ZA5CNFSM4KAQP46KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHYSKAA#issuecomment-569451776>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAFAWF3YWDY324JAA7VJDDQ2663ZANCNFSM4KAQP46A>.
|
Looks good to me. @certik In what scenario would an application not want to use a derived type? We should fix the MacOS builds before merging though. I don't know what's going on there and GitHub is not letting me expand the details for the step that fails. |
It doesn't quite apply in this PR, but a good example is a sparse CSR matrix, represented by three arrays Ap, Aj, Ax. We will provide operations on it, such as matmul. The lowest level public API should simply accept the three arrays as input arguments, not use a derived type CSRMatrix. The reason for that is that as an application I would like to have a choice what data structures I use. If we only accept CSRMatrix as an argument then the application is forced to use it and what if this is some big production code that already uses its own derived type for CSR matrix, perhaps with a few more members such as name, or some other application specific metadata? Then the application is forced to create CSRMatrix derived type and copy the arrays there, or it is forced to redo its data structures. Not optimal. If on the other hand we provide an API that only accepts the three arrays directly, then the application simply passes the arrays in from its internal data structure directly.
Stdlib can still optionally provide a higher level API that uses the CSRMatrix derived type.
So the answer is: as an application I would almost always like to use a derived type for a CSR matrix, but on my own terms. Not forced by a library like stdlib.
CSR matrix is an example where most people including me would agree a derived type simplifies code and generally is appropriate for an end application. Other use cases, such as saveppm and creating an Image_t derived type, are much worse. The best is to leave applications the freedom to decide what derived types to use and when; and in stdlib to provide the actual functionality to operate on a CSR matrix or PPM images.
…On Sun, Dec 29, 2019, at 6:50 AM, Milan Curcic wrote:
Looks good to me. @certik <https://github.com/certik> In what scenario
would an application not want to use a derived type?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#49?email_source=notifications&email_token=AAAFAWF2NTBK2GDN4RZ4EVLQ3CTMHA5CNFSM4KAQP46KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHY75SY#issuecomment-569507531>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAFAWD5GZR6VNAJPXWEPKLQ3CTMHANCNFSM4KAQP46A>.
|
I agree with @certik that placing character constants in a derived type is
a bad design decision. Derived types are not designed to store constants.
They are not namespaces. They should be not used as namespaces. If somebody
cares about pollution of name space, they can use only operator which was
designed exactly for that. I think we should use things as are intended and
not poorly implement ideas from other languages. Or the result will be a
Frankenstein.
niedz., 29 gru 2019, 16:07 użytkownik Ondřej Čertík <
notifications@github.com> napisał:
… It doesn't quite apply in this PR, but a good example is a sparse CSR
matrix, represented by three arrays Ap, Aj, Ax. We will provide operations
on it, such as matmul. The lowest level public API should simply accept the
three arrays as input arguments, not use a derived type CSRMatrix. The
reason for that is that as an application I would like to have a choice
what data structures I use. If we only accept CSRMatrix as an argument then
the application is forced to use it and what if this is some big production
code that already uses its own derived type for CSR matrix, perhaps with a
few more members such as name, or some other application specific metadata?
Then the application is forced to create CSRMatrix derived type and copy
the arrays there, or it is forced to redo its data structures. Not optimal.
If on the other hand we provide an API that only accepts the three arrays
directly, then the application simply passes the arrays in from its
internal data structure directly.
Stdlib can still optionally provide a higher level API that uses the
CSRMatrix derived type.
So the answer is: as an application I would almost always like to use a
derived type for a CSR matrix, but on my own terms. Not forced by a library
like stdlib.
CSR matrix is an example where most people including me would agree a
derived type simplifies code and generally is appropriate for an end
application. Other use cases, such as saveppm and creating an Image_t
derived type, are much worse. The best is to leave applications the freedom
to decide what derived types to use and when; and in stdlib to provide the
actual functionality to operate on a CSR matrix or PPM images.
On Sun, Dec 29, 2019, at 6:50 AM, Milan Curcic wrote:
> Looks good to me. @certik <https://github.com/certik> In what scenario
> would an application not want to use a derived type?
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <
#49?email_source=notifications&email_token=AAAFAWF2NTBK2GDN4RZ4EVLQ3CTMHA5CNFSM4KAQP46KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHY75SY#issuecomment-569507531>,
or unsubscribe <
https://github.com/notifications/unsubscribe-auth/AAAFAWD5GZR6VNAJPXWEPKLQ3CTMHANCNFSM4KAQP46A
>.
>
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#49?email_source=notifications&email_token=AC4NA3MIX72375GFEPZNVD3Q3C4LHA5CNFSM4KAQP46KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHZBPIQ#issuecomment-569513890>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AC4NA3OOJPFL5MU24ZNHGSDQ3C4LHANCNFSM4KAQP46A>
.
|
Thank you @certik and @gronki for explaining your opinions. It is good we discuss this usage case as it sets a precedence and will also influence future design choices. If we do not want the control character constants to polute the We could also just give them longer names, e.g. @gronki Do you offer any other suggestions? |
I agree with the comments. And this @ivan-pi 's solution is a solution I often use in my programs. If a user doesn't need constants, he doesn't need to call the module. While not ideal, it would also allow the use to simply write
Maybe the name of the module could be used in front of the variable, e.g., for |
Let's put them in a separate module then? In that case their names can stay as is, cannot they?
There is an open issue to make modules work as namespaces here:
j3-fortran/fortran_proposals#1
Which would solve a lot of these issues. I do not recommend people to "use some_module" without the "only" part. Just like in Python.
…On Sun, Dec 29, 2019, at 11:07 AM, Jeremie Vandenplas wrote:
>
> If we do not want the control character constants to polute the `stdlib_experimental_ascii` namespace, aside from the current PR which wraps them in a derived type (only exposing a single instance publically), we could put them in a separate module `stdlib_experimental_ascii_constants` along with the other constant character sequences (letters, digits).
I agree with the comments. And this @ivan-pi
<https://github.com/ivan-pi> 's solution is a solution I often use in
my programs. If a user doesn't need constants, he doesn't need to call
the module. While not ideal, it would also allow the use to simply write
`use stdlib_experimental_ascii`
without a need to specify what it just wants.
> We could also just give them longer names, e.g. ascii_control_char_tab instead of the current tab, and thereby hopefully prevent name clashes
Maybe the name of the module could be used in front of the variable,
e.g., for `stdlib_experimental_ascii_constants`, `tab` would become
`stdlib_ascii_constants_tab` (where I remove the `experimental` since
it would be eventually removed).
I am afraid that something like `ascii_control_char_xxx` is still too
generic. Also, using the names of the module in the names of the
variables may help to find its origin in complex libraries.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#49?email_source=notifications&email_token=AAAFAWBEWFDVM5T7A6J47WDQ3DRMNA5CNFSM4KAQP46KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHZFCSY#issuecomment-569528651>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAFAWH7D42PQEWJ2C66KXTQ3DRMNANCNFSM4KAQP46A>.
|
@ivan-pi I suspect the failures are just "cloud" issues. I'm not a member of the fortran-lang org, however, so I cannot re-run the CI test to confirm this. Given that the failures won't expand their logs, this is most likely the cause of the issue. |
I agree, you're talking in the context of providing procedural/functional vs. object-oriented APIs. Here it's meant exactly to emulate a namespace. |
@gronki Can you elaborate why you think it's a bad design choice? What are the specific caveats or downsides? I don't see them right now. No matter if we adopt it here or not, I'd bet that it'd come up again as we keep working on stdlib. It'd be good for all of us to understand what are the downsides. I don't care if it was originally intended for a purpose or not. What I care about is what is the problem we're trying to solve and does the proposed method solve it. Different question is whether this is a problem at all. That's why I asked about this in #11. |
I think this solves only the scenario of user doing Alternatives that remain are:
I like both latter options better than adding the constants module. |
I restarted the checks, but I don't know if it did anything... GitHub actions might not be as production ready as I was hoping. We can move to Azure pipelines, which can also check all three platforms (Linux, macOS and Windows) in one framework. Otherwise we can do Travis + AppVeyor. |
The conflict may need to be resolved too. I'll try reproducing CI locally on macOS to see if there's anything obvious. |
Let's keep this ball rolling. How do we want to proceed? Options:
@jvdp1 @marshallward @jacobwilliams do you mind chiming in? |
Also, @zbeekman what do you think? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally prefer having these constants packed into a derived type like this. Yes, a UDT is not a namespace, but I think that the advantages of encapsulating this group of related compile time constants makes sense. I think it is reasonable to assume that if a user wants access to one of them, they likely want (or at least don't mind) access to all of them without having to type out ten lines of only: ...
in their import statement. Furthermore, if they don't like the DT syntax, they could very easily write their own "mixin" module that imports this ascii module and declares constants to be used the way they would like to use them. I don't want to muddy the waters here, but let me also lay out for you:
- Some concerns about portability due compiler support & bugs
- Some suggestions to make the interface more uniform
- A proposal (perhaps controversial, and perhaps exacerbating point 1 above) for an even more OO design approach
1. Compiler support (portability?)
I have seen some very strange behavior (most of my experience is with Intel and GNU) with both character compile time constants and character components of derived types. Often these bugs are not easily exercised until the DTs/parameter
s are used in a complex, real example. Just something to keep in mind, and not necessarily worth abandoning this proposed change over.
2. More uniform interface
For consistency I think that:
- Other constants like
fullhex_digits
,letters
etc. should be packed into a UDT as well - If this approach is adopted the UDT compile time constant should have a descriptive but not overly-verbose name like
std_ascii_const
or something like that to indicate its origin
4. A more OO approach
This may be controversial, and I'm not necessarily saying this is the right approach
™️: What if everything was wrapped up in an object including the procedures as TBPs/methods?
The advantage is that you pull everything into client code with the class/UDT and so long as it has a decent name, there is only one entity that might have a name conflict with client code. You also only have one entity to import. The disadvantages are that you may encounter more compiler bugs (1 above) and that client code may need to type out longer procedure calls. Also, client code won't see at first glance the internal details of the class/object, but if this becomes a truly standard library that is well documented, then it is not unreasonable to assume that the documentation will be readily available and searchable, and that people will start becoming familiar with the pieces they frequently use.
At the end of the day I like this PR, but I would want to see the other constant character sequences wrapped up in a UDT as well, perhaps the same one as the control characters but with a different name, should this PR get accepted.
While the other approaches are acceptable to me, I think I'd rather be writing
std_ascii % tab
than
stdlib_experimental_ascii_tab
I'll mark this as approved, but I would like to see all constants included in a UDT, or none.
By priority, my first preference is to simply reject the PR. My personal opinion is that we should not use derived type for everything to try to emulate namespaces. Rather, let's work on fixing this issue: j3-fortran/fortran_proposals#1 and then this issue: j3-fortran/fortran_proposals#86. With those fixed, one could access it by My second preference is to put them in a separate module, that will not be imported by default when you My last preference is a derived type. But these particular constants will not be used that often, so if the majority wants a derived type, then I am fine with that. More pragmatically, this is in the experimental module, so we can always revert this change later. |
My preferences would be:
My feelind is that we should find, at least, something to avoid polluting namespaces by variables as |
If I understand correctly you mean something along the lines of: type :: ascii_tools_t
contains
... procedures for character conversion, etc.
end type
type(ascii_tools_t) :: ascii_tools = ascii_tools_t() The user would then import a single instance of this derived type and just call the TPB/methods, completely avoiding the hassle of listing ten different functions: use std_ascii, only: at => ascii_tools
print *, at%is_upper('A') ! prints T It is kind of like a swiss-army knife. You only take out the tool you need. I would like to see what the others thoughts are on this one, however I having the feeling it is somehow too radically different to what most users are accustomed too. Given the replies from @certik, @jvdp1, and @zbeekman we are now at 50/50 for or against using a derived type to emulate a namespace. While it would be easier to judge with some real world usage examples in the end it might turn out more comfortable to move the constants to a different module, e.g.
and someone who wants all control characters will just leave the Under the solution with a separate module, would the sequences for letters, digits, whitespace, and punctuation stay in the current module or would they move to the new one? Could we use this issue to further motivate the development of namespaces over at j3-fortran? With the namespace syntax, we could have:
|
Yes!! Please do. Comment at the issue. Then go to j3-fortran/fortran_proposals#122, and put this high in your priority list. :) |
I'm not adding a separate module just for these constants. It only solves half of the problem and introduces complexity cost. Considering we don't have majority agreement on way forward, I agree we should shelf this away (close PR) as @certik suggested. |
Ok. Let's close this one for now. This does not mean that we are saying "no" forever. It's just that we can't reach a solid agreement on this right now, and let's use our energy on things where we can reach an agreement. Once we get more users, we can revisit this. |
We have successfully used a class-like type in Australia's main climate model's library, libaccessom2. (All credit to the author, @nichannah, who designed and programmed this class, and would probably be very interested in the efforts going on here.) The definition is based on The overhead to load it is very low: Here is the I am a big fan of this approach. It worked well on our supercomputer, which was your typical hornet's nest of old and new libraries, so it is hopefully reasonably robust these days. I could see a design approach using And big apologies for taking so long to reply, I'm slowly getting caught up on all the activities going on here. |
As a user, I would prefer the OO implementation along the lines suggested by @zbeekman and @marshallward . As soon as the standard is fixed (as suggested by @certik), then the interface can be simplified or changed. My 2 cent. |
Always nice to stumble upon datetime-fortran used in the wild :) |
As discussed in #11, I have moved the ascii control characters into a derived type. A single instance of this type (with the parameter attribute) is exposed to the public and can be accesed using
The ascii character validation and conversion procedures are now elemental and work on character arrays. One of the tests has been modified to demonstrate this by allocating character arrays filled with different subsets of characters and generating a true/false table:
Last, I have replaced the
whitechar
function instdlib_experimental_io
with theis_blank
function from the ascii module.