Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Procedures for detecting endianness #323

Open
ivan-pi opened this issue Feb 16, 2021 · 7 comments
Open

Procedures for detecting endianness #323

ivan-pi opened this issue Feb 16, 2021 · 7 comments
Labels
idea Proposition of an idea and opening an issue to discuss it topic: utilities containers, strings, files, OS/environment integration, unit testing, assertions, logging, ...

Comments

@ivan-pi
Copy link
Member

ivan-pi commented Feb 16, 2021

Motivated by the question from @vmagnin at Discourse. For those not familiar with the term endianness (quoting the D language documentation):

Endianness refers to the order in which multibyte types are stored. The two main orders are big endian and little endian. The compiler predefines the version identifier BigEndian or LittleEndian depending on the order of the target system. The x86 systems are all little endian.

The times when endianness matters are:

  1. When reading data from an external source (like a file) written in a different endian format.
  2. When reading or writing individual bytes of a multibyte type like longs or doubles.

Another place where endianness matters is in network stacks and communication protocols. All of the protocol layers in the Transmission Control Protocol and the Internet Protocol (TCP/IP) suite are defined to be big-endian.


Would there be any interest to provide procedures to detect platform endianness at run-time?

In C this can be done with the following program (taken from the IBM Developer guide Writing endian-independent code in C):

#define LITTLE_ENDIAN 0
#define BIG_ENDIAN    1

int endian() {
    int i = 1;
    charp = (char ∗)&i;

    if (p[0] == 1)
        return LITTLE_ENDIAN;
    else
        return BIG_ENDIAN;
}
Alternative solution in C - 1

C99 solution taken from here: https://stackoverflow.com/questions/1001307/detecting-endianness-programmatically-in-a-c-program

bool is_big_endian(void)
{
    union {
        uint32_t i;
        char c[4];
    } bint = {0x01020304};

    return bint.c[0] == 1; 
}

A solution using deprecated Fortran features is given here:

      subroutine endian(litend)

c     checks if this is a little endian machine
c     returns litend=.true. if it is, litend=.false. if not

      integer*1 j(2)
      integer*2 i
      equivalence (i,j)
      logical litend

      i = 1
      if (j(1).eq.1) then
         litend = .true.
      else
         litend = .false.
      end if

      end

A modern Fortran equivalent could be something like:

pure logical function little_endian()
  integer(int8) :: j(2)
  integer(int16) :: i
  i = 1
  j = transfer(source=i,mold=j,size=2)
  if (j(1) == 1) then
    little_endian = .true.
  else
    little_endian = .false.
  end if
end function

Such procedures are very likely already part of some (legacy) Fortran libraries.

They are also available in other languages:

Some related procedures for byte swapping could also be useful. Julia has bswap. Equivalent C versions of this function are given at the bottom of this link.

Issues:

  • Not clear how to handle the PDP-11 😉
@ivan-pi ivan-pi added idea Proposition of an idea and opening an issue to discuss it topic: utilities containers, strings, files, OS/environment integration, unit testing, assertions, logging, ... labels Feb 16, 2021
@urbanjost
Copy link

There are a lot of approaches available in Fortran for handling endian-ness. Some food for thought ...

program testi
! Posted by Perseus in comp.lang.fortran on 4 July 2005.
! and Paul Van Delst and David Flower on 5 July 2005.

LOGICAL, PARAMETER :: bigend = IACHAR(TRANSFER(1,"a")) == 0

if (bigend) then
  print *, "Big Endian"
else
  print *, "Little Endian"
endif

end program testi
program chkend
! based on ideas from: http://ftp.aset.psu.edu/pub/ger/fortran/hdk/endian.f90
! by Code Tuning co-guide, 1998 Lahey Fortran Users' Conference

! Check what endian this program is running on.

integer(4) i, ascii_0, ascii_1, ascii_2, ascii_3
parameter(ascii_0 = 48, ascii_1 = 49, ascii_2 = 50, ascii_3 = 51)
common // i
   i = ascii_0 + ascii_1*256 + ascii_2*(256**2) + ascii_3*(256**3)
   call sub()
end program chkend

subroutine sub()
character(4) i
common // i
   write(*,*) ' Integer structure: ', I
   write(*,*) ' Byte order:        ', '0123'
   write(*,*)
   if(i == '0123') then
      write(*,*) ' Machine is Little-Endian '
   elseif(i == '3210') then
      write(*,*) ' Machine is Big-Endian '
   else
      write(*,*) ' Mixed endianity machine ... '
   endif
end subroutine sub
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!           FILE: SUBR_native_4byte_real.f90
!     SUBPROGRAM: native_4byte_real
!
!         AUTHOR: David Stepaniak, NCAR/CGD/CAS
! DATE INITIATED: 29 April 2003 
!  LAST MODIFIED: 29 April 2003
!
!       SYNOPSIS: Converts a 32 bit, 4 byte, REAL from big Endian to
!                 little Endian, or conversely from little Endian to big
!                 Endian.
!
!    DESCRIPTION: This subprogram allows one to convert a 32 bit, 4 byte,
!                 REAL data element that was generated with, say, a big
!                 Endian processor (e.g. Sun/sparc, SGI/R10000, etc.) to its
!                 equivalent little Endian representation for use on little
!                 Endian processors (e.g. PC/Pentium running Linux). The
!                 converse, little Endian to big Endian, also holds.
!                 This conversion is accomplished by writing the 32 bits of
!                 the REAL data element into a generic 32 bit INTEGER space
!                 with the TRANSFER intrinsic, reordering the 4 bytes with
!                 the MVBITS intrinsic, and writing the reordered bytes into
!                 a new 32 bit REAL data element, again with the TRANSFER
!                 intrinsic. The following schematic illustrates the
!                 reordering process
!
!
!                  --------    --------    --------    --------
!                 |    D   |  |    C   |  |    B   |  |    A   |  4 Bytes
!                  --------    --------    --------    --------
!                                                             |
!                                                              -> 1 bit
!                                       ||
!                                     MVBITS
!                                       ||
!                                       \/
!
!                  --------    --------    --------    --------
!                 |    A   |  |    B   |  |    C   |  |    D   |  4 Bytes
!                  --------    --------    --------    --------
!                         |           |           |           |
!                         24          16          8           0   <- bit
!                                                                 position
!
!          INPUT: realIn,  a single 32 bit, 4 byte REAL data element.
!         OUTPUT: realOut, a single 32 bit, 4 byte REAL data element, with
!                 reverse byte order to that of realIn.
!    RESTRICTION: It is assumed that the default REAL data element is
!                 32 bits / 4 bytes.
!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
  SUBROUTINE native_4byte_real( realIn, realOut )
  IMPLICIT NONE
  REAL, INTENT(IN)                              :: realIn
                                                   ! a single 32 bit, 4 byte
                                                   ! REAL data element
  REAL, INTENT(OUT)                             :: realOut
                                                   ! a single 32 bit, 4 byte
                                                   ! REAL data element, with
                                                   ! reverse byte order to
                                                   ! that of realIn
! Local variables (generic 32 bit INTEGER spaces):

  INTEGER                                       :: i_element
  INTEGER                                       :: i_element_br
! Transfer 32 bit of realIn to generic 32 bit INTEGER space:
  i_element = TRANSFER( realIn, 0 )
! Reverse order of 4 bytes in 32 bit INTEGER space:
  CALL MVBITS( i_element, 24, 8, i_element_br, 0  )
  CALL MVBITS( i_element, 16, 8, i_element_br, 8  )
  CALL MVBITS( i_element,  8, 8, i_element_br, 16 )
  CALL MVBITS( i_element,  0, 8, i_element_br, 24 )
! Transfer reversed order bytes to 32 bit REAL space (realOut):
  realOut = TRANSFER( i_element_br, 0.0 )
  END SUBROUTINE

! This file:
! http://ftp.aset.psu.edu/pub/ger/fortran/hdk/ReverseEndian.f90
!
! Convert a a file with integers/reals to
! the opposite endian by reversing bytes.

program ReverseEndian
implicit none

integer :: j, k
integer, parameter :: NumberOfRecords=5
integer, parameter :: n=3 ! n = WordsPerRecord.
character (Len=4) :: EndianIn(n), EndianOut(n)
integer, parameter :: RecLen=len(EndianIn)*n  ! in bytes

! Assume records are Binary, so use Direct access.
open (unit=50,file='Data.inp',status='old',             &
       access='direct', form='unformatted', recl=RecLen)
open (unit=60,file='Data.out',                          &
       access='direct', form='unformatted', recl=RecLen)

do k=1,NumberOfRecords
    read(50,rec=k) EndianIn
!   Reverse the bytes (convert Endian byte order)
    FORALL(j = 1:n) EndianOut(j)(4:4)=EndianIn(j)(1:1)
    FORALL(j = 1:n) EndianOut(j)(1:1)=EndianIn(j)(4:4)
    FORALL(j = 1:n) EndianOut(j)(3:3)=EndianIn(j)(2:2)
    FORALL(j = 1:n) EndianOut(j)(2:2)=EndianIn(j)(3:3)
  write(60,rec=k) EndianOut
end do

close(50)
close(60)

end program ReverseEndian

bash-4.4$

@ivan-pi
Copy link
Member Author

ivan-pi commented Feb 16, 2021

Thanks @urbanjost for these interesting approaches.

The second program chkend can be adapted, to get compile-time constants:

program chkend

use, intrinsic :: iso_fortran_env, only: int32

! i = 858927408
integer(int32), parameter :: i = shiftl(iachar('0'),  0) + &
                                 shiftl(iachar('1'),  8) + &
                                 shiftl(iachar('2'), 16) + &
                                 shiftl(iachar('3'), 24)

character(len=4), parameter :: c = transfer(i,'0123')


type :: endianness_type
  integer :: little = 1, big = 2, mixed1 = 3, mixed2 = 4
  integer :: native = findloc(c == ['0123','3210','1032','2301'],.true.,1)
end type

type(endianness_type), parameter :: endianess = endianness_type()

character(len=*), parameter :: value = trim( &
    merge("Little",merge("Big   ","Mixed ",c=='3210'),c=='0123'))

select case(endianess%native)
case(endianess%little)
  write(*,*) 'Machine is Little-Endian '
case(endianess%big)
  write(*,*) 'Machine is Big-Endian '
case(endianess%mixed1,endianess%mixed2)
  write(*,*) 'Mixed endianity machine ... '
end select

print *, "Machine is "//value//"-Endian"

end program

Edit: perhaps someone with access to a PowerPC machine could test this? (@milancurcic)

On https://godbolt.org/ I seem to be getting little-endian with all of the compilers.

@ivan-pi
Copy link
Member Author

ivan-pi commented Feb 16, 2021

Am I right to guess that in PDP-Endian the value c == '2301'?

On the other hand the Honeywell Series 16 would have c == '1032'?

@wclodius2
Copy link
Contributor

FWIW in my hash codes I assume that for modern Fortran processors integers are either big or little endian, i.e. no mixed enchain and I use as an endian flag the compile time parameter

! Dealing with different endians
    logical, parameter, public ::                                    &
        little_endian = ( 1 == transfer([1_int8, 0_int8], 0_int16) )

@ivan-pi
Copy link
Member Author

ivan-pi commented Feb 17, 2021

Thanks @wclodius2 for this useful approach.

Since the endianness might be useful for other purposes, I am in favor of making it a public constant.

I admit I like the solution of encapsulating this information in a derived type singleton. However, the last time I proposed a similar use of a derived type, the opinions were quite mixed: #49 A related discussion ensued in the issue on mathematical constants: #99 (comment)

In C++20 they use an enum class for this purpose:

enum class endian
{
#ifdef _WIN32
    little = 0,
    big    = 1,
    native = little
#else
    little = __ORDER_LITTLE_ENDIAN__,
    big    = __ORDER_BIG_ENDIAN__,
    native = __BYTE_ORDER__
#endif
};

where the class can then be used as follows:

int main() {
 
    if constexpr (std::endian::native == std::endian::big) {
        std::cout << "big-endian" << '\n';
    }
    else if constexpr (std::endian::native == std::endian::little) {
        std::cout << "little-endian"  << '\n';
    }
    else {
        std::cout << "mixed-endian"  << '\n';
    }
 
}

@wclodius2
Copy link
Contributor

As near as I can tell they stoped making mixed endian processors in the 1980s and they never had the resources to support an F90+ processor, so only big and little endian processors are of interest to this discussion. A complication is that several processor architectures, e.g., ARM and PowerPC, are Biendian, able to run in either little endian or big endian mode. The endianness used appears to usually be selected by software at startup or by the mother board design, which I believes means that it can be determined at compile time, but some architectures may allow runtime mode selection which complicates the determination of the mode. However I suspect that such runtime switching would involve either the use of assembler, or a processor specific procedure, so a standard conforming Fortran code should have a consistent endianness during processing, and a compile time flag would still be valid.

For my purposes, the processing of sub-bytes to generate a hash, I require only the endianness of the processor and not of a communication link or data file, so a single compile time flag should be sufficient. Still it would be useful to have testing on the new ARM based Macs, or IBM's PowerPC based supercomputers.

@titoxd
Copy link

titoxd commented Oct 27, 2021

Would functions such as htonl() be included here? A host_to_network() function (with or without the hardcoded integer size in the name) would be useful for interfacing with C socket code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
idea Proposition of an idea and opening an issue to discuss it topic: utilities containers, strings, files, OS/environment integration, unit testing, assertions, logging, ...
Projects
None yet
Development

No branches or pull requests

4 participants