Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multidimensional Arrays #3

Closed
seanhess opened this issue Mar 20, 2024 · 2 comments
Closed

Multidimensional Arrays #3

seanhess opened this issue Mar 20, 2024 · 2 comments

Comments

@seanhess
Copy link
Contributor

seanhess commented Mar 20, 2024

Hi @krakrjak!

I'm finally at the point where I need to use the library to rewrite some FITS files. Specifically, I'm reading a 4d single-HDU FITS image, and then writing out multiple files, each with multiple HDUs containing 2d images.

I did some research on options for working with multidimensional data, and also attempted to write something super simple. I settled on Data.Massiv being the best option. It's easy to use, performant, actively maintained, and popular.

I wrote an example of how to decode images into arrays of various shapes. See below.

Do we want to incorporate something like this into the library? Have you put any thought into more robust reading and writing of data?

{-# LANGUAGE MultiParamTypeClasses #-}
{-# LANGUAGE TypeApplications #-}
{-# LANGUAGE DeriveAnyClass #-}
{-# LANGUAGE FlexibleInstances #-}
module Data.Fits.Massiv where

import Control.Monad (replicateM)
import qualified Data.Fits as Fits
import Data.Int
import Data.Word (Word8)
import Data.Fits (BitPixFormat(..))
import Data.Binary.Get
import qualified Data.ByteString.Lazy as BL
import Data.Massiv.Array as A hiding (isEmpty)


sampleInput :: Word8 -> BL.ByteString
sampleInput start = BL.pack [start..start+100]




test :: IO ()
test = do
  putStrLn "1D Array"
  print =<< decodeArray @Ix1 @Int EightBitInt [Axis 8] (sampleInput 0)

  putStrLn "2D Array"
  print =<< decodeArray @Ix2 @Int EightBitInt [Axis 2, Axis 3] (sampleInput 0)

  putStrLn "3D Array"
  arr3 <- decodeArray @Ix3 @Int EightBitInt [Axis 2, Axis 3, Axis 4] (sampleInput 0)
  print arr3

  putStrLn "4D Array"
  arr4 <- decodeArray @Ix4 @Int EightBitInt [Axis 1, Axis 2, Axis 3, Axis 4] (sampleInput 0)
  print arr4

  putStrLn "Slicing"
  print $ arr3 !> 0 !> 0
  print $ (arr3 <! 1) !> 0
  print $ arr3 <!> (Dim 2, 1) !> 1


newtype Axis = Axis Int
  deriving (Show)
newtype Axes' = Axes' [Axis]


get1dAxis :: DecodePix a => BitPixFormat -> [Axis] -> Get [a]
get1dAxis f [Axis n] = do
  replicateM n (getPix f)
get1dAxis _ as = fail $ "Invalid axes for 1d: " <> show as

get2dAxis :: DecodePix a => BitPixFormat -> [Axis] -> Get [[a]]
get2dAxis f [Axis rows, cols] =
  replicateM rows (get1dAxis f [cols])
get2dAxis f as = fail $ "Invalid axes for 2d: " <> show as

get3dAxis :: DecodePix a => BitPixFormat -> [Axis] -> Get [[[a]]]
get3dAxis f as = getAxis as (get2dAxis f)

get4dAxis :: DecodePix a => BitPixFormat -> [Axis] -> Get [[[[a]]]]
get4dAxis f as = getAxis as (get3dAxis f)

get5dAxis :: DecodePix a => BitPixFormat -> [Axis] -> Get [[[[[a]]]]]
get5dAxis f as = getAxis as (get4dAxis f)

getAxis :: [Axis] -> ([Axis] -> Get a) -> Get [a]
getAxis (Axis depth:as) getNextAxis =
  replicateM depth (getNextAxis as)
getAxis [] _ = fail "Empty Axes"




class DecodeArray ix a where
  decodeArray :: MonadThrow m => BitPixFormat -> [Axis] -> BL.ByteString -> m (Array P ix a)

instance (DecodePix a, Prim a) => DecodeArray Ix1 a where
  decodeArray f as inp = do
    as <- runGetThrow (get1dAxis f as) inp
    pure $ A.fromList Seq as

instance (DecodePix a, Prim a) => DecodeArray Ix2 a where
  decodeArray f as inp = do
    as <- runGetThrow (get2dAxis f as) inp
    A.fromListsM Seq as

instance (DecodePix a, Prim a) => DecodeArray Ix3 a where
  decodeArray f as inp = do
    as <- runGetThrow (get3dAxis f as) inp
    A.fromListsM Seq as

instance (DecodePix a, Prim a) => DecodeArray Ix4 a where
  decodeArray f as inp = do
    as <- runGetThrow (get4dAxis f as) inp
    A.fromListsM Seq as

instance (DecodePix a, Prim a) => DecodeArray Ix5 a where
  decodeArray f as inp = do
    as <- runGetThrow (get5dAxis f as) inp
    A.fromListsM Seq as



runGetThrow :: MonadThrow m => Get a -> BL.ByteString -> m a
runGetThrow get inp =
  case runGetOrFail get inp of
    Left (_, bytes, e) -> throwM $ ParseError bytes e
    Right (_, _, a) -> pure a

data ParseError = ParseError !ByteOffset !String
  deriving (Show, Exception)







-- class DecodePix a where
class DecodePix a where
  getPix :: BitPixFormat -> Get a

instance DecodePix Int8 where
  getPix EightBitInt       = getInt8
  getPix f = fail $ "Expected Int8, but format is " <> show f

instance DecodePix Int16 where
  getPix SixteenBitInt = getInt16be
  getPix f = fail $ "Expected Int16, but format is " <> show f

instance DecodePix Int32 where
  getPix ThirtyTwoBitInt = getInt32be
  getPix f = fail $ "Expected Int32, but format is " <> show f

instance DecodePix Int64 where
  getPix SixtyFourBitInt = getInt64be
  getPix f = fail $ "Expected Int64, but format is " <> show f

instance DecodePix Int where
  getPix EightBitInt = fromIntegral <$> getPix @Int8 EightBitInt
  getPix SixteenBitInt = fromIntegral <$> getPix @Int16 SixteenBitInt
  getPix ThirtyTwoBitInt = fromIntegral <$> getPix @Int32 ThirtyTwoBitInt
  getPix SixtyFourBitInt = fromIntegral <$> getPix @Int64 SixtyFourBitInt
  getPix f = fail $ "Expected Int, but format is " <> show f

instance DecodePix Float where
  getPix ThirtyTwoBitFloat = getFloatbe
  getPix f = fail $ "Expected Float, but format is " <> show f

instance DecodePix Double where
  getPix SixtyFourBitFloat = getDoublebe
  getPix f = fail $ "Expected Double, but format is " <> show f
@krakrjak
Copy link
Owner

I know we are crossing the streams here a little, and I apologize for the lapse in communication.

I do think that fits-parse or a sister library could be opinionated in providing nice convience apis for using good off-the-shelf array processing libraries. Maybe this means having a fits-parse-massiv and fits-parse-hmatrix, etc. Maybe we just keep the core unopinionated... I'm not sure. I continue to stay on the fence here because I'm not doing analysis these days so I'm a little removed from understanding what the end user really needs when they get to the analysis phase.

Is it more useful to just give a big Boxed Vector and let the end-user transform that into the native array processing structure they want to use or would it be better to have interfaces that already provide the Massiv or HMatrix or Accellerate or what-have-you structures ready to use?

This is where I'd really take a lot of input from you (@seanhess ) or other library users. Maybe we can even schedule a short call to talk through some options and see what would make the most sense long-term.

As a first take I like this approach and the idea of providing some generic implementations for 6, 7, 8 dimensions or more is probably just fine. If someone has a need for more later, we can always add them (maybe even get a free patch out of it :) ).

@seanhess
Copy link
Contributor Author

Thanks for getting back to me!

I'm neck deep in FITS for my current project. I need enough autonomy to iterate and refactor to support my work. I think another repo is the easiest way to accomplish that, but I'm open to alternative ideas. I've started a new package/repo called "telescope", with a more opinionated and less focused approach. Take a look at some work-in-progress docs here:

https://hackage.haskell.org/package/telescope-0.1.0/candidate/docs/Telescope-Fits.html

Is it more useful to just give a big Boxed Vector and let the end-user transform that into the native array processing structure they want to use or would it be better to have interfaces that already provide the Massiv or HMatrix or Accellerate or what-have-you structures ready to use?

I had to experiment for a few days to figure out how to delay evaluation on a big (3GB) file I was working on. It takes 10 seconds or so to convert the whole thing into a vector or anything similar. Using a delayed array (Array D Ix2 Float) allows one to slice it up instantly, getting the same performance as numpy, at least until you actually compute something. So I can't use a vector as an intermediate step. I DO think that fits-parse should continue to provide an easier-to-use function that parses the input as a vector though.

Maybe we can even schedule a short call to talk through some options and see what would make the most sense long-term.

Sure, I'd be down for a call!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants