-
-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make Format
an enumerated type
#547
Make Format
an enumerated type
#547
Comments
+1 to this |
I couldn't find any actual type synonym definition for Is the type synonym referred to a philosophical one? Or has it been fixed already? |
Format is defined in Note that it already has a "it just works" instance of newtype Format = Format String
deriving (Read, Show, Typeable, Data, Generic)
instance IsString Format where
fromString f = Format $ map toLower f
instance Eq Format where
Format x == Format y = map toLower x == map toLower y I seem to recall that the reason I mean it's arguably a hack, but changing it to an actual sum type (in fact a product type when you consider the set of extensions) will definitely break backward compatibility, so this is most likely going to be a 2.0 thing. |
Oh, so it's a different package. Thanks for the info. |
+++ Tim T.Y. Lin [Mar 04 15 00:45 ]:
No, the extensions are not part of the string on Format.
Right. In principle, a sum type would be better. However, it's a big |
Right, turns out |
One question is whether it should be possible to pass a custom Format, or whether Format can only contain known formats. I.e., should we use data Format = Markdown | Docx | ReStructuredText | … or rather data KnownFormat = Markdown | Docx | ReStructuredText | …
data Format = Format KnownFormat
| CustomFormat String I can see arguments for both variants; most arguments in favor of a finite sum type are given above. On the other hand, we'd limit users in their ability to pass format information to filters, custom writers, and programs built on top of pandoc's library. Personally, I lean towards a finite sum type, as I feel the advantages out-weight the slight loss in flexibility. The only real problem I see is how to handle unknown formats specifications during parsing: Should those be turned into a default format, or maybe a code block? |
I'm not sure about the finite vs extensible question, but like you I lean towards finite. If we're thinking about this question, I think we might want to address a bigger issue about raw blocks. This has come up with ipynb. Jupyter notebook code cells will often generate output in multiple formats: for example, a table might be produced in text/latex and text/plain. The plain version is a fallback, so if you're converting to HTML, the HTML version will be used; if to LaTeX, the fallback would be to include the plain text version in a verbatim environment. It's tough to handle this properly in pandoc. Given that the behavior of the reader is supposed to be independent of the writer, we can either (a) include both the HTML version as a raw block and the plain text version as a code block, with the result that you'll see TWO versions of the table when it's converted to HTML or (b) just include the HTML version, with the result that there will be no fallback when it's converted to LaTeX or other formats. A bad choice, which makes it impossible to fully emulate nbconvert. One thing that would help here would be an AST element that includes content conditionally on the format. Something like this:
With this kind of structure one could remove the Format specifier from the RawBlock itself. But thinking about the fallback part of this, one sees a need for format specifications that encompass multiple formats, like |
Could you give a couple more examples? Is the fallback always plain-text? Or are the fallbacks at least ordered? Like try Just a thought: instead of going with a whole boolean algebra, the ipynb reader could also put in a |
What I ended up doing is putting a little filter |
A few thoughts on a new Having a One outline of a design is to include something like this in module Text.Pandoc.Format where
-- Absolutely anything that might occur in Format right now is included. Requires a look through
-- the pandoc code base to get everything, I think.
data Format = HTML | HTML4 | HTML5 | EPUB | EPUB2 | EPUB3 | ...
deriving (..., Enum, Bounded)
-- The Formats boolean algebra is just the normal one for Set Format.
newtype Formats = Formats (Set Format)
-- As a format specifier or selector, Formats x means "any of the formats in x".
matchesFormat :: Formats -> Format -> Bool
(Formats s) `matchesFormat` f = f `Set.member` s
anyOf :: [Format] -> Formats
anyOf = Formats . Set.fromList
anyFormat :: Formats
anyFormat = anyOf [minBound..maxBound]
notFormat :: Formats -> Formats
notFormat (Formats s) = Formats $ t `Set.difference` s
where Formats t = anyFormat
-- and various other boolean operations on Formats The Format type supports a sub-format relation, where x is a sub-format of y if a raw element of format x can always be included in an output format y. This (with helper functions) should make it easier to figure out when -- List the sub-formats of the given format
includesFormats :: Format -> Formats
includesFormats HTML = fromList [HTML, HTML4, HTML5, EPUB, EPUB2, EPUB3]
includesFormats HTML5 = fromList [HTML5, EPUB3]
includesFormats EPUB = fromList [EPUB, EPUB2, EPUB3]
-- etc.
-- List the super-formats of the given format
includedByFormats :: Format -> Formats
includedByFormats HTML = fromList [HTML]
includedByFormats HTML5 = fromList [HTML, HTML5]
includedByFormats EPUB = fromList [HTML, EPUB]
-- etc. It would be simpler to have only concrete, fully-specified formats in Having a "big" |
Currently,
If toConcreteFormat :: Format -> Format
toConcreteFormat HTML = HTML5
toConcreteFormat HTML5 = HTML5
toConcreteFormat EPUB = EPUB3
-- etc. that takes under-specified formats and chooses a default concrete one for them, like the |
Maybe a better way to define a sub-format is to say that x is a sub-format of y if whenever a raw element of format y can be included somewhere, a raw element of format x can be included in the same place and in the same way. |
I have the feeling there are a few different "sub-format" relations..
|
Yes, I think there are a few relevant relations. There are:
I think jgm/pandoc-types#78 deals with the first two. The I think the writers using other writers as intermediates sorts itself out naturally from the perspective of the first two relations, based on the current The formats representing different extensions problem should also hopefully be solved in that pull request, for instance by considering all the |
Format
is a synonym forString
. User have to look at the source code to find out right values for this type. (It can be"html"
or"Html"
or"latex"
or"LaTeX"
or"tex"
). It's not clear wich from the docs alone. Maybe it's better to define a new data type:The text was updated successfully, but these errors were encountered: