Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Citations #32

Open
jgm opened this issue Jul 31, 2022 · 34 comments
Open

Citations #32

jgm opened this issue Jul 31, 2022 · 34 comments

Comments

@jgm
Copy link
Owner

jgm commented Jul 31, 2022

We need a syntax for citations that can be plugged into citeproc-lua or sent to pandoc for processing.

Pandoc's citation syntax seems a good basis. One thing we might change would be the syntax for author-in-text citations, which is currently a bit tricky to parse, because it requires lookahead.

Perhaps instead of

@foo [p. 15]

we should have something like

[+@foo, p. 15]
@uvtc
Copy link
Contributor

uvtc commented Aug 1, 2022

I like the idea of djot having a simple unambiguous syntax for this that is less tricky to parse. It not only makes djot simpler and faster, but it also makes it easier for any future alternative implementations of djot to parse as well.

@uvtc
Copy link
Contributor

uvtc commented Aug 5, 2022

@jgm , why do you suggest adding that + sign in there? Why not [@foo, p. 15] instead?

The [+@foo, p. 15] syntax suggests to me that it's one example of a more general syntax, as in

[+@foo ... ]  for citations
[+&foo ... ]  for ... maybe something else
[+*foo ... ]
[+_foo ... ]

@jgm
Copy link
Owner Author

jgm commented Aug 5, 2022

[@foo, p. 15] is fine for a regular citation which might render as (Foo 2000, p. 15).
I'm talking about syntax for an author-in-text citation, which would render as Foo (2000, p. 15).

@uvtc
Copy link
Contributor

uvtc commented Aug 5, 2022

Ah. I'm not very familiar with citations. Thanks.

@kmaasrud
Copy link
Contributor

This syntax seems very natural to me. Same goes for using [+@foo, p. 15] for author-in-text citations. No need to reinvent the wheel, and the citeproc syntax is familiar for many.

Djot will be a perfect fit for academic writing---a natural continuation of Pandoc Markdown, which many (including me) are using in academia today. Thus, having a well-defined citation syntax seems very important to me. What will it take to implement this? I would be happy to help if I can!

@NotAFedoraUser
Copy link

Org-Mode, another markup language added citation support in 9.5.

In that release they added the following syntax to markup a citation:

According to [cite: common prefix;@Key123 page 13; @Key982 chap 1; common suffix] ...

Which would render as (Key123 2000, pp. 13; Key982 2009 chap. 1), for example.
They also allow you to specify a style of citation:

[cite/t/c: ...]
      ^ ^
      | |
      | Variant
      Style (Here, "t" means in text) ala: Foo (...)

The blog post from a contributor to Org-Mode lays it all out better than I could ever do in a GH issue:
https://blog.tecosaur.com/tmio/2021-07-31-citations.html

Crucially, this kind of syntax would allow people to set different styles on each citation, which it seems is not (easily) accomplished in the discussed syntax proposal.

@kmaasrud
Copy link
Contributor

According to [cite: common prefix;@Key123 page 13; @Key982 chap 1; common suffix] ...

That looks similar to what @jgm is proposing and the current syntax used by pandoc-citeproc, just with an english-defined syntax (using the word cite), which we would like to avoid.

I'm still in favour of encapsulating a citation fully in square brackets for easy parsing, and I think the choice of @ for simple cites and +@ for author-in-text should be enough customization.

@jgm
Copy link
Owner Author

jgm commented Jan 14, 2023

The org-mode syntax (which draws on and extends the pandoc syntax) gets more flexibility (different styles) at the price of verbosity and English-language keywords. So each has its drawbacks and its advantages.

@NotAFedoraUser
Copy link

From the currect proposal this:

In [+@Smith2014 page 21-23] he talks about...

Turns into this:

In Smith (2014, pp. 21--23) he talks about...

Whereas to do the syntax ala Org-Mode:

In [cite/t:@Smith2014 page 21-23] he talks about...

While Org-Mode's syntax is longer winded, it is more flexible,
allowing for more styles of citations, [cite/a:] or [cite/n:] or [cite/t:]
I suppose one could accomplish the same task with a modification of
the current proposal to include something like the following:

[-@Key] /* nocite (For inclusion the printed bibliography) */
[+@Key] /* in text cite (Smith (pp. 21-23)) */
[/@Key] /* author name citation (Smith) */

Perhaps this makes more sense for djot?

@kmaasrud
Copy link
Contributor

[-@Key] /* nocite (For inclusion the printed bibliography) */

@NotAFedoraUser that is very clever! Along with +@, I think that should be sufficient for most use-cases. However, the [<some-punctuation>@<key>] scheme leaves room for a lot of flexibility down the road---if more citation variants are requested.

@bpj
Copy link

bpj commented Jan 14, 2023

@jgm :

at the price of verbosity and English-language keywords

I hope both will be avoided!

@bdarcus
Copy link

bdarcus commented Feb 8, 2023

Just came across djot; cool!

@jgm - I just thought I'd remind you about one wrinkle we stumbled on in org-cite development, which is the question of whether a local variant is a property of the citation as a whole (where we came down with org-cite), or the individual citation-reference (as it is in pandoc).

E.g. what happens if you have more than one reference in a citation with your proposed examples (the first example being where the author lists differs, and second where they don't)?

[@foo, p. 15;+@bar]
[@foo1, p. 15;+@foo2]

@jgm
Copy link
Owner Author

jgm commented Feb 8, 2023

@bdarcus the proposal floated above was to use + for author-in-text citations. The thought was that it would go at the beginning of the citation list, thus

[+@foo, p. 15; @bar]

which would be equivalent to pandoc's

@foo [p. 15; @bar]

I hadn't envisioned allowing it to be put on subsequent items, and I'm not sure what sense that would make.
Maybe I haven't grasped your thought here.

@bdarcus
Copy link

bdarcus commented Feb 8, 2023

@jgm - in that case, I think I misunderstood, and it's a property of the citation as a whole, which is I think right.

@bdarcus
Copy link

bdarcus commented Feb 14, 2023

One other difference between org-cite (and biblatex) and pandoc: it has two levels of affixes; one for the citation, and another for the citation-references.

It's useful when you have a multi-cite, and a style may sort the references within the citation.

[cite:see ;@doe22;@doe20, ch. 2]

So presumably in djot, it could just be:

[see ;@doe22;@doe20, ch. 2]

@jgm
Copy link
Owner Author

jgm commented Feb 14, 2023

Yes, I think that would be a good approach. However, citeproc doesn't currently support two levels of affixes, so I don't know what we'd do with this.

@bdarcus
Copy link

bdarcus commented Feb 14, 2023

Maybe a simple heuristic to flatten them (like merge with the affix of the nearest reference affix?), and later add support to citeproc as time and interest allow?

You may already have to do something similar when dealing with org-cite?

@bdarcus
Copy link

bdarcus commented May 19, 2023

Is this issue pretty much resolved; just needs to be implemented?

And maybe also relies on #35?

I've been working on a project I have been planning from the beginning to integrate with this once it's available.

https://github.com/bdarcus/csl-next

ATM, I have my own AST, which is basically the new style input template model enhanced with rendered data (current example bibliography reference below), but I'm hoping it should be pretty easy to integrate with djot; both for document processing as a whole, and also to allow djot markup within field strings.

  [
    [ { contributors: "author", procValue: "Doe, Jane" } ],
    {
      date: "issued",
      format: "year",
      wrap: "parentheses",
      procValue: "2023b"
    },
    [ { title: "title", procValue: "The Title" } ],
    undefined,
    undefined
  ]

@jgm
Copy link
Owner Author

jgm commented May 19, 2023

I wouldn't call it resolved! There are still a lot of choice points.

@bdarcus
Copy link

bdarcus commented May 19, 2023 via email

@jgm
Copy link
Owner Author

jgm commented May 20, 2023

the former

@bdarcus
Copy link

bdarcus commented May 22, 2023

the former

So what are those outstanding questions?

I suppose one, that you may or may not have been thinking about, is locators: string + string parsing (as with the pandoc syntax and most current other examples), vs more structured.

For the project I'm working on, I just merged this, which actually isn't too bad in YAML:

suffix: [see, page: 23, section: V]

But I guess the pandoc optional brackets basically is the same.

I guess another, that came up with org-cite, is where to allow markup within the citation?

@jgm
Copy link
Owner Author

jgm commented May 23, 2023

There are lots of questions. Do we want to support a huge range of variants like org? If so, how do we do that without English language keywords? How are prefixes and suffixes handled? How are locators handled? Do we use localized locator labels as in pandoc? How are locators distinguished from other suffix content? I don't have a lot of time right now to work on this, but this should give some idea.

@bdarcus
Copy link

bdarcus commented Jun 4, 2023

Note: I edited this a bit much later to add something I missed earlier on affixes.

Since I'm thinking about and working on this area ATM, my thoughts:

Do we want to support a huge range of variants like org?

This is indeed the big question, since it's hard to reverse later.

My impulse is to say no, and just have two styles/commands; what in the academic literature on this are called:

  1. integral: AKA citet, textcite, narrative citations.
  2. non-integral: AKA citep, parenthetical citations.

These notions are very general, more so than in the TeX world, and for that reason should go fairly far.

EDIT: the caveat is some of the variants in the LaTeX world are for handling capitalization, which the above would not.

EDIT: Implementing the citation model now; here's for now how I'm dealing with this.

pub enum CitationModeType {
    /// Places the author inline in the text; also known as "narrative" or "in text" citations.
    Integral,
    /// Places the author in the citation and/or bibliography or reference entry.
    #[default]
    NonIntegral,
}

But I could also see:

If so, how do we do that without English language keywords?

Do something like org-cite, but use single characters. But that has its own trade-offs.

How are prefixes and suffixes handled?

I think you're referring to this above?

#32 (comment)

In any case, yes, this is another decision point: affixes only or individual citation references (as in pandoc), or also for the citation as a whole (as in org-cite and biblatex).

Per my comment there, I'd prefer the latter, because the cost is low, and the benefit in terms of flexibility for users high.

How are locators handled? Do we use localized locator labels as in pandoc? How are locators distinguished from other suffix content?

In my in-progress project (which I'm now focusing on a Rust implementation; just haven't done the citation part yet), here's the typescript definitions for locators.

export type Locator = Record<LocatorTerms, string> | string;

type LocatorTerms =
  | "book"
  | "chapter"
  | "column"
  | "figure"
  | "folio"
  | "number"
  | "line"
  | "note"
  | "opus"
  | "page"
  | "paragraph"
  | "part"
  | "section"
  | "sub-verbo"
  | "verse"
  | "volume";

In YAML:

suffix: [see, page: 23, section: V]

But that's a format more for machines; not humans. E.g. it's what the djot markup might be converted into.

This is another tricky area; my impulse is just to do what you've done in pandoc.

Do you see any glaring problems with that?

@jgm
Copy link
Owner Author

jgm commented Jun 5, 2023

The pandoc way has worked pretty well. There are occasional requests for more expressive power, but it seems enough for most users.

@kmaasrud
Copy link
Contributor

kmaasrud commented Jun 6, 2023

[...] but it seems enough for most users.

Based on my personal experience of academic writing, I concur. The less complexity, the better; that'll keep it simpler for implementors.

@gfarrell
Copy link

gfarrell commented Apr 5, 2024

For my own purposes, I started adding the citation format specified in this issue into my own djoths fork.

  1. Parsing is fine, rendering to HTML is fine (ish, one question below), but the bit I'm stuck at is: do you think the references have to be contained in the source text? I'm thinking of how, for example, you can have a LaTeX document with separate BibTeX file. That engenders two pathways: djot implementations have to be able to specify an input map of references (or a references file) OR there has to be additional syntax for specifying references as part of the djot specification.

  2. On rendering to HTML (both for the bibliography and the inline citation itself), is it better to have a standardised output as described above (e.g. either "author-in-text" or "author-in-parentheses") or would it be better to allow the user to specify a CSL stylesheet (perhaps with a default stylesheet) which, sadly, would mean another external input to the djot implementations.

I know it's quite possible that something will block this from making it into the djot spec any time soon, but I thought I'd ask given that I am implementing anyway, and maybe that implementation will make it into djoths when the spec gets updated, so I'd rather do this semi-informed than 0% informed.

@jgm
Copy link
Owner Author

jgm commented Apr 5, 2024

For parsing, we just need to specify the syntax of citations and a corresponding AST element.

For rendering: that's a matter of what we do with the citations. Here djot itself could be neutral, but I think the most powerful thing to do would be what pandoc does: use a citeproc processor to create citations and bibliography using a CSL stylesheet and external references. (Here in a Haskell implementation you could simply use my citeproc library.)

@jgm
Copy link
Owner Author

jgm commented Apr 5, 2024

Re providing a way to put citations inside the document itself: pandoc does allow this, in a references field in metadata. So this interacts with the metadata issue.

@bdarcus
Copy link

bdarcus commented Apr 5, 2024

Random quick thoughts:

Here djot itself could be neutral, but I think the most powerful thing to do would be what pandoc does: use a citeproc processor to create citations and bibliography using a CSL stylesheet and external references.

The advantage of that is that, like djot, CSL is agnostic about output format. So it's a good match.

I guess the question is how closely and formally they are tied.

Someone that primarily targets LaTeX might want to bypass CSL and use bibtex/biblatex.

Also, I do have ambitions of finishing my CSLN project and hooking it up to djot, so hopefully there's room for that sort of alternative.

@jgm
Copy link
Owner Author

jgm commented Apr 5, 2024

I don't think specifying a syntax for citations (and perhaps reference lists) requires tying djot to any particular mode of rendering citations. Pandoc's citations, for example, can be rendered using CSL or natbib or bib latex or org-cite, depending on command line options.

@bdarcus
Copy link

bdarcus commented Apr 27, 2024

@gfarrell:

I know likely premature ATM, but since you've been working on it ...

I started adding the citation format specified in this issue into my own djoths fork.

Was looking at the test cases, and just wondering about one design question we hadn't settled.

From what I can tell, your implementation follows the pandoc way; no global affixes?

The concrete question that issue raises is what happens if you have a citation like this, and the citation processor is using a style that requires reordering the references within the citation by date issued?

[see @doe24; @doe20]

Without global affixes, you either end up with something like this (which is simple wrong; the author here is intending to list multiple references to "see"):

(Doe, 2020; see Doe 2024)

... or you require the user to track that order, AND adjust it if the citation style changes.

So in org-mode (which is an iteration of the pandoc model and syntax), for example, you would do:

[cite: see; @doe24; @doe20]

My argument has been it's a niche feature important in some fields (notably in the humanities and social sciences), but that adding it to djot is low-cost for users and developers alike.

Regardless, you probably want to include some prefixes in the test cases?

@jgm
Copy link
Owner Author

jgm commented Apr 28, 2024

@bdarcus citeproc doesn't have a notion of global affixes, does it? (by the way, the way citeproc-hs handles this is just by blocking re-ordering around an affix; at least that prevents misleading things from appearing.)

@bdarcus
Copy link

bdarcus commented Apr 28, 2024

@jgm:

citeproc doesn't have a notion of global affixes, does it?

You mean citeproc-js?

It does not.

It's an iteration that made it into org-cite. So it's supported in the org citeproc-el integration.

(by the way, the way citeproc-hs handles this is just by blocking re-ordering around an affix; at least that prevents misleading things from appearing.)

I hadn't thought of that, but that might be a reasonable alternative.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants