Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rmarkdown should be a markup language or lines of R in .Rmd files should count towards R programming language #5208

Closed
myoung3 opened this issue Feb 11, 2021 · 4 comments

Comments

@myoung3
Copy link

myoung3 commented Feb 11, 2021

Yes, I read #4119, but given that Jupyter notebooks are considered a language , Rmarkdown should be as well.

@myoung3 myoung3 added Add Language Good First Issue This is a great opportunity to start contributing to Linguist labels Feb 11, 2021
@myoung3
Copy link
Author

myoung3 commented Feb 11, 2021

Currently github considers Rmd files "prose," but there are situations where this makes for confusing language summary results. This repo is written almost entirely in .Rmd, but the language summary describes it as mostly javascript and zero R.

@lildude
Copy link
Member

lildude commented Feb 12, 2021

RMarkdown is already a language in that Linguist already knows about it and has an entry for it as pointed out in my comment at #4119 (comment) I think what you mean is having it considered as markup instead of prose so is considered when calculating the language statistics by default like Jupyter.

I'm apprehensive about doing this for precisely the reason @Alhadis mentioned:

It's interesting you mention Jupyter Notebook, because we've had quite a few complaints from users unhappy that their projects were classified as "Jupyter Notebook", simply because of one huge .ipynb file (which was mostly generated data, not source code).

I predict if we gave RMarkdown the same treatment, we'd get the same issues reported by users whose R projects became "RMarkdown" overnight.

To add to this apprehension, RMarkdown was added well over 7 years ago (as markup but then changed to prose the very next day without a PR) and this is the first request for this change. Suddenly changing it now will surprise a lot of long time users and not necessarily in a good way.

As such, I think we can keep this issue open for a while to see if it gains any traction with others wanting the same behaviour, but I'm not confident nor sure this is the right thing to do in the long run.

Currently github considers Rmd files "prose," but there are situations where this makes for confusing language summary results. This repo is written almost entirely in .Rmd, but the language summary describes it as mostly javascript and zero R.

As mentioned in #4119, you'll need to implement an override if you want your files to show in the stats.

@lildude lildude added Discussion and removed Add Language Good First Issue This is a great opportunity to start contributing to Linguist labels Feb 12, 2021
@myoung3
Copy link
Author

myoung3 commented Feb 12, 2021

Regarding the concern that counting Rmd lines might distort language percentages strangely, I think this is a reasonable concern. On the one hand:

Jupyter notebooks have an inflated number of lines of code, since they store a lot of metadata. So it doesn't take many notebooks to "take over" a project.
(from #3316)

The above actually isn't true for Rmd. Jupyter files are stored as json, whereas .Rmd are stored as plain text markdown, so there won't be inflation by metadata. On the other hand, Rmarkdown documents are often used as instructional pieces so they do tend to be "inflated" by a lot of actual prose.

Another option to consider might be to actually count lines of code within Rmarkdown documents that are explicitly R code chunks, and have that total contribute to the R language count. This would be more work than just counting lines of .Rmd to its own language (although doable https://www.r-bloggers.com/2018/05/create-code-metrics-with-cloc/), but is probably the safest and most reasonable approach to better reflect programming language use in repos given the concerns that you've raised.

@myoung3 myoung3 changed the title Rmarkdown should be a language Rmarkdown should be a markup language or lines of R in .Rmd files should count towards R programming language Feb 12, 2021
@lildude
Copy link
Member

lildude commented Feb 13, 2021

On the other hand, Rmarkdown documents are often used as instructional pieces so they do tend to be "inflated" by a lot of actual prose.

I think you may have hit the nail on the head there as to why this was switched to prose. If most files are mostly prose, it makes sense to mark them as such, especially as Linguist doesn't have the ability to do partial file analysis thus...

Another option to consider might be to actually count lines of code within Rmarkdown documents that are explicitly R code chunks

... is not possible. This is the same issue peeps have had with Jupyter as some want it shown as Python and not Jupyter.

I think the same explanation and solution I provided at #3316 (comment) applies here.

@lildude lildude closed this as completed Mar 9, 2021
@github-linguist github-linguist locked and limited conversation to collaborators Mar 9, 2021

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Projects
None yet
Development

No branches or pull requests

2 participants