Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add STAR language and .star file extension for Starlark #4840

Merged
merged 14 commits into from
May 26, 2022

Conversation

mcuadros
Copy link
Contributor

@mcuadros mcuadros commented Apr 11, 2020

Description

This PR's adds .star as an extension to the Starlark language. The .star extension is the default extension for Starlark scripts not related to Bazel, this is growing every day since many projects are embedding the language in their own projects.

This extension is used in all official interpreters in the examples and as well in the testdata.

Also is the official extension recommended by many opensource projects:

This was discarded initially at #4759 (comment)

The amount of .star files currently is 30k, since the PR instrudicing the language the usage has growth a 50%.

Checklist:

@lildude has also added support for STAR as it also has the .star extension which is very popular on GitHub:

@pchaigno
Copy link
Contributor

I don't think this pull request addresses the concerns expressed in #4759. The issue was not with the in-the-wild usage, but with the presence of many other .star files. All these files will be recognized as Starlark files with the pull request as is.

@mcuadros
Copy link
Contributor Author

mcuadros commented Apr 19, 2020 via email

@pchaigno
Copy link
Contributor

Sorry, but if I am not mistaken, in linguist are many extension collisions,
that are resolved with the classifier. For example .md.

So all this files are not going to be identified as Starlark since a sample
is provided for the classifier. I am wrong?

That is correct, but requires us to also have support in Linguist for the conflicting languages. In the case of .md, Linguist knows both GCC Machine Description and Markdown and can use the classifier (and other, more precise strategies) to distinguish between the two. So for .star, we'll need to identity the conflicting languages and add them to Linguist if they meet usage requirements.

@mcuadros
Copy link
Contributor Author

mcuadros commented Apr 19, 2020 via email

Alhadis added a commit to file-icons/atom that referenced this pull request Apr 22, 2020
@stale
Copy link

stale bot commented May 19, 2020

This pull request has been automatically marked as stale because it has not had recent activity, and will be closed if no further activity occurs. If this pull request was overlooked, forgotten, or should remain open for any other reason, please reply here to call attention to it and remove the stale status. Thank you for your contributions.

@stale stale bot added the Stale label May 19, 2020
@stale
Copy link

stale bot commented Jun 2, 2020

This pull request has been automatically closed because it has not had activity in a long time. Please feel free to reopen it or create a new issue.

@stale stale bot closed this Jun 2, 2020
@fkorotkov
Copy link

@pchaigno I want to bump this issue. What will require for us to "identity the conflicting languages"?

I made a search query to look for recently indexed *.star files, checked a few first pages and it seems majority of cases are Startlark cases. The rest is some data files or files without a particular pattern.

@mahmoudimus
Copy link

bump! seconding @fkorotkov. /cc @lildude @pchaigno - any way we can get this looked at?

@lildude
Copy link
Member

lildude commented Jun 8, 2021

bump! seconding @fkorotkov. /cc @lildude @pchaigno - any way we can get this looked at?

Things are still in the same place as it was... you still need to identify the other users of the .star extension. From a very quick look, there's one particular use of .star files easily identifiable that is pretty active with nearly 8000 files and each file appears to occur only once per repo making it a clear example that qualifies for inclusion else it would be incorrectly identified. I have no idea what language it's written in, but it appears to be related to machine learning from the few READMEs I've looked at so it will be noticed very quickly if it's misidentified.

You don't have to identify every user of the .star extension, but an attempt needs to be made to at least identify the most popular (probably just the one I've found), and then add them (it) as part of this PR (I've re-opened it).

@lildude lildude reopened this Jun 8, 2021
@stale stale bot removed the Stale label Jun 8, 2021
@lildude lildude requested a review from a team as a code owner June 8, 2021 14:01
@fkorotkov
Copy link

fkorotkov commented Jun 8, 2021

Just for a reference we at Cirrus CI started using Starlark along side YAML for allowing to programmatically configure CI builds. So there will be more .cirrus.star and lib.star files on GitHub in the near future. 😅

We also created a plugin for IntelliJ platform to support code assistance for .star files.

@rohansingh
Copy link

@lildude Just so I understand, if we go through and identify all the conflicting languages that meet usage requirements, do they need to be added as new languages in this same PR?

@lildude
Copy link
Member

lildude commented Jun 25, 2021

@rohansingh I think you may have missed the last sentence in my last response:

You don't have to identify every user of the .star extension, but an attempt needs to be made to at least identify the most popular (probably just the one I've found), and then add them (it) as part of this PR (I've re-opened it).

@rohansingh
Copy link

rohansingh commented Jun 25, 2021

Sorry, reading comprehension failure. Anyway, sounds good, I'll try to get to this today.

update: Unfortunately I don't see myself getting around to this after all. Just using .gitattributes as a workaround for now.

Copy link
Member

@lildude lildude left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marking this as still needing changes

@lildude lildude changed the title Add file extension for Starlark Add STAR language and .star file extension for Starlark May 26, 2022
@lildude
Copy link
Member

lildude commented May 26, 2022

You don't have to identify every user of the .star extension, but an attempt needs to be made to at least identify the most popular (probably just the one I've found), and then add them (it) as part of this PR (I've re-opened it).

I've done this. I've identified the other most commonly used language is Self-defining Text Archive and Retrieval commonly referred to as STAR.

I've pushed changes to your repo to add support to this PR.

@lildude lildude requested review from lildude and Alhadis May 26, 2022 11:34
Copy link
Collaborator

@Alhadis Alhadis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That pro-dna.nmr.star sample is huge (8,613 lines / 145 KBs). Sure we can't trim it down a bit? I noticed that a lot of sections repeat the same data.

@lildude
Copy link
Member

lildude commented May 26, 2022

That pro-dna.nmr.star sample is huge (8,613 lines / 145 KBs). Sure we can't trim it down a bit? I noticed that a lot of sections repeat the same data.

🤦 And I point this out myself so often 😆 I'll get the ✂️ out.

@lildude
Copy link
Member

lildude commented May 26, 2022

@Alhadis replaced it with a smaller sample.

@lildude lildude merged commit 3d6b57d into github-linguist:master May 26, 2022
@github-linguist github-linguist locked as resolved and limited conversation to collaborators Jun 17, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants