-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The Hard Tabs Issue #544
Comments
You need to configure your editor to use unix line endings to write Zig code. Additionally, you need to configure your editor to use spaces instead of hard tabs for indentation.
The rationale for only supporting unix line endings and no hard tabs is part of the "only one obvious way to do things" philosophy. From a practical perspective, never having windows line endings makes it easier to write tools that read zig source files. For example, a tool that searches for "\n\n// TODO" and replaces it with something else that includes newlines: it's much easier to do this without worrying about newline style. Furthermore, Variable newline style and variable indentation style are features that Zig does not support. Is this documented anywhere, or are users just expected to run into cryptic errors like this? Like a CR character doesn't even print properly in a terminal. |
I'm not sure if Zig does this on purpose or it was overlooked, but as far as I'm concerned this is a good feature. Different styles of indentation and line endings cause endless headaches when working on a collaborative project, for example using source control software such as git, etc. I know notepad is the default text editor in windows, but nearly all developers use something else to write code, such as notepad++ or visual studio. VS Code is also a good option. It's developed by Microsoft, and it's free. |
Nim does it right:
Multiline string should insert LF newlines, as in C. If someone wants CR he could add it via \r. Correct newlines are is not just problem of stupid Notepad: if one copy pastes example from webpage he gets CR/LF too. Imagine someone failing with Hello World. |
Pasting into what editor does that? |
@thejoshwolfe: Notepad, Sublime Text 2 and probably anything else. I do not know Windows editor which by default uses Unix line ending and converts to this style automatically. |
@PavelVozenilek Are you saying that when an editor is configured to use unix line endings or is editing a file that already has consistent unix line endings, then pasting from a webbrowser inserts the wrong kind of line ending? Or are you just saying that windows line endings are typically the default line ending style before you configure it in your editor? |
@thejoshwolfe: the editors I know (VC++, Sublime Text 2, Notepad ...) do not have configuration option to force Unix ending everywhere from now. At the best one switch it manually file by file. Programable editors like vim, probably, but I tend to avoid tools smarter than me. I do not understand why this is even a problem. Line ending chaos is real, it won't go away, pragmatic solution (accept all) is easy and then the mess disappears from the view of ordinary user. |
@thejoshwolfe: I think it is the browser that does this, not the editor. HTML is defined as using \r\n not \n. Most browsers let you get away with it on input, but when you copy and paste I think it recreates "correct" HTML from the DOM. Not sure about this, but I have run into the problem consistently. I think @PavelVozenilek has a point. Every useful editor can manage to translate the line endings just fine but few allow you to do it at a project level and change everything automatically. However, the two main platforms, Windows and Mac, do not use the line ending convention that Zig uses. I happen to use Linux, but that is a minority platform. I also tend to like the use of tools like go-fmt simply because it completely eliminates an entire class of bike-shedding. I've wasted too much time fighting about formats over the years. It is not a winnable war unless you create something like go-fmt. |
I just did some experimentation on my Windows machine. Here's what I found: Eclipse and Notepad++ normalize line endings when you paste text into a file. Each file is determined to be in a particular style, and anything you put into the file through typing or pasting gets normalized to that style. In Visual Studio, when you press enter, it uses the newline style of the lines around your cursor. When you paste code with CRLF line endings into any file, you get CRLF line endings for the text that you're pasting without affecting the surrounding text. If you save the file, it saves with mixed line endings without warning you. (You can convert line endings while saving through the "Advanced Save Options".) Visual Studio has no option to automatically normalize line endings on paste or on save. If you want normalized line endings, you gotta do it every time you save. When you open a file in Visual Studio that has mixed line endings, you get a dialog that prompts you to normalize all the line endings to one style or another. This is not a bug thread for Visual Studio, but that is a Visual Studio bug. Why would they let you create mixed line endings without warning, but then warn you when you open a file that has mixed line endings? This leads to a "best practice" where you should close and reopen all your files before making a commit to make sure you're not committing files that will produce warnings, which is just silly. This is a bug/missing feature in Visual Studio. I don't know about Sublime Text; it's not free. Meanwhile in Linux, copying text out of a web browser always seems to result in unix line endings, not windows line endings. I don't know where you're getting the idea that HTML uses windows line endings; I don't see it in the spec. Maybe you mean HTTP headers? There are parts of the HTML spec that talk about normalizing to CRLF, but I can't figure out how to observe that as an end user. I tried copy-paste and drag-drop text from a Google search page and from the textarea editor I'm typing this in right now, but I always got unix line endings (tested in Chrome). The so-called "Mac"-style line ending style actually refers to pre-2005 "classic" Mac OS-9 line endings. Modern Mac uses Unix line endings, just like Linux. |
This kind of reasoning leads to JavaScript's automatic semicolon insertion. This kind of reasoning has proven to be very successful at getting widespread adoption. This kind of reasoning is also contrary to the Zen of Zig. In Zig, the code author is required to do more work so that code readers are required to do less work.
Some kind of |
@thejoshwolfe, thanks for running the experiments on cut-and-paste from browsers. Interesting that you did not get the CRLN combos. It has been a while since I cared to check and I tend to set all my editor tools to use LN only on save. As you note, Visual Studio is perhaps not the example of what to do :-) Not sure how I feel about the idea of having the compiler reject code that is not in the One True Format(tm). While I like the idea that all Zig code would be formatted the same, that might be a little too draconian. For Python this almost works because indentation matters and if someone enters code using both tabs and spaces the meaning is ambiguous. I think your example of JavaScript's semicolon insertion is taking this a bit far. The semicolon insertion (IMO) is an abomination because it can be wrong and change the intended meaning of the code. I do not the see the same thing with handling CRLN, LN or CR as white space. If format is so important that you would want to make it enforced by the compiler, then perhaps the syntax should be closer to Python? I mean this in all seriousness. I think Guido van Rossum did something really interesting when he decided to make the visual layout elements of Python have meaning at the language level. Python code is not formatted all the same, but even without a python-fmt tool, the code from different projects has more formatting similarities than code in most other languages. I think van Rossum made a mistake in allowing tabs. |
Yes. My idea is to have both C-like curly braces and Pythonic indentation, and they must agree. Curly braces are arguably easier for tools to understand, and indentation is absolutely easier for humans to understand, so I want both. Curly braces enable things that you can't do with just indentation. And as for the compiler enforcing indentation rules, come on, you should always get indentation right; no excuse for wrong indentation; it's not that hard, and it makes a huge difference for readability. A neat advantage of having strict indentation rules and curly brace block scopes is that you can have better compile errors for unbalanced curly braces, which is something that is especially chaotic in C and Java. fn SomeClassThing(comptime T: type) -> type {
struct {
const Self = this;
field: T,
fn method(self: &const Self) {
{var i = u32(0); while (i < self.field.len) {
self.field.something(i);
}
} // ERROR: missing '}', or wrong indentation
// At this point, the compiler can trust the indentation
// rather than the curly braces for parsing the rest of the file.
}
} In practice, indentation tends to be more correct than curly brace balance. This is especially relevant for IDE's where the tooling is trying to follow along with you as you type. Unbalanced parentheses, quotes, curly braces, etc. are very common while you're in the midst of typing code. By contrast, wrong indentation is much less common. Usually the indentation is wrong if you past/move a bunch of code at once, and in that case, you can have an IDE hotkey to trust the curly braces and fix the indentation; then everything's back in agreement. Generally there are two facets to code formatting: readable for tools and readable for humans. C leans toward readable for tools (curly braces, etc.); Python leans toward readable for humans (indentation, etc.); Zig wants to have it both ways, and so has two sets of formatting rules that must be in agreement for your code to compile. (As a reminder, this is an informal plan for a future version of Zig, not status quo.) Related is #114.
Absolutely agree. It's horrifying how ugly you can make "correct" indentation in Python by mixing spaces and tabs, even in the same line. What a mess.
I have high hopes for this strategy. We've already seen some people scared away by Zig's decision to not support hard tabs, which is a shame. But on the plus side, all Zig code will be consistent with this kind of design philosophy. |
@thejoshwolfe, doesn't the use of both curly braces and indentation violate the DRY principle? If one of them is wrong, which one? I think this will add to the cognitive load of the programmer before he or she even thinks about the logic of the code itself. One of the things I like about Python is that it showed you can have both human friendly and machine friendly syntax at the same time. Parsing Python is not markedly harder than parsing a brace-heavy language. Tooling has become intelligent enough that pleasing the human far outweighs pleasing the machine. If Zig is to become a useful replacement for C, and I think it has many parts that are very positive, putting too many barriers in the way of adoption could be a problem. The balance that the Go creators did with go-fmt ended up being a pretty good one. Use of go-fmt is not actually required, but your code is going to be heavily criticized and not reused if it isn't used. I think use of an enforced indentation scheme and providing a tool like zig-fmt would go a very long way to stopping the bike-shedding and help a lot in making all code heavily reusable. For instance you could simply mandate that all indentation is three spaces per indent level. Fine, 99% of all editors can handle that right now. Mandating that you must have curly braces and that the indentation of the code must also match is not something existing editors are going to help with. That said, using indentation as a hint that the programmer missed a curly brace? That would be a good thing. I think some editors may do that now. We catch the misaligned indentation by eye easier than the missing curly braces. Obviously this is all IMO! |
Yes, and I think this is a good time to violate that principle. DRY taken to the extreme leads to Haskell's complete type inference, which is very hard to read. Information duplication is only a problem because it's more work to do, which Zig is ok with forcing on authors, and because it can create conflicting information:
When you're trying to compile your code, probably the indentation is right (still a compile error though). When you're trying to autoformat your code, probably the curly braces are right.
It doesn't seem like much to ask of a programmer to get their indentation right before trying to compile their code. I'm always careful to keep my indentation correct, even if when it's not a compile error, because it makes the code easier to read. An error for incorrect indentation would add 0 cognitive load for me, but if you're not used to being careful to keep your indentation correct, perhaps have your
Maybe I'm just bad at it, but I find writing indentation-scoped parsers to be much harder than start/end token-scoped parsers.
I still want to consider people creating new tools. There are lots of cases where you'll want to make a machine that reads Zig code, e.g. custom linters, syntax highlighters, even a one-off
Vim can already do this. The I don't think curly-braces-to-indentation is an outrageous feature to expect editors to have. And again, I don't think indentation is very difficult to get right manually in the first place. |
As an example of how easy Zig is to comprehend with tools, here's a perl one-liner that deletes the content of all the free-form text you can find in Zig code (
Even if you don't understand that mess, do notice how short it is. You can't get anything near that simple for C/C++/Java/JavaScript/C# (due to multiline comments), Python (due to multiline string literals), JavaScript/Ruby (due to template strings), PHP/Perl (I don't even want to know), etc. This tokenization simplicity is one reason why Zig does not support And by newline, I mean EDIT: Just for fun, here's some code in Chrome's debugger console that tries to understand JavaScript source code using simple regex. JavaScript is way too complex for that to work, and you can observe lots of misbehavior in that area if you poke at it long enough. This serves as just one counterexample to the "Tooling has become intelligent enough" idea, fwiw. |
What is the use case for tools massaging source code? Qt does it because C++ is lacking usable metaprogramming, but it is hated and very clumsy to use within IDE. If one-true-newline-rules-them-all is really that important feature then I suggest to switch to CR/LF everywhere. Number of Windows programmers dwarfs the others, and they are not used to accommodate to other platforms. |
That's not a feature by Zig's standards though :) |
The biggest reason to enforce an indentation and line endings is that it eliminates energy spent on debating what the standard should be, since the standard is enforced by the compiler. It's unfortunate that a set of people will have to configure their editors beyond the defaults, but that is necessary for one or the other standard to be selected. It's not my intention to shut down any discussion, but I would posit the thought in everybody's heads who is involved in this thread: is this how you want to spend your time, discussing whitespace? Or do you want to challenge yourself, and switch over to figuring out some of the more fundamental engineering problems that this project is trying to tackle? |
Pros for CRLF: Notepad support. Visual Studio users can be sloppy. You usually don't need to change your native-Windows editor's configuration from the default. Pros for LF: Easier to write tools that scan for LF than for tools that scan for CRLF. Easier to write tools that produce This is just a start, but the pattern is that LF is more friendly to programmers, and CRLF is more friendly to windows users who don't know any better. In other words, LF is better for advanced users, and CRLF is better for adoption. As an advanced user, I vote for favoring advanced users.
The number of bad programmers dwarfs the others too, and I'm not sure I want to cater to bad programmers. Sure it's better for adoption, but compromising to increase adoption is not in line with Zig's goals. |
The issue with errors like this is, when a new user like me downloads Zig and starts coding in Visual Studio Code ( Windows )... and get this error, the result is confusing. Spend first 10 minutes trying out other examples, to run into the same issue. Still did not figure it was file issues. My first idea? Must be a bug in Zig... In simple terms, the error message is inadequate and needs to be much more clear. |
Made some improvements (in the above pull request) to these error messages that should handle the most obvious cases and hopefully help a user diagnose exactly where the problem is a little better. Open to any wording changes or extra special cases if they are considered noteworthy. Regardless of the stance on line endings, hopefully this helps. |
That's a reasonable stance. |
@andrewrk Will tabs be allowed at some point? |
Any updates on this? It's extremely annoying to have the compiler enforce a coding style on you. Even more so than dealing with Rusts borrow checker. |
|
I'd have my two cents to add regarding this topic but if zig fmt can now handle it correctly, it should be far less invasive. Thank you for the quick response. |
Just a casual observation, but making this preference a syntax error is a sure fire way to guarantee the bikeshedding about it never ends, at best. |
I don't think you can ever end the bike-shedding. The difference is that since Zig is choosing to only support one format, developers no longer have a decision to make or debate on a project-by-project basis. The bike-shedding is now centralized :) |
They do have a decision to not use the language tho x) |
Spaces-only is a real usability issue. Spaces are highly programmer-unfriendly and only work in some way with fancy editor configurations. Let's try to compare in an objective way: Tabsadvantages
disadvantages
Spacesadvantages
disadvantages
To be honest, I never saw a convincing argument for spaces. It just makes no sense to not use a key that was designed exactly for that and mimicry tabs with soft tabs and alternatives. "Looks everywhere the same" is exactly what you don't want and what brought us the indentation mess. I'd highly appreciate it if that decision would be reconsidered. |
To play devil's advocate (I think tabs should be supported), there are two IMO somewhat convincing arguments against them, and so for spaces:
void foo() { // this function is silly
if (1) // as is this condition
printf("hi"); // but at least it's friendly
} A middle ground would be to add a warning flag like |
2 of course (as other compilers do and IDEs expect as error location). Even for fancy arrows in the error message it's simple: copy the affected line, cut at error and replace all non-white-space code points with space.
This only happens when block comments span different indentation levels. This is a code smell and breaks with every refactoring. (You don't want to check if comments are aligned in every location when you rename a variable, right?) If you're creating ASCII art or quines - fine, but don't use it in real-world code. And again, this happens with spaces too: try to integrate such sections into documents with different indention requirements... |
nope.
nope. it's tabs that are rendered strangely when you prefix every line with a
nope. the tab key and character were originally designed to align the cells of a table, not indent structured code. The original purpose of the tab character was to appear in the middle of a line, which is today considered bad practice (at least before the rise of elastic tabstops). |
Because it's a browser and not an editor. A proper in-browser code editor supports tabs (and monospace) - see GitHub's online editor. There are many plugins, key combinations etc. to solve this if one really wants to write code in a browser?! I'd consider an editor that is not able to even work with the ASCII character set (minus weird control characters) as broken.
I see what you mean, although it's not rendered "strangely" - it stops exactly at the same level. I thought more about small indentations (1-2 spaces) where you can't make out the indentation level. With tabs you can just pipe it through
That's exactly aligning at fixed indentation levels... (tabs on typewriters were also used to indent paragraphs or lists, not just tables). |
I think people are not aware of zig's stance on hard tabs. I updated the wiki page to make it more clear: |
Neither of those is strictly true. First off, grammatically speaking the word less does not fit there, and should be replaced with fewer; secondly and more practically, you can use the tab key to insert spaces even in most plain text editors that I've used. Conversion from tabs to spaces is possible, but the inverse is true as well: the text editor I use literally has the options to go back and forth right next to each other. Also, it's trivial with any good find+replace system to go in either direction.
Yeah, and? That's not an advantage. To qualify as an advantage, it can't be in the center of a venn diagram, it has to be on a specific side. Both tabs and spaces work in every editor.
3 bytes per indentation level is not nearly large enough to be a serious concern. You might as well complain that comments require two characters, or that Zig has no multiline comments and therefore a ten-line comment requires twenty characters instead of e.g. four. It's just not a real issue regardless.
That's not a disadvantage of spaces; again, that argument could easily be made in either direction. Furthermore, zig uses spaces right now, and you'd be hard pressed to claim that this is an issue with any zig code at the moment.
Firstly, this is only an issue if you care about supporting both spaces and tabs. Secondly, it's not even true. I've literally done it dozens of times in the past.
Which exists, that's literally the point of having it be compiler enforced. There's also many more advantages and disadvantages missing for both sides of the argument in that post. |
No grammar policing please. Plenty of people around here have English as a second language and teaching English is off topic. Just try to understand intent. |
Thanks, fixed.
This misuse led to mixes of tabs/spaces in many code bases.
That's wrong. You can't go from spaces to tabs without a syntax-aware formatter. Find/replace is simply not capable of distinguishing between indentation and alignment.
I wouldn't consider it "working" from a usability perspective when I have to press 24 times space to reach the 3rd indentation level (or try to hit it with auto-repeat).
I prefer a tab to be 8 spaces wide on most screens...
No. Press a space/tab - get a space/tab. Everything else is just a misconfiguration to cope with usability issues of spaces and leads to mixed up indentations (see above).
It is a usability and more importantly an accessibility issue (think of the limited space on a braille display or the inflexibility to change indentation width).
The compiler doesn't enforce anything. You can indent your code in any way as long as you use spaces. You have to agree on 1/2/3/4/8 spaces for indentation per project. With tabs that's not a problem at all.
Please list the most important ones. |
@pixelherodev the tabs vs spaces thing isn't for aesthetics it's very important for accessibility, tabs are better for accessibility because some users use huge font sizes to be able to see in that case they need to adjust their tab width, because at larger font sizes it becomes harder to even see the spaces. Checkout this post which goes into the issue a bit more in detail. |
I wanted to try to experiment with this language, but the fact that it imposes no tabs, no Windows newlines and in general other coding style issues which will cause me to add 2 more days for a medium size project just to fix these things that shouldn't need fixing as they don't affect the reliability and correctness of the compiled code at all (I am fine with Rust's borrow checker), I'm not going for this anymore. (I would make a "fork" of this language without these... "Political" I shall call them even if not related to actual politics, issues). I probably will work with 4 spaces per soft tab/indent level. It's fine. But I want my editor (usually an IDE) to be able to give me the proper spaces, automatically convert all preexisting tabs to spaces at a 4-wide alignment (I agree that mixed tabs and spaces are not a good idea), and to be able to delete an indent level with a single backspace character instead of 4 (assuming my style). The only context where I use a different style is Linux kernel, which has its own coding style, imposed things AND the statement that you may break the rules where it makes sense. And this is the most important factor in imposing the rules at compile time (with an error; warnings may be fine as long as you can locally override them) -- you will be unable to break the rules where it makes sense to do so. |
I find this hard to believe.
I use vscode, and it does this. (bottom right - Spaces: 4, UTF-8, LF)
Zig fmt is optional, and can be turned off for top-level declarations with
and I'm not sure where you'd want to break these rules. Zig fmt is also fairly lenient. |
On the UTF-8 one I fully agree, it's non-controversial. I fully agree with the premise. Skipping the BOM might be a good feature though, which can be done before giving the characters to the tokenizer (and also, the BOM is a valid UTF-8 character with the value 0xFEFF, which can be conditionally skipped if it's the first one). You can even deny overlong forms of characters (ASCII characters should always be 1 byte), that too makes sense. I won't insist on this. On Windows newlines, I mostly agree, though again simply skipping the character before the tokenizer (and a stray \r that isn't followed by \n would therefore not be considered a newline) -- so it isn't even part of strings unless escaped in the \r form -- might be an easy solution. Most tools can skip \r on their own as well and, if not, you could run dos2unix on said file anyway. Again, you could run dos2unix on .zig files before compiling or as an added build step so I won't insist either. On the no tabs one it's a bit more complicated. I'd argue that you should default to no tabs BUT allow support for them in larger projects (not single-file projects) by having some sort of configuration parameter or command line switch to allow tabs (and their width), though only at the beginning of the line (tabs following non-tab characters on the same line can be forbidden just fine). For example build.zig could get by with no tabs at all, and it could have one configuration option that tabs are x spaces wide (which "zig fmt" would also obey). Also preferring 3 spaces per indent level, that's a bit odd (you're the first that I've seen with such a preference, being used to 4 spaces in most projects and 8-wide tabs on the Linux kernel specifically). I'm not sure there are tools that could do this preprocessing either so that we can still fit within our own coding style specifications. |
@exoticus Thanks for the link. Accessibility sways me instantly. Tabs win IMO. |
The difference is that developers just won't bother to use Zig))) |
This thread has nothing useful left to offer. Here's the FAQ entry pasted: Why does
|
Hi,
(This report is based on the v0.1.1 Win64 binary artifact from the Zig website.)
I noticed that if I create a Zig source file in Windows with a native editor (eg Notepad), the compiler complains about line endings:
If I manually kill the newlines (resulting in the code being all on one line) it compiles.
I tried using Vim in a Cygwin shell and the file it wrote also compiled without complaint (presumably Unix-style newlines, as Notepad renders that file on one line while Vim looks correct).
The text was updated successfully, but these errors were encountered: