To install stata_linter
, type ssc install stata_linter
and restart Stata.
This will install the most recent published version of stata_linter
.
The main version of the code in this repository (the master
branch) is what
is published on SSC as well.
To install the linter to run directly with Python and not via Stata, clone this repository and then run the following command on your terminal:
pip install -e src-py/
This will also install pandas
and openpyxl
if they are not currently installed.
- Stata version 16 or higher.
- Python 3 or higher
For setting up Stata to use Python, refer to this web page.
stata_linter
also requires the Python package pandas
and openpyxl
.
Refer to this web page to know more about installing Python packages.
The stata_linter
package works through the lint
command.
lint
is an opinionated detector that attempts to improve the readability and organization of Stata do files.
The command is written based on the good coding practices of the Development Impact Evaluation Unit at The World Bank.
For these standards, refer to DIME's Stata Coding practices and Appendix: The DIME Analytics Coding Guide of Development Research in Practice.
The lint
command can be broken into two functionalities:
- detection identifies bad coding practices in one or multiple Stata do-files
- correction corrects a few of the bad coding practices detected in a Stata do-file
Disclaimer: Please note that this command is not guaranteed to correct codes without changing results. It is strongly recommended that after using this command you check if results of the do file do not change.
lint "input_file" using "output_file", options
To detect bad practices in a do-file you can run the following:
lint "test/bad.do"
and on your Stata console you will get a summary of bad coding practices that were found in your code:
-------------------------------------------------------------------------------------
Bad practice Occurrences
-------------------------------------------------------------------------------------
Hard tabs used instead of soft tabs: Yes
One-letter local name in for-loop: 3
Non-standard indentation in { } code block: 7
No indentation on line following ///: 1
Missing whitespaces around operators: 0
Implicit logic in if-condition: 1
Delimiter changed: 1
Working directory changed: 0
Lines too long: 5
Global macro reference without { }: 0
Use of . where missing() is appropriate: 6
Backslash detected in potential file path: 0
Tilde (~) used instead of bang (!) in expression: 5
-------------------------------------------------------------------------------------
If you want to get the lines where those bad coding practices appear you can use the option verbose
. For example:
lint "test/bad.do", verbose
Gives the following information before the regular output of the command.
(line 14): Use 4 white spaces instead of tabs. (This may apply to other lines as well.)
(line 15): Avoid to use "delimit". For line breaks, use "///" instead.
(line 17): This line is too long (82 characters). Use "///" for line breaks so that one line has at m
> ost 80 characters.
(line 25): After declaring for loop statement or if-else statement, add indentation (4 whitespaces).
(line 25): Always explicitly specify the condition in the if statement. (For example, declare "if var
> == 1" instead of "if var".)
...
You can also pass a folder path to detect all the bad practices in all the do-files that are in the same folder.
If you would like to correct bad practices in a do-file you can run the following:
lint "test/bad.do" using "test/bad_corrected.do"
In this case, the lint command will create a do-file called bad_corrected.do
.
Stata will ask you if you would like to perform a set of corrections for each bad practice detected, one by one.
You can add the option automatic
to perform the corrections automatically and skip the manual confirmations.
It is strongly recommended that the output file has a different name from the input file, as the original do-file should be kept as a backup.
As a result of this command, a piece of Stata code as the following:
#delimit ;
foreach something in something something something something something something
something something{ ; // some comment
do something ;
} ;
#delimit cr
becomes:
foreach something in something something something something something something ///
something something { // some comment
do something
}
and
if something ~= 1 & something != . {
do something
if another == 1 {
do that
}
}
becomes
if something ~= 1 & something != . {
do something
if another == 1 {
do that
}
}
You can use the following options with the lint
command:
-
Options related to the detection feature:
verbose
: show all the lines where bad practices appear.nosummary
: suppress the summary of bad practices.excel()
: export detection results to Excel.
-
Options exclusive to the correction feature:
automatic
: correct all bad coding practices without asking if you want each bad coding practice detected to be corrected or not.replace
: replace the existing output file.force
: allow the output file name to be the same as the name of the input file (not recommended).
-
Options for both features:
indent()
: specify the number of whitespaces used for indentation (default is 4).linemax()
: maximum number of characters in a line (default: 80)tab_space()
: number of whitespaces used instead of hard tabs (default is 4).
-
Use soft tabs (i.e., whitespaces), not hard tabs: Use white spaces (usually 2 or 4 whitespaces are used) instead of hard tabs. You can change this option in the do-file editor preferences.
-
Avoid using abstract index names: In for loops, index names should describe what the code is looping over. Hence, for example, avoid coding like this:
foreach i of var cassava maize wheat { }
Instead, looping commands should name the index local descriptively:
foreach crop of var cassava maize wheat { }
-
Use proper indentations: After declaring a for loop statement or if-else statement, add indentation with whitespaces (usually 2 or 4 whitespaces).
-
Use indentations after declaring newline symbols
///
: After a new line statement(///)
, add indentation (usually 2 or 4 whitespaces). -
Use
!missing()
function for conditions of missing values: For clarity, use!missing(var)
instead ofvar < .
orvar != .
-
Add whitespaces around math symbols (
+
,=
,<
,>
): For better readability, add whitespaces around math symbols. For example, writegen a = b + c if d == e
instead ofgen a=b+c if d==e
. -
Specify the condition in the if statement: Always explicitly specify the condition in the if statement. For example, declare
if var == 1
instead ofif var
. -
Do not use
delimit
, instead use///
for line breaks: More information about the use of line breaks here. -
Do not use the
cd
command to change the current folder: Use absolute and dynamic file paths. More about this here. -
Use line breaks for too long lines: For lines that are too long, use
///
for line breaks and divide them into multiple lines. It is recommended to restrict the number of characters in a line under 80. Though sometimes this is difficult since, for example, Stata does not allow line breaks within double quotes, try to follow this rule when possible. -
Use curly brackets for global macros: Always use
${ }
for global macros. For instance, use${global}
instead of$global
. -
Include missing values in condition expressions: Condition expressions like
var != 0
orvar > 0
are evaluated to true for missing values. Make sure to explicitly take missing values into account by usingmissing()
in expressions. -
Check if backslashes are not used in file paths: Check if backslashes
(\)
are not used in file paths. If you are using them, then replace them with forward slashes(/)
. -
Check if tildes
(~)
are not used for negations: If you are using tildes(~)
for negations, replace them with the bang symbol(!)
.
The correction
feature does not correct all the bad practices detected by detect
.
It only corrects the following:
- Replaces the use of
delimit
with three forward slashes (///
) in each line affected bydelimit
- Replaces hard tabs with soft spaces (4 by default). The amount of spaces can be set with the
tab_space()
option - Indents lines inside curly brackets with 4 spaces by default. The amount of spaces can be set with the
indent()
option - Breaks long lines into two lines. Long lines are considered to have more than 80 characters by default, but this setting can be changed with the option
linemax()
- Adds a whitespace before opening curly brackets, except for globals
- Removes redundant blank lines after closing curly brackets
- Removes duplicated blank lines
If the option automatic
is omitted, lint
will prompt the user to confirm that they want to correct each of these bad practices only in case they are detected. If none of these are detected, it will show the message:
Nothing to correct.
The issues lint is able to correct are not present in your dofile.
No output files were generated.
To minimize the risk of crashing a do-file, the correction
feature works based on fewer rules than the detection
feature.
That is, we can can detect more bad coding practices with lint "input_file"
in comparison to lint "input_file" using "output_file"
.
Therefore, after writing a do-file, you can first detect
bad practices to check how many bad coding practices are contained in the do-file and later decide whether you would like to use the correction feature.
If there are not too many bad practices, you can go through the lines flagged by the detection
feature and manually correct them.
This also avoids potential crashes by the correction
feature.
If there are many bad practices detected, you can use the correction
feature first to correct some of the flagged lines, and then you can detect
again and correct
the remaining bad practices manually.
We strongly recommend not overwriting the original input do-file so it can remain as a backup in case correct
introduces unintended changes in the code.
Additionally, we recommend checking that the results of the do-file are not changed by the correction feature.
If you are familiar with GitHub go to the Contributions section below for advanced instructions.
An easy but still very efficient way to provide any feedback on these commands is to create an issue in GitHub. You can read issues submitted by other users or create a new issue in the top menu below worldbank/stata-linter. If you have an idea for a new command, or a new feature on an existing command, creating an issue is a great tool for suggesting that. Please read already existing issues to check whether someone else has made the same suggestion or reported the same error before creating a new issue.
While we have a slight preference for receiving feedback here on GitHub, you are still very welcome to send a regular email with your feedback to dimeanalytics@worldbank.org.
If you are not familiar with GitHub see the Bug reports and feature requests section above for a less technical but still very helpful way to contribute to stata-linter.
We appreciate contributions directly to the code and will give credit to anyone providing contributions that we merge to the master branch. If you have any questions on anything in this section, please do not hesitate to email dimeanalytics@worldbank.org.
The files on the master
branch are the files most recently released on the SSC server.
README, LICENSE and similar files are updated directly to master
in between releases.
All the other files are updated in the develop
branch before being merged into master
.
Check out the develop
branch if you want to see what future updates we are currently working on.
Please make pull requests to the master
branch only if you wish to contribute to README, LICENSE or similar meta data files.
If you wish to make a contribution to any other file, then please do not use the master
branch.
Instead, please fork this repository from develop
and make your pull request to that branch.
The develop
branch includes all minor edits we have made to already published commands since the last release that we will include in the next version released on the SSC server.
stata_linter is developed under MIT license. See http://adampritchard.mit-license.org/ or see the LICENSE
file for details.
Luis Eduardo San Martin (dimeanalytics@worldbank.org)
This command is developed by DIME Analytics at DIME, The World Bank's department for Development Impact Evaluations.
DIME is the World Bank's impact evaluation department. Part of DIME’s mission is to intensify the production of and access to public goods that improve the quantity and quality of global development research, while lowering the costs of doing IE for the entire research community. This Library is developed and maintained by DIME Analytics. DIME Analytics supports quality research processes across the DIME portfolio, offers public trainings, and develops tools for the global community of development researchers.
Other DIME Analytics public goods are:
- Development Research in Practice: the DIME Analytics Data Handbook
- DIME Wiki: a one-stop-shop for impact evaluation resources
- ietoolkit: Stata package for impact evaluations
- iefieldkit: Stata package for primary data collection
- Stata Visual Library
- R Econ Visual Library
- DIME Research Standards: DIME's commitments to best practices