Skip to content

Contributor guidelines

Ian Hellen edited this page Oct 15, 2022 · 17 revisions

Contributor Guidance:

Contributions of improvements, fixes and new features are all welcomed. Whether this is your first time contributing to a project or you are a seasoned Open-Source contributor we welcome you contribution. In this guide you can find a few pointers to help you create a great contribution.

What to contribute:

There are many things that can make a good contribution. It might be a fix for a specific issue you have come across, an improvement to an existing feature that you have thought about such as a new data connector or threat intelligence provider, or a completely new feature.

If you don’t have a specific idea in mind take a look at the Issues page on GitHub: https://github.com/microsoft/msticpy/issues

This page tracks a range of issues, enhancements, and features that members of the community have thought of. The MSTICPy team uses these issues as a way to track work and includes many things we have added ourselves.

The issues are tagged with various descriptions that relate to the type of issue. You may see some with the ‘good first issue’ tag. These are issues that the team think would make a good issue for someone contributing to MSTICPy for the first time, however anyone is welcome to work on any Issue. If you decide to start working on an Issue please make a comment on the Issue so that we can assign it to you and other members of the community know that it is being worked on and don’t duplicate work. Also if you are unclear about what the Issue feel free to comment on the Issue to get clarification from others.

What makes a good contribution

Whilst there is no one thing that makes a contribution good here are some guidelines:

Single Issues

Focus your contribution on a single thing per PR raised, whether it be a feature or a fix. If you have multiple things you want to contribute considers splitting them into multiple PRs. Keeping it to a single item makes it easier for others to see what you are contributing and how it fits with the rest of the project.

Documentation

Make it clear what you are contributing, why its important, and how it works. This provides much needed clarity for others when reviewing contributions and helps to highlight the great value in your contribution.

Test Coverage

Ensure your contribution has the highest possible of test coverage. You should aim for a least 80% coverage and ideally reach 100%. If you can’t reach 80% for what ever reason let us know when you raise a PR and we can work with you on this.

Using Git

To contribute you will need to fork the MSTICPy repo, create a branch for your contribution, and then raise a Pull Request (PR) to merge the changes back into MSTICPy’s main branch. Please do not make changes to main of your fork and submit this as a PR. You should also consider granting permission on your fork so that we can push changes back to your forked branch. Sometimes, it's quicker for us to make a quick change to fix something than to ask you to make the change. If we cannot push any changes back this is impossible to do.

If you are unfamiliar with Git and GitHub you can find some guidance here: https://docs.github.com/en/get-started/quickstart/set-up-git

Where to get help

The MSTICPy team is more than happy to help support your contributions, if you need help you can comment on the Issue you are working on, or email msticpy@microsoft.com

Code Guidelines:

Unit Tests

All new code should have unit tests with at least 80% code coverage. There are some exceptions to this: for example, code that accesses online data and requires authentication. We can work with you on getting this to work in our build. We use pytest but most of the existing tests are also Python unittest compatible.

Type hints

Use type annotations for parameters and return values in public methods, properties and functions.

Python Type Hints documentation

Docstrings

Our documentation is automatically built for Readthedocs using Sphinx. All public modules, functions, classes and methods should be documented using the numpy documenation standard.

numpy docstring guide

Code Formatting

We use black everywhere and enforce this in the build.

Black - The Uncompromising Code Formatter

Linters/Code Checkers

We use the following code checkers:

  • pylint (with --disable=bad-continuation)
  • mypy
  • bandit (with -s B303,B404,B603,B607)
  • flake8 (with --ignore=E501,W503)
  • prospector (see prospector.yml in root of repo for config used). Prospector runs:
    • pycodestyle
    • pydocstyle
    • pep8
    • pyroma
    • pep257

Coding Best Practices

  • If you cannot see all the code of a function at one time, it is too big - break it up into smaller functions. Anything longer than 30 lines is suspect.
  • Make each function carry out one specific job.
    • You can often tell if it is doing this when you try to name it. Things like "get_matched_regex" or "read_settings". If you are having trouble naming it other than using vague terms like "process_data" or having to use multiple verbs ("match_regex_and_read_and_process_file") then you should refactor.
  • Function, class, variable, parameter naming.
    • Naming clarity
      • Don't use generic terms like "process_data", "file", "var", "x"
      • Give descriptive names like "match_result", "col_name", etc.
      • Don't be afraid of long names - remember code is written once but read many times, your extra keystrokes when creating something will be very helpful in a few weeks, months, years when you or someone else is trying to understand what's going on.
      • Using descriptive names helps in debugging/testing and general understanding of the code
    • Use UPPER_CASE names for constants defined at the module level
    • Use _func_name for constants, module functions, class methods and member variables if the item should not be used outside of the module/class.
  • Commenting
    • Code structure and naming should be clear enough that you shouldn't need comments
    • Use comments to explain something non-obvious (like potential return val from a function), not to explain obscure and lengthy code constructs
  • Beware of literal values:
    • Should they really be parameters?
    • If you're using a literal 3 or more times, you should prob define it as a constant or local variable.
  • Avoid using classes unless you need to keep state related to the functionality - use functions and pass data as parameters.
  • Do use classes if you have a lot of complex state.
    • If you find you are passing lots of data parameters between functions, it might be better implemented as a class with the data as private class variables.
    • Use classes if you want to expose operations on data but not the data structure itself.
  • Beware of copy/paste
    • If you find yourself copying/pasting code blocks, it probably means that you haven’t structured the code properly. Think about refactoring
  • Beware of deep nestings of branches/loops
    • Use guard clauses at the start of functions and loops
    for foo in bar:
        if foo == invalid:
            continue
        do_other_loop_stuff()
    • Rather than:
        for foo in bar: 
           if foo == valid:
              do_other_loop_stuff() 
  • Use named tuples if you're returning a bunch of values from a function:
    from collections import namedtuple 
    MyResult = namedtuple("MyResult", "status, value, message, query") 
    def my_func(): 
        return MyResult(200, 1.589895, "Nice result", "select x from foo")
    • You can treat the named tuple like a tuple but also address the members by names print(result.value)

A musical guide

The PEP8 Song

Brilliantly written and performed by @lemonsaurus_rex

Clone this wiki locally