Building a custom Flake8 plugin

Linters are everywhere. Be it in a fancy IDE, a CI pipeline or in the command line, linters help us to spot potential issues in our codebases. My favorite linter is flake8 and I use it in my VSCode setup, in my git pre-commit hooks and CI pipelines.

But the thing is that flake8 doesn’t catch all the stuff I wanted it to catch. For example, I’d like my linter to catch the usage of the map and filter functions.

So I was wondering how can I make my linter complain about it every time someone else or I write such a thing? The answer is: let’s write a flake8 plugin!

To do so, we’re gonna use two tools:

  • ast: the stdlib module to manipulate Python Abstract Syntax Trees;
  • flake8: one of the (many) Python linters;

The anatomy of a flake8 plugin

For a simple plugin, all we need is to create two files in a new folder (which I called flake8-picky/):

  • setup.py: to make it installable and distributable;
  • picky_checker.py: the module for the code checker itself.

Let’s start with the boring stuff (setup.py) so that we are free to have fun hacking our plugin later. Here it is:

import setuptools

setuptools.setup(
   name='flake8-picky',
   license='MIT',
   version='0.0.1',
   description='A plugin to pick on map and filter usage :)',
   author='Your name here',
   author_email='you@yourdomain.com',
   url='http://github.com/yourname/your-repo',
   py_modules=['flake8_picky'],
   entry_points={
       'flake8.extension': [
           'PCK0 = picky_checker:PickyChecker',
       ],
   },
   install_requires=['flake8'],
   classifiers=[
       'Topic :: Software Development :: Quality Assurance',
   ],
)

Apart from the usual setup.py stuff, there’s a section we need to pay attention:

   entry_points={
       'flake8.extension': [
           'PCK0 = picky_checker:PickyChecker',
       ],
   }

We’ve listed a single entry point for our flake8 plugin, which is the picky_checker.PickyChecker class (we’ll get there soon). As you can see, we’ve listed it under the 'flake8.extension' entry point type, because this is what we need for a plugin that will add code verifications to Flake8. You can check for more options in the official docs.

Another thing to notice here is the string we added to the list of entry points: 'PCK0 = picky_checker:PickyChecker'. PCK0 is a code prefix for the kind of issues we are going to report (they must all start with such a substring).

Now let’s focus on the picky_checker.py file which will contain:

  • a class to parse and check the code to be linted;
  • the entrypoint class for our plugin.

We’ll start with the former.

Building our checker with ast

It doesn’t surprise me that an awesome language like Python has a module in the stdlib that allows us to easily parse Python code: the ast module.

The ast module provides the ast.NodeVisitor base class, which basically walks through the Abstract Syntax Tree calling visitor functions for every node it finds.

For example, let’s say we want to find all the function definitions in a Python snippet and print their names. Here’s how we’d do it using ast:

>>> import ast
>>> class FunctionFinder(ast.NodeVisitor):
       def visit_FunctionDef(self, node):
           print('Found: {}'.format(node.name))

>>> sample = '''
   def myfunc():
     pass
   def anotherfunc(x, y):
     return x * y

   x = myfunc() + 1
   '''
>>> parsed = ast.parse(sample)
>>> finder = FunctionFinder()
>>> finder.visit(parsed)
Found: myfunc
Found: anotherfunc

Easy like that. So, if we want our plugin to focus on a specific kind of AST node, all we have to do is to implement the visit_*() method and add the checks inside. Check out the full list of node types here: https://greentreesnakes.readthedocs.io/en/latest/nodes.html

The parser

Getting back to our flake8 plugin, the issue we want to catch is the map and filter usage. To check for that, all we have to write is a parser like this:

import ast


class ForbiddenFunctionsFinder(ast.NodeVisitor):
    forbidden = ['map', 'filter']

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.issues = []

    def visit_Call(self, node):
        if not isinstance(node.func, ast.Name):
            return

        if node.func.id in self.forbidden:
            msg = "PCK01 Please don't use {}()".format(node.func.id)
            self.issues.append((node.lineno, node.col_offset, msg))

In this case, our checker will visit each and every ast.Call node in the AST and check if its name is not one of the forbidden functions.

As you can see, we add some information into the issues list whenever we find a call. One important thing to remember here is that the linter error message should start with an error code that matches the prefix defined in setup.py for our linter (PCK0 in our case).

I think we’ve got enough information about how ast.NodeVisitor works in order to build our plugin, so let’s move to the entry point class.

The entry point

Let’s create the picky_checker.py file and add the entry point code on it:

class PickyChecker(object):
    options = None
    name = 'picky-checker'
    version = '0.1'

    def __init__(self, tree, filename):
        self.tree = tree
        self.filename = filename

    def run(self):
        parser = ForbiddenFunctionsFinder()
        parser.visit(self.tree)

        for lineno, column, msg in parser.issues:
            yield (lineno, column, msg, PickyChecker)

Most of this is boilerplate, but let’s focus on the run() method. This method is the one called when Flake8 runs the verifications. There, we first instantiate our ForbiddenFunctionsFinder class, which will be basically an ast.NodeVisitor doing the verifications. Once we have the object, we call the .visit() method so that our node visitor traverses the AST.

After that’s done, we iterate over the issues found by ForbiddenFunctionFinder generating tuples with the issues in the order expected by Flake8: line number, column number, the linter message and the class that found the issues.

Gluing it all

We’ll end up with the following files in our plugin folder:

β”œβ”€β”€ picky_checker.py
└── setup.py

The picky_checker.py file should contain both the ForbiddenFunctionsFinder and PickyChecker classes.

Installing our linter

Now that we have our linter code in our flake8_picky folder, let’s install our plugin and run flake8 over some sample files. You can install it by running:

$ pip install .

Then you can check if the plugin got installed by running:

$ flake8 --version
3.5.0 (mccabe: 0.6.1, pycodestyle: 2.3.1, pyflakes: 1.6.0, picky-checker: 0.1)

Finally, create some sample files and run flake8 against them:

$ flake8 samples/01.py
samples/01.py:4:5: PCK01 Please don't use map()
samples/01.py:7:5: PCK01 Please don't use filter()

Wrapping up

That’s it. All we need to build a flake8 plugin is:

  • a setup.py file to make it installable;
  • an entrypoint class that will run your code checker;
  • the code checker itself, which can be a NodeVisitor subclass.

Here you can find a repo with the linter developed here: https://github.com/stummjr/flake8-picky/

If you’re angry at me because you love map and filter, please forgive me as I had to come up with an example. πŸ™‚

The curious case of the else in Python loops

One of the first things to stand out when I was starting with Python was the else clause. I guess everyone knows the normal usage of such clauses in any programming language, which is to define an alternate path for the if condition. Oddly enough, in Python we can add else clauses in loop constructions, such as for and while. For example, this is valid Python:

for number in some_sequence:
    if is_the_magic_number(number):
        print('found the magic number')
        break
else:
    print('magic number not found')

Notice how the else is aligned with the for and not with the if. What this means is that commands inside the else block will be executed if, and only if, the loop was not finished by a break. The same is true for while loops.

I must admit that I’ve always had some trouble to remember the meaning of an else in loops, specially because I don’t see them very often (and I’m grateful for that). But, at some day I was watching Raymond Hettinger’s Transforming Code into Beautiful, Idiomatic Python talk where he brilliantly says something like this at some point:

Why don’t you call the else in loops as ‘nobreak’?

That’s all I needed to not forget the meaning anymore. πŸ™‚

How to customize your IPython 5+ prompt

IPython is wonderful and I ❀️ it. I can’t see myself using the default Python shell in a daily basis. However, its default prompt kind of annoys me:

Some things I dislike:

  • the banner displayed when we start it;
  • the In[x] and Out[x] displayed for inputs and outputs;
  • the newline in between commands;
  • and last, but far from least, the uber-annoyingΒ “do you really want to exit?”Β message.

As you can see, it doesn’t take much to get on my nerves.Β πŸ˜†

The bright side is that it’s easy to change that and have a more pleasant experience with IPython. This is my ideal shell, more compact and less bureaucratic:

 

If you like it, follow me through the next steps to make your IPython shell look and behave like that.

Customizing the prompt

FirstΒ you have to create a default profile for your shell with this command:

$ ipython profile create

As a result, a .ipython folder will be created in your home folder, with the following contents:

.ipython
β”œβ”€β”€ extensions
β”œβ”€β”€ nbextensions
└── profile_default
    β”œβ”€β”€ ipython_config.py
    β”œβ”€β”€ log
    β”œβ”€β”€ pid
    β”œβ”€β”€ security
    └── startup
        └── README

Next, createΒ Β .ipython/custom_prompt.pyΒ file with the following content:

from IPython.terminal.prompts import Prompts, Token


class CustomPrompt(Prompts):

    def in_prompt_tokens(self, cli=None):
        return [(Token.Prompt, '>>> '), ]

    def out_prompt_tokens(self, cli=None):
        return [(Token.Prompt, ''), ]

    def continuation_prompt_tokens(self, cli=None, width=None):
        return [(Token.Prompt, ''), ]

And last, you have to tell IPython to use this new class as your prompt and in addition to custom settings.

You can do so by adding this code toΒ .ipython/profile_default/ipython_config.py:

from custom_prompt import CustomPrompt


c = get_config()

c.TerminalInteractiveShell.prompts_class = CustomPrompt
c.TerminalInteractiveShell.separate_in = ''
c.TerminalInteractiveShell.confirm_exit = False
c.TerminalIPythonApp.display_banner = False

That’s it, now you have a prompt like the one I’ve shown earlier. I hope it improves your experience with IPython as it did for me.

If you want to learn how to do further customizations, check the official documentation.

Ah, did I mention that I love IPython? Huge kudos and thanks for the team behind it! πŸ‘

Python 3 rounding oddities

Rounding a decimal number with Python 3 is as simple as invoking the round() builtin:

>>> round(1.2)
1
>>> round(1.8)
2

We can also pass an extra parameter called ndigits, which defines the precision we want in the result. Such parameter defaults to 0, but we can pass anything:

>>> round(1.847, ndigits=2)
1.85
>>> round(1.847, ndigits=1)
1.8

And what happens when we want to round a number like 1.5? Will it round it up or down? Let’s check:

>>> round(1.5)
2

It seems that it rounds up. Let’s check some other numbers to confirm:

>>> round(2.5)
2

Uh, now it went down! Let’s check some more:

>>> round(3.5)
4
>>> round(4.5)
4
>>> round(5.5)
6

wut

Calm down, there’s an explanation for this. In Python 3, round() works like this:

Round to the closest number.
If there’s a tie, round to the closest even number.

Now it makes sense. If we check the examples above, we’ll see that the rounding was always made to the closest even number:

>>> round(3.5)
4
>>> round(4.5)
4
>>> round(5.5)
6

What about Python 2?

Python 2 is quite different. When there’s a tie, the rounding is always made upwards in case the numbers are positive:

>>> round(1.5)
2.0
>>> round(2.5)
3.0

And downwards, when the numbers are negative:

>>> round(-1.5)
-2.0
>>> round(-2.5)
-3.0

Why the hell does Python 3 changed it?

The goal is to take the bias out of the rounding operations.

Imagine a bank where all the roundings are done upwards. By the end of the day, the bank earning report will show a value that is higher than what the bank actually earned. That’s what happens on Python 2:

>>> # Python 2
>>> values = [1.5, 2.5, 3.5, 4.5]
>>> sum(values)
12.0
>>> sum(round(v) for v in values)
14.0

Using Python 3’s round(), the rounded values tend to be amortized, because half of them round upwards and half of them round downwards, given that half the numbers are even and the other half are odd. Check the same code, but now running on Python 3:

>>> # Python 3
>>> values = [1.5, 2.5, 3.5, 4.5]
>>> sum(values)
12.0
>>> sum(round(v) for v in values)
12

This is no Python 3’s inovation. In fact, this kind of rounding is quite old and even has a proper name: Bankers Rounding.

Drop duplicates in order

Let’s say you have a list containing all the URLs extracted from a web page and you want to get rid of duplicate URLs.

The most common way of achieving that might be building a set from that list, given that such operation automatically drops the duplicates. Something like:

>>> urls = [
    'http://api.example.com/b',
    'http://api.example.com/a',
    'http://api.example.com/c',
    'http://api.example.com/b'
]
>>> set(urls)
{'http://api.example.com/a',
 'http://api.example.com/b',
 'http://api.example.com/c'}

The problem is that we just lost the original order of the list.

A good way to maintain the original order of the elements after removing the duplicates is by using this trick with collections.OrderedDict:

>>> from collections import OrderedDict
>>> list(OrderedDict.fromkeys(urls).keys())
['http://api.example.com/b',
 'http://api.example.com/a',
 'http://api.example.com/c']

Cool, huh? Now let’s dig into details to understand what the code above does.

OrderedDict is like a traditional Python dict with a (not so) slight difference: OrderedDict keeps the elements’ insertion order internally. This way, when we iterate over such an object, it will return its elements in the order in which they’ve been inserted.

Now, let’s breakdown the operations to understand what’s going on:

>>> odict = OrderedDict.fromkeys(urls)

The fromkeys() method creates a dictionary using the values passed as its first parameters as the keys and the second parameter as its values (or None if we pass nothing, as we did).

As a result we get:

>>> odict
OrderedDict([('http://api.example.com/b', None),
             ('http://api.example.com/a', None),
             ('http://api.example.com/c', None)])

Now that we have a dictionary with the URLs as the keys, we can call the keys() method to get only a sequence containing the URLs:

>>> list(odict.keys())
['http://api.example.com/b',
 'http://api.example.com/a',
 'http://api.example.com/c']

Easy like that. πŸ™‚

If you enjoyed this tip, subscribe to the blog, because I’ll be posting more content in the upcoming weeks.