You're my type: Python, meet static typing

(originally published: 2020-09-19)

If you’re writing Python code in a production environment, it’s quite likely that you have been using type hints or static type checking. Why do we need these tools? How can you use them when you’re developing your own project? I hope to answer these questions in this blogpost.

The why

Python was designed as a dynamically typed language - the developer should be able to solve the problems quickly, iterating quickly in the interactive REPL. The type system will work things out transparently for the developer.

def checkConstraintsList2(solution, data):
    """
    This is an actual excerpt from a BEng thesis, python2 style.
    Trying to infer the data types is painful, so is debugging or
    extending this snippet.
    :param solution:
    :param data:
    """
    for slot in range(0, data.periodsPerDay * data.daysNum):
        for lecture in solution[slot]:
            for constraint in data.getConstraintsForCourse(lecture[0]):
                if slot == solution.mapKeys(constraint):
                    print "Violation lecture", lecture, "slot", slot

Ugly Python2-style snippet from an ML project. Non-trivial data-structures make reading this a horrible experience.

This approach works great for tiny, non-critical codebases, with a limited number of developers. Since even Google was at some point in time a tiny, non-critical codebase I’d advise you not to follow this path.

The how

PEP484 added type comments as a standardized way of adding type information to Python code.

def weather(celsius): # type: (Optional[float]) -> Union[str, Dict[str, str]]
    if celsius is None:
        return "I don’t know what to say"
    return (
        "It’s rather warm"
        if celsius > 20 else {"opinion": "Bring back summer :/"}
    )

The separation of the identifier and type hint makes it hard to read.

This used to be the only way to add types to Python codebase until Python 3.6 got released, adding optional type hints to it’s grammar:

from typing import Dict, Optional, Union

def weather(celsius: Optional[float]) -> Union[str, Dict[str, str]]:
    if celsius is None:
        return "I don’t know what to say"
    return (
        "It’s rather warm"
        if celsius > 21 else {"opinion": "Bring back summer :/"}
    )

Much better. Please note - this code example introduced some additional import statements. These are not free, since Python interpreter needs to load code for that import, resulting in some barely noticeable overhead, depending on your codebase.

If you’re looking to speed up your code, Cython uses a slightly altered Python syntax of type annotations for compiling to machine code in for the sake of the speed.

The wow

The type hints might be helpful for the developer, but humans commit errors all the time - you should definitely use a type checker to validate if your assumptions are correct. (And, catch some bugs before they hit you!)

How do you introduce types in your codebase?

Most type checkers follow the approach of gradual introduction of enforcement of type correctness. The type checkers can infer the types of the variables or return types to some degree, but in this approach it’s the developers responsibility to gradually increase the coverage of strict type checking, usually module by module.

Additionally, you can control either particular features of the type system being enforced (e.g. forbidding redefinition of a variable with a different type) or select one of the predefined strictness levels. The main issue is that it involves manual process - you need to define the order or modules in which you want to annotate your codebase and, well, manually do it.

Pytype follows a completely different approach - it type checks the entire codebase by default, instead taking a very permissive take on type correctness - if it’s valid Python, it’s OK. This approach definitely makes sense in application to older or unmaintained projects with minimal or no type hints since it allows you to catch the basic type errors very quickly, with no changes made to the codebase. This sounds very tempting, but the long term solution should be to apply more strict type checking. Valid Python’s type system is just way too permissive.

MypypytypePyright[Pyre](https://github.com/facebook/pyre-check
ModesFeature flags for strictness checksLenient: if it runs, it’s validMultiple levels of strictness enforced per project and directoryPermissive / strict modes
Applying to existing codeGradualRuns on entire codebase, type hints completely optionalGradualGradual
ImplementationDaemon mode, incremental updatesset of CLI toolsTypescript, daemon mode, incremental updatesDaemon mode with watchman, incremental updates
IDE integrationIncomplete LSP pluginmaybe somedayLSP, Pylance vscode extension, vimVSCode, vim, emacs
Extra pointsGuido-approvedMerge-pyi automatically merges type stubs into your codebaseSnappy vscode integration, no Python required - runs on node jsBuilt in Pysa - static security analysis tool

Third-party libraries

Not all libraries you use are type annotated - that’s a sad fact. There are two solutions to this problem - either just annotate them, or use type stubs. If you’re using a library, you care only about the exported data structures and function signature types. Some typecheckers utilize an official collection of type annotations maintained as a part of Python project - Typeshed (see example type stubs file). If you find a project that doesn’t have type hints, you can contribute your annotations to that repo, for the benefit of everyone!

class TcpWSGIServer(BaseWSGIServer):
    def bind_server_socket(self) -> None: ...
    def getsockname(self) -> Tuple[str, Tuple[str, int]]: ...
    def set_socket_options(self, conn: SocketType) -> None: ...

The ... is an Ellipsis object. Since it's builtin constant, the code above is valid Python, albeit useless (besides being type stub).

Some static type checkers create pyi stubs behind the scenes. Pytype allows you to combine the inferred type stubs with the code using a single command. This is really neat - you can seed the initial types in your code with a single command - although the quality of the annotations is quite weak, since Pytype is a lenient tool, you will see Any wildcard type a lot. Type hints: beyond static type checking

The type-correctness is not the only use case of the type hints. Here you can find some neat projects making good use of type annotations:

Pydantic

Pydantic allows you to specify and validate your data model using intuitive type-annotated classes. It’s super fast and is a foundation of FastAPI web framework (request/response models and validations).

from typing import List, Optional

from fastapi import FastAPI
from pydantic import BaseModel

app = FastAPI()


class Item(BaseModel):
    name: str
    tax: Optional[float] = None
    tags: List[str] = []


@app.post("/items/", response_model=Item)
async def create_item(item: Item):
    return item

You get request and response validation for free.

Typer

Generating CLIs using just type-annotated functions. Typer takes care of parsing, validating and generating boilerplate for you.

Hypothesis

Hypothesis is a library enabling property-based testing for pytest. Instead of crafting custom unit test examples, you specify functions generating the test cases for you. Or… you can use inferred strategies which will sample the space of all valid values for a given type!

from hypothesis import given, infer

@given(username=infer, article_id=infer, comment=infer)
def test_adding_comment(username: str, article_id: int, comment: str):
    with mock_article(article_id):
    comment_id = add_comment(username, comment)
    assert comment_id is not None

Inferred strategies are a good start, but you should limit the search space of your examples to match your assumptions (would you expect your article_ids to be negative?)

Outro

Static type checking gives the developers additional layer of validation of their code against the common type errors. With the wonders of static analysis, you can significantly reduce the number of bugs, make contributing to the code easier and catch some non-trivial security issues.

that's all

Thanks