tags : Programming Languages

Modules/Packages/Imports

  • A module is a python file, and a package is a directory of modules with an __init__ (Edit: and/or a __main__ ).
  • Even following the release of 3.3, the details of how sys.path is initialised are still somewhat challenging to figure out.

File loading

  • Just knowing what directory a file is in does not determine what package Python thinks it is in.
  • Since Python 2.6 a module’s “name” is effectively __package__ + '.' + ___name__, or just __name__ if __package__ is None

Two ways to load a Python file

  • As top-level script
    • When executed directly, for instance by typing python myfile.py
    • __name__ : __main__
  • As a module
    • When an import statement is encountered inside some other file.
    • __name__ : [the filename, preceded by the names of any packages/subpackages of which it is a part, separated by dots], eg. package.subpackage1.moduleX
    • NOTE: If you run moduleX as the top-level script (eg. python -m package.subpackage1.moduleX), the __name__ will instead be __main__. (Just a variant of top-level script)
    • The most important takeaway is that: the name of the module depends on how you loaded it.
      • It doesn’t matter where the file actually is on disk.
      • If a module’s name has no dots, it is not considered to be part of a package.
  • Special case: REPL
    • A special case is if you run the interpreter interactively, the name of that interactive session is __main__
      • You cannot do relative imports directly from an interactive session.
    • Python adds the current directory to its search path(sys.path) when the interpreter is entered interactively.
      • This is why you’re able to do import moduleX, if its in the same directory.
      • It will not know that the directory is part of a package.

Imports

Relative imports

  • Relative imports are only for use within module files.
  • Relative imports use the module’s name to determine where it is in a package.
  • If your module’s name is __main__, it is not considered to be in a package. You cannot use relative imports then.

Absolute imports

Packages

  • Technically, a package is a module that has a __path__ attribute.

Types

  • Named packages
    • Python 3.3+: supports implicit namespace packages that allows it to create a package without an __init__.py file.
    • Python 2: You needed to do weird shit
    • Creating a namespace package should ONLY be done if there is a need for it. i.e if you have different libraries that reside in different locations and you want them each to contribute a subpackage to the parent package.
    • Namespace packages can exist in more than one place at a time, while regular packages cannot.
    /usr/lib/python3.9/site-packages/
    └── foobar/
        └── nice.py
 
    /usr/local/lib/python3.9/site-packages/
    └── foobar/
        └── wow.py
  • Regular packages
    • A package with a __init__.py (empty or not empty)
      • Leaving an __init__.py file empty is normal, if the package’s modules and sub-packages do not need to share any code.
    • A directory with a C extension named __init__.so or with a .pyc file named __init__.pyc is also a regular package.
    • The __init__.py file
      • Typically left empty
      • Can contains package-related attributes such as __doc__ and __version__
      • Can be used to decouple the public API of a package from its internal implementation.

Module

  • Keep module names short, lowercase, and avoid special symbols like _,.,- etc.
  • Don’t namespace with underscores, use submodules instead.

Glossary

  • Python behind the scenes #11: how the Python import system works
  • Module object: a Python object that acts as a namespace for the module’s names. The names are stored in the module object’s dictionary (available as m.__dict__), so we can access them as attributes.
  • Built-in modules: C modules compiled into the python executable.
  • Frozen modules: Part of the python executable, but they are written in Python. Python code is compiled to a code object and then the marshalled code object is incorporated into the executable.
  • C extensions: A bit like built-in modules and a bit like Python files.
    • Written in C or C++ and interact with Python via the Python/C API.
    • They are not a part of the executable but loaded dynamically during the import.
    • Some standard modules including array, math and select are C extensions.
    • asyncio, heapq and json are written in Python but call C extensions under the hood.
  • Python bytecode:
    • Typically live in a __pycache__ directory
    • They are the result of compiling Python code to bytecode.
    • Its purpose is to reduce module’s loading time by skipping the compilation stage.

Different kinds of modules in use

  • Built-in modules such as os and sys
  • Frozen modules.
  • C extensions.
  • Python source code files or pyc files
  • Third-party modules you have installed in your environment
  • Your project’s internal modules (directories)

How to import

# bad
from modu import *
x = sqrt(4)  # Is sqrt part of modu? A builtin? Defined above?
 
# better
from modu import sqrt
[...]
x = sqrt(4)  # sqrt may be part of modu, if not redefined in between
 
# best
import modu
x = modu.sqrt(4)  # sqrt is visibly part of modu's namespace

Operation

import pack.modu , this:

  • It’ll look for an __init__.py file in pack, execute all of its top-level statements.
  • Then it will look for a file named pack/modu.py and execute all of its top-level statements.
  • Creates a module object for that module, and assigns the module object to the variable.

Module object

  • The type of a module object is PyModule_Type in the C code but it’s not available in Python as a built-in.
from types import ModuleType
# what type does
# import sys (could be any other module)
# ModuleType = type(sys)
# ModuleType
m = ModuleType('m') # this is what happens when we import

Issues

Double import

“Never add a package directory, or any directory inside a package, directly to the Python path” The reason: every module in that directory is now potentially accessible under two different names

  • As a top level module (since the directory is on sys.path)
  • As a submodule of the package (if the higher level directory containing the package itself is also on sys.path).

Namespace and Scope

  • Existence of multiple, distinct namespaces means several different instances of a particular name can exist simultaneously
  • To pinpoint to which variable we’re refering to from which namespace, we use scope

Types of namespaces

  • Built-In: __builtins__
  • Global: One belonging to the main program
  • Enclosing: Function enclosing another function
  • Local: Namespace specific to a function

Scope

  • It’s a runtime thing.
  • scope of a name is the region of a program in which that name has meaning.
  • As namespaces, scopes are: Local, Enclosing, Global and Builtin
  • Accessing shit outside of scope
    • It’s not advisable to use these, better to avoid
    • global
    • nonlocal : nearest enclosing scope

Threading and Processes in Python

Daemon threads and non-daemon

  • This is different from the unix idea of a daemon. Don’t confuse.
  • non-daemon thread (blocking)
    • starts running in background and you can perform other stuff
    • your program or main thread is blocked by non-daemon threads
    • main thread will not exit until all such non-daemon threads have completed their execution
  • daemon threads (non-blocking)
    • As soon as the main thread completes its execution & the program exits, all the remaining daemon threads will be reaped.
    • Your program or main thread is NOT blocked by daemon threads

Threads vs Processes and GIL in Python

  • There’s a GIL per process, so no GIL limitations when using processes
  • Processes good for CPU bound tasks as they can use multiple CPUs if available
  • Threads good for I/O bound shit (See O). But limited by GIL, only one thread at a time.

GIL is a lock

  • Allows only one thread at a time to execute
  • Needed in Cpython because memory management is not threadsafe because it uses ref counting Garbage collection

OOP

Object lifecycle

  • Read later: All about Pythonic Class: The Life Cycle
  • One of the 3
    • A prototype
    • A objects of some other classes
    • A metaclass of type. Eg. <class ‘str’> : <class ‘type’>
  • __init__ is not a constructor
  • __new__ is a constructor: This is called before __init__
  • self parameter
    • Interpreter passes the created object itself as the first argument into __init__, by convention it’s called self but you could call it anything
  • metaclass: takes properties from other classes to generate a new class with additional functionalities.

Other terms

  • super() : Gives access to the class it inherits from

Dataclasses

  • Dataclass also has a self-documentation effect: it says “this thing is kind of like a record“.
  • There are debates around whether classes should be replaced with dataclasses in python etc. at the end of the day the ans is it depends and see if things fit etc etc.
  • Pydantic has dataclasses too, but does validation and bunch of other stuff which makes it slower and rightly so.
  • Internet comments
    • I think kw_only=True and slots=True solve a lot of my dataclass problems. Wish they defaulted to true

Abstract classes vs NotImplementedError

  • Basically ABCs to describe the nature of the program, it doesn’t do runtime enforcement anyway but if the problem I am working on fits the nature of ABC I can use it
  • Otherwise, if it’s just a case of me implementing a class that down the line someone else should implement then NotImplementError is good enuf and more appropriate imo
  • python - When to use ‘raise NotImplementedError’? - Stack Overflow

Other basic shit

Expression vs Statement

  • A statement in Python is a unit of code.
  • An expression is a special statement that can be evaluated to some value.

Decorators

Serving Python as a Web Application

  • See Web Server and Web Development
  • In the past we used WSGI, now we use ASGI
  • ASGI is a spec, implementations(server applications) include Daphne, Hypercorn and Uvicorn.
  • These implementations (eg. Uvicorn) handle HTTP and pass parameters in dicts to some applications built with some ASGI framework
  • FastAPI is a ASGI Framework

Regular things to know

I’ll just list things here so that these are under my radar when they are new, when they become common things for me i’ll just remove them from this list

  • Stdlib
    • collections.Counter
    • collections.namedtuple, alternatively use typings.NamedTuples (better imo)
    • collections.OrderedDict (less imp now, normal dict remember order)
    • collections.defaultdict
    • itertools.{product, permm, comb, comb_wr, groupby, count, cycle, repeat}
    • functools.{reduce}
    • queue.Queue : Useful for Concurrency stuff (It’s Thread Safe, i.e put and get call itself are atomic so you can be sure T2 will not take your queue item)
      • There’s also multiprocessing.Queue
  • Language features
    • type hints
    • pattern mathiing
    • walrus assignment experssion operator
      • it can return the value being assigned, unlike normal assignments
      • usecases
        • debugging: evaluate things into identifiers mid other expressions
        • intent when doing things
          numbers = [2, 8, 0, 1, 1, 9, 7, 7]
          num_length = len(numbers)
          num_sum = sum(numbers)
          description = {
              "length": num_length,
              "sum": num_sum,
              "mean": num_sum / num_length,
          }
          # we can now do this instead
          description = {
              "length": (num_length := len(numbers)),
              "sum": (num_sum := sum(numbers)),
              "mean": num_sum / num_length,
          }
        • The := assignment expression operator isn’t always the most readable solution even when it makes your code more concise.