diff options
Diffstat (limited to 'Doc/reference/import_machinery.rst')
-rw-r--r-- | Doc/reference/import_machinery.rst | 547 |
1 files changed, 547 insertions, 0 deletions
diff --git a/Doc/reference/import_machinery.rst b/Doc/reference/import_machinery.rst new file mode 100644 index 0000000..93d7ba2 --- /dev/null +++ b/Doc/reference/import_machinery.rst @@ -0,0 +1,547 @@ + +.. _importmachinery: + +**************** +Import machinery +**************** + +.. index:: single: import machinery + +Python code in one :term:`module` gains access to the code in another module +by the process of :term:`importing` it. Most commonly, the :keyword:`import` +statement is used to invoke the import machinery, but it can also be invoked +by calling the built-in :func:`__import__` function. + +The :keyword:`import` statement combines two operations; it searches for the +named module, then it binds the results of that search to a name in the local +scope. The search operation of the :keyword:`import` statement is defined as +a call to the :func:`__import__` function, with the appropriate arguments. +The return value of :func:`__import__` is used to perform the name binding +operation of the :keyword:`import` statement. See the :keyword:`import` +statement for the exact details of that name binding operation. + +A direct call to :func:`__import__` performs only the search for the module. +The function's return value is used like any other function call in Python; +there is no special side-effects (e.g. name binding) associated with +:func:`__import__`. + +When a module is first imported, Python searches for the module and if found, +it creates a module object, initializing it. If the named module cannot be +found, an :exc:`ImportError` is raised. Python implements various strategies +to search for the named module when the import machinery is invoked. These +strategies can be modified and extended by using various hooks described in +the sections below. The entire import machinery itself can be overridden by +replacing built-in :func:`__import__`. + + +Packages +======== + +.. index:: + single: package + +Python has only one type of module object, and all modules are of this type, +regardless of whether the module is implemented in Python, C, or something +else. To help organize modules and provide a naming hierarchy, Python has a +concept of :term:`packages <package>`. It's important to keep in mind that +all packages are modules, but not all modules are packages. Or put another +way, packages are just a special kind of module. Although usually +unnecessary, introspection of various module object attributes can determine +whether a module is a package or not. + +Packages can contain other packages and modules, while modules generally do +not contain other modules or packages. You can think of packages as the +directories on a file system and modules as files within directories, but +don't take this analogy too literally since packages and modules need not +originate from the file system. For the purposes of this documentation, we'll +use this convenient analogy of directories and files. + +All modules have a name. Packages also have names, and subpackages can be +nested arbitrarily deeply. Subpackage names are separated from their parent +package by dots, akin to Python's standard attribute access syntax. Thus you +might have a module called :mod:`sys` and a package called :mod:`email`, which +in turn has a subpackage called :mod:`email.mime` and a module within that +subpackage called :mod:`email.mime.text`. + + +Regular packages +---------------- + +.. index:: + pair: package; regular + +Python defines two types of packages, :term:`regular packages <regular +package>` and :term:`namespace packages <namespace package>`. Regular +packages are traditional packages as they existed in Python 3.2 and earlier. +A regular package is typically implemented as a directory containing an +``__init__.py`` file. When a regular package is imported, this +``__init__.py`` file is implicitly imported, and the objects it defines are +bound to names in the package's namespace. The ``__init__.py`` file can +contain the same Python code that any other module can contain, and Python +will add some additional attributes to the module when it is imported. + + +Namespace packages +------------------ + +.. index:: + pair:: package; namespace + pair:: package; portion + +A namespace package is a composite of various :term:`portions <portion>`, +where each portion contributes a subpackage to the parent package. Portions +may reside in different locations on the file system. Portions may also be +found in zip files, on the network, or anywhere else that Python searches +during import. Namespace packages may or may not correspond directly to +objects on the file system; they may be virtual modules that have no concrete +representation. + +For example, the following file system layout defines a top level ``parent`` +package with three subpackages:: + + parent/ + __init__.py + one/ + __init__.py + two/ + __init__.py + three/ + __init__.py + +Importing ``parent.one`` will implicitly import ``parent/__init__.py`` and +``parent/one/__init__.py``. Subsequent imports of ``parent.two`` or +``parent.three`` will import ``parent/two/__init__.py`` and +``parent/three/__init__.py`` respectively. + +With namespace packages, there is no ``parent/__init__.py`` file. In fact, +there may be multiple ``parent`` directories found during import search, where +each one is provided by a separate vendor installed container, and none of +them contain an ``__init__.py`` file. Thus ``parent/one`` may not be +physically located next to ``parent/two``. In this case, Python will create a +namespace package for the top-level ``parent`` package whenever it or one of +its subpackages is imported. + + +Searching +========= + +To begin the search, Python needs the :term:`fully qualified <qualified name>` +name of the module (or package, but for the purposes of this discussion, the +difference is immaterial) being imported. This name may come from various +arguments to the :keyword:`import` statement, or from the parameters to the +:func:`__import__` function. + +This name will be used in various phases of the import search, and it may be +the dotted path to a submodule, e.g. ``foo.bar.baz``. In this case, Python +first tries to import ``foo``, then ``foo.bar``, and finally ``foo.bar.baz``. +If any of the intermediate imports fail, an :exc:`ImportError` is raised. + + +The module cache +---------------- + +.. index:: + single: sys.modules + +The first place checked during import search is :data:`sys.modules`. This +mapping serves as a cache of all modules that have been previously imported, +including the intermediate paths. So if ``foo.bar.baz`` was previously +imported, :data:`sys.modules` will contain entries for ``foo``, ``foo.bar``, +and ``foo.bar.baz``. Each key will have as its value the corresponding module +object. + +During import, the module name is looked up in :data:`sys.modules` and if +present, the associated value is the module satisfying the import, and the +process completes. However, if the value is ``None``, then an +:exc:`ImportError` is raised. If the module name is missing, Python will +continue searching for the module. + +:data:`sys.modules` is writable. Deleting a key will generally not destroy +the associated module, but it will invalidate the cache entry for the named +module, causing Python to search anew for the named module upon its next +import. Beware though, because if you keep a reference to the module object, +invalidate its cache entry in :data:`sys.modules`, and then re-import the +named module, the two module objects will *not* be the same. The key can also +be assigned to ``None``, forcing the next import of the module to result in an +:exc:`ImportError`. + + +Finders and loaders +------------------- + +.. index:: + single: finder + single: loader + +If the named module is not found in :data:`sys.modules` then Python's import +protocol is invoked to find and load the module. As this implies, the import +protocol consists of two conceptual objects, :term:`finders <finder>` and +:term:`loaders <loader>`. A finder's job is to determine whether it can find +the named module using whatever strategy it knows about. For example, there +is a file system finder which know how to search the file system for the named +module. Other finders may know how to search a zip file, a web page, or a +database to find the named module. The import machinery is extensible, so new +finders can be added to extend the range and scope of module searching. + +Finders do not actually load modules. If they can find the named module, they +return a loader, which the import machinery later invokes to load the module +and create the corresponding module object. + +There are actually two types of finders, and two different but related APIs +for finders, depending on whether it is a :term:`meta path finder` or a +:term:`sys path finder`. Meta path processing occurs at the beginning of +import processing, while sys path processing happens later, by the :term:`path +importer`. + +The following sections describe the protocol for finders and loaders in more +detail, including how you can create and register new ones to extend the +import machinery. + + +Import hooks +------------ + +.. index:: + single: import hooks + single: meta hooks + single: path hooks + pair: hooks; import + pair: hooks; meta + pair: hooks; path + +The import machinery is designed to be extensible; the primary mechanism for +this are the *import hooks*. There are two types of import hooks: *meta +hooks* and *path hooks*. + +Meta hooks are called at the start of import processing, before any other +import processing has occurred. This allows meta hooks to override +:data:`sys.path` processing, frozen modules, or even built-in modules. Meta +hooks are registered by adding new finder objects to :data:`sys.meta_path`, as +described below. + +Path hooks are called as part of :data:`sys.path` (or ``package.__path__``) +processing, at the point where their associated path item is encountered. +Path hooks are registered by adding new callables to :data:`sys.path_hooks` as +described below. + + +The meta path +------------- + +.. index:: + single: sys.meta_path + pair: finder; find_module + pair: finder; find_loader + +When the named module is not found in :data:`sys.modules`, Python next +searches :data:`sys.meta_path`, which contains a list of meta path finder +objects. These finders are queried in order to see if they know how to handle +the named module. Meta path finders must implement a method called +:meth:`find_module()` which takes two arguments, a name and a path. The meta +path finder can use any strategy it wants to determine whether it can handle +the named module or not. + +If the meta path finder knows how to handle the named module, it returns a +loader object. If it cannot handle the named module, it returns ``None``. If +:data:`sys.meta_path` processing reaches the end of its list without returning +a loader, then an :exc:`ImportError` is raised. Any other exceptions raised +are simply propagated up, aborting the import process. + +The :meth:`find_module()` method of meta path finders is called with two +arguments. The first is the fully qualified name of the module being +imported, for example ``foo.bar.baz``. The second argument is the relative +path for the module search. For top-level modules, the second argument is +``None``, but for submodules or subpackages, the second argument is the value +of the parent package's ``__path__`` attribute, which must exist or an +:exc:`ImportError` is raised. + +Python's default :data:`sys.meta_path` has three meta path finders, one that +knows how to import built-in modules, one that knows how to import frozen +modules, and one that knows how to import modules from the file system +(i.e. the :term:`path importer`). + + +Meta path loaders +----------------- + +Once a loader is found via a meta path finder, the loader's +:meth:`load_module()` method is called, with a single argument, the fully +qualified name of the module being imported. This method has several +responsibilities, and should return the module object it has loaded [#fn1]_. +If it cannot load the module, it should raise an :exc:`ImportError`, although +any other exception raised during :meth:`load_module()` will be propagated. + +In many cases, the meta path finder and loader can be the same object, +e.g. :meth:`finder.find_module()` would just return ``self``. + +Loaders must satisfy the following requirements: + + * If there is an existing module object with the given name in + :data:`sys.modules`, the loader must use that existing module. (Otherwise, + the :func:`reload()` builtin will not work correctly.) If the named module + does not exist in :data:`sys.modules`, the loader must create a new module + object and add it to :data:`sys.modules`. + + Note that the module *must* exist in :data:`sys.modules` before the loader + executes the module code. This is crucial because the module code may + (directly or indirectly) import itself; adding it to :data:`sys.modules` + beforehand prevents unbounded recursion in the worst case and multiple + loading in the best. + + If the load fails, the loader needs to remove any modules it may have + inserted into ``sys.modules``. If the module was already in + ``sys.modules`` then the loader should leave it alone. + + * The loader may set the ``__file__`` attribute of the module. If set, this + attribute's value must be a string. The loader may opt to leave + ``__file__`` unset if it has no semantic meaning (e.g. a module loaded from + a database). + + * The loader may set the ``__name__`` attribute of the module. While not + required, setting this attribute is highly recommended so that the + :meth:`repr()` of the module is more informative. + + * If module is a package (either regular or namespace), the loader must set + the module object's ``__path__`` attribute. The value must be a list, but + may be empty if ``__path__`` has no further significance to the importer. + More details on the semantics of ``__path__`` are given below. + + * The ``__loader__`` attribute must be set to the loader object that loaded + the module. This is mostly for introspection and reloading, but can be + used for additional importer-specific functionality, for example getting + data associated with an importer. + + * The module's ``__package__`` attribute should be set. Its value must be a + string, but it can be the same value as its ``__name__``. This is the + recommendation when the module is a package. When the module is not a + package, ``__package__`` should be set to the parent package's name. + + This attribute is used instead of ``__name__`` to calculate explicit + relative imports for main modules, as defined in :pep:`366`. + + * If the module is a Python module (as opposed to a built-in module or a + dynamically loaded extension), it should execute the module's code in the + module's global name space (``module.__dict__``). + + +Module reprs +------------ + +By default, all modules have a usable repr, however depending on the +attributes set above, and hooks in the loader, you can more tightly control +the repr of module objects. + +Loaders may implement a :meth:`module_repr()` method which takes a single +argument, the module object. When ``repr(module)`` is called for a module +with a loader supporting this protocol, whatever is returned from +``loader.module_repr(module)`` is returned as the module's repr without +further processing. This return value must be a string. + +If the module has no ``__loader__`` attribute, or the loader has no +:meth:`module_repr()` method, then the module object implementation itself +will craft a default repr using whatever information is available. It will +try to use the ``module.__name__``, ``module.__file__``, and +``module.__loader__`` as input into the repr, with defaults for whatever +information is missing. + +Here are the exact rules used: + + * If the module has an ``__loader__`` and that loader has a + :meth:`module_repr()` method, call it with a single argument, which is the + module object. The value returned is used as the module's repr. + + * If an exception occurs in :meth:`module_repr()`, the exception is caught + and discarded, and the calculation of the module's repr continues as if + :meth:`module_repr()` did not exist. + + * If the module has an ``__file__`` attribute, this is used as part of the + module's repr. + + * If the module has no ``__file__`` but does have an ``__loader__``, then the + loader's repr is used as part of the module's repr. + + * Otherwise, just use the module's ``__name__`` in the repr. + +This example, from :pep:`420` shows how a loader can craft its own module +repr:: + + class NamespaceLoader: + @classmethod + def module_repr(cls, module): + return "<module '{}' (namespace)>".format(module.__name__) + + +module.__path__ +--------------- + +By definition, if a module has an ``__path__`` attribute, it is a package, +regardless of its value. + +A package's ``__path__`` attribute is used during imports of its subpackages. +Within the import machinery, it functions much the same as :data:`sys.path`, +i.e. providing a list of locations to search for modules during import. +However, ``__path__`` is typically much more constrained than +:data:`sys.path`. + +``__path__`` must be a list, but it may be empty. The same rules used for +:data:`sys.path` also apply to a package's ``__path__``, and +:data:`sys.path_hooks` (described below) are consulted when traversing a +package's ``__path__``. + +A package's ``__init__.py`` file may set or alter the package's ``__path__`` +attribute, and this was typically the way namespace packages were implemented +prior to :pep:`420`. With the adoption of :pep:`420`, namespace packages no +longer need to supply ``__init__.py`` files containing only ``__path__`` +manipulation code; the namespace loader automatically sets ``__path__`` +correctly for the namespace package. + + +The Path Importer +================= + +.. index:: + single: path importer + +As mentioned previously, Python comes with several default meta path finders. +One of these, called the :term:`path importer`, knows how to provide +traditional file system imports. It implements all the semantics for finding +modules on the file system, handling special file types such as Python source +code (``.py`` files), Python byte code (``.pyc`` and ``.pyo`` files) and +shared libraries (e.g. ``.so`` files). + +In addition to being able to find such modules, there is built-in support for +loading these modules. To accomplish these two related tasks, additional +hooks and protocols are provided so that you can extend and customize the path +importer semantics. + +A word of warning: this section and the previous both use the term *finder*, +distinguishing between them by using the terms :term:`meta path finder` and +:term:`sys path finder`. Meta path finders and sys path finders are very +similar, support similar protocols, and function in similar ways during the +import process, but it's important to keep in mind that they are subtly +different. In particular, meta path finders operate at the beginning of the +import process, as keyed off the :data:`sys.meta_path` traversal. + +On the other hand, sys path finders are in a sense an implementation detail of +the path importer, and in fact, if the path importer were to be removed from +:data:`sys.meta_path`, none of the sys path finder semantics would be invoked. + + +sys path finders +---------------- + +.. index:: + single: sys.path + single: sys.path_hooks + single: sys.path_importer_cache + single: PYTHONPATH + +The path importer is responsible for finding and loading Python modules and +packages from the file system. As a meta path finder, it implements the +:meth:`find_module()` protocol previously described, however it exposes +additional hooks that can be used to customize how modules are found and +loaded from the file system. + +Three variables are used during file system import, :data:`sys.path`, +:data:`sys.path_hooks` and :data:`sys.path_importer_cache`. These provide +additional ways that the import machinery can be customized, in this case +specifically during file system path import. + +:data:`sys.path` contains a list of strings providing search locations for +modules and packages. It is initialized from the :data:`PYTHONPATH` +environment variable and various other installation- and +implementation-specific defaults. Entries in :data:`sys.path` can name +directories on the file system, zip files, and potentially other "locations" +that should be searched for modules. + +The path importer is a meta path finder, so the import machinery begins file +system search by calling the path importer's :meth:`find_module()` method as +described previously. When the ``path`` argument to :meth:`find_module()` is +given, it will be a list of string paths to traverse. If not, +:data:`sys.path` is used. + +The path importer iterates over every entry in the search path, and for each +of these, searches for an appropriate sys path finder for the path entry. +Because this can be an expensive operation (e.g. there are `stat()` call +overheads for this search), the path importer maintains a cache mapping path +entries to sys path finders. This cache is maintained in +:data:`sys.path_importer_cache`. In this way, the expensive search for a +particular path location's sys path finder need only be done once. User code +is free to remove cache entries from :data:`sys.path_importer_cache` forcing +the path importer to perform the path search again. + +If the path entry is not present in the cache, the path importer iterates over +every callable in :data:`sys.path_hooks`. Each entry in this list is called +with a single argument, the path entry being searched. This callable may +either return a sys path finder that can handle the path entry, or it may +raise :exc:`ImportError`. An :exc:`ImportError` is used by the path importer +to signal that the hook cannot find a sys path finder for that path entry. +The exception is ignored and :data:`sys.path_hooks` iteration continues. + +If :data:`sys.path_hooks` iteration ends with no sys path finder being +returned then the path importer's :meth:`find_module()` method will return +``None`` and an :exc:`ImportError` will be raised. + +If a sys path finder *is* returned by one of the callables on +:data:`sys.path_hooks`, then the following protocol is used to ask the sys +path finder for a module loader. If a loader results from this step, it is +used to load the module as previously described (i.e. its +:meth:`load_module()` method is called). + + +sys path finder protocol +------------------------ + +sys path finders support the same, traditional :meth:`find_module()` method +that meta path finders support, however sys path finder :meth:`find_module()` +methods are never called with a ``path`` argument. + +The :meth:`find_module()` method on sys path finders is deprecated though, and +instead sys path finders should implement the :meth:`find_loader()` method. +If it exists on the sys path finder, :meth:`find_loader()` will always be +called instead of :meth:`find_module()`. + +:meth:`find_loader()` takes one argument, the fully qualified name of the +module being imported. :meth:`find_loader()` returns a 2-tuple where the +first item is the loader and the second item is a namespace :term:`portion`. +When the first item (i.e. the loader) is ``None``, this means that while the +sys path finder does not have a loader for the named module, it knows that the +path entry contributes to a namespace portion for the named module. This will +almost always be the case where Python is asked to import a namespace package +that has no physical presence on the file system. When a sys path finder +returns ``None`` for the loader, the second item of the 2-tuple return value +must be a sequence, although it can be empty. + +If :meth:`find_loader()` returns a non-``None`` loader value, the portion is +ignored and the loader is returned from the path importer, terminating the +:data:`sys.path` search. + + +Open issues +=========== + +XXX What to say about `imp.NullImporter` when it's found in +:data:`sys.path_importer_cache`? + +XXX It would be really nice to have a diagram. + +.. [#fn1] The importlib implementation appears not to use the return value + directly. Instead, it gets the module object by looking the module name up + in ``sys.modules``.) + + +References +========== + +The import machinery has evolved considerably since Python's early days. The +original `specification for packages +<http://www.python.org/doc/essays/packages.html>`_ is still available to read, +although some details have changed since the writing of that document. + +The original specification for :data:`sys.meta_path` was :pep:`302`, with +subsequent extension in :pep:`420`, which also introduced namespace packages +without ``__init__.py`` files in Python 3.3. :pep:`420` also introduced the +:meth:`find_loader` protocol as an alternative to :meth:`find_module`. + +:pep:`366` describes the addition of the ``__package__`` attribute for +explicit relative imports in main modules. |