Documentation / bpf / map_devmap.rst


Based on kernel version 6.8. Page generated on 2024-03-11 21:26 EST.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238
.. SPDX-License-Identifier: GPL-2.0-only
.. Copyright (C) 2022 Red Hat, Inc.

=================================================
BPF_MAP_TYPE_DEVMAP and BPF_MAP_TYPE_DEVMAP_HASH
=================================================

.. note::
   - ``BPF_MAP_TYPE_DEVMAP`` was introduced in kernel version 4.14
   - ``BPF_MAP_TYPE_DEVMAP_HASH`` was introduced in kernel version 5.4

``BPF_MAP_TYPE_DEVMAP`` and ``BPF_MAP_TYPE_DEVMAP_HASH`` are BPF maps primarily
used as backend maps for the XDP BPF helper call ``bpf_redirect_map()``.
``BPF_MAP_TYPE_DEVMAP`` is backed by an array that uses the key as
the index to lookup a reference to a net device. While ``BPF_MAP_TYPE_DEVMAP_HASH``
is backed by a hash table that uses a key to lookup a reference to a net device.
The user provides either <``key``/ ``ifindex``> or <``key``/ ``struct bpf_devmap_val``>
pairs to update the maps with new net devices.

.. note::
    - The key to a hash map doesn't have to be an ``ifindex``.
    - While ``BPF_MAP_TYPE_DEVMAP_HASH`` allows for densely packing the net devices
      it comes at the cost of a hash of the key when performing a look up.

The setup and packet enqueue/send code is shared between the two types of
devmap; only the lookup and insertion is different.

Usage
=====
Kernel BPF
----------
bpf_redirect_map()
^^^^^^^^^^^^^^^^^^
.. code-block:: c

    long bpf_redirect_map(struct bpf_map *map, u32 key, u64 flags)

Redirect the packet to the endpoint referenced by ``map`` at index ``key``.
For ``BPF_MAP_TYPE_DEVMAP`` and ``BPF_MAP_TYPE_DEVMAP_HASH`` this map contains
references to net devices (for forwarding packets through other ports).

The lower two bits of *flags* are used as the return code if the map lookup
fails. This is so that the return value can be one of the XDP program return
codes up to ``XDP_TX``, as chosen by the caller. The higher bits of ``flags``
can be set to ``BPF_F_BROADCAST`` or ``BPF_F_EXCLUDE_INGRESS`` as defined
below.

With ``BPF_F_BROADCAST`` the packet will be broadcast to all the interfaces
in the map, with ``BPF_F_EXCLUDE_INGRESS`` the ingress interface will be excluded
from the broadcast.

.. note::
    - The key is ignored if BPF_F_BROADCAST is set.
    - The broadcast feature can also be used to implement multicast forwarding:
      simply create multiple DEVMAPs, each one corresponding to a single multicast group.

This helper will return ``XDP_REDIRECT`` on success, or the value of the two
lower bits of the ``flags`` argument if the map lookup fails.

More information about redirection can be found :doc:`redirect`

bpf_map_lookup_elem()
^^^^^^^^^^^^^^^^^^^^^
.. code-block:: c

   void *bpf_map_lookup_elem(struct bpf_map *map, const void *key)

Net device entries can be retrieved using the ``bpf_map_lookup_elem()``
helper.

User space
----------
.. note::
    DEVMAP entries can only be updated/deleted from user space and not
    from an eBPF program. Trying to call these functions from a kernel eBPF
    program will result in the program failing to load and a verifier warning.

bpf_map_update_elem()
^^^^^^^^^^^^^^^^^^^^^
.. code-block:: c

   int bpf_map_update_elem(int fd, const void *key, const void *value, __u64 flags);

Net device entries can be added or updated using the ``bpf_map_update_elem()``
helper. This helper replaces existing elements atomically. The ``value`` parameter
can be ``struct bpf_devmap_val`` or a simple ``int ifindex`` for backwards
compatibility.

 .. code-block:: c

    struct bpf_devmap_val {
        __u32 ifindex;   /* device index */
        union {
            int   fd;  /* prog fd on map write */
            __u32 id;  /* prog id on map read */
        } bpf_prog;
    };

The ``flags`` argument can be one of the following:
  - ``BPF_ANY``: Create a new element or update an existing element.
  - ``BPF_NOEXIST``: Create a new element only if it did not exist.
  - ``BPF_EXIST``: Update an existing element.

DEVMAPs can associate a program with a device entry by adding a ``bpf_prog.fd``
to ``struct bpf_devmap_val``. Programs are run after ``XDP_REDIRECT`` and have
access to both Rx device and Tx device. The  program associated with the ``fd``
must have type XDP with expected attach type ``xdp_devmap``.
When a program is associated with a device index, the program is run on an
``XDP_REDIRECT`` and before the buffer is added to the per-cpu queue. Examples
of how to attach/use xdp_devmap progs can be found in the kernel selftests:

- ``tools/testing/selftests/bpf/prog_tests/xdp_devmap_attach.c``
- ``tools/testing/selftests/bpf/progs/test_xdp_with_devmap_helpers.c``

bpf_map_lookup_elem()
^^^^^^^^^^^^^^^^^^^^^
.. code-block:: c

.. c:function::
   int bpf_map_lookup_elem(int fd, const void *key, void *value);

Net device entries can be retrieved using the ``bpf_map_lookup_elem()``
helper.

bpf_map_delete_elem()
^^^^^^^^^^^^^^^^^^^^^
.. code-block:: c

.. c:function::
   int bpf_map_delete_elem(int fd, const void *key);

Net device entries can be deleted using the ``bpf_map_delete_elem()``
helper. This helper will return 0 on success, or negative error in case of
failure.

Examples
========

Kernel BPF
----------

The following code snippet shows how to declare a ``BPF_MAP_TYPE_DEVMAP``
called tx_port.

.. code-block:: c

    struct {
        __uint(type, BPF_MAP_TYPE_DEVMAP);
        __type(key, __u32);
        __type(value, __u32);
        __uint(max_entries, 256);
    } tx_port SEC(".maps");

The following code snippet shows how to declare a ``BPF_MAP_TYPE_DEVMAP_HASH``
called forward_map.

.. code-block:: c

    struct {
        __uint(type, BPF_MAP_TYPE_DEVMAP_HASH);
        __type(key, __u32);
        __type(value, struct bpf_devmap_val);
        __uint(max_entries, 32);
    } forward_map SEC(".maps");

.. note::

    The value type in the DEVMAP above is a ``struct bpf_devmap_val``

The following code snippet shows a simple xdp_redirect_map program. This program
would work with a user space program that populates the devmap ``forward_map`` based
on ingress ifindexes. The BPF program (below) is redirecting packets using the
ingress ``ifindex`` as the ``key``.

.. code-block:: c

    SEC("xdp")
    int xdp_redirect_map_func(struct xdp_md *ctx)
    {
        int index = ctx->ingress_ifindex;

        return bpf_redirect_map(&forward_map, index, 0);
    }

The following code snippet shows a BPF program that is broadcasting packets to
all the interfaces in the ``tx_port`` devmap.

.. code-block:: c

    SEC("xdp")
    int xdp_redirect_map_func(struct xdp_md *ctx)
    {
        return bpf_redirect_map(&tx_port, 0, BPF_F_BROADCAST | BPF_F_EXCLUDE_INGRESS);
    }

User space
----------

The following code snippet shows how to update a devmap called ``tx_port``.

.. code-block:: c

    int update_devmap(int ifindex, int redirect_ifindex)
    {
        int ret;

        ret = bpf_map_update_elem(bpf_map__fd(tx_port), &ifindex, &redirect_ifindex, 0);
        if (ret < 0) {
            fprintf(stderr, "Failed to update devmap_ value: %s\n",
                strerror(errno));
        }

        return ret;
    }

The following code snippet shows how to update a hash_devmap called ``forward_map``.

.. code-block:: c

    int update_devmap(int ifindex, int redirect_ifindex)
    {
        struct bpf_devmap_val devmap_val = { .ifindex = redirect_ifindex };
        int ret;

        ret = bpf_map_update_elem(bpf_map__fd(forward_map), &ifindex, &devmap_val, 0);
        if (ret < 0) {
            fprintf(stderr, "Failed to update devmap_ value: %s\n",
                strerror(errno));
        }
        return ret;
    }

References
===========

- https://lwn.net/Articles/728146/
- https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/commit/?id=6f9d451ab1a33728adb72d7ff66a7b374d665176
- https://elixir.bootlin.com/linux/latest/source/net/core/filter.c#L4106