| CVE |
Vendors |
Products |
Updated |
CVSS v3.1 |
| In the Linux kernel, the following vulnerability has been resolved:
libceph: Fix potential null-ptr-deref in decode_choose_args()
A message of type CEPH_MSG_OSD_MAP contains an OSD map that itself
contains a CRUSH map. When decoding this CRUSH map in crush_decode(), an
array of max_buckets CRUSH buckets is decoded, where some indices may
not refer to actual buckets and are therefore set to NULL. The received
CRUSH map may optionally contain choose_args that get decoded in
decode_choose_args(). When decoding a crush_choose_arg_map, a series of
choose_args for different buckets is decoded, with the bucket_index
being read from the incoming message. It is only checked that the bucket
index does not exceed max_buckets, but not that it doesn't point to an
index with a NULL bucket. If a (potentially corrupted) message contains
a crush_choose_arg_map including such a bucket_index, a null pointer
dereference may occur in the subsequent processing when attempting to
access the bucket with the given index.
This patch fixes the issue by extending the affected check. Now, it is
only attempted to access the bucket if it is not NULL. |
| In the Linux kernel, the following vulnerability has been resolved:
libceph: Fix potential out-of-bounds access in __ceph_x_decrypt()
In __ceph_x_decrypt(), a part of the buffer p is interpreted as a
ceph_x_encrypt_header, and the magic field of this struct is accessed.
This happens without any guarantee that the buffer is large enough to
hold this struct. The function parameter ciphertext_len represents the
length of the ciphertext to decrypt and is guaranteed to be at most the
remaining size of the allocated buffer p. However, this value is not
necessarily greater than sizeof(ceph_x_encrypt_header). E.g., a message
frame of type FRAME_TAG_AUTH_REPLY_MORE, that is just as long to hold
the ciphertext at its end with a ciphertext_len of 8 or less, can
trigger an out-of-bounds memory access when accessing hdr->magic.
This patch fixes the issue by adding a check to ensure that the
decrypted plaintext in the buffer is large enough to represent at least
the ceph_x_encrypt_header. |
| In the Linux kernel, the following vulnerability has been resolved:
libceph: Fix potential out-of-bounds access in crush_decode()
A message of type CEPH_MSG_OSD_MAP containing a crush map with at least
one bucket has two fields holding the bucket algorithm. If the values
in these two fields differ, an out-of-bounds access can occur. This is
the case because the first algorithm field (alg) is used to allocate
the correct amount of memory for a bucket of this type, while the second
algorithm field inside the bucket (b->alg) is used in the subsequent
processing.
This patch fixes the issue by adding a check that compares alg and
b->alg and aborts the processing in case they differ. Furthermore,
b->alg is set to 0 in this case, because the destruction of the crush
map also uses this field to determine the bucket type, which can again
result in an out-of-bounds access when trying to free the memory pointed
to by the fields of the bucket. To correctly free the memory allocated
for the bucket in such a case, the corresponding call to kfree is moved
from the algorithm-specific crush_destroy_bucket functions to the
generic crush_destroy_bucket(). |
| In the Linux kernel, the following vulnerability has been resolved:
libceph: handle rbtree insertion error in decode_choose_args()
A message of type CEPH_MSG_OSD_MAP contains an OSD map that itself
contains a CRUSH map. The received CRUSH map may optionally contain
choose_args that get decoded in decode_choose_args(). In this function,
num_choose_arg_maps is read from the message, and a corresponding number
of crush_choose_arg_maps gets decoded afterwards. Each
crush_choose_arg_map has a choose_args_index, which serves as the key
when inserting it into the choose_args rbtree of the decoded crush_map.
If a (potentially corrupted) message contains two crush_choose_arg_maps
with the same index, the assertion in insert_choose_arg_map() triggers a
kernel BUG when trying to insert the second crush_choose_arg_map.
This patch fixes the issue by switching to the non-asserting rbtree
insertion function and rejecting the message if the insertion fails.
[ idryomov: changelog ] |
| In the Linux kernel, the following vulnerability has been resolved:
iommu/vt-d: Fix oops due to out of scope access
Below oops triggers when kill QEMU process:
Oops: general protection fault, probably for non-canonical address 0x7fffffff844eaaa7: 0000 [#1] SMP NOPTI
Call Trace:
<TASK>
do_raw_spin_lock+0xaa/0xc0
_raw_spin_lock_irqsave+0x21/0x40
domain_remove_dev_pasid+0x52/0x160
intel_nested_set_dev_pasid+0x1b9/0x1e0
__iommu_set_group_pasid+0x56/0x120
pci_dev_reset_iommu_done+0xe3/0x180
pcie_flr+0x65/0x160
__pci_reset_function_locked+0x5b/0x120
vfio_pci_core_close_device+0x63/0xe0 [vfio_pci_core]
vfio_df_close+0x4f/0xa0
vfio_df_unbind_iommufd+0x2d/0x60
vfio_device_fops_release+0x3e/0x40
__fput+0xe5/0x2c0
task_work_run+0x58/0xa0
do_exit+0x2c8/0x600
do_group_exit+0x2f/0xa0
get_signal+0x863/0x8c0
arch_do_signal_or_restart+0x24/0x100
exit_to_user_mode_loop+0x87/0x380
do_syscall_64+0x2ff/0x11e0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
The global static blocked domain is a dummy domain without corresponding
dmar_domain structure, accessing beyond iommu_domain structure triggers
oops easily. Fix it by return early in domain_remove_dev_pasid() like
identity domain. |
| In the Linux kernel, the following vulnerability has been resolved:
iommu: Fix WARN_ON in __iommu_group_set_domain_nofail() due to reset
In __iommu_group_set_domain_internal(), concurrent domain attachments are
rejected when any device in the group is recovering. This is necessary to
fence concurrent attachments to a multi-device group where devices might
share the same RID due to PCI DMA alias quirks, but triggers the WARN_ON in
__iommu_group_set_domain_nofail().
Other IOMMU_SET_DOMAIN_MUST_SUCCEED callers in detach/teardown paths, such
as __iommu_group_set_core_domain and __iommu_release_dma_ownership, should
not be rejected, as the domain would be freed anyway in these nofail paths
while group->domain is still pointing to it. So pci_dev_reset_iommu_done()
could trigger a UAF when re-attaching group->domain.
Honor the IOMMU_SET_DOMAIN_MUST_SUCCEED flag, allowing the callers through
the group->recovery_cnt fence, so as to update the group->domain pointer.
Instead add a gdev->blocked check in the device iteration loop, to prevent
any concurrent per-device detachment. |
| In the Linux kernel, the following vulnerability has been resolved:
drm/xe/dma-buf: handle empty bo and UAF races
There look to be some nasty races here when triggering the
invalidate_mappings hook:
1) We do xe_bo_alloc() followed by the attach, before the actual full bo
init step in xe_dma_buf_init_obj(). However the bo is visible on the
attachments list after the attach. This is bad since exporter driver,
say amdgpu, can at any time call back into our invalidate_mappings hook,
with an empty/bogus bo, leading to potential bugs/crashes.
2) Similar to 1) but here we get a UAF, when the invalidate_mappings
hook is triggered. For example, we get as far as xe_bo_init_locked()
but this fails in some way. But here the bo will be freed on error, but
we still have it attached from dma-buf pov, so if the
invalidate_mappings is now triggered then the bo we access is gone and
we trigger UAF and more bugs/crashes.
To fix this, move the attach step until after we actually have a fully
set up buffer object. Note that the bo is not published to userspace
until later, so not sure what the comment "Don't publish the bo
until we have a valid attachment", is referring to.
We have at least two different customers reporting hitting a NULL ptr
deref in evict_flags when importing something from amdgpu, followed by
triggering the evict flow. Hit rate is also pretty low, which would
hint at some kind of race, so something like 1) or 2) might explain
this.
v2:
- Shuffle the order of the ops slightly (no functional change)
- Improve the comment to better explain the ordering (Matt B)
(cherry picked from commit af1f2ad0c59fe4e2f924c526f66e968289d77971) |
| In the Linux kernel, the following vulnerability has been resolved:
drm/xe/dma-buf: fix UAF with retry loop
Retry doesn't work here, since bo will be freed on error, leading to
UAF. However, now that we do the alloc & init before the attach, we can
now combine this as one unit and have the init do the alloc for us. This
should make the retry safe.
Reported by Sashiko.
v2: Fix up the error unwind (CI)
(cherry picked from commit 479669418253e0f27f8cf5db01a731352ea592e7) |
| In the Linux kernel, the following vulnerability has been resolved:
net: qrtr: fix refcount saturation and potential UAF in qrtr_port_remove
In qrtr_port_remove(), the socket reference count is decremented via
__sock_put() before the port is removed from the qrtr_ports XArray and
before the RCU grace period elapses.
This breaks the fundamental RCU update paradigm. It exposes a race
window where a concurrent RCU reader (such as qrtr_reset_ports() or
qrtr_port_lookup()) can obtain a pointer to the socket from the XArray,
and attempt to call sock_hold() on a socket whose reference count has
already dropped to zero.
This exact race condition was hit during syzkaller fuzzing, leading to
the following refcount saturation warning and a potential Use-After-Free:
refcount_t: saturated; leaking memory.
WARNING: CPU: 3 PID: 1273 at lib/refcount.c:22 refcount_warn_saturate+0xae/0x1d0
Modules linked in: qrtr(+) bochs drm_shmem_helper ...
Call Trace:
<TASK>
qrtr_reset_ports net/qrtr/af_qrtr.c:768 [inline] [qrtr]
__qrtr_bind.isra.0+0x48b/0x570 net/qrtr/af_qrtr.c:805 [qrtr]
qrtr_bind+0x17d/0x210 net/qrtr/af_qrtr.c:901 [qrtr]
kernel_bind+0xe4/0x120 net/socket.c:3592
qrtr_ns_init+0x1a6/0x380 net/qrtr/ns.c:715 [qrtr]
qrtr_proto_init+0x3b/0xff0 net/qrtr/af_qrtr.c:169 [qrtr]
do_one_initcall+0xf5/0x5e0 init/main.c:1283
...
</TASK>
Fix this by deferring the reference count decrement until after the
xa_erase() and the synchronize_rcu() complete.
(Note: The v1 of this patch incorrectly replaced __sock_put() with
sock_put(). As Simon Horman pointed out, the callers of qrtr_port_remove()
still hold a reference to the socket, so freeing the socket memory here
would lead to a subsequent UAF in the caller. Thus, the __sock_put() is
kept, but only repositioned to close the RCU race.) |
| In the Linux kernel, the following vulnerability has been resolved:
fs/fcntl: fix SOFTIRQ-unsafe lock order in fasync signaling
A SOFTIRQ-safe to SOFTIRQ-unsafe lock order deadlock can occur in
send_sigio() and send_sigurg() when a process group receives a signal.
When FASYNC is configured for a process group (PIDTYPE_PGID), both
functions use read_lock(&tasklist_lock) to traverse the task list.
However, they are frequently called from softirq context:
- send_sigio() via input_inject_event -> kill_fasync
- send_sigurg() via tcp_check_urg -> sk_send_sigurg (NET_RX_SOFTIRQ)
The deadlock is caused by the rwlock writer fairness mechanism:
1. CPU 0 (process context) holds read_lock(&tasklist_lock) in do_wait().
2. CPU 1 (process context) attempts write_lock(&tasklist_lock) in
fork() or exit() and spins, which blocks all new readers.
3. CPU 0 is interrupted by a softirq (e.g., TCP URG packet reception).
4. The softirq calls send_sigurg() and attempts to acquire
read_lock(&tasklist_lock), deadlocking because CPU 1 is waiting.
Since PID hashing and do_each_pid_task() traversals are already
RCU-protected, the read_lock on tasklist_lock is no longer strictly
required for safe traversal. Fix this by replacing tasklist_lock with
rcu_read_lock(), aligning the process group signaling path with the
single-PID path. This also mitigates a potential remote denial of
service vector via TCP URG packets.
Lockdep splat:
=====================================================
WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected
[...]
Chain exists of:
&dev->event_lock --> &f_owner->lock --> tasklist_lock
Possible interrupt unsafe locking scenario:
CPU0 CPU1
---- ----
lock(tasklist_lock);
local_irq_disable();
lock(&dev->event_lock);
lock(&f_owner->lock);
<Interrupt>
lock(&dev->event_lock);
*** DEADLOCK *** |
| In the Linux kernel, the following vulnerability has been resolved:
Revert "wireguard: device: enable threaded NAPI"
This reverts commit 933466fc50a8e4eb167acbd0d8ec96a078462e9c which is
commit db9ae3b6b43c79b1ba87eea849fd65efa05b4b2e upstream.
We have had three independent production user reports in combination
with Cilium utilizing WireGuard as encryption underneath that k8s Pod
E/W traffic to certain peer nodes fully stalled. The situation appears
as follows:
- Occurs very rarely but at random times under heavy networking load.
- Once the issue triggers the decryption side stops working completely
for that WireGuard peer, other peers keep working fine. The stall
happens also for newly initiated connections towards that particular
WireGuard peer.
- Only the decryption side is affected, never the encryption side.
- Once it triggers, it never recovers and remains in this state,
the CPU/mem on that node looks normal, no leak, busy loop or crash.
- bpftrace on the affected system shows that wg_prev_queue_enqueue
fails, thus the MAX_QUEUED_PACKETS (1024 skbs!) for the peer's
rx_queue is reached.
- Also, bpftrace shows that wg_packet_rx_poll for that peer is never
called again after reaching this state for that peer. For other
peers wg_packet_rx_poll does get called normally.
- Commit db9ae3b ("wireguard: device: enable threaded NAPI")
switched WireGuard to threaded NAPI by default. The default has
not been changed for triggering the issue, neither did CPU
hotplugging occur (i.e. 5bd8de2 ("wireguard: queueing: always
return valid online CPU in wg_cpumask_choose_online()")).
- The issue has been observed with stable kernels of v5.15 as well as
v6.1. It was reported to us that v5.10 stable is working fine, and
no report on v6.6 stable either (somewhat related discussion in [0]
though).
- In the WireGuard driver the only material difference between v5.10
stable and v5.15 stable is the switch to threaded NAPI by default.
[0] https://lore.kernel.org/netdev/CA+wXwBTT74RErDGAnj98PqS=wvdh8eM1pi4q6tTdExtjnokKqA@mail.gmail.com/
Breakdown of the problem:
1) skbs arriving for decryption are enqueued to the peer->rx_queue in
wg_packet_consume_data via wg_queue_enqueue_per_device_and_peer.
2) The latter only moves the skb into the MPSC peer queue if it does
not surpass MAX_QUEUED_PACKETS (1024) which is kept track in an
atomic counter via wg_prev_queue_enqueue.
3) In case enqueueing was successful, the skb is also queued up
in the device queue, round-robin picks a next online CPU, and
schedules the decryption worker.
4) The wg_packet_decrypt_worker, once scheduled, picks these up
from the queue, decrypts the packets and once done calls into
wg_queue_enqueue_per_peer_rx.
5) The latter updates the state to PACKET_STATE_CRYPTED on success
and calls napi_schedule on the per peer->napi instance.
6) NAPI then polls via wg_packet_rx_poll. wg_prev_queue_peek checks
on the peer->rx_queue. It will wg_prev_queue_dequeue if the
queue->peeked skb was not cached yet, or just return the latter
otherwise. (wg_prev_queue_drop_peeked later clears the cache.)
7) From an ordering perspective, the peer->rx_queue has skbs in order
while the device queue with the per-CPU worker threads from a
global ordering PoV can finish the decryption and signal the skb
PACKET_STATE_CRYPTED out of order.
8) A situation can be observed that the first packet coming in will
be stuck waiting for the decryption worker to be scheduled for
a longer time when the system is under pressure.
9) While this is the case, the other CPUs in the meantime finish
decryption and call into napi_schedule.
10) Now in wg_packet_rx_poll it picks up the first in-order skb
from the peer->rx_queue and sees that its state is still
PACKET_STATE_UNCRYPTED. The NAPI poll routine then exits e
---truncated--- |
| In the Linux kernel, the following vulnerability has been resolved:
net: skbuff: fix missing zerocopy reference in pskb_carve helpers
pskb_carve_inside_header() and pskb_carve_inside_nonlinear() both copy
the old skb_shared_info header into a new buffer via memcpy(), which
includes the destructor_arg pointer (uarg) for MSG_ZEROCOPY skbs.
Neither function calls net_zcopy_get() for the new shinfo, creating an
unaccounted holder: every skb_shared_info with destructor_arg set will
call skb_zcopy_clear() once when freed, but the corresponding
net_zcopy_get() was never called for the new copy. Repeated calls
drive uarg->refcnt to zero prematurely, freeing ubuf_info_msgzc while
TX skbs still hold live destructor_arg pointers.
KASAN reports use-after-free on a freed ubuf_info_msgzc:
BUG: KASAN: slab-use-after-free in skb_release_data+0x77b/0x810
Read of size 8 at addr ffff88801574d3e8 by task poc/220
Call Trace:
skb_release_data+0x77b/0x810
kfree_skb_list_reason+0x13e/0x610
skb_release_data+0x4cd/0x810
sk_skb_reason_drop+0xf3/0x340
skb_queue_purge_reason+0x282/0x440
rds_tcp_inc_free+0x1e/0x30
rds_recvmsg+0x354/0x1780
__sys_recvmsg+0xdf/0x180
Allocated by task 219:
msg_zerocopy_realloc+0x157/0x7b0
tcp_sendmsg_locked+0x2892/0x3ba0
Freed by task 219:
ip_recv_error+0x74a/0xb10
tcp_recvmsg+0x475/0x530
The skb consuming the late access still referenced the same uarg via
shinfo->destructor_arg copied by pskb_carve_inside_nonlinear() without
a refcount bump. This has been verified to be reliably exploitable: a
working proof-of-concept achieves full root privilege escalation from
an unprivileged local user on a default kernel configuration.
The fix follows the pattern of pskb_expand_head() which has the same
memcpy/cloned structure. For pskb_carve_inside_header(), net_zcopy_get()
is placed after skb_orphan_frags() succeeds, so the orphan error path
needs no cleanup. For pskb_carve_inside_nonlinear(), net_zcopy_get() is
placed after all failure points and just before skb_release_data(), so
no error path needs cleanup at all -- matching pskb_expand_head() more
closely and avoiding the need for a balancing net_zcopy_put(). |
| In the Linux kernel, the following vulnerability has been resolved:
netfilter: nf_log: validate MAC header was set before dumping it
The fallback path of dump_mac_header() guards the MAC header access
only with "skb->mac_header != skb->network_header", without checking
skb_mac_header_was_set(). When the MAC header is unset, mac_header is
0xffff, so the test passes and skb_mac_header(skb) returns
skb->head + 0xffff, ~64 KiB past the buffer; the loop then reads
dev->hard_header_len bytes out of bounds into the kernel log.
This is reachable via the netdev logger: nf_log_unknown_packet() calls
dump_mac_header() unconditionally, and an skb sent through AF_PACKET
with PACKET_QDISC_BYPASS reaches the egress hook with mac_header still
unset (__dev_queue_xmit(), which would reset it, is bypassed).
Add the skb_mac_header_was_set() check the ARPHRD_ETHER path already
uses, and replace the open-coded MAC header length test with
skb_mac_header_len(). Only skbs with an unset MAC header are affected;
valid ones are dumped as before.
BUG: KASAN: slab-out-of-bounds in dump_mac_header (net/netfilter/nf_log_syslog.c:831)
Read of size 1 at addr ffff88800ea49d3f by task exploit/148
Call Trace:
kasan_report (mm/kasan/report.c:595)
dump_mac_header (net/netfilter/nf_log_syslog.c:831)
nf_log_netdev_packet (net/netfilter/nf_log_syslog.c:938 net/netfilter/nf_log_syslog.c:963)
nf_log_packet (net/netfilter/nf_log.c:260)
nft_log_eval (net/netfilter/nft_log.c:60)
nft_do_chain (net/netfilter/nf_tables_core.c:285)
nft_do_chain_netdev (net/netfilter/nft_chain_filter.c:307)
nf_hook_slow (net/netfilter/core.c:619)
nf_hook_direct_egress (net/packet/af_packet.c:257)
packet_xmit (net/packet/af_packet.c:280)
packet_sendmsg (net/packet/af_packet.c:3114)
__sys_sendto (net/socket.c:2265) |
| In the Linux kernel, the following vulnerability has been resolved:
xfrm: espintcp: do not reuse an in-progress partial send
espintcp keeps a single in-flight transmit in ctx->partial.
Before building a new sk_msg, espintcp_sendmsg() first tries to flush
that state through espintcp_push_msgs().
For blocking callers, espintcp_push_msgs() may return success even when
the previous partial send is still pending. espintcp_sendmsg() would
then reinitialize emsg->skmsg and reuse ctx->partial while the old
transfer still owns that state.
Do not rebuild the send message when ctx->partial is still in progress.
If espintcp_push_msgs() returns with emsg->len still set, fail the new
send instead of overwriting the live partial state.
This is a memory-safety fix: reusing the live partial-send state can
leave a stale offset attached to a new sk_msg and lead to an out-of-
bounds read in the send path.
tcp_sendmsg_locked() already handles waiting for send buffer memory, so
the fix here is just to preserve espintcp's one-message-at-a-time
transmit state. |
| In the Linux kernel, the following vulnerability has been resolved:
batman-adv: tvlv: reject oversized TVLV packets
batadv_tvlv_container_ogm_append() builds a TVLV packet section from
the tvlv.container_list. The total size of this section is computed by
batadv_tvlv_container_list_size(), which sums the sizes of all registered
containers.
The return type and accumulator in batadv_tvlv_container_list_size() were
u16. If the accumulated size exceeds U16_MAX, the value wraps around,
causing the subsequent allocation in batadv_tvlv_container_ogm_append()
to be undersized. The memcpy-style copy that follows would then write
beyond the end of the allocated buffer, corrupting kernel memory.
Fix this by widening the return type of batadv_tvlv_container_list_size()
to size_t. In batadv_tvlv_container_ogm_append(), check the computed length
against U16_MAX before proceeding, and bail out as if the allocation had
failed when the limit is exceeded. |
| In the Linux kernel, the following vulnerability has been resolved:
io_uring/poll: fix signed comparison in io_poll_get_ownership()
io_poll_get_ownership() uses a signed comparison to check whether
poll_refs has reached the threshold for the slowpath:
if (unlikely(atomic_read(&req->poll_refs) >= IO_POLL_REF_BIAS))
atomic_read() returns int (signed). When IO_POLL_CANCEL_FLAG
(BIT(31)) is set in poll_refs, the value becomes negative in
signed arithmetic, so the >= 128 comparison always evaluates to
false and the slowpath is never taken.
Fix this by casting the atomic_read() result to unsigned int
before the comparison, so that the cancel flag is treated as a
large positive value and correctly triggers the slowpath. |
| In the Linux kernel, the following vulnerability has been resolved:
xfrm: ipcomp: Free destination pages on acomp errors
Move the out_free_req label up by a couple of lines so that the
allocated dst SG list gets freed on error as well as success. |
| In the Linux kernel, the following vulnerability has been resolved:
batman-adv: tp_meter: avoid use of uninit sender vars
batadv_tp_recv_ack() and batadv_tp_stop() are only valid for tp_vars in the
BATADV_TP_SENDER role. When called with a BATADV_TP_RECEIVER role, it
proceeds to read sender-only members that were never initialized, leading
to undefined behavior.
This can be triggered when a node that is currently acting as a receiver in
an ongoing tp_meter session receives a malicious ACK packet.
Guard against this by checking tp_vars->role immediately after the
lookup and bailing out if it is not BATADV_TP_SENDER, before any of
those members are accessed. |
| In the Linux kernel, the following vulnerability has been resolved:
sctp: stream: fully roll back denied add-stream state
When ADD_OUT_STREAMS is denied, SCTP only shrinks the queued chunks and
then lowers outcnt. That leaves removed stream metadata behind, so a
later re-add can reuse a stale ext and hit a null-pointer dereference in
the scheduler get path.
Fix the rollback by tearing down the removed stream state the same way
other stream resizes do. Unschedule the current scheduler state, drop
the removed stream ext state with sctp_stream_outq_migrate(), and then
reschedule the remaining streams.
This keeps scheduler-private RR/FC/PRIO lists consistent while fully
rolling back denied outgoing stream additions. |
| In the Linux kernel, the following vulnerability has been resolved:
netfilter: ebtables: fix OOB read in compat_mtw_from_user
Luxiao Xu says:
The function compat_mtw_from_user() converts ebtables extensions from
32-bit user structures to kernel native structures. However, it lacks
proper validation of the user-supplied match_size/target_size.
When certain extensions are processed, the kernel-side translation
logic may perform memory accesses based on the extension's expected
size. If the user provides a size smaller than what the extension
requires, it results in an out-of-bounds read as reported by KASAN.
This fix introduces a check to ensure match_size is at least as large
as the extension's required compatsize. This covers matches, watchers,
and targets, while maintaining compatibility with standard targets.
AFAIU this is relevant for matches that need to go though
match->compat_from_user() call. Those that use plain memcpy with the
user-provided size are ok because the caller checks that size vs the
start of the next rule entry offset (which itself is checked vs. total
size copied from userspace).
The ->compat_from_user() callbacks assume they can read compatsize bytes,
so they need this extra check.
Based on an earlier patch from Luxiao Xu. |