diff options
author | Raghavendra G <rgowdapp@redhat.com> | 2017-08-29 15:07:53 +0530 |
---|---|---|
committer | jiffin tony Thottan <jthottan@redhat.com> | 2017-09-11 04:59:34 +0000 |
commit | 4867647db935439abdd8fb19d39416ce1d83b081 (patch) | |
tree | 3d762d70b23434f399cb5621bb50407ad3abee4b /xlators/cluster/afr/src/afr-self-heal.h | |
parent | 143714d96eff50501b1a5a3debf794cae9f91005 (diff) |
event/epoll: don't call handler for events received after a pollerr
we register socket with EPOLLONESHOT, which means it has to be
explicitly added back through epoll_ctl to receive more
events. Normally we do this once the handler completes processing of
current event. But event_select_on_epoll is one asynchronous codepath
where socket can be added back for polling while an event on the same
socket is being processed. event_select_on_epoll has a check whether
an event is being processed in the form of slot->in_handler. But this
check is not sufficient enough to prevent parallel events as
slot->in_handler is not atomically incremented with respect to
reception of the event. This means following imaginary sequence of
events can happen:
* epoll_wait returns with a POLLERR - say POLLERR1 - on a socket
(sock1) associated with slot s1. socket_event_handle_pollerr is yet
to be invoked.
* an event_select_on called from __socket_ioq_churn which was called
in request/reply/msg submission codepath (as opposed to
__socket_ioq_churn called as part of POLLOUT handling - we cannot
receive a POLLOUT due to EPOLLONESHOT) adds back sock1 for polling.
* since sock1 was added back for polling in step 2 and our polling is
level-triggered, another thread picks up another POLLERR event - say
POLLERR2. socket_event_handler is invoked as part of processing
POLLERR2 and it completes execution setting priv->sock to -1.
* event_unregister_epoll called as part of __socket_reset due to
POLLERR1 would receive fd as -1 resulting in assert failure.
Also, since the first pollerr event has done rpc_transport_unref,
subsequent parallel events (not just pollerr, but other events too)
could be acting on a freed up transport too.
>Change-Id: I5db755068e7890ec755b59f7a35a57da110339eb
>BUG: 1486134
>Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
>Reviewed-on: https://review.gluster.org/18129
>Smoke: Gluster Build System <jenkins@build.gluster.org>
>CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
>Reviewed-by: mohammed rafi kc <rkavunga@redhat.com>
(cherry picked from commit b1b49997574eeb7c6a42e6e8257c81ac8d2d7578)
Change-Id: I5db755068e7890ec755b59f7a35a57da110339eb
BUG: 1489296
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
Reviewed-on: https://review.gluster.org/18223
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: jiffin tony Thottan <jthottan@redhat.com>
Diffstat (limited to 'xlators/cluster/afr/src/afr-self-heal.h')
0 files changed, 0 insertions, 0 deletions