Seth Woolley's Man Viewer

epoll(4) - epoll, epoll - I/O event notification facility - man 4 epoll

([section] manual, -k keyword, -K [section] search, -f whatis)
man plain no title

EPOLL(4)                   Linux Programmer's Manual                  EPOLL(4)



NAME
       epoll - I/O event notification facility

SYNOPSIS
       #include <sys/epoll.h>

DESCRIPTION
       epoll  is a variant of poll(2) that can be used either as Edge or Level
       Triggered interface and scales well to large numbers  of  watched  fds.
       Three  system  calls  are  provided to set(7,n,1 builtins) up and control an epoll set:
       epoll_create(2), epoll_ctl(2), epoll_wait(2).

       An epoll set(7,n,1 builtins) is connected to a file(1,n) descriptor  created  by  epoll_cre-
       ate(2).   Interest  for certain file(1,n) descriptors is then registered via
       epoll_ctl(2).  Finally, the actual wait is started by epoll_wait(2).


NOTES
       The epoll event distribution interface is able to behave both  as  Edge
       Triggered  ( ET ) and Level Triggered ( LT ). The difference between ET
       and LT event distribution mechanism can be described as  follows.  Sup-
       pose that this scenario happens :

       1      The  file(1,n)  descriptor  that represents the read(2,n,1 builtins) side of a pipe(2,8) (
              RFD ) is added inside the epoll device.

       2      Pipe writer writes 2Kb of data on the write(1,2) side of the pipe.

       3      A call to epoll_wait(2) is done that will return  RFD  as  ready
              file(1,n) descriptor.

       4      The pipe(2,8) reader reads 1Kb of data from RFD.

       5      A call to epoll_wait(2) is done.


       If  the RFD file(1,n) descriptor has been added to the epoll interface using
       the EPOLLET flag, the call to epoll_wait(2) done in(1,8) step 5 will  proba-
       bly  hang because of the available data still present in(1,8) the file(1,n) input
       buffers and the remote peer might be expecting a response based on  the
       data  it already sent. The reason for this is that Edge Triggered event
       distribution delivers events only when events happens on the  monitored
       file.  So, in(1,8) step 5 the caller might end up waiting for some data that
       is already present inside the input buffer. In the  above  example,  an
       event on RFD will be generated because of the write(1,2) done in(1,8) 2 , and the
       event is consumed in(1,8) 3.  Since the read(2,n,1 builtins) operation done in(1,8)  4  does  not
       consume the whole buffer data, the call to epoll_wait(2) done in(1,8) step 5
       might lock indefinitely. The epoll interface, when used with the  EPOL-
       LET flag ( Edge Triggered ) should use non-blocking file(1,n) descriptors to
       avoid having a blocking read(2,n,1 builtins) or write(1,2) starve the task that is  handling
       multiple  file(1,n)  descriptors.  The suggested way to use epoll as an Edge
       Triggered ( EPOLLET ) interface is  below,  and  possible  pitfalls  to
       avoid follow.

              i      with non-blocking file(1,n) descriptors

              ii     by  going  to  wait  for  an  event only after read(2,n,1 builtins)(2) or
                     write(1,2)(2) return EAGAIN

       On the contrary, when used as a Level Triggered interface, epoll is  by
       all means a faster poll(2), and can be used wherever the latter is used
       since it shares the same semantics. Since even with the Edge  Triggered
       epoll  multiple  events  can  be  generated  up on receival of multiple
       chunks of data, the caller has the option to specify  the  EPOLLONESHOT
       flag, to tell epoll to disable the associated file(1,n) descriptor after the
       receival of an event with epoll_wait(2).  When the EPOLLONESHOT flag is
       specified,  it  is  caller  responsibility to rearm the file(1,n) descriptor
       using epoll_ctl(2) with EPOLL_CTL_MOD.


EXAMPLE FOR SUGGESTED USAGE
       While the usage of epoll when employed like a Level Triggered interface
       does  have  the  same  semantics  of  poll(2),  an Edge Triggered usage
       requires more clarifiction to avoid stalls  in(1,8)  the  application  event
       loop.  In this example, listener is a non-blocking socket(2,7,n) on which lis-
       ten(2) has been called. The function do_use_fd()  uses  the  new  ready
       file(1,n) descriptor until EAGAIN is returned by either read(2,n,1 builtins)(2) or write(1,2)(2).
       An event driven state machine application should, after having received
       EAGAIN,  record  its  current  state  so  that  at  the  next  call  to
       do_use_fd() it will continue to  read(2,n,1 builtins)(2)  or  write(1,2)(2)  from  where  it
       stopped before.

       struct epoll_event ev, *events;

       for(;;) {
           nfds = epoll_wait(kdpfd, events, maxevents, -1);

           for(n = 0; n < nfds; ++n) {
               if(3,n)(events[n].data.fd == listener) {
                   client = accept(2,8)(listener, (struct sockaddr *) &local,
                                   &addrlen);
                   if(3,n)(client < 0){
                       perror(1,3)("accept(2,8)");
                       continue;
                   }
                   setnonblocking(client);
                   ev.events = EPOLLIN | EPOLLET;
                   ev.data.fd = client;
                   if(3,n) (epoll_ctl(kdpfd, EPOLL_CTL_ADD, client, &ev) < 0) {
                       fprintf(stderr, "epoll set(7,n,1 builtins) insertion error: fd=%d0,
                               client);
                       return -1;
                   }
               }
               else
                   do_use_fd(events[n].data.fd);
           }
       }

       When  used  as an Edge triggered interface, for performance reasons, it
       is possible to add the file(1,n) descriptor inside  the  epoll  interface  (
       EPOLL_CTL_ADD  )  once  by specifying ( EPOLLIN|EPOLLOUT ). This allows
       you to avoid continuously switching between EPOLLIN and EPOLLOUT  call-
       ing epoll_ctl(2) with EPOLL_CTL_MOD.


QUESTIONS AND ANSWERS (from linux-kernel)
              Q1     What  happens  if(3,n)  you  add  the  same fd to an epoll_set
                     twice?

              A1     You will probably get EEXIST.  However,  it  is  possible
                     that  two  threads  may  add the same fd twice. This is a
                     harmless condition.

              Q2     Can two epoll sets wait for  the  same  fd?  If  so,  are
                     events reported to both epoll sets fds?

              A2     Yes.  However,  it  is  not  recommended. Yes it would be
                     reported to both.

              Q3     Is the epoll fd itself poll/epoll/selectable?

              A3     Yes.

              Q4     What happens if(3,n) the epoll fd is put into its own fd  set(7,n,1 builtins)?

              A4     It  will  fail.  However,  you can add an epoll fd inside
                     another epoll fd set.

              Q5     Can I send(2,n) the epoll fd over  a  unix-socket  to  another
                     process?

              A5     No.

              Q6     Will  the  close(2,7,n) of an fd cause it to be removed from all
                     epoll sets automatically?

              A6     Yes.

              Q7     If more than one event  comes  in(1,8)  between  epoll_wait(2)
                     calls, are they combined or reported separately?

              A7     They will be combined.

              Q8     Does  an  operation on an fd affect the already collected
                     but not yet reported events?

              A8     You can do two operations on an existing fd. Remove would
                     be  meaningless for this case. Modify will re-read avail-
                     able I/O.

              Q9     Do I need to continuously read(2,n,1 builtins)/write(1,2) an fd  until  EAGAIN
                     when  using the EPOLLET flag ( Edge Triggered behaviour )
                     ?

              A9     No you  don't.  Receiving  an  event  from  epoll_wait(2)
                     should  suggest to you that such file(1,n) descriptor is ready
                     for the requested I/O operation. You have simply to  con-
                     sider  it  ready  until you will receive the next EAGAIN.
                     When and  how  you  will  use  such  file(1,n)  descriptor  is
                     entirely   up  to  you.  Also,  the  condition  that  the
                     read(2,n,1 builtins)/write(1,2) I/O space is  exhausted  can  be  detected  by
                     checking the amount of data read(2,n,1 builtins)/write(1,2) from/to the target
                     file(1,n) descriptor. For example, if(3,n) you call read(2,n,1 builtins)(2) by ask-
                     ing  to read(2,n,1 builtins) a certain amount of data and read(2,n,1 builtins)(2) returns
                     a lower  number  of  bytes,  you  can  be  sure  to  have
                     exhausted  the  read(2,n,1 builtins)  I/O space for such file(1,n) descriptor.
                     Same is valid when writing using the write(1,2)(2) function.


POSSIBLE PITFALLS AND WAYS TO AVOID THEM
              o Starvation ( Edge Triggered )

              If there is a large amount of I/O space, it is possible that  by
              trying  to drain it the other files will not get processed caus-
              ing starvation. This is not specific to epoll.


              The solution is to maintain a  ready  list  and  mark  the  file(1,n)
              descriptor  as  ready  in(1,8) its associated data structure, thereby
              allowing the application to remember which files need to be pro-
              cessed  but  still round robin amongst all the ready files. This
              also supports ignoring subsequent events you  receive  for  fd's
              that are already ready.



              o If using an event cache...

              If  you  use  an event cache or store all the fd's returned from
              epoll_wait(2), then make sure to provide a way to mark its  clo-
              sure  dynamically (ie- caused by a previous event's processing).
              Suppose you receive 100 events from epoll_wait(2), and in(1,8) eventi
              #47  a  condition  causes event #13 to be closed.  If you remove
              the structure and close(2,7,n)() the fd for event #13, then your  event
              cache might still say there are events waiting for that fd caus-
              ing confusion.


              One solution for this is to call, during the processing of event
              47,  epoll_ctl(EPOLL_CTL_DEL)  to delete fd 13 and close(2,7,n)(), then
              mark its associated data structure as removed and link(1,2) it  to  a
              cleanup  list. If you find another event for fd 13 in(1,8) your batch
              processing, you will discover(1,3,5) the fd had been previously removed
              and there will be no confusion.



CONFORMING TO
       epoll(4) is a new API introduced in(1,8) Linux kernel 2.5.44.  Its interface
       should be finalized in(1,8) Linux kernel 2.5.66.

SEE ALSO
       epoll_create(2), epoll_ctl(2), epoll_wait(2)



Linux                             2002-10-23                          EPOLL(4)

References for this manual (incoming links)