streamlining.

[~madcoder/pwqr.git] / Documentation / pwqr.adoc
diff --git a/Documentation/pwqr.adoc b/Documentation/pwqr.adoc

index 0c04746..240c20a 100644 (file)
--- a/Documentation/pwqr.adoc
+++ b/Documentation/pwqr.adoc
@@ -19,29 +19,29 @@ running::
         schedulable threads.
  
  waiting::
         schedulable threads.
  
  waiting::
-       This is the state of threads that are currently in a `PWQR_WAIT` call
-       from userspace (see `pwqr_ctl`) but that would not overcommit if
-       released by a `PWQR_WAKE` call.
+       This is the state of threads that are currently in a `PWQR_CTL_WAIT`
+       call from userspace (see `pwqr_ctl`) but that would not overcommit if
+       released by a `PWQR_CTL_WAKE` call.
  
  quarantined::
  
  quarantined::
-       This is the state of threads that are currently in a `PWQR_WAIT` call
-       from userspace (see `pwqr_ctl`) but that would overcommit if released
-       by a `PWQR_WAKE` call.
+       This is the state of threads that are currently in a `PWQR_CTL_WAIT`
+       call from userspace (see `pwqr_ctl`) but that would overcommit if
+       released by a `PWQR_CTL_WAKE` call.
  +
  This state avoids waking a thread to force userland to "park" the thread, this
  +
  This state avoids waking a thread to force userland to "park" the thread, this
-is racy, make the scheduler work for nothing useful.  Though if `PWQR_WAKE` is
-called, quarantined threads are woken but with a `EDQUOT` errno set, and only
-one by one, no matter how wakes have been asked.
+is racy, make the scheduler work for nothing useful.  Though if
+`PWQR_CTL_WAKE` is called, quarantined threads are woken but with a `EDQUOT`
+errno set, and only one by one, no matter how wakes have been asked.
  +
  +
-This state actually has only one impact: when `PWQR_WAKE` is called for more
-than one threads, for example 4, and that userland knows that there is 5
+This state actually has only one impact: when `PWQR_CTL_WAKE` is called for
+more than one threads, for example 4, and that userland knows that there is 5
  threads in WAIT state, but that actually 3 of them are in the quarantine, only
  threads in WAIT state, but that actually 3 of them are in the quarantine, only
-2 will be woken up, and the `PWQR_WAKE` call will return 2. Any subsequent
-`PWQR_WAKE` call will wake up one quarantined thread to let it be parked, but
-returning 0 each time to hide that from userland.
+2 will be woken up, and the `PWQR_CTL_WAKE` call will return 2. Any subsequent
+`PWQR_CTL_WAKE` call will wake up one quarantined thread to let it be parked,
+but returning 0 each time to hide that from userland.
  
  parked::
  
  parked::
-       This is the state of threads currently in a `PWQR_PARK` call from
+       This is the state of threads currently in a `PWQR_CTL_PARK` call from
         userspace (see `pwqr_ctl`).
  
  
         userspace (see `pwqr_ctl`).
  
  
@@ -100,22 +100,14 @@ in kernel (poll solution)::
  +
  It sounds very easy, but it has one major drawback: it meaks the pwqfd must be
  somehow registered into the eventloop, and it's not very suitable for a
  +
  It sounds very easy, but it has one major drawback: it meaks the pwqfd must be
  somehow registered into the eventloop, and it's not very suitable for a
-pthread_workqueue implementation.
-
-in kernel (hack-ish solution)::
-       The kernel could voluntarily unpark/unblock a thread with another
-       errno that would signal overcommiting. Unlike the pollable proposal,
-       this doesn't require hooking in the event loop. Though it requires
-       having one such thread, which may not be the case when userland has
-       reached the peak number of threads it would ever want to use.
+pthread_workqueue implementation. In other words, if you can plug into the
+event-loop because it's a custom one or one that provides thread regulation
+then it's fine, if you can't (glib, libdispatch, ...) then you need a thread
+that will basically just poll() on this file-descriptor, it's really wasteful.
  +
  +
-Is this really a problem? I'm not sure. Especially since when that happens
-userland could pick a victim thread that would call `PWQR_PARK` after each
-processed job, which would allow some kind of poor man's poll.
-+
-The drawback I see in that solution is that we wake up YET ANOTHER thread at a
-moment when we're already overcommiting, which sounds counter productive.
-That's why I didn't implement that.
+NOTE: this has been implemented now, but still it looks "expensive" to hook
+for some users. So if some alternative way to be signalled could exist, it'd
+be really awesome.
  
  in userspace::
         Userspace knows how many "running" threads there are, it's easy to
  
  in userspace::
         Userspace knows how many "running" threads there are, it's easy to
@@ -123,24 +115,21 @@ in userspace::
         already accounted for. When "waiting" is zero, if "registerd - parked"
         is "High" userspace could choose to randomly try to park one thread.
  +
         already accounted for. When "waiting" is zero, if "registerd - parked"
         is "High" userspace could choose to randomly try to park one thread.
  +
-I think `PWQR_PARK` could use `val` to have some "probing" mode, that would
-return `0` if it wouldn't block and `-1/EWOULDBLOCK` if it would in the non
-probing mode. Userspace could maintain some global probing_mode flag, that
-would be a tristate: NONE, SLOW, AGGRESSVE.
+userspace can use non blocking read() to probe if it's overcommiting.
  +
  It's in NONE when userspace belives it's not necessary to probe (e.g. when the
  amount of running + waiting threads isn't that large, say less than 110% of
  the concurrency or any kind of similar rule).
  +
  It's in SLOW mode else. In slow mode each thread does a probe every 32 or 64
  +
  It's in NONE when userspace belives it's not necessary to probe (e.g. when the
  amount of running + waiting threads isn't that large, say less than 110% of
  the concurrency or any kind of similar rule).
  +
  It's in SLOW mode else. In slow mode each thread does a probe every 32 or 64
-jobs to mitigate the cost of the syscall. If the probe returns EWOULDBLOCK
-then the thread goes to PARK mode, and the probing_mode goes to AGGRESSVE.
+jobs to mitigate the cost of the syscall. If the probe returns '1' then ask
+for down-commiting and stay in SLOW mode, if it returns AGAIN all is fine, if
+it returns more than '1' ask for down-commiting and go to AGGRESSIVE.
  +
  When AGGRESSVE threads check if they must park more often and in a more
  controlled fashion (every 32 or 64 jobs isn't nice because jobs can be very
  long), for example based on some poor man's timer (clock_gettime(MONOTONIC)
  +
  When AGGRESSVE threads check if they must park more often and in a more
  controlled fashion (every 32 or 64 jobs isn't nice because jobs can be very
  long), for example based on some poor man's timer (clock_gettime(MONOTONIC)
-sounds fine). As soon as a probe returns 0 or we're in the NONE conditions,
-then the probing_mode goes back to NONE/SLOW.
+sounds fine). State transition works as for SLOW.
  +
  The issue I have with this is that it sounds to add quite some code in the
  fastpath code, hence I dislike it a lot.
  +
  The issue I have with this is that it sounds to add quite some code in the
  fastpath code, hence I dislike it a lot.
@@ -172,7 +161,21 @@ with a concurrency corresponding to the number of online CPUs at the time of
  the call, as would be returned by `sysconf(_SC_NPROCESSORS_ONLN)`.
  
  `flags`::
  the call, as would be returned by `sysconf(_SC_NPROCESSORS_ONLN)`.
  
  `flags`::
-       a mask of flags, currently only O_CLOEXEC.
+       a mask of flags among `PWQR_FL_CLOEXEC`, and `PWQR_FL_NONBLOCK`.
+
+Available operations on the pwqr file descriptor are:
+
+`poll`, `epoll` and friends::
+       the PWQR file descriptor can be watched for POLLIN events (not POLLOUT
+       ones as it can not be written to).
+
+`read`::
+       The file returned can be read upon. The read blocks (or fails setting
+       `EAGAIN` if in non blocking mode) until the regulator believes the
+       pool is overcommitting. The buffer passed to read should be able to
+       hold an integer. When `read(3)` is successful, it writes the amount of
+       overcommiting threads (understand: the number of threads to park so
+       that the pool isn't overcommiting anymore).
  
  RETURN VALUE
  ~~~~~~~~~~~~
  
  RETURN VALUE
  ~~~~~~~~~~~~
@@ -205,27 +208,26 @@ by the file descriptor `pwqrfd`.
  
  Valid values for the `op` argument are:
  
  
  Valid values for the `op` argument are:
  
-`PWQR_GET_CONC`::
+`PWQR_CTL_GET_CONC`::
         Requests the current concurrency level for this regulator.
  
         Requests the current concurrency level for this regulator.
  
-`PWQR_SET_CONC`::
+`PWQR_CTL_SET_CONC`::
         Modifies the current concurrency level for this regulator. The new
         value is passed as the `val` argument. The requests returns the old
         concurrency level on success.
  +
         Modifies the current concurrency level for this regulator. The new
         value is passed as the `val` argument. The requests returns the old
         concurrency level on success.
  +
-       A zero or negative value for `val` means 'automatic' and is recomputed
-       as the current number of online CPUs as
-       `sysconf(_SC_NPROCESSORS_ONLN)` would return.
+A zero or negative value for `val` means 'automatic' and is recomputed as the
+current number of online CPUs as `sysconf(_SC_NPROCESSORS_ONLN)` would return.
  
  
-`PWQR_REGISTER`::
+`PWQR_CTL_REGISTER`::
         Registers the calling thread to be taken into account by the pool
         regulator. If the thread is already registered into another regulator,
         then it's automatically unregistered from it.
  
         Registers the calling thread to be taken into account by the pool
         regulator. If the thread is already registered into another regulator,
         then it's automatically unregistered from it.
  
-`PWQR_UNREGISTER`::
+`PWQR_CTL_UNREGISTER`::
         Deregisters the calling thread from the pool regulator.
  
         Deregisters the calling thread from the pool regulator.
  
-`PWQR_WAKE`::
+`PWQR_CTL_WAKE`::
         Tries to wake `val` threads from the pool. This is done according to
         the current concurrency level not to overcommit. On success, a hint of
         the number of woken threads is returned, it can be 0.
         Tries to wake `val` threads from the pool. This is done according to
         the current concurrency level not to overcommit. On success, a hint of
         the number of woken threads is returned, it can be 0.
@@ -242,28 +244,31 @@ thread to be unblocked, we actually say we woke none, but still unblock one
  counter of waiting threads to decrease, but we know the thread won't be usable
  so we return 0.
  
  counter of waiting threads to decrease, but we know the thread won't be usable
  so we return 0.
  
-`PWQR_WAKE_OC`::
+`PWQR_CTL_WAKE_OC`::
         Tries to wake `val` threads from the pool. This is done bypassing the
         current concurrency level (`OC` stands for `OVERCOMMIT`). On success,
         the number of woken threads is returned, it can be 0, but it's the
         real count that has been (or will soon be) woken up. If it's less than
         required, it's because there aren't enough parked threads.
  
         Tries to wake `val` threads from the pool. This is done bypassing the
         current concurrency level (`OC` stands for `OVERCOMMIT`). On success,
         the number of woken threads is returned, it can be 0, but it's the
         real count that has been (or will soon be) woken up. If it's less than
         required, it's because there aren't enough parked threads.
  
-`PWQR_WAIT`::
-       Puts the thread to wait for a future `PWQR_WAKE` command. If this
+`PWQR_CTL_WAIT`::
+       Puts the thread to wait for a future `PWQR_CTL_WAKE` command. If this
         thread must be parked to maintain concurrency below the target, then
         the call blocks with no further ado.
  +
  If the concurrency level is below the target, then the kernel checks if the
  address `addr` still contains the value `val` (in the fashion of `futex(2)`).
  If it doesn't then the call doesn't block. Else the calling thread is blocked
         thread must be parked to maintain concurrency below the target, then
         the call blocks with no further ado.
  +
  If the concurrency level is below the target, then the kernel checks if the
  address `addr` still contains the value `val` (in the fashion of `futex(2)`).
  If it doesn't then the call doesn't block. Else the calling thread is blocked
-until a `PWQR_WAKE` command is received.
+until a `PWQR_CTL_WAKE` command is received.
++
+`addr` must of course be a pointer to an aligned integer which stores the
+reference ticket in userland.
  
  
-`PWQR_PARK`::
+`PWQR_CTL_PARK`::
         Puts the thread in park mode. Those are spare threads to avoid
         cloning/exiting threads when the pool is regulated. Those threads are
         released by the regulator only, and can only be woken from userland
         Puts the thread in park mode. Those are spare threads to avoid
         cloning/exiting threads when the pool is regulated. Those threads are
         released by the regulator only, and can only be woken from userland
-       with the `PWQR_WAKE_OC` command, and once all waiting threads have
+       with the `PWQR_CTL_WAKE_OC` command, and once all waiting threads have
         been woken.
  +
  The call blocks until an overcommiting wake requires the thread, or the kernel
         been woken.
  +
  The call blocks until an overcommiting wake requires the thread, or the kernel
@@ -292,24 +297,24 @@ with a real syscall.
  [EINVAL]::
         TODO
  
  [EINVAL]::
         TODO
  
-Errors specific to `PWQR_REGISTER`:
+Errors specific to `PWQR_CTL_REGISTER`:
  
  [ENOMEM]::
         There was insufficient memory to perform the operation.
  
  
  [ENOMEM]::
         There was insufficient memory to perform the operation.
  
-Errors specific to `PWQR_WAIT`:
+Errors specific to `PWQR_CTL_WAIT`:
  
  [EWOULDBLOCK]::
         When the kernel evaluated if `addr` still contained `val` it didn't.
         This works like `futex(2)`.
  
  
  [EWOULDBLOCK]::
         When the kernel evaluated if `addr` still contained `val` it didn't.
         This works like `futex(2)`.
  
-Errors specific to `PWQR_WAIT` and `PWQR_PARK`:
+Errors specific to `PWQR_CTL_WAIT` and `PWQR_CTL_PARK`:
  
  [EINTR]::
         The call was interrupted by a syscall (note that sometimes the kernel
         masks this fact when it has more important "errors" to report like
         `EDQUOT`).
  [EDQUOT]::
  
  [EINTR]::
         The call was interrupted by a syscall (note that sometimes the kernel
         masks this fact when it has more important "errors" to report like
         `EDQUOT`).
  [EDQUOT]::
-       The thread has been woken by a `PWQR_WAKE` or `PWQR_WAKE_OC` call, but
-       is overcommiting.
+       The thread has been woken by a `PWQR_CTL_WAKE` or `PWQR_CTL_WAKE_OC`
+       call, but is overcommiting.