X-Git-Url: http://git.madism.org/?p=~madcoder%2Fpwqr.git;a=blobdiff_plain;f=Documentation%2Fpwqr.adoc;h=240c20a10c5a809f3b65a891ccb4f1b28e9dd7d3;hp=067b2aec896d2a75cee235562ea9776508b01306;hb=a5f7e5aaf5bb2168aca37066eb6b49d126689aa6;hpb=b29ccbcc4ab80bc7e7b2131349459cc9e96d49ef;ds=sidebyside

diff --git a/Documentation/pwqr.adoc b/Documentation/pwqr.adoc
index 067b2ae..240c20a 100644
--- a/Documentation/pwqr.adoc
+++ b/Documentation/pwqr.adoc
@@ -19,28 +19,29 @@ running::
 	schedulable threads.
 
 waiting::
-	This is the state of threads that are currently in a `PWQR_WAIT` call
-	from userspace (see `pwqr_ctl`) but that would not overcommit if
-	released by a `PWQR_WAKE` call.
+	This is the state of threads that are currently in a `PWQR_CTL_WAIT`
+	call from userspace (see `pwqr_ctl`) but that would not overcommit if
+	released by a `PWQR_CTL_WAKE` call.
 
 quarantined::
-	This is the state of threads that are currently in a `PWQR_WAIT` call
-	from userspace (see `pwqr_ctl`) but that would overcommit if released
-	by a `PWQR_WAKE` call.
+	This is the state of threads that are currently in a `PWQR_CTL_WAIT`
+	call from userspace (see `pwqr_ctl`) but that would overcommit if
+	released by a `PWQR_CTL_WAKE` call.
 +
 This state avoids waking a thread to force userland to "park" the thread, this
-is racy, make the scheduler work for nothing useful.  Though if `PWQR_WAKE` is
-called, quarantined threads are woken but with a `EDQUOT` errno set.
+is racy, make the scheduler work for nothing useful.  Though if
+`PWQR_CTL_WAKE` is called, quarantined threads are woken but with a `EDQUOT`
+errno set, and only one by one, no matter how wakes have been asked.
 +
-This state actually has only one impact: when `PWQR_WAKE` is called for more
-than one threads, for example 4, and that userland knows that there is 5
+This state actually has only one impact: when `PWQR_CTL_WAKE` is called for
+more than one threads, for example 4, and that userland knows that there is 5
 threads in WAIT state, but that actually 3 of them are in the quarantine, only
-2 will be woken up, and the `PWQR_WAKE` call will return 2. Any subsequent
-`PWQR_WAKE` call will wake up one quarantined thread to let it be parked, but
-returning 0 each time to hide that from userland.
+2 will be woken up, and the `PWQR_CTL_WAKE` call will return 2. Any subsequent
+`PWQR_CTL_WAKE` call will wake up one quarantined thread to let it be parked,
+but returning 0 each time to hide that from userland.
 
 parked::
-	This is the state of threads currently in a `PWQR_PARK` call from
+	This is the state of threads currently in a `PWQR_CTL_PARK` call from
 	userspace (see `pwqr_ctl`).
 
 
@@ -99,22 +100,14 @@ in kernel (poll solution)::
 +
 It sounds very easy, but it has one major drawback: it meaks the pwqfd must be
 somehow registered into the eventloop, and it's not very suitable for a
-pthread_workqueue implementation.
-
-in kernel (hack-ish solution)::
-	The kernel could voluntarily unpark/unblock a thread with another
-	errno that would signal overcommiting. Unlike the pollable proposal,
-	this doesn't require hooking in the event loop. Though it requires
-	having one such thread, which may not be the case when userland has
-	reached the peak number of threads it would ever want to use.
+pthread_workqueue implementation. In other words, if you can plug into the
+event-loop because it's a custom one or one that provides thread regulation
+then it's fine, if you can't (glib, libdispatch, ...) then you need a thread
+that will basically just poll() on this file-descriptor, it's really wasteful.
 +
-Is this really a problem? I'm not sure. Especially since when that happens
-userland could pick a victim thread that would call `PWQR_PARK` after each
-processed job, which would allow some kind of poor man's poll.
-+
-The drawback I see in that solution is that we wake up YET ANOTHER thread at a
-moment when we're already overcommiting, which sounds counter productive.
-That's why I didn't implement that.
+NOTE: this has been implemented now, but still it looks "expensive" to hook
+for some users. So if some alternative way to be signalled could exist, it'd
+be really awesome.
 
 in userspace::
 	Userspace knows how many "running" threads there are, it's easy to
@@ -122,24 +115,21 @@ in userspace::
 	already accounted for. When "waiting" is zero, if "registerd - parked"
 	is "High" userspace could choose to randomly try to park one thread.
 +
-I think `PWQR_PARK` could use `val` to have some "probing" mode, that would
-return `0` if it wouldn't block and `-1/EWOULDBLOCK` if it would in the non
-probing mode. Userspace could maintain some global probing_mode flag, that
-would be a tristate: NONE, SLOW, AGGRESSVE.
+userspace can use non blocking read() to probe if it's overcommiting.
 +
 It's in NONE when userspace belives it's not necessary to probe (e.g. when the
 amount of running + waiting threads isn't that large, say less than 110% of
 the concurrency or any kind of similar rule).
 +
 It's in SLOW mode else. In slow mode each thread does a probe every 32 or 64
-jobs to mitigate the cost of the syscall. If the probe returns EWOULDBLOCK
-then the thread goes to PARK mode, and the probing_mode goes to AGGRESSVE.
+jobs to mitigate the cost of the syscall. If the probe returns '1' then ask
+for down-commiting and stay in SLOW mode, if it returns AGAIN all is fine, if
+it returns more than '1' ask for down-commiting and go to AGGRESSIVE.
 +
 When AGGRESSVE threads check if they must park more often and in a more
 controlled fashion (every 32 or 64 jobs isn't nice because jobs can be very
 long), for example based on some poor man's timer (clock_gettime(MONOTONIC)
-sounds fine). As soon as a probe returns 0 or we're in the NONE conditions,
-then the probing_mode goes back to NONE/SLOW.
+sounds fine). State transition works as for SLOW.
 +
 The issue I have with this is that it sounds to add quite some code in the
 fastpath code, hence I dislike it a lot.
@@ -171,7 +161,21 @@ with a concurrency corresponding to the number of online CPUs at the time of
 the call, as would be returned by `sysconf(_SC_NPROCESSORS_ONLN)`.
 
 `flags`::
-	a mask of flags, currently only O_CLOEXEC.
+	a mask of flags among `PWQR_FL_CLOEXEC`, and `PWQR_FL_NONBLOCK`.
+
+Available operations on the pwqr file descriptor are:
+
+`poll`, `epoll` and friends::
+	the PWQR file descriptor can be watched for POLLIN events (not POLLOUT
+	ones as it can not be written to).
+
+`read`::
+	The file returned can be read upon. The read blocks (or fails setting
+	`EAGAIN` if in non blocking mode) until the regulator believes the
+	pool is overcommitting. The buffer passed to read should be able to
+	hold an integer. When `read(3)` is successful, it writes the amount of
+	overcommiting threads (understand: the number of threads to park so
+	that the pool isn't overcommiting anymore).
 
 RETURN VALUE
 ~~~~~~~~~~~~
@@ -204,51 +208,67 @@ by the file descriptor `pwqrfd`.
 
 Valid values for the `op` argument are:
 
-`PWQR_GET_CONC`::
+`PWQR_CTL_GET_CONC`::
 	Requests the current concurrency level for this regulator.
 
-`PWQR_SET_CONC`::
+`PWQR_CTL_SET_CONC`::
 	Modifies the current concurrency level for this regulator. The new
 	value is passed as the `val` argument. The requests returns the old
 	concurrency level on success.
 +
-	A zero or negative value for `val` means 'automatic' and is recomputed
-	as the current number of online CPUs as
-	`sysconf(_SC_NPROCESSORS_ONLN)` would return.
+A zero or negative value for `val` means 'automatic' and is recomputed as the
+current number of online CPUs as `sysconf(_SC_NPROCESSORS_ONLN)` would return.
 
-`PWQR_REGISTER`::
+`PWQR_CTL_REGISTER`::
 	Registers the calling thread to be taken into account by the pool
 	regulator. If the thread is already registered into another regulator,
 	then it's automatically unregistered from it.
 
-`PWQR_UNREGISTER`::
+`PWQR_CTL_UNREGISTER`::
 	Deregisters the calling thread from the pool regulator.
 
-`PWQR_WAKE`::
+`PWQR_CTL_WAKE`::
 	Tries to wake `val` threads from the pool. This is done according to
-	the current concurrency level not to overcommit. On success, the
-	number of woken threads is returned, it can be 0.
-
-`PWQR_WAKE_OC`::
+	the current concurrency level not to overcommit. On success, a hint of
+	the number of woken threads is returned, it can be 0.
++
+This is only a hint of the number of threads woken up for two reasons. First,
+the kernel could really have woken up a thread, but when it becomes scheduled,
+it could *then* decide that it would overcommit (because some other thread
+unblocked inbetween for example), and block it again.
++
+But it can also lie in the other direction: userland is supposed to account
+for waiting threads. So when we're overcommiting and userland want a waiting
+thread to be unblocked, we actually say we woke none, but still unblock one
+(the famous quarantined threads we talk about above). This allow the userland
+counter of waiting threads to decrease, but we know the thread won't be usable
+so we return 0.
+
+`PWQR_CTL_WAKE_OC`::
 	Tries to wake `val` threads from the pool. This is done bypassing the
 	current concurrency level (`OC` stands for `OVERCOMMIT`). On success,
-	the number of woken threads is returned, it can be 0.
+	the number of woken threads is returned, it can be 0, but it's the
+	real count that has been (or will soon be) woken up. If it's less than
+	required, it's because there aren't enough parked threads.
 
-`PWQR_WAIT`::
-	Puts the thread to wait for a future `PWQR_WAKE` command. If this
+`PWQR_CTL_WAIT`::
+	Puts the thread to wait for a future `PWQR_CTL_WAKE` command. If this
 	thread must be parked to maintain concurrency below the target, then
 	the call blocks with no further ado.
 +
 If the concurrency level is below the target, then the kernel checks if the
 address `addr` still contains the value `val` (in the fashion of `futex(2)`).
 If it doesn't then the call doesn't block. Else the calling thread is blocked
-until a `PWQR_WAKE` command is received.
+until a `PWQR_CTL_WAKE` command is received.
++
+`addr` must of course be a pointer to an aligned integer which stores the
+reference ticket in userland.
 
-`PWQR_PARK`::
+`PWQR_CTL_PARK`::
 	Puts the thread in park mode. Those are spare threads to avoid
 	cloning/exiting threads when the pool is regulated. Those threads are
 	released by the regulator only, and can only be woken from userland
-	with the `PWQR_WAKE_OC` command, and once all waiting threads have
+	with the `PWQR_CTL_WAKE_OC` command, and once all waiting threads have
 	been woken.
 +
 The call blocks until an overcommiting wake requires the thread, or the kernel
@@ -277,24 +297,24 @@ with a real syscall.
 [EINVAL]::
 	TODO
 
-Errors specific to `PWQR_REGISTER`:
+Errors specific to `PWQR_CTL_REGISTER`:
 
 [ENOMEM]::
 	There was insufficient memory to perform the operation.
 
-Errors specific to `PWQR_WAIT`:
+Errors specific to `PWQR_CTL_WAIT`:
 
 [EWOULDBLOCK]::
 	When the kernel evaluated if `addr` still contained `val` it didn't.
 	This works like `futex(2)`.
 
-Errors specific to `PWQR_WAIT` and `PWQR_PARK`:
+Errors specific to `PWQR_CTL_WAIT` and `PWQR_CTL_PARK`:
 
 [EINTR]::
 	The call was interrupted by a syscall (note that sometimes the kernel
 	masks this fact when it has more important "errors" to report like
 	`EDQUOT`).
 [EDQUOT]::
-	The thread has been woken by a `PWQR_WAKE` or `PWQR_WAKE_OC` call, but
-	is overcommiting.
+	The thread has been woken by a `PWQR_CTL_WAKE` or `PWQR_CTL_WAKE_OC`
+	call, but is overcommiting.