From 0fb6dd1141fb6076b2753133c797f0325c55a0df Mon Sep 17 00:00:00 2001 From: Pierre Habouzit Date: Sat, 14 Jan 2012 19:22:49 +0100 Subject: [PATCH] Write some nice documentation about the design and APIs. Signed-off-by: Pierre Habouzit --- .gitignore | 2 + Documentation/Makefile | 9 ++ Documentation/pwqr.adoc | 227 ++++++++++++++++++++++++++++++++++++++++ 3 files changed, 238 insertions(+) create mode 100644 Documentation/Makefile create mode 100644 Documentation/pwqr.adoc diff --git a/.gitignore b/.gitignore index 32058e5..4afe4d5 100644 --- a/.gitignore +++ b/.gitignore @@ -6,3 +6,5 @@ .tmp_versions Module.symvers modules.order + +/Documentation/pwqr.html diff --git a/Documentation/Makefile b/Documentation/Makefile new file mode 100644 index 0000000..a42b796 --- /dev/null +++ b/Documentation/Makefile @@ -0,0 +1,9 @@ +all: pwqr.html + +pwqr.html: pwqr.adoc + asciidoc -bhtml5 -o$@ $< + +clean: + $(RM) pwqr.html + +.PHONY: all clean diff --git a/Documentation/pwqr.adoc b/Documentation/pwqr.adoc new file mode 100644 index 0000000..57a85d4 --- /dev/null +++ b/Documentation/pwqr.adoc @@ -0,0 +1,227 @@ +Pthread WorkQueue Regulator +=========================== + +The Pthread Workqueue Regulator is meant to help userland regulate thread +pools based on the actual amount of threads that are running, the capacity of +the machines, the amount of blocked threads ... + +kernel-land design +------------------ + +In the kernel, threads registered in the pwq regulator can be in 4 states: + +blocked:: + This is the state of threads that are curently blocked in a syscall. + +running:: + This is the state of threads that are either really running, or have + been preempted out by the kernel. In other words it's the number of + schedulable threads. + +waiting:: + This is the state of threads that are currently in a `PWQR_WAIT` call + from userspace (see `pwqr_ctl`) but that would not overcommit if + released by a `PWQR_WAKE` call. + +quarantined:: + This is the state of threads that are currently in a `PWQR_WAIT` call + from userspace (see `pwqr_ctl`) but that would overcommit if released + by a `PWQR_WAKE` call. ++ +This state avoids waking a thread to force userland to "park" the thread, this +is racy, make the scheduler work for nothing useful. Though if `PWQR_WAKE` is +called, quarantined threads are woken but with a `EDQUOT` errno set. + +parked:: + This is the state of threads currently in a `PWQR_PARK` call from + userspace (see `pwqr_ctl`). + + +The regulator tries to maintain the following invariant: + + running + waiting == target_concurrency + || (running + waiting < target_concurrency && waiting > 0) + +When `running + waiting` overcommits:: + The kernel puts waiting threads into the quarantine, which doesn't + require anything from userland. It's something userland discovers only + when it needs a waiting thread, which may never happen. ++ +If there are no waiting threads, then well, the workqueue overcommits, and +that's one of the TODO items at the moment (see Notes) + +When `running + waiting` undercommits:: + If waiting is non-zero then well, we don't care, it's that userland + actually doesn't need work to be performed. ++ +If waiting is zero, then a parked thread (if such a thread) is woken up so +that userland has a chance to consume jobs. ++ +Unparking threads only when waiting becomes zero avoid flip-flops when the job +flow is small, and that some of the running threads sometimes blocks (IOW +running sometimes decreases, making `running + waiting` be below target +concurrency for very small amount of time). + +The regulation between running and waiting threads is left to userspace that +is a way better judge than kernel land that has absolutely no knowledge about +the current workload. Also, doing so means that when there are lots of jobs to +process and that the pool has a size that doesn't require more regulation, +kernel isn't called for mediation/regulation AT ALL. + +NOTE: right now threads are unparked as soon as `running + waiting` +undercommit, and some delay should be applied to be sure it's not a really +short blocking syscall that made us undercommit. + +NOTE: when we're overcommiting for a "long" time, userspace should be notified +in some way it should try to reduce its amount of running threads. Note that +the Apple implementation (before Lion at least) has the same issue. Though if +you imagine someone that spawns a zillion jobs that call very slow `msync()s` +or blocking `read()s` over the network, then that all those go back to running +state, the overcommit is huge. +A way to mitigate this atm is that when userspace belives the amount of +threads is abnormally high it should periodically try to PARK the threads. If +that blocks the thread, then it's that we were overcommiting. Note that it may +be the best solution rather than a kernel-side implementation. To be thought +over. + +pwqr_create +----------- +SYNOPSIS +~~~~~~~~ + + int pwqr_create(int flags); + +DESCRIPTION +~~~~~~~~~~~ +This call returns a new PWQR file-descriptor. The regulator is initialized +with a concurrency corresponding to the number of online CPUs at the time of +the call, as would be returned by `sysconf(_SC_NPROCESSORS_ONLN)`. + +`flags`:: + a mask of flags, currently only O_CLOEXEC. + +RETURN VALUE +~~~~~~~~~~~~ +On success, this call return a nonnegative file descriptor. +On error, -1 is returned, and errno is set to indicate the error. + +ERRORS +~~~~~~ +[EINVAL]:: + Invalid value specified in flags +[ENFILE]:: + The system limit on the total number of open files has been reached. +[ENOMEM]:: + There was insufficient memory to create the kernel object. + + +pwqr_ctl +-------- +SYNOPSIS +~~~~~~~~ + + int pwqr_ctl(int pwqrfd, int op, int val, void *addr); + + +DESCRIPTION +~~~~~~~~~~~ + +This system call performs control operations on the pwqr instance referred to +by the file descriptor `pwqrfd`. + +Valid values for the `op` argument are: + +`PWQR_GET_CONC`:: + Requests the current concurrency level for this regulator. + +`PWQR_SET_CONC`:: + Modifies the current concurrency level for this regulator. The new + value is passed as the `val` argument. The requests returns the old + concurrency level on success. ++ + A zero or negative value for `val` means 'automatic' and is recomputed + as the current number of online CPUs as + `sysconf(_SC_NPROCESSORS_ONLN)` would return. + +`PWQR_REGISTER`:: + Registers the calling thread to be taken into account by the pool + regulator. If the thread is already registered into another regulator, + then it's automatically unregistered from it. + +`PWQR_UNREGISTER`:: + Deregisters the calling thread from the pool regulator. + +`PWQR_WAKE`:: + Tries to wake `val` threads from the pool. This is done according to + the current concurrency level not to overcommit. On success, the + number of woken threads is returned, it can be 0. + +`PWQR_WAKE_OC`:: + Tries to wake `val` threads from the pool. This is done bypassing the + current concurrency level (`OC` stands for `OVERCOMMIT`). On success, + the number of woken threads is returned, it can be 0. + +`PWQR_WAIT`:: + Puts the thread to wait for a future `PWQR_WAKE` command. If this + thread must be parked to maintain concurrency below the target, then + the call blocks with no further ado. ++ +If the concurrency level is below the target, then the kernel checks if the +address `addr` still contains the value `val` (in the fashion of `futex(2)`). +If it doesn't then the call doesn't block. Else the calling thread is blocked +until a `PWQR_WAKE` command is received. + +`PWQR_PARK`:: + Puts the thread in park mode. Those are spare threads to avoid + cloning/exiting threads when the pool is regulated. Those threads are + released by the regulator only, and can only be woken from userland + with the `PWQR_WAKE_OC` command, and once all waiting threads have + been woken. ++ +The call blocks until an overcommiting wake requires the thread, or the kernel +regulator needs to grow the pool with new running threads. + +RETURN VALUE +~~~~~~~~~~~~ +When successful `pwqr_ctl` returns a nonnegative value. +On error, -1 is returned, and errno is set to indicate the error. + +ERRORS +~~~~~~ +[EBADF]:: + `pwqfd` is not a valid file descriptor. + +[EBADFD]:: + `pwqfd` is a valid pwqr file descriptor but is in a broken state: it + has been closed while other threads were in a pwqr_ctl call. ++ +NOTE: this is due to the current implementation and would probably not be here +with a real syscall. + +[EFAULT]:: + Error in reading value from `addr` from userspace. + +[EINVAL]:: + TODO + +Errors specific to `PWQR_REGISTER`: + +[ENOMEM]:: + There was insufficient memory to perform the operation. + +Errors specific to `PWQR_WAIT`: + +[EWOULDBLOCK]:: + When the kernel evaluated if `addr` still contained `val` it didn't. + This works like `futex(2)`. + +Errors specific to `PWQR_WAIT` and `PWQR_PARK`: + +[EINTR]:: + The call was interrupted by a syscall (note that sometimes the kernel + masks this fact when it has more important "errors" to report like + `EDQUOT`). +[EDQUOT]:: + The thread has been woken by a `PWQR_WAKE` or `PWQR_WAKE_OC` call, but + is overcommiting. + -- 2.20.1