Context Navigation

Changes between Version 12 and Version 13 of InstallationGuidelines/Cluster

-              v12
+              v13
  - Server becomes suddenly unavailable for about 15 minutes, before returning to normal
-Observations:
  - No signs of CPU/RAM saturation at the back-end (no signs of any back-end activity at all during the blackout)
  - Instant recovery: server is immediately back to normal after the blackout (no secondary delays)
 …
  - None of the logs showing any irregularities - no errors, no strange requests/responses
 Critically:
+Observations:
  - Blackout length is constant (900sec), and doesn't seem to depend on request complexity
 …
  - Typically occurs after a period of lower traffic, at the moment when traffic increases again
  - At least one of the workers had been inactive for a significant period of time (e.g. 4000+ seconds) before the hanging request
+ - Provoking a uWSGI "harakiri" (by setting the timeout well below 900sec), the traceback often (not always) shows the worker hanging in {{{PQgetResult}}}/{{{poll}}}
 === Reason ===
 There is a router between front- and back-end - which isn't the ideal configuration, of course, but normal in many clouds.