Changes between Version 12 and Version 13 of InstallationGuidelines/Cluster
- Timestamp:
- 11/30/17 11:53:17 (7 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
InstallationGuidelines/Cluster
v12 v13 27 27 28 28 - Server becomes suddenly unavailable for about 15 minutes, before returning to normal 29 30 Observations:31 32 29 - No signs of CPU/RAM saturation at the back-end (no signs of any back-end activity at all during the blackout) 33 30 - Instant recovery: server is immediately back to normal after the blackout (no secondary delays) … … 35 32 - None of the logs showing any irregularities - no errors, no strange requests/responses 36 33 37 Critically:34 Observations: 38 35 39 36 - Blackout length is constant (900sec), and doesn't seem to depend on request complexity … … 41 38 - Typically occurs after a period of lower traffic, at the moment when traffic increases again 42 39 - At least one of the workers had been inactive for a significant period of time (e.g. 4000+ seconds) before the hanging request 40 - Provoking a uWSGI "harakiri" (by setting the timeout well below 900sec), the traceback often (not always) shows the worker hanging in {{{PQgetResult}}}/{{{poll}}} 43 41 === Reason === 44 42 There is a router between front- and back-end - which isn't the ideal configuration, of course, but normal in many clouds.