Earlier today, a poorly formed search query was executed against one of our Elasticsearch clusters containing customer event logs. This query consumed all available search queues across the cluster and after numerous attempts, we were unable to terminate the queries while the cluster was online. As a result, we performed a rolling restart of the cluster, which resulted in a rebalance of the nodes. Once the rebalancing was completed, we re-enabled the events API and logs section of the control panel.
To help mitigate future issues, we've deployed timeouts on queries that will help prevent long-running operations from compromising the performance of our search infrastructure.
Jul 14, 16:03 PDT
The Events API and Logs tab have been bought back up and are now in a functional state. We are monitoring for continued issues.
Jul 14, 15:15 PDT
We have temporarily suspended the Events API and Logs tab while we continue to bring our Logs cluster back to a functional state.
Jul 14, 12:46 PDT
We are currently investigating timeouts and errors with Logs and Event API.
Jul 14, 10:27 PDT