What is a breaker in Elasticsearch

Elasticsearch: date_histogram of date_range with zero endpoints triggers a breaker

I encountered a very similar problem.

I have ~ 180 million documents from the last 4 years or so. Each document has a field that can range from a few days to a few years. When I do a on the date range, I always get buckets for each date in the range the documents are in, and there is no way to limit myself to a subset of those buckets.

If I query monthly, it doesn't matter, I just get more data back than I need (every month between the minimum month in the matching documents and the maximum month). However, a real problem arises when I aggregate on days. A query that I'm trying to display data for, say, 30 days actually includes all days in the range, which can be over 1500 buckets, one for each day in the range of the matching document (not my query scope). Most of the time, this times out Elasticsearch and I never get data back.

This problem prevents me from doing the -aggregation for -fields. If there was something like that could prevent the aggregation from looking at buckets that don't fall in the limited range, I think that would solve both my problem and the OP problem.


An example query I would run (using the same field name as the OP) would be something like this:

Since documents could be "active" today in 2016, the ES query generates all buckets from day one in the document area until today.

It feels like Elasticsearch could instruct not to aggregate buckets outside of the specified range, but the documentation states that it specifically does not. Either could start with this (break change) or we could introduce another field or something which would be a tough enforcement at aggregation time.

A script might be a good solution for this, but it seems that a field similar to this would make a little more sense for this type of aggregation. Or if there was a way to instruct bucket aggregations to ignore certain buckets at aggregation time (like a bucket selector aggregation that only runs before and not after aggregation), which is what happens with more than just this type of range histogram aggregation could be useful.