dcsimg
September 25, 2016
Hot Topics:

Addressing Internal Apache Storm Buffers Overflowing

  • June 23, 2015
  • By Sean Allen
  • Send Email »
  • More Articles »

By Sean T. Allen

The best place to see whether any of Storm's internal buffers are overflowing is the debug log output in Storm's logs. Assuming you're using a shuffle grouping to distribute tuples evenly among bolts and tasks, checking the value for any task of a given bolt should be enough to determine how close you are to capacity. If you're using a grouping that doesn't evenly distribute tuples among bolts and tasks, you may have a harder time quickly spotting the problem.

A little automated log analysis should get you where you need to be, though. The pattern of the log entries is well established, and pulling out each entry and looking for population values that are at or near capacity would be a matter of constructing and using an appropriate tool.

Once you find some internal Storm buffers overflowing, you can address them in one of four primary ways. These aren't all-or-nothing options—you can mix and match as needed in order to address the problem:

  • Adjust the production-to-consumption ratio
  • Increase the size of the buffer for all topologies
  • Increase the size of the buffer for a given topology
  • Set max spout pending

Let's cover them one at a time, starting with adjusting the production-to-consumption ratio.

Adjust the production-to-consumption ratio

Producing tuples slower or consuming them faster is your best option to handle buffer overflows. You can decrease the parallelism of the producer or increase the parallelism of the consumer until the problem goes away (or becomes a different problem!). Another option beyond tweaking parallelism is to examine your user code in the consuming bolt (inside the execute method) and find a way to make it go faster.

For executor buffer-related problems, there are many reasons why tweaking parallelism isn't going to solve the problem. Stream groupings other than shuffle grouping are liable to result in some tasks handling far more data than others, resulting in their buffers seeing more activity than others. If the distribution is especially off, you could end up with memory issues from adding tons of consumers to handle what is in the end a data distribution problem.

When dealing with an overflowing worker transfer queue, "increasing parallelism" means adding more worker processes, thereby (hopefully) lowering the executor-to-worker ratio and relieving pressure on the worker transfer queue. Again, however, data distribution can rear its head. If most of the tuples are bound for tasks on the same worker process after you add another worker process, you haven't gained anything.

Adjusting the production-to-consumption ratio can be difficult when you aren't evenly distributing tuples, and any gains you get could be lost by a change in the shape of the incoming data. Although you might get some mileage out of adjusting the ratio, if you aren't relying heavily on shuffle groupings, one of our other three options is more likely to help.

Increase the size of the buffer for all topologies

We'll be honest with you: this is the cannon-to-kill-a-fly approach. The odds of every topology needing an increased buffer size are low, and you probably don't want to change buffer sizes across your entire cluster. That said, maybe you have a really good reason. You can change the default buffer size for topologies by adjusting the following values in your storm.yaml:

  • The default size of all executors' incoming queue can be changed using the value topology.executor.receive.buffer.size
  • The default size of all executors' outgoing queue can be changed using the value topology.executor.send.buffer.size
  • The default size of a worker process's outgoing transfer queue can be changed using the value topology.transfer.buffer.size

It's important to note that any value you set the size of a disruptor queue buffer to has to be set to a power of 2—for example, 2, 4, 8, 16, 32, and so on. This is a requirement imposed by the LMAX disruptor.

 If changing the buffer size for all topologies isn't the route you want to go, and you need finer-grained control, increasing the buffer sizes for an individual topology may be the option you want.

Increase the size of the buffer for a given topology

Individual topologies can override the default values of the cluster and set their own size for any of the disruptor queues. This is done via the Config class that gets passed into the StormSubmitter when you submit a topology. We place this code in a RemoteTopologyRunner class, which can be seen in the following listing.

Listing 1: RemoteTopologyRunner.java with configuration for increased buffer sizes

public class RemoteTopologyRunner { public
   static void main(String[] args) {

      Config config = new Config(); ...
      config.put(Config.TOPOLOGY_EXECUTOR_RECEIVE_BUFFER_SIZE,
      new Integer(16384));
      config.put(Config.TOPOLOGY_EXECUTOR_SEND_BUFFER_SIZE,
      new Integer(16384));
      config.put(Config.TOPOLOGY_TRANSFER_BUFFER_SIZE,
      new Integer(32));
      StormSubmitter.submitTopology("topology-name",
         config, topology);
   }
}

This brings us to our final option (one that should also be familiar): setting max spout pending.

Max spout pending

As you may know, max spout pending caps the number of tuples that any given spout will allow to be live in the topology at one time. How can this help prevent buffer overflows? Let's try some math:

    • A single spout has a max spout pending of 512.
    • The smallest disruptor has a buffer size of 1024.
512 < 1024

Assuming all your bolts don't create more tuples than they ingest, it's impossible to have enough tuples in play within the topology to overflow any given buffer. The math for this can get complicated if you have bolts that ingest a single tuple but emit a variable number of tuples. Here's a more complicated example:

  • A single spout has a max spout pending of 512.
  • The smallest disruptor has a buffer size of 1024.

One of our bolts takes in a single tuple and emits 1 to 4 tuples. That means the 512 tuples that our spout will emit at a given point in time could result in anywhere from 512 to 2048 tuples in play within our topology. Or put another way, we could have a buffer overflow issue. Buffer overflows aside, setting a spout's max spout pending value is a good idea and should always be done.

We've addressed four solutions for handling buffers overflowing. In my book, Storm Applied, I also discuss tweaking the sizes of these buffers in order to get the best performance possible in your Storm topologies.

Cover

Addressing internal Storm buffers overflowing

By Sean T. Allen

This article is excerpted from Storm Applied by Manning Publishing.


Tags: buffer overflow, tuples, Apache Storm, disruptor, spout




Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 


Enterprise Development Update

Don't miss an article. Subscribe to our newsletter below.

Sitemap | Contact Us

Thanks for your registration, follow us on our social networks to keep up-to-date
Rocket Fuel