Home > Cloud Cruiser 4 > Collecting, transforming, and publishing > Network Traffic > Performance factors

Performance factors

Server vs. Desktop

The Network Traffic Collector should be run on a dedicated server (either physical or VM) for optimal performance. Ubuntu comes in Server and Desktop flavors. While the underlying kernel is identical in both versions, the Server version boots to a CLI and the Desktop version includes a desktop and graphical window system.

There is a small performance penalty associated with the host when running the graphical user interface. This is unlikely to cause problems but in edge cases it might make a difference. It is recommended that the collector is deployed on a Server version for this reason.

CPU cores

When processing a lot of NetFlow data, the collector can be an extremely CPU-intensive application. Network traffic is always received by a single thread, and written directly into a circular packet buffer approximately 800,000 packets in size.

Parser threads process packets in this buffer. The more parser threads there are, the more packets can be processed simultaneously. With more CPU cores available, the Operating System can distribute the threads across those cores to make optimal use of the available processing power.

As a general rule of thumb, preliminary testing has shown that using two to six parser threads per core makes effective use of the system. That said, there is nothing to prevent experimentation using the parser_threads configuration item to increase the total number of parsers to any number up to 32.

CPU clock speed

The clock speed of the CPU plays a significant role in overall performance, as the time taken for a parser thread to complete processing a packet is proportional to the clock speed. Generally speaking, the higher the clock speed, the more overall work can be performed.

This is only true up to a point however, because once the memory bus is saturated (the total amount of reads/writes being performed from/to RAM) additional clock speed is of no benefit, as the CPU will be limited by that I/O bottleneck and spend relatively more time waiting for access to the hardware at higher clock speeds.

Systems that support higher clock speeds generally have higher speed motherboards and associated hardware, so a higher CPU speed is often indicative of a generally higher performing system in the case of a hardware server.

RAM

The collector requires approximately 3 GB of working RAM (this does not include the overhead associated with the Operating System and other processes, so a minimum of 4 GB should be present). Care should be taken to avoid running other applications that might consume enough RAM to force the Operating System to use virtual memory. The performance of the collector will suffer greatly if any portion of the RAM it uses gets swapped out to disk.

As the Server version of Linux doesn’t use a graphical user interface, it leaves more working memory for use by applications. This is one of the reasons that a Server (as opposed to Desktop) version of Linux is recommended.

Number of parser threads

By default, the collector will automatically pick a suitable, but slightly conservative number of parsers based on the number of cores it detects at runtime. It is recommended that the default is used for any given deployment unless you have reason to believe otherwise. These default values are shown in the following table.

Cores

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

Parsers

0

2

4

6

6

8

8

16

16

16

16

16

16

16

16

16

As detailed in the Configuration section, this default can be overridden through use of the parser_threads configuration option. Generally it is recommended that two to six threads per core be chosen, but eight per core has increased performance in tests.

Network Interface & drivers

The collector can only process packets that are delivered to it by the underlying Operating System. If deployed on a system with low-performance drivers then packet loss can occur. If problems occur then tests should be performed to establish the underlying performance of the system. Issuing the command ifconfig on Linux will show the Rx counters associated with the network interfaces. The Rx counter indicates the number of packets received by the network driver.

By sending a known number of packets to the system at the same speed that the NetFlow sources are sending them, it can be determined if the packets are being dropped by the OS by comparing the Rx counters before and after the transmission against the number of packets sent. If there is a discrepancy whereby the Rx counter did not increase by at least the number of packets sent, then the Operating System is dropping packets before delivering them to the collector.

To establish a proper baseline, this test should be performed while the collector is not running.

NetFlow templates

In order to decode a NetFlow record, it is necessary to have first received and cached a template for that record type. The Network Traffic Collector supports up to 128 unique templates.

NetFlow sources periodically re-send templates in order to refresh the cached version. The collector currently does not expire cached templates, but it does check all the re-transmitted templates in case there is an update to the cached version, so these re-exported templates cause additional processing overhead.

It is recommended that the time between template retransmissions from the NetFlow source is set to as long a time as possible in order to minimise this overhead.

NetFlow records

NetFlow records contain the metrics required to generate CCR files. There are many potential fields in a NetFlow record. Currently, the collector will only extract the IPV4_SRC_ADDR, IPV4_DST_ADDR, and IN_BYTES values from data records. If any of these are not present, the record is rejected.

A full list of the possible records can be found in the official Cisco NetFlow documentation at http://www.cisco.com/en/US/technologies/tk648/tk362/technologies_white_paper09186a00800a3db9.html.

NetFlow generation usually happens within a router or other hardware device, although there are software NetFlow generators available also. The nature of NetFlow is such that it does not necessarily provide a completely accurate count of the bytes being transferred between hosts. NetFlow often operates in a mode whereby it ‘samples’ packets from the traffic flowing throught the router periodically and sends the results of that sampling out as NetFlow records.

The shorter the interval between samples, the more accurate the NetFlow data will be. It is possible to configure NetFlow sources to report on every single packet. This places the highest load on the NetFlow generator, but will result in the most accurate reporting.

The Network Traffic Collector can only ever be as accurate as the NetFlow data it receives. There is no way to extrapolate missing data from the records received.

CCR file exports

In order to generate a CCR file, the collector scans an internal session table which contains entries of the form:

start_time, end_time, source_address, destination_address, total_bytes

The start and end times represent the time (accurate to the second) that the first packet sent from the source_address to the end_address was seen. The end time is the last time a packet was seen. For very short flows the start_time and the end_time can be identical. This is not an error and Cloud Cruiser will handle them properly.

Last modified

Tags

This page has no custom tags.

Classifications

This page has no classifications.
© Copyright 2018 Hewlett Packard Enterprise Development LP