solaris 10 tcp/ip tuning
In this solaris 10 tcp/ip tuning article i have collected the most essential and important tcp keys
in order to achieve a better throughput, off course according to the situation being examined.
Please Note:
When mentioned Temporary, it means that after reboot configuration will be
erased while when mentioned Permanently, it means that after reboot the configurations will
remain intact.
Fanout the incoming TCP/IP connections
Determines the number of squeues to be used to fanout the incoming TCP/IP connections.
The incoming traffic is placed on one of the rings. If the ring is overloaded, packets are dropped.
Default - 2
Range - 0 - nCPUs, where nCPUs is the maximum number of CPUs in the system
Dynamic? - No. The interface should be plumbed again when changing this parameter.
When to Change Consider setting this parameter to a value greater than 2 on systems that have 10 Gbps NICs and many
CPUs.
Use these commands to set temporarly and permanently the changes:
Temporary - ndd -set /dev/tcp ip_soft_rings_cnt
Permanently insert into /etc/system - set
ip:ip_soft_rings_cnt=16
A value of 0 associates a new TCP/IP connection with the CPU that creates the connection. A value of 1 associates
the connection with multiple squeues that belong to different CPUs. The number of squeues that are used to fanout
the connection is based upon ip_soft_rings_cnt.
Consider setting this parameter to 1 to spread the load across all CPUs in certain situations. For example, when
the number of CPUs exceed the number of NICs, and one CPU is not capable of handling the network load of a single
NIC, change this parameter to 1.
Temporary - ndd -set /dev/tcp ip_squeue_fanout 1
Permanently insert into /etc/system - set
ip:ip_squeue_fanout = 1
Remarks
In Solaris, the available range of TCP/IP ports is 0 to 65535. However, there are some restrictions
that apply:
Ports in the range 0 to 1023 are reserved for privileged (root) services, such as telnetd, ftpd, and so on.
Ports in the range 1024 to tcp_smallest_anon_port-1 are used for user services such as NFS server daemon, FONT
server, and so on.
This leaves the range 32768 to 65535 available for general TCP/IP connections. To limit the range of the port
numbers allocated for the general use, the following two ndd(1M) parameters can be used:
tcp_smallest_anon_port:
This determines the smallest TCP port number that may be used for an anonymous connection. Solaris allocates
anonymous ports above 32768. The default value is 32768.
tcp_largest_anon_port:
This is the largest TCP port number that may be used for anonymous connections. The default value of this is
65535.
kernel sockets
The kernel keeps a list of sockets in the TIME_WAIT state. When the list is full, failures start to
occur. If your server is getting new client connections faster than it can bleed off sockets in the TIME_WAIT
state, the list will ultimately get full. Decreasing the timeout increases the bleed-off rate.
Default - 60000
Temporary - ndd -set /dev/tcp tcp_time_wait_interval 3000
Setting this didnt give a better performance for a lighty web server.
TCP hash table size
Check UNDER /etc/system:
Controls the hash table size in the TCP module for all TCP connections(default 512).
Temporary - ndd -set tcp:tcp_conn_hash_size=8192
Controls the hash table size in an IP module for all active (in ESTABLISHED state) TCP
connections(default 512).
Permanently insert into /etc/system - set
ipc_tcp_conn_hash_size=8192
congestion window
The maximum (Default 4)initial congestion window (cwnd) size in MSS of a TCP connection
tcp_rexmit_interval_initial
Temporary - ndd -set /dev/tcp tcp_slow_start_initial 1
ndd -set /dev/tcp tcp_slow_start_after_idle 1 When to Change?
For more information, see tcp_slow_start_initial.
ndd -set /dev/tcp tcp_slow_start_initial 2
ndd -set /dev/tcp tcp_slow_start_initial 1 When to Change?
Do not change the value.
If the initial cwnd size causes network congestion under special circumstances, decrease the value.
TIME_WAIT ports
These ensure that TIME_WAIT ports either get reused or closed fast.
insert into /etc/system
Linux net.ipv4.tcp_fin_timeout = 1
in Solaris tcp_time_wait_interval
Linux net.ipv4.tcp_tw_recycle = 1
TCP memory Linux net.core.rmem_max = 16777216
Linux net.core.rmem_default = 16777216
Linux net.core.netdev_max_backlog = 262144 in Solaris
to tcp_conn_req_max_q Linux tcp_slow_start_after_idle = 262144
SYN cookies protection
Linux net.ipv4.tcp_syncookies = 1 , but Solaris has SYN flood protection enabled by
default.(The "syn cookies" violate the TCP spec thus solaris uses thier own mechanisem).
Linux net.ipv4.tcp_max_orphans = 262144
Linux net.ipv4.tcp_max_syn_backlog = 262144 in Solaris tcp_conn_req_max_q0
Linux net.ipv4.tcp_synack_retries & net.ipv4.tcp_syn_retries = 2 in
Solaris
tcp_rexmit_interval_min 400
tcp_rexmit_interval_max 60000
tcp_ip_abort_interval 480000
tcp_rexmit_interval_initial 3000
conntrack
You shouldn't be using conntrack on a heavily loaded server anyway, but these are suitably high for
our uses, insuring that if conntrack gets turned on, the box doesn't die
net.ipv4.netfilter.ip_conntrack_max = 1048576
net.nf_conntrack_max = 1048576 In Solaris use Dtrace script to track connections
Dtrace script to track connections
TCP/IP connection control blocks
Notifies TCP/IP on how long to keep the connection control blocks closed.
After the applications complete the TCP/IP
connection, the control blocks are kept for the specified time. When
high connection rates occur, a large backlog of the TCP/IP connections
accumulates and can slow server performance. The server can stall during certain peak periods. If the server
stalls, the netstat command shows that many of the sockets that are opened to the HTTP server are in
the CLOSE_WAIT or FIN_WAIT_2 state. Visible delays can occur for up to four
minutes, during which time the server does not send any responses, but CPU utilization stays high, with all of the
activities in system processes.
Default - 60000
Temporary - ndd -set /dev/tcp tcp_time_wait_interval 3000
FIN_WAIT_2 state timer interval
Specifies the timer interval prohibiting a connection
in the FIN_WAIT_2 state to remain in that state. When high connection
rates occur, a large backlog of TCP/IP connections accumulates and can
slow server performance. The server can stall during peak periods. If
the server stalls, using the netstat command shows that many of the
sockets opened to the HTTP server are in the CLOSE_WAIT or FIN_WAIT_2
state. Visible delays can occur for up to four minutes, during which
time the server does not send any responses, but CPU utilization stays
high, with all of the activity in system processes.
Default - 675000
Temporary - ndd -set /dev/tcp tcp_fin_wait_2_flush_interval 67500
TCP keepalive
TCP keepalive is a feature provided by many TCP implementations, including
Solaris, as a way to clean up idle connections in situations like the ones mentioned above.
Applications must enable this feature with the SO_KEEPALIVE socket option via the
setsockopt(3SOCKET) socket call. Once enabled, a keepalive probe packet is sent to the other end
of the socket provided the connection has remained in the ESTABLISHED state and has been idle for
the specified time frame. This time frame is the value specified by the TCP tunable
tcp_keepalive_interval.
A keepalive probe packet is handled just like any other TCP packet which requires an acknowledgment
(ACK) from the other end of the socket connection. It will be retransmitted per the standard retransmission backoff
algorithm. If no response is received by the time specified for the other TCP tunable, tcp_ip_abort_interval, the
connection is terminated, as would be the case for any other unacknowledged packet. Hence the actual maximum idle
time of a connection utilizing TCP keepalive, which has no responding peer will therefore be:
tcp_keepalive_interval + tcp_ip_abort_interval
Default valuses respectively 7200000 480000
The above parameters are global and will affect the entire system. Keep in mind that TCP keepalive
probes have no effect on inactive connections as long as the remote host is still responding to probes. However
care should be taken to ensure the above parameters remain at a high enough value to avoid unnecessary traffic and
other issues such as prematurely closing active connections in situations where a few packets have gone
missing.
Temporary - ndd -set /dev/tcp tcp_keepalive_interval 300000
Backlog Queue
The backlog queue is a large memory structure used to handle incoming packets with the SYN flag set
until the moment the three-way handshake process is completed.
An operating system allocates part of the system memory for every incoming connection. We know that every TCP port
can handle a defined number of incoming requests. The backlog queue controls how many half-open connections can be
handled by the operating system at the same time. When a maximum number of incoming connections is reached,
subsequent requests are silently dropped by the operating system.
As mentioned before, when we detect a lot of connections in the SYN RECEIVED state, host is probably under a SYN
flooding attack. Moreover, the source IP addresses of these incoming packets can be spoofed. To limit the effects
of SYN attacks we should enable some built-in protection mechanisms. Additionally, we can sometimes use techniques
such as increasing the backlog queue size and minimizing the total time where a pending connection in kept in
allocated memory (in the backlog queue).
Run this command to count how many half-open connections are in the backlog queue at the moment
netstat -s -P tcp | grep tcpHalfOpenDrop
In Sun Solaris there are two parameters which control the maximum number of
connections.
The first parameter tcp_conn_req_max_q controls the total number of full connections.
The second tcp_conn_req_max_q0 parameter defines how many half-open connections are
allowed without the dropping of incoming requests. In Sun Solaris 8, the default value is set to 1024. Using the
ndd command we can modify this value.
It is pretty simple really: never change these parameters unless connections are refused because the values are too
low.
The only way to determine this empirically is to use ‘netstat –s | fgrep –i
listendrop’.
If tcpListenDrop is non-zero, increase tcp_conn_req_max_q. If tcpListenDropQ0 is non-zero, increase
tcp_conn_req_max_q0.
Temporary - ndd -set /dev/tcp tcp_conn_req_max_q 128 OR
262144 Temporary - ndd -set /dev/tcp tcp_conn_req_max_q0 1024 OR 30000
Outgoing connection establishe time wait
Some systems allow you to configure how long a system waits for an outgoing connection to be
established. When set too high, establishing outgoing connections to destination servers such as replicas not
responding quickly can cause long delays.
Temporary - ndd -set /dev/tcp tcp_ip_abort_cinterval 10000 (default is
180000)
Specifies the default total retransmission timeout value for a TCP connection. For a given TCP
connection, if TCP has been retransmitting for tcp_ip_abort_interval period of time and it has not received any
acknowledgment from the other endpoint during this period, TCP closes this connection.
Temporary - ndd -set /dev/tcp tcp_ip_abort_interval 60000 (default is 480000)
TCP/IP statistics
These set of commands will present some of the TCP/IP statistics you will need in order to follow every change
you make in your TCP/IP stack.
netstat -nP tcp | grep WAIT | wc -l;netstat -nP tcp |wc -l;netstat -s -P tcp | grep -E
"tcpL"
netstat -I bnx0 10
iostat -xn 10
TCP/IP script
This script will assist you in configuring the TCP/IP parameters on your system.
#######Start of TCP/IP script#############
#!/sbin/sh
ndd -set /dev/ip ip_forward_src_routed 0 #(Defalut value alreay set)
ndd -set /dev/tcp tcp_rev_src_routes 0 #(Defalut value alreay set)
ndd -set /dev/ip ip_forward_directed_broadcasts 0 #(Defalut value alreay set)
*ndd -set /dev/tcp tcp_conn_req_max_q0 4096 #(Defalut value 1024)
*ndd -set /dev/tcp tcp_conn_req_max_q 1024 #(Defalut value 128)
###Prevent System responding to ICMP timestamp requests
ndd -set /dev/ip ip_respond_to_timestamp 0 #(Defalut value alreay set)
###Prevent System responding to ICMP timestamp Broadcast
ndd -set /dev/ip ip_respond_to_timestamp_broadcast 0 #(Defalut value alreay set)
ndd -set /dev/ip ip_respond_to_address_mask_broadcast 0 #(Defalut value alreay set)
ndd -set /dev/ip ip_respond_to_echo_broadcast 0
ndd -set /dev/arp arp_cleanup_interval 60000
ndd -set /dev/ip ip_ire_arp_interval 60000
ndd -set /dev/ip ip_ignore_redirect 1
ndd -set /dev/ip ip_strict_dst_multihoming 1
ndd -set /dev/ip ip_send_redirects 0
########END of TCP/IP script###############
Enjoy
Eilfa Team
|