********************************************************************************
ATM Forum Document Number: 97-0835
********************************************************************************
Title: Proposed Appendix B of Performance Testing Baseline Text on Scalable
Configurations
********************************************************************************
Abstract: A standardized methodology for implementing scalable configurations
for throughput and latency tests is presented.
********************************************************************************
Source:
Arjan Durresi, Raj Jain, Justin Dolske, Gojko Babic
The Ohio State University
Department of CIS Columbus, OH 43210-1277
Phone: 614-292-3989, Fax: 614-292-2911, Email: Jain@ACM.Org
The presentation of this contribution at the ATM Forum is sponsored by NASA.
********************************************************************************
Date: September 1997
********************************************************************************
Distribution: ATM Forum Technical Working Group Members (AF-TEST, AF-TM)
********************************************************************************
Notice:
This contribution has been prepared to assist the ATM Forum. It is offered to
the Forum as a basis for discussion and is not a binding proposal on the part
of any of the contributing organizations. The statements are subject to
change in form and content after further study. Specifically, the contributors
reserve the right to add to, amend or modify the statements contained herein.
********************************************************************************
A postscript version of this contribution including all figures and
tables has been uploaded to the ATM forun ftp server in the incoming
directory. It may be moved from there to the atm97
directory. The postscript version is also available on our web page as:
http://www.cse.wustl.edu/~jain/atmf/a97-0835.htm
Appendix B
Methodology for Implementing Connection Configurations
B.1. Introduction
In Sections 3.1.7 and 3.2.7 of the baseline text, a number of configurations
have been presented for throughput and latency measurements. In their basic
form, these configurations require traffic generators and analyzers, whose number increases as the number of ports on a switch increases. Since the test monitors arerather expensive, it is desirable to define scalable configurations that canbe used with a limited number of generators. However, one problem with scalable configurations is that thereare many ways to set up the connections and the results could vary with the setup. In this appendix, a standard method for generating these configurations is defined. Thus, anyone can design a connection configuration for switches with any number of ports. Since the methodology presented here applies to any number of traffic generators, it can be used for non-scalable
(basic) configurations as well.
Performance testing requires two kinds of virtual channel connections (VCCs):
foreground VCCs (traffic that is measured) and background VCCs (traffic that
simply interferes with the foreground traffic). The methodology for
generating configurations of both types of VCCs is covered in this appendix.
The VCCs are formed by setting up connections between ports of the switch. The
connection order of these ports is referred to here as a VCC Chain. For
example, the VCC shown in Figure B.1 consists of one VCC chain passing through
ports P1-P2-P3- P4-1. Another possible configuration for this "N-to-N" single
generator scalable configuration would be P1-P3-P2-P4-P1. For an N-port
switch, there are a total of (N-1)! possible configurations.
Figure B.1 One out of six possible VCC chains that can implement the 4-to-4 straight configuration with a single generator.
If the four-port switch shown in Figure B.1 consists of two modules with two
ports each, the measured performance may depend upon the number of times the
VCC chain passes from one module to the other and may be different for
different configurations.
At the end of this appendix, the pseudocode for a computer program is presented
that allows generating a standardized port order for all connection
configurations. This methodology (pseudocode) generally creates VCC chains
that cross the modules as often as possible while still keeping the whole
process simple.
B. 2. Definitions and Rules
In order to generate a standard configuration, it is first necessary to have a
standard method of numbering the ports of a switch. This method is presented
in this section.
Consider a switch with several modules. Each module may have a varying number
of ports. In order to number these ports, the first step is to generate a
schematic of modules placed one below the other. The schematic is drawn such
that the modules are arranged in a decreasing order of number of ports. Then the switch ports are numbered sequentially, along the columns, starting from the top left corner of the schematic. This port numbering helps in creating VCC chains that cross modules as often as possible. The port numbers obtained this way are represented by Pi in this appendix.
Figure B.2 shows one example of port numbering. The switch consists of three
modules with 8, 7, and 6 ports respectively. The first port on the first
module is numbered 1 or P1. The first port on the second module is numbered
P2, and so on up to P18 as shown in the figure. For simplicity, we also refer
to Pi as port i.
[Figure B.2 Numbering of ports in a switch with different number of ports per module.]
The second thing we need is a standard method of presenting connection
configurations. Each VCC chain is represented by a three-dimensional matrix
C(i, j, k). Matrix index i represents the interconnection order among the
switch ports, where the value 0 indicates the source port and the last value
indicates the destination port. Index k represents the generator number and
index j represents the chain number starting at that generator. One row C(*,
j, k) of the matrix represents a single VCC chain. For example, if the first
VCC chain from generator #2 starts at source port P1, passes through ports P3,
P4, P5, P6, P7, P8, and exits at port P2, the matrix C has the following
entries: C(0,1,2)=P1, C(1, 1, 2)=P3, C(2, 1, 2)=P4, C(3, 1, 2)=P5, C(4, 1,
2)=P6, C(5, 1, 2)=P7, C(6, 1, 2)=P8, C(7, 1, 2)=P2. Figure B.3 illustrates
this VCC chain. The source port and destination ports are also represented by
symbols Cin and Cout. For the VCC of Figure B.3, Cin=P1, Cout=P2.
[Figure B.3 A VCC chain NP(k) denotes the total number of intermediate ports for a VCC chain generated by generator k. Notice that the source and destination ports are not counted. In the case of Figure B.3, NP(2) = 6. Note that C(NP+1, j, k) is always the destination port.
For latency measurements, the foreground traffic involves only two ports, one
for input and the other for output. To design the VCC chain for this traffic,
the operator may simply chose any two ports, referred to as CFin and CFout
respectively. Here, F in the subscript signifies "foreground." In order to
avoid interference with the foreground traffic, the background VCC chains may
or may not use CFin and CFout. If the background traffic does use these ports
then it should only be in the directions opposite to that used by the
foreground traffic. Figure B.4 shows a schematic representation of connection
configuration for latency measurement of a 8-port 2-module switch. The
foreground traffic uses ports P2 and P1 as the source and destination ports,
respectively. So, CFin=P2 and CFout=P1. The background traffic also uses
these ports but in the opposite direction. Therefore, for the background
traffic: Cin = C(0,1,2)=P1 and Cout=C(0,1,2)=P2. The background traffic can
use the six remaining ports in both directions. Incidently, Figure B.3 shown
earlier shows the VCC chain representation of this same configuration. From
now on, we only show VCC chain representions for all configurations. It is
straight forward to generate the schematic representations from it.
[Figure B.4. A 7-to-7 straight configuration with one-generator for the background traffic.]
B.2. Connection Configurations Characteristics.
In this section we analyze several of the configurations for throughput and
latency measurements and show how scalable version of them can be obtained
using the algorithm given in Section B.3. The algorithm consists of three simple rules:
1. The chains generally go from port i to port i+1 unless the port has already
been fully used by other chains.
2. After generating jth chain, j+1 chain can be generated simply by adding 1 to
each port index of the jth chain.
3. If there are multiple generators, each generator uses a contiguous subset of
the switch ports as source ports. Each generator needs as many source orts as the number of VCC chains starting from it.
B.2.1 N-to-N Straight (Single Generator)
This configuration is used for throughput as well as latency measurements. The
scalable versions can be obtained as follows:
a) Throughput measurements: For these tests, we need only a single chain
starting from a single generator, i.e., k=1 and j=1. The chain starts from
one port, goes through all other ports and exits from the starting port.
Therefore, NP(1) is equal to N-1. Cin and Cout coincide and any port Px could
be selected to be the input/output port
Figure B.5 illustrates this case for the 2-module 8-port switch. Figure B.5a
shows how to number the switch ports. Figure B.5b presents the VCC chain
representation of the configuration. using Cin=C(0,1,1) = Cout = C(8,1,1)
=P1.
The application of the algorithm is simple. The ports C(i,1,1) in the VCC
chain are selected in numerically increasing order. The ports are included in
VCC chain if they are not already used up. After reaching Nth port, the index
(i) starts again from the beginning (from i=1).
[Figure B.5a. Port numbering of a switch with 2 modules and 4 ports on each
module. The numbers in brackets indicate the port numbers in the module.]
For Cin=Cout=P1, the VCC chain is:P1-P2-P3-P4-P5-P6-P7-P8-P1
If we chose P3 as the source and destination then the VCC chain will be:
P3-P4-P5-P6- P7-P8-P1-P2-P3
[Figure B.5b. The 8-to-8 straight configuration with one generator. ]
Note that in both cases, the VCC chains cross the modules at every hop.
b) Latency Measurements: First, let us consider the case in which the
background traffic does use the source/destination ports of the foreground
traffic (but in the opposite direction). The background traffic passes
through all other ports. Therefore, NP(1) is equal to N-2 . Cin and Cout for
the background coincide respectively with PFout and PFin. If PFin=P2 and
PFout=1, the foreground chain is P2-P1 and the background chain is P1-P2-
P3-P4-P5-P6-P7-P8-P2. This connection configuration was presented earlier in
Figures B.3 and B.4.
Now, let us consider the case in which the background traffic does not use the
source/destination ports of the foreground. In this case, NP(1) is equal to
N-3. Cin and Cout coincide and could be selected any of the switch ports Px
except PFout and PFin. For example, the foreground could use the chain P2-P1
and background could use P3-P4-P5- P6-P7-P8-P3. Figure B.6 illustrates this
case.
[Figure B.6 Implementation of the 6-to-6 straight configuration with one
generator. ]
B.2.2. N-to-N Straight (r Generators)
This configuration implements the N-to-N straight configuration with r
generators. a) Throughput Measurements: Each generator has one VCC chain. In
all there are r VCC chains. Of the N ports, r ports are used as
source/destination of these chains. The remaining ports are divided among the
generators as evenly as possible.
Let p =mod(N-r, r)
For the first p VCC chains, the number of intermediate ports NP is equal to
the quotient of (N-r)/r plus 1
For the remaining (r-p) VCC chains, NP is equal to the quotient of (N-r)/r,
For all VCC chains, the source/destination ports coincide and may be
selected from any of the switch ports Px not selected by other VCC chains as
source or destination.
As an example, consider the 8-port switch again. With r=3 generators, p equals
mod(8-3, 3) = 2. So, the first two VCC chains have NP=?(8-3)/3? + 1 = 2
intermediate ports, and the last chain has NP is equal to the quotient of
(8-3)/3, so equal to 1.
Figures B.7 illustrates the implementation of the VCC chains for this case.
First we select the source and destination ports:
Port 1 is the source and destination for the first chain, so C(0,1,1)=P1, C(3,
1,1)=P1
Port 2 is the source and destination for the second chain, so C(0,1,2)=P2,
C(3,1,2)=P2 Port 3 is the source and destination for the third chain, so
C(0,1,3)=P3, C(2, 1,3)=P3. These selections have been made to avoid any
overlap.
Then we apply the algorithm. Let us start with the VCC chain having port 1 as
the source. The next port available is P4, so C(1,1,1)=P4, then C(2,1,1)=P5.
This VCC chain has two intermediate ports, for this reason it is now
complete. Now we continue with the VCC chain starting at port 2. The next
available port is port 6 (because 4 and 5 are fully occupied by the previous
VCC chain). So C(1,1,2)=P6, and then C(2,1,2)=P7. Similarly, C(1,1,3)=P8.
This VCC chain has only one intermediate port. The VCC chain implementation is complete.
[Figure B.7 Implementation of the 8-to-8 straight configuration with 3
generators .]
b) Latency Measurements: Consider the case with the background traffic using
the foreground ports in the opposite direction. The remaining N-1 ports are
evenly divided among the r background VCC chains.
Let p =mod(N-r-1, r)
For the first p VCC chains NP is equal to the quotient of (N-r-1)/r plus 1,
For the remaining (r-p) VCC chains, NP is equal to the quotient of
(N-r-1)/r
For one of VCC chains, Cin and Cout coincide with PFout and PFin
respectively. ? For the other VCC chains, Cin and Cout coincide and could be
selected from any of the switch ports Px not selected by other VCCs
Figure B.8 illustrates an example for this case. Ports 1 and 2 are used by the
foreground traffic as destination and source ports, respectively.
[Figure B.8 Implementation of the 7-to-7 straight configuration with 3
generators for background traffic in latency measurement. ]
Ports 1 and 2 will be used as source and destination ports (respectively) by
one of the background VCC chains. The other two generators will use port 3
and 4 as the source and destination ports, respectively. For the first VCC
chain, NP =2 and for the other two VCC chains NP=1. The chains are:
P1-P5-P6-P2, P3-P7-P3, and P4-P8-P4. Note that the first chain goes from P1 to
P5 since P2, P3, P4 since have already been assigned to other chains.
The configuration for the case when the background traffic does not share the
ports with the foreground can be generated by the above procedure by
considering the switch having only N-2 ports
B.2.3. N-to-m Partial Cross (r Generators)
This is a generalization of N-to-m Partial cross with 1 generator presented in
the baseline. The discussion here applies for r=1. Also, by appropriately
setting r, one can obtain non- scalable (basic) configurations.
a) Throughput Measurements: This configuration has m*r VCC chains originating
from r, where each generator originates m VCC chains. Each has a load of
1/mth of the generator. Each intermediate node has exactly m of these streams
flowing through it. Again, the ports are evenly divided among the chains.
However, since each chain uses only a part of the port's capacity, the ports
can be used by other chains even from other generators as well.
Let p =mod(N-r, r)
For the first p VCC chains, the number of intermediate ports NP is equal to
the quotient of (N-r)/r plus 1
For the remaining (r-p) VCC chains, NP is equal to the quotient of (N-r)/r,
For all m VCC chains, source/destination ports coincide and may be selected
from any of the switch ports Px not selected by other VCC chains.
Figure B.9 illustrates the case of 8-to-2 partial cross with 2 generators.
In this case, p =mod(8-2,2) = 0. So, the VCC chains of both generators have
Quotient of (8-2)/2, so 3 intermediate ports.
[Figure B.9 Implementation of 8-to-2 partial cross configuration with 2
generators for foreground traffic ]
Both of the VCC chains of the first generator start and end at port 1, so:
C(0,1,1)=C(0,2,1)= C(4,1,1)=C(4,2,1)=P1.
Similarly for the two VCC chains of the other generator:
C(0,1,2)=C(0,2,2)=C(4,1,2)=C(4,2,2)=P2.
First we divide the remaining ports among the two generators. The first
generator gets P3, P4, and P5. The second generator gets P6, P7, and P8.
The first chain of the first generator is simply P1-P3-P4-P5-P1.
The first chain of the second generator is P2-P6-P7-P8-P2. The second chain
from the first generator is obtained by shifting the intermediate ports of the
first chain. Therefore, the chain is P1-P4-P5-P6-P1. Note that this chain is
sharing port P6 of the other generator since each chain uses only half the
capacity. The second chain of the second generator is again obtained by
shifting: P2-P7-P8-P3-P2. Note that, shifting P8 would have produced P1 but P1
is being fully used. The next port P2 is also being fully used. So P3 is
used.
b)Latency measurements: Again we consider only the case of background traffic
sharing the foreground ports in the opposite direction. Excluding the
foreground port, the remaining N-1 ports are evenly divided among the r
generators.
Let p =mod(N-r-1, r)
For all VCCs of the first p generators NP is equal to the quotient of
(N-r)/r plus 1
For all VCCs of the remaining (r-p) generators, NP is equal to the quotient
of (N-r)/r
For all m VCCs of only one generator, source and destination coincide with
PFout and PFin respectively.
For all m VCCs of all other generators, source and destination coincide and
could be selected from any of the switch ports Px not selected by other generators.
An example of this case is shown in Figure B.10. In this case, N=8, r=2. So
p=mod(8-2-1,2) = 1 , so NP(1)=3 and NP(2)=2.
[Figure B.10 Implementation of 7-to-2 partial cross configuration with 2
generators for background traffic in latency measurements.]
The VCC chains of the first generator will use ports 1 and 2 in opposite
directions of the foreground traffic,. The VCC chains of the second generator
will use port 3 as the source and destination. The chains of the first
generator are: P1-P4-P5-P6-P2, P1-P5-P6-P7-P2. The chains of the second
generator are: P3-P7-P8-P3, P3-P8-P4-P3.
Table B.1 summarizes the values for number of intermediate ports in various
configurations of this section B.2. These values are used in the pseudocode of
Section B.3.
[Table B.1 Parameter values used in the algorithm to creating VCC
chains for different configurations.]
B 3. Algorithm for creating VCC Chains.
The algorithm for creating VCC chains for different connection configuration is
based on the definitions given in section B.1. and the characteristics
specified in section B.2 and summarized in Table B.1.
NP(k) denotes the number of intermediate ports for the VCC chains of the kth
generator. These values are specified in B.2.
P(f) denotes the fth port of the switch
C(i, j, k) denotes the ith intermediate port of the jth VCC chain of the kth
generator
The function mod*(x, N) is equal to mod(x, N) except for the cases where
mod(x, N) is equal to zero, where the function is equal to N
f = 1
For k = 1 to r, step 1
{
if(k>1){ f=1+sum of NP(d) for d = 1 to k-1
}
for j = 1 to m, step 1
{
if (r is equal to 1 and j > 1) f = mod*(f+1, N)
if (r>1 and j>1) f=C(2,j-1,k)
for i =1 to NP(k), step 1
{
while (P(f) is used as source or output by another VCC chain or is full)
{
f = mod*(f+1, N)
}
C(i, j, k) = P(f)
f = mod*(f+1, N)
} end for i
} end for j
} end for k.