Different Implementation of Network Level in Embedded Networking with QoS

Nadezhda Matveeva, Elena Suvorova
Saint-Petersburg State University of Aerospace Instrumentation
Saint-Petersburg, Russian Federation
n.matveeva88@gmail.com, suvorova@aanet.ru

Abstract—Some modern standards in space industry, which are being used in embedded networking designs, provide quality of service features, which are implemented by means of virtual channels. Implementations of virtual channels mechanisms are very different. Each implementation has its latency characteristics for packet flow, hardware cost and performance. These parameters depend on the virtual channels quantity in a port and switch matrix's channels quantity connected to every port (connection point). The connection point quantity can vary from one to a number of virtual channels in port. We consider three structures and implementations of network layer. In the first implementation quantity of connection points is equal to number of virtual channels in a port. In the second - one connection point. The third – one connection point with lower priority data transmission interruption. In this article we compare characteristics of different architecture implementations and structures of port controllers and switch matrix. Also we analyze and simulate proposed mechanism. We present formulas to calculate minimum and maximum data packet transmission latency and compare theoretic and simulation results. Count of virtual channels is 4 for simulation, packet length – 250, 750 bytes. Moreover router’s switch matrix hardware cost is evaluated in the article.

Keywords—Embedded networking, SpaceFibre, Virtual channel.

I. INTRODUCTION

Performance of modern embedded systems depends on network architecture and structure. Existing embedded networks support data transmission with Quality of Service (QoS) [1]. Currently many different standards are widely used in design of network. For example – RapidIO [2], SpaceWire [3] and etc. It supports data transmission on virtual channels.

For our research we chose different approaches to implementation technology of virtual channels [4]. The first allows transferring data at the same time from different virtual channels of a port. The second – only one virtual channel of a port can transfer data. The third - virtual channel with higher priority can interrupt the transmission of data with lower priority. These approaches are not associated with a specific standard. It can be used in the construction of different embedded networking technologies [5].

We will use SpaceFibre in our case study. SpaceFibre is the modern standard in space industry. This technology also can be used for construction embedded networks.

SpaceFibre provides a coherent quality of service (QoS) mechanism able to support best effort, bandwidth reserved, scheduled and priority based qualities of service.[6] Quality of service parameters [7] that can be provided by routers with SpaceFibre ports depend not only on the SpaceFibre protocol characteristics and port specific implementation but also on a network layer implementation. In this article we analyze different implementation of network layer SpaceFibre.

II. STRUCTURES OF NETWORK LEVEL

A. 1st way of router’s network layer structure

Router’s switch matrix includes a separate channel for connection of each input virtual channel with the correspondent output virtual channel in this way. Quantity of connection points to the switch matrix (hereinafter – connection points) for every port of a router is equal to the virtual channels number in this port, Fig. 1 (only one data transmission direction is represented). This way was recommended by the SpaceWire-RT specification draft [8]. In such router structure data flows can compete with each other only within one virtual channel in output port of router. In this case timing characteristics in the network layer depend only on arbitration rules. In all other cases timing characteristics of data flows are not influenced by the router network layer. However, such router structure results in an essential hardware cost.

B. 2nd way of router’s network layer structure

According to this router structure, the quantity of connection points for every port is less than number of virtual channels in the port. There is one connection point. For our research we suppose that data flows from every virtual channel can be transmitted via one connection point of the correspondent port, Fig. 2. Hardware cost of this router structure is essentially less, than hardware cost of the
previous one. But in this way, data flows from different virtual channels share switch matrix channels. Therefore, an impact between data flows and corresponding disturbance of its timing characteristics in this case in this router structure is more essential than in the previous one.

**Fig. 2** proposed 2nd way router structure. The following is more essential than in the previous one.

**Fig. 1** The first way of router's network layer implementation

**Fig. 2** The second way of router's network layer implementation

C. 3rd way of router’s network layer structure

This router structure is similar to 2nd way. The difference between these ways is possibility of lower priority data transmission interruption. Condition of data transmission interruption can be different. Packet transmission can be interrupted after N byte transfer.

III. Theoretical parameters evaluation

Maximum/minimum delays are calculated for the proposed 2nd way router structure. The following assumptions were made during calculations: the packet size for the virtual channel was the same for every source; for every virtual channel data transmission is enabled in every time slot; Nchars are written to TX and RX buffers of each port at the same amount of time; the packet size for every virtual channel is less than frame size; the frame size is less than buffer size for every virtual channel; for every port of a device has the same value.

Notation:

- \( k \) - an identifier of a node (a terminal node or a router);
- \( l \) - an identifier of a link;
- \( p \) - an identifier of a port;
- \( h \) - an identifier of a virtual channel with the highest priority;
- \( sizeF \) - the frame size in bytes;
- \( sizeP_{VCi} \) - a packet size for the virtual channel \( i \) in bytes;
- \( sizeB_{VCi} \) - a buffer size for the virtual channel \( i \) in bytes;
- \( countSw_{VCi} \) - the number of routers which should be passed for transmission of data of the virtual channel \( i \);
- \( countLink_{VCi} \) - the number of links which should be passed for transmission of data of the virtual channel \( i \).

\[
\text{countLink}_{VCi} = \text{countSw}_{VCi} + 1
\]

\( v_i \) - a data rate in the link \( i \), Gb/s.

\( Tbyte_i \) - the transmission time of 1 Nchar (1 byte) through the link \( i \).

\[
Tbyte_i = \frac{1}{v_i}
\]

\( f_k \) - an operating frequency of the node \( k \), MHz.

\( \text{minDelay}_{VCi} \) - the minimal packet’s transmission delay for the virtual channel \( i \) for the whole transmission path;

\( \text{maxDelay}_{VCi} \) - maximal packet’s transmission delay for virtual channel \( i \) for the whole transmission path;

\( \{\text{Source}_{VCi}\} \) - a set of source nodes for the virtual channel \( i \);

\( \{\text{Destin}_{VCi}\} \) - a set of destination nodes for the virtual channel \( i \);

\( \text{TwrByte}_{TX,k} \) - time of writing of 1 Nchar into the TX buffer of the node \( k \);

\( \text{TwrByte}_{RX,k} \) - time of writing of 1 Nchar into the RX buffer of the node \( k \) (for this implementation it is equal to the transmission time of 1 byte through the SpaceFibre link, i.e. \( \text{TwrByte}_{RX,k} = Tbyte_i \)).

\( TcalcPreced \) - time of the Precedence calculation for all virtual channels in node \( k \). This parameter is defined by the developer of the system.

\( \text{DelaySwMatrix}_{k} \) - the delay of accessing to routing table and selection of connection points in a router with identifier \( k \). This time is necessary to connect the input port with the output port for data transmission.

A. Calculation of the minimum data transmission delay for the virtual channel \( i \)

\( \{\text{minLink}_{VCi}\} \) - a set of physical links, which constitute the shortest data transmission path for the virtual channel \( i \).
\( \{ \text{minSwVC}_i \} \) - a set of routers, which constitute the shortest data transmission path for the virtual channel \( i \):

\[
\text{minDelaySource}_i \text{VC}_C = \text{the minimal processing delay in a packet's source of the virtual channel } i
\]

\[
\text{minDelaySource}_i \text{VC}_C = \min_{k \in \{ \text{SourceVC}_C \}} \left( \text{sizeP}_i \cdot \text{TwrByteTX}_k + \text{TcalcPrec}_k \right)
\]

\[
\text{minDelayDestin}_i \text{VC}_C = \text{the minimal processing delay in a receiver of the virtual channel } i
\]

\[
\text{minDelayDestin}_i \text{VC}_C = \min_{k \in \{ \text{DestinVC}_C \}} \left( \text{sizeP}_i \cdot \text{TwrByteRX}_k \right)
\]

\[
\text{minDelaySw}_i \text{VC}_C = \text{the minimal delay in a router for packets of the virtual channel } i. \text{ We assume that there is no competition between packets of one virtual channel and that different virtual channels do not compete in the router's output port.}
\]

\[
\text{minDelaySw}_i \text{VC}_C = \text{sizeP}_i \cdot \text{TwrByteRX}_k + \text{DelaySwMatrix}_k + \text{TcalcPrec}_k + \text{sizeP}_i \cdot \text{TwrByteTX}_k
\]

\[
\text{minDelay}_i \text{VC}_C = \text{minDelaySource}_i \text{VC}_C + \sum_{k \in \{ \text{minSwVC}_i \}} \text{minDelaySw}_i \text{VC}_C + \text{minDelayDestin}_i \text{VC}_C
\]

B. Calculation of the maximum data transmission delay for the virtual channel \( i \)

\( \{ \text{maxLinkVC}_i \} \) - a set of links, which constitute the longest data transmission path for the virtual channel \( i \):

\( \{ \text{maxSwVC}_i \} \) - a set of routers, which constitute the longest data transmission path for virtual channel \( i \):

\( \text{maxDelaySource}_i \text{VC}_C \) - the maximum processing delay in a packet’s source of the virtual channel \( i \):

\( \{ \text{allVC}_p \} \) - a set of virtual channels, which are supported in the port with identifier \( p \) of a node.

\( \{ \text{allCHighPriority}_i \} \) - a set of virtual channels, which are supported in the port with identifier \( p \) of a node and have a higher priority than the priority of the virtual channel \( i \):

\( \{ \text{allPortVC}_i \} \) - a set of node’s ports which support data transmission via the virtual channel \( i \):

\[
\text{maxDelaySource}_i \text{VC}_C = \max_{k \in \{ \text{SourceVC}_C \}} \left( \max_{p \in \{ \text{allPortVC}_i \}} \left( \text{sizeP}_i \cdot \text{TwrByteTX}_k + \text{TcalcPrec}_k \right) + \sum_{j \in \{ \text{allVC}_p \}, j \neq i} \left( \text{sizeP}_j \cdot \text{TwrByteTX}_k + \text{TcalcPrec}_k \right) \right)
\]

\[
\text{maxDelayDestin}_i \text{VC}_C \text{ - the maximum processing delay in a destination node for packets of the virtual channel } i
\]

\[
\text{maxDelayDestin}_i \text{VC}_C = \max_{k \in \{ \text{DestinVC}_C \}} \left( \text{sizeP}_i \cdot \text{TwrByteRX}_k \right)
\]

\[
\text{maxDelaySw}_i \text{VC}_C \text{ - maximum delay in a router for packets of the virtual channel } i \text{ for the case when the competition exists between the packets of one virtual channel and the packets of different virtual channels for the switch output port.}
\]

\[
\text{maxDelaySw}_i \text{VC}_C = \text{sizeP}_i \cdot \text{TwrByteRX}_k + \text{DelaySwMatrix}_k + \text{TcalcPrec}_k + \left( \left| \{ \text{allPortVC}_i \} \right| - 1 \right) \cdot \left( \text{sizeP}_i \cdot \text{TwrByteTX}_k + \text{TcalcPrec}_k \right)
\]

\[
\text{maxDelay}_i \text{VC}_C = \text{maxDelaySource}_i \text{VC}_C + \sum_{k \in \{ \text{maxSwVC}_i \}} \text{maxDelaySw}_i \text{VC}_C + \text{maxDelayDestin}_i \text{VC}_C
\]

C. Calculation of the maximum/minimum data transmission delay for the virtual channel with the highest priority.

The following restrictions were made during calculations: for every virtual channel data transmission is enabled in every time slot; all routers contain only one connection point for each port. The connection point is shared by all virtual channels of the corresponding port.

\[
\text{minDelay}_i \text{VC}_h = \text{the minimal packet transmission delay for the virtual channel with the highest priority for the whole transmission path.}
\]

\[
\text{maxDelay}_i \text{VC}_h = \text{the maximal packet transmission delay for the virtual channel with the highest priority for the whole transmission path.}
\]

The value of the minimal packet transmission delay for the virtual channel with the highest priority is equal to the value of the minimal packet transmission delay for the virtual channel of an arbitrary priority.

\[
\text{minDelay}_i \text{VC}_h = \text{minDelay}_i \text{VC}_i
\]

The value of the maximal packet transmission delay for the virtual channel with the highest priority is not equal to the minimal packet transmission delay for the virtual channel of an arbitrary priority.

\[
\text{maxDelaySource}_i \text{VC}_h - \text{the maximal packet processing delay for the virtual channel with the highest priority in a source node.}
\]
The packets of virtual channels of the same priority in output already being transmitted and there is a competition between We assume that the frame of the lower priority packet is packets from the virtual channel with the highest priority.

maxDelayDestin_VC_h = maxDelayDestin_VC_i - the maximal packet processing delay in a destination node for the highest priority virtual channel is equal to the maximal packet processing delay in a destination node of an arbitrary virtual channel priority.

maxDelaySw_k_VC_h - the maximal delay in a router for packets from the virtual channel with the highest priority. We assume that the frame of the lower priority packet is already being transmitted and there is a competition between the packets of virtual channels of the same priority in output port of router.

maxDelaySw_k_VC_h = sizeP_VC_h \cdot TwrByteRX_k + DelaySwMatrix_k + TcalcPrec_k = (sizeF - 1) \cdot TwrByteTX_k + (\{\{allPortVC_k\} \cdot 1\} \cdot (sizeP_VC_h \cdot TwrByteRX_k + TcalcPrec_k)

maxDelay_VC_h = maxDelaySource_VC_h + \sum_{k\in maxSwVC_h} maxDelaySw_k_VC_h + maxDelayDestin_VC_h

IV. RESULTS OF TIMING CHARACTERISTICS

A. Network model

Timing characteristics estimation was done on the basis of the models, which are depicted in Fig. 3.

The Network model 1 comprises a router with 4 ports, each of which can work with 4 virtual channels. Terminal nodes generate packets in a random time moments. At these random moments the terminal node sends the generated packets to each virtual channel. The destination nodes for each virtual channel are also chosen randomly and can be different for the virtual channels. This configuration can lead to a potential possibility of data packets flow concurrency in the output port.

B. Simulation

The network was simulated on the adapted DCNSimulator model. DCNSimulator is based on Qt and SystemC. It consists of the simulation engine and libraries of network components. The simulation engine is the general part that could work for simulation of any network. Libraries of network components are specific for particular network standards and could represent network components

\[ \text{maxDelaySource}_{VC_h} \]
\[ = \text{max}_{k\in\text{SourceVC}_h} \left( \text{max}_{p\in\{\text{allPortVC}_h\}} \left( \text{sizeP}_{VC_h} \cdot \text{TwrByteTX}_k + (\text{sizeF} - 1) \cdot \text{TwrByteTX}_k + \text{TcalcPrec}_k \right) \right) \]

\[ \text{maxDelayDestin}_{VC_h} = \text{maxDelayDestin}_{VC_i} - \text{the maximal packet processing delay in a destination node for the highest priority virtual channel is equal to the maximal packet processing delay in a destination node of an arbitrary virtual channel priority.} \]

\[ \text{maxDelaySw}_{k,VC_h} - \text{the maximal delay in a router for packets from the virtual channel with the highest priority. We assume that the frame of the lower priority packet is already being transmitted and there is a competition between the packets of virtual channels of the same priority in output port of router.} \]

\[ \text{maxDelaySw}_{k,VC_h} = \text{sizeP}_{VC_h} \cdot \text{TwrByteRX}_k + \text{DelaySwMatrix}_k + \text{TcalcPrec}_k = (\text{sizeF} - 1) \cdot \text{TwrByteTX}_k + (\{\{\text{allPortVC}_k\} \cdot 1\} \cdot (\text{sizeP}_{VC_h} \cdot \text{TwrByteRX}_k + \text{TcalcPrec}_k) \]

\[ \text{maxDelay}_{VC_h} = \text{maxDelaySource}_{VC_h} + \sum_{k\in\text{maxSwVC}_h} \text{maxDelaySw}_{k,VC_h} + \text{maxDelayDestin}_{VC_h} \]

In this case we used the router and node models which comprise only the Virtual Channel and the Network Layers (this gave an opportunity to reduce the simulation time and to obtain more detailed results). The link bandwidth in the model is set to 1 Gbit/s.

The results of the simulation can significantly depend on the router model implementation features such as local clock frequency and link capacity within the router.

C. Estimation of achievable characteristics of the Network model 1

Let us consider the case when each virtual channel has its own particular priority level, which corresponds to the virtual channel number: VC1 – the highest priority, VC4 – the lowest. The packet length does not exceed the frame length. Fig. 4 - Fig. 11 shows the simulation results for the 1st, 2nd and 3rd way of router implementation for each virtual channel, when size of data packet is 250 byte. Fig. 12 - Fig. 19 shows the simulation results for the 1st, 2nd and 3rd way of router implementation for each virtual channel, when size of data packet is 750 byte.

2nd implementation of network layer differs by a large value of delay of high priority packet.

Fig. 12 - Fig. 19 shows the simulation results for the 1st, 2nd and 3rd way of router implementation for each virtual channel, when size of data packet is 750 byte. Delay is bigger for the 2nd way of the router implementation than for 1st, 3rd way.

Fig. 20 - Fig. 23 shows the simulation results for the 1st, 2nd and 3rd way of router implementation for each virtual channel, when size of data packet is 750 byte. There are interruptions the transmission of data with lower priority in 3rd way.
Fig. 4 Comparison of the packet transmission time via VC1 (the packet size = 250 bytes) in case of different implementations of network layer.

Fig. 5 Comparison of the packet transmission time via VC2 (the packet size = 250 bytes) in case of different implementations of network layer.

Fig. 6 Comparison of the packet transmission time via VC3 (the packet size = 250 bytes) in case of different implementations of network layer.

Fig. 7 Comparison of the packet transmission time via VC4 (the packet size = 250 bytes) in case of different implementations of network layer.

Fig. 8 Bar chart of the average packet transmission time via VC1 (the packet size = 250 bytes).

Fig. 9 Bar chart of the average packet transmission time via VC2 (the packet size = 250 bytes).

Fig. 10 Bar chart of the average packet transmission time via VC3 (the packet size = 250 bytes).

Fig. 11 Bar chart of the average packet transmission time via VC4 (the packet size = 250 bytes).
Fig. 12 Comparison of the packet transmission time via VC1 (the packet size = 750 bytes) in case of different implementations of network layer

Fig. 13 Comparison of the packet transmission time via VC2 (the packet size = 750 bytes) in case of different implementations of network layer

Fig. 14 Comparison of the packet transmission time via VC3 (the packet size = 750 bytes) in case of different implementations of network layer

Fig. 15 Comparison of the packet transmission time via VC4 (the packet size = 750 bytes) in case of different implementations of network layer

Fig. 16 Bar chart of the average packet transmission time via VC1 (the packet size = 750 bytes)

Fig. 17 Bar chart of the average packet transmission time via VC2 (the packet size = 750 bytes)

Fig. 18 Bar chart of the average packet transmission time via VC3 (the packet size = 750 bytes)

Fig. 19 Bar chart of the average packet transmission time via VC4 (the packet size = 750 bytes)
V. COMPARISON THEORETICAL AND SIMULATION RESULTS

Minimum and maximum delay for packet of the virtual channel with the highest priority was calculated using represented formulas. The parameters value: $sizeP_{VC_k} = 250$ byte; $TwrByteRX_k = TwrByteTX_k = 8$ ns; $sizeF = 256$; $TcalcPrec_k = 16$ ns; $DelaySwMatrix_k = 16$ ns.

We calculate parameters of delay:

- $minDelaySource_{VC_1} = 250 \cdot 8 + 16 = 2016$ ns;
- $minDelayDestin_{VC_1} = 250 \cdot 8 = 2000$ ns;
- $minDelaySw_{k}\_VC_1 = 250 \cdot 8 + 16 + 16 + 250 \cdot 8 = 4032$ ns;
- $minDelay_{VC_1} = 8048$ ns;
- $maxDelaySource_{VC_1} = 250 \cdot 8 + 255 \cdot 8 + 16 = 4056$ ns;
- $maxDelayDestin_{VC_1} = 250 \cdot 8 = 2000$ ns;
- $minDelaySw_{k}\_VC_1 = 250 \cdot 8 + 16 + 16 + 255 \cdot 8 + 3 \cdot (250 \cdot 8 + 16) = 10120$ ns;
- $maxDelay_{VC_1} = 4056 + 2000 + 10120 = 16176$ ns.

Fig. 20 Bar chart of the average packet transmission time via VC1 (the packet size = 750 bytes). Exponential distribution of packet generation time.

Fig. 21 Bar chart of the average packet transmission time via VC2 (the packet size = 750 bytes). Exponential distribution of packet generation time.

Fig. 22 Bar chart of the average packet transmission time via VC3 (the packet size = 750 bytes). Exponential distribution of packet generation time.

Fig. 23 Bar chart of the average packet transmission time via VC4 (the packet size = 750 bytes). Exponential distribution of packet generation time.

Fig. 24 Comparison of the packet transmission time via VC1 (the packet size = 250 bytes) in case of simulation and theoretical results.
Theoretical minimum delay for packet of the virtual channel with the highest priority is equal simulation results. Theoretical maximum delay is more simulation delay. This is because the system is not functioning with maximum download.

VI. HARDWARE COSTS

We are using Cadence RTL Compiler and Encounter and UMC 120 nm technology library for evaluation of router’s switch matrix hardware cost. We performed a logical and a physical synthesis of the switch matrixes with different number of channels that correspond to different router implementations (different amount of ports and connection points).

Results of the logical synthesis are represented in Fig. 25. As shown in this figure, if quantity of connection points is bigger than 4, hardware cost grows essentially. The logical synthesis becomes impossible when quantity of ports is 16 and quantity of virtual channels is 16 or bigger (256 channels of the switch matrix). The physical synthesis is problematic if quantity of ports is bigger than 8 and of virtual channels is bigger than 8 (64 channels of switch matrix). This amount of switch matrix channels is boundary of hardware resources for the 1st way of a router structure. The 2nd way can be implemented with the greater amount of virtual channels if 2 – 4 connection point for every port is used. Thus the 1st way of a router structure hardware is essentially constrained.

![Switch matrix area(mm2) vs. connection points quantity](image)

Fig. 25 The switch matrix hardware cost

VII. CONCLUSION

According to the investigations made the 1st way of the router organization results in the limitations in hardware implementation. The comparison of the achievable timing characteristics for different ways of router implementation showed that if a packet size is smaller than the frame size then the average packet transmission time for three ways is almost similar.

Delay of the low priority traffic grows faster for the 2nd way of the router implementation. Therefore, the 2nd way of the router implementation can be used for the networks with the packet length shorter than frame size. In this case it will provide scheduled, bandwidth reserved and priority qualities of service. The packet lengths larger than the frame size while using the 2nd way of the router implementation result in degradation of the timing characteristics in comparison with the 1st and 3rd way. This degradation of characteristics grows proportionally to the packet’s length of the virtual channels of low priorities. Consequently, the 2nd way of the router implementation in networks where long packets are transmitted is possible only when there are no hard real time requirements. The 3rd way of the router implementation essentially decreases these disadvantages. The average packet transmission time and achievable link utilization in this case are almost similar to the 1st way of the router implementation.

Delay is 10% bigger for the 2nd way of the router implementation than for 1st way, when the packet length shorter than frame size and delay is 1% bigger for the 3rd way of the router implementation than for 1st way. Delay is 50% bigger for the 2nd way of the router implementation than for 1st way, when the packet length longer than frame size and delay is 17% bigger for the 3rd way of the router implementation than for 1st way. Therefore, the achievable characteristics for the scheduled service and delay value for this 2nd way of router implementation are lower.

ACKNOWLEDGMENT

The research leading to these results has received funding from the Ministry of Education and Science of the Russian Federation under grant agreement no 13.G36.31.0003.

REFERENCES