What is self-similar traffic?
What is long-range dependance?
Estimating value of H
C++ code to measure Hurst parameter
Q & A
|
|
[ Top] |
|
Consider a cumulative process Y(t) with stationary increments,
and let Xt be its corresponding incremental
process:
|
| Xt = Y(t) - Y(t-1) |
(1)
|
|
For example Y(t) can represent the number of bytes arriving
up to time t, and
Xt can represent the number of bytes arriving
in one unit of time.
|
|
The process Xs(m) is an aggregated
process of Xt if:
|
|
Xs(m) = [Xsm-m+1 + X
sm-m+2
+ ... + Xsm]/m
| (2) |
|
Process Xt is said to be self-similar if Xt is indistinguishable from
Xs(m). This is a very
restrictive definition, especially considering the stochastic
nature of the network traffic. Usually, second-order self-similarity
is considered for the purposes of traffic description: auto-covariance functions of the original and aggregated processes should have
the same values.
|
| Let |
|
γ(k) = E[(Xt - μ)(Xt+k - μ)]
| (3) |
| and |
|
γ(m)(k) = E[(Xs(m) - μ)(X
s+k
(m) - μ)]
| (4) |
|
Then, the process Xt is exactly second-order
self-similar if
|
|
γ(m)(k) = γ(k)
| (5) |
|
and asymptotically second-order self-similar if
|
|
limm→∞ γ
(m)
(k) = γ(k)
| (6) |
|
A convenient measure of a process′s distributional
self-similarity is its Hurst parameter H. A
process is self-similar with parameter H (0 < H < 1) if:
|
|
Y(t) = k-H Y(kt), ∀ k > 0, t ≥ 0
| (7) |
|
i.e., the original and normalized aggregated processes should
have the same distribution. Translating it to the corresponding
stationary increment process, we get
|
|
Xs(m) = [Xsm-m+1 + X
sm-m+2
+ ... + Xsm]/m = [Y(sm) - Y(sm-m)]/m = [Y(m)
- Y(0)]/m
| (8) |
|
From Eq.(7), it follows that
Y(m) = mHY(1)
. Thus,
|
|
X(m) = [mH/m]*[Y(1) - Y(0)] = m
H-1
X
| (9) |
|
The self-similarity can be viewed as an ability of an aggregated
process to "preserve" the burstiness of the original process,
viz., the property of slowly decaying variance:
|
|
var(X(m)) ~ m2H-2
| (10) |
In the context of network traffic, this means that aggregating
traffic over large time intervals reduces the burstiness very
slowly (compared to non-self-similar traffic).
|
|
| [ Top] |
|
The property of long-range dependence refers to a non-summable
auto-correlation function ρ(k):
|
|
ρ(k) = γ(k) / σ2 = E[(X
t
- μ)(Xt+k - μ)] / E[(Xt -
μ)2]
| (11) |
|
For 0 < H < 0.5 and 0.5 < H < 1
|
|
ρ(k) ~ H(2H-1)k2H-2, k → ∞
| (12) |
|
It is clear, that if 0.5 < H < 1, then
|
|
∑∞ρ(k) = ∞
| (13) |
|
i.e., the process is long-range dependant.
|
|
The long-range dependence results from a heavy-tailed
distribution of the corresponding stochastic process.
Heavy-tailness refers to the rate of tail decay of the
complementary distribution function. In a heavy-tailed
distribution, the decay obeys the power law:
|
|
P[X > x] ~ cx-α, x →
∞, 1 < α < 2;
| (14) |
As a result, the probability of an extremely large observation
in the LRD process is non-negligible. In the context of
network traffic, this means that extremely large bursts of
data (packet trains) and extremely long periods of silence
(inter-arrival times) will occur from time to time. This is
one of the reasons why analytic models employing traditional
negative exponential distribution often provide overly
optimistic estimates for the delays and queue sizes –
the probability of an extreme event is negligible.
|
|
| [ Top] |
| From Eq. (9) we get |
|
var(X(m)) = m
2H-2
var(x)
| (15) |
| or |
|
log[var(X
(m)
) / var(x)] = (2H-2) log(m)
| (16) |
Eq. (16) suggests that log-log plot of variance vs.
aggregation level m should have a linear slope
of value 2H-2. The following figure shows a variance-time log-log plot which can be used to estimate the Husrt parameter of a
stochastic process. In a log-log plot, the
x
-axis represents the logarithm of the aggregation
parameter m and the y-axis
represents the logarithm of the normalized variance. The LRD traffic plot shows a linear dependency (except tail region) with
slope s value close to −0.4. From Eq. (16),
we expect the log-log plot to have a slope
s = 2H − 2
. Thus, the Hurst parameter is estimated to be
H = 1 + s/2 = 0.8
.
The second plot called SRD traffic was obtained
on the same generator, but using sub-streams with exponentially-distributed
on/off periods. This distribution possesses no long-range
dependence and its variance-time log-log plot is expected
to have a slope of −1 (
var(X(m)) ~ m-1
).

|
|
|
[ Top]
|
List of files:
-
_rand_MT.h
-
_types.h
-
_util.h
-
avltree.h.h
-
broadcom_pdf.h
-
MersenneTwister.h
-
stats.h
-
PolyFit.h
-
trf_gen_v3.h
-
var_time.h
-
vartime.cpp
|
This C++ program
measures synthetic
traffic produced by
Self-Similar traffic
Generator v.3
. It builds the
var-time log-log plot
and estimates the Hurst
parameter by finding
linear least-squares fit
to the measured graph
and calculating its
slope. The following
table shows the output
of the program.
|
| POINT # | WINDOW | LOG WINDOW
(X-coordinate) | | LRD VAR | LRD NORM VAR | LRD LOG NORM VAR
(Y-coordinate) | LRD LINEAR FIT
(Y-coordinate) | | SRD VAR | SRD NORM VAR | SRD LOG NORM VAR
(Y-coordinate) | SRD LINEAR FIT
(Y-coordinate) |
| 0 | 0.0001 | 0 | | 0.0628126 | 1 | 0 | 0.00746551 | | 0.0346508 | 1 | 0 | 0.0377225 |
| 1 | 0.000162378 | 0.210526 | | 0.0526957 | 0.838936 | -0.076271 | -0.0743866 | | 0.0227969 | 0.657904 | -0.181838 | -0.170089 |
| 2 | 0.000263665 | 0.421053 | | 0.0440054 | 0.700582 | -0.154541 | -0.156239 | | 0.0146055 | 0.421507 | -0.375196 | -0.3779 |
| 3 | 0.000428133 | 0.631579 | | 0.0366207 | 0.583016 | -0.234319 | -0.238091 | | 0.00920152 | 0.26555 | -0.575854 | -0.585712 |
| 4 | 0.000695193 | 0.842105 | | 0.0303168 | 0.482655 | -0.316363 | -0.319943 | | 0.00574394 | 0.165766 | -0.780503 | -0.793523 |
| 5 | 0.00112884 | 1.05263 | | 0.0250827 | 0.399326 | -0.398673 | -0.401795 | | 0.00357364 | 0.103133 | -0.986603 | -1.00133 |
| 6 | 0.00183298 | 1.26316 | | 0.0206973 | 0.329509 | -0.482133 | -0.483647 | | 0.00220538 | 0.063646 | -1.19623 | -1.20915 |
| 7 | 0.00297635 | 1.47368 | | 0.0170862 | 0.272018 | -0.565402 | -0.565499 | | 0.0013642 | 0.0393699 | -1.40484 | -1.41696 |
| 8 | 0.00483293 | 1.68421 | | 0.0141749 | 0.22567 | -0.646526 | -0.647351 | | 0.000839418 | 0.0242251 | -1.61574 | -1.62477 |
| 9 | 0.0078476 | 1.89474 | | 0.0117262 | 0.186685 | -0.72889 | -0.729203 | | 0.000516435 | 0.014904 | -1.8267 | -1.83258 |
| 10 | 0.0127427 | 2.10526 | | 0.00969033 | 0.154274 | -0.811708 | -0.811056 | | 0.000317155 | 0.0091529 | -2.03844 | -2.04039 |
| 11 | 0.0206914 | 2.31579 | | 0.00802201 | 0.127713 | -0.893763 | -0.892908 | | 0.000194784 | 0.00562135 | -2.25016 | -2.2482 |
| 12 | 0.0335982 | 2.52632 | | 0.00656447 | 0.104509 | -0.980847 | -0.97476 | | 0.000121096 | 0.00349476 | -2.45658 | -2.45601 |
| 13 | 0.0545559 | 2.73684 | | 0.00549938 | 0.0875522 | -1.05773 | -1.05661 | | 7.34E-05 | 0.00211776 | -2.67412 | -2.66383 |
| 14 | 0.0885867 | 2.94737 | | 0.00462536 | 0.0736376 | -1.1329 | -1.13846 | | 4.60E-05 | 0.00132829 | -2.87671 | -2.87164 |
| 15 | 0.143845 | 3.15789 | | 0.00376108 | 0.0598778 | -1.22273 | -1.22032 | | 2.79E-05 | 0.000804806 | -3.09431 | -3.07945 |
| 16 | 0.233572 | 3.36842 | | 0.00311508 | 0.0495932 | -1.30458 | -1.30217 | | 1.71E-05 | 0.000493271 | -3.30691 | -3.28726 |
| 17 | 0.379269 | 3.57895 | | 0.00268554 | 0.0427548 | -1.36902 | -1.38402 | | 1.10E-05 | 0.000317988 | -3.49759 | -3.49507 |
| 18 | 0.615848 | 3.78947 | | 0.00215787 | 0.0343541 | -1.46402 | -1.46587 | | 6.81E-06 | 0.000196651 | -3.7063 | -3.70288 |
| 19 | 1 | 4 | | 0.00185091 | 0.0294672 | -1.53066 | -1.54772 | | 4.38E-06 | 0.000126402 | -3.89825 | -3.9107 |
| LRD test: packet
count | 2.46E+08 |
| LRD test: traffic
load | 0.26099 |
| LRD test:
expected Hurst
parameter | 0.8 |
| LRD test:
measured Hurst
parameter | 0.805601 |
| SRD test: packet
count | 2.36E+08 |
| SRD test: traffic
load | 0.249803 |
| SRD test:
expected Hurst
parameter | 0.5 |
| SRD test:
measured Hurst
parameter | 0.506448 |
|
|
[
Top
]
|
| |
|
|