Introduction
SmartUpdate focuses on the optimization of rule updating for the well-known packet classification problem on the Internet.
It’s the project I worked on as a software engineer intern at Juniper Networks in Beijing, China, where I was supervised by Bill Fang.
Details
1. Problems and Motivations
Rules updating is a subproblem in packet classification. Given an original rule sets and pre-computed data structures, rules updating required inserting or deleting certain rules, which means the original data structures should be updated online.
To support cloud computing, we are required to complete updating operations in restricted time with high after-updating classification performance at the same time. However, typical algorithms including Tuple Space Search and HyperSplit are not able to achieve high-speed packet classification and rapid rules updating simultaneously.
2. Aims
We want to develop an algorithm to:
- achieve better updating speed;
- achieve better after-updating classification performance;
- handle the trade-off between the above performance smartly.
3. Methods and Solutions
1. Assessment algorithms
We propose assessment algorithms to predict the updating complexity for various rule sets based on segment distribution, overlapping density, tuple space size, and rule set size.
For the decision tree based algorithms, we have
def Preprocess(ruleset, segment_list):
overlap_density = InitialList(size = DIM_MAX)
for i in range(0, DIM_MAX):
segment_list = GenerateSegment(ruleset, i)
segment_distribution[i] = segment_list.size
for segment in segment_list:
for rule in ruleset:
if IsOverlap(rule, segment):
overlap_density[i]++
overlap_densit[i] /= ruleset.size
return overlap_density, segment_distribution
def Estimator(new_ruleset, original_ruleset):
segment_list = GetInfo(original_ruleset)
overlap_density, segment_distribution = Preprocess(new_ruleset, segment_list)
time_base = SystemTest()
operation_num = 0
for i in range(0, DIM_MAX):
operation_num += overlap_density[i] * segment_distribution[i]
estimator = operation_num * time_base
return estimator
For tuple space based algorithms, we have
def Preprocess(ruleset):
tuple_list = InitialList()
for rule in ruleset:
tuple_found = LinearSearch(tuple_list, rule)
key = CreateKey(rule)
if tuple_found is not None:
tuple_new = CreateTuple(key)
AddListItem(tuple_list, tuple_new)
return tuple_list.size, ruleset.size
def Estimator(new_ruleset, original_tuple_num):
new_tuple_num, new_rule_num = Preprocess(ruleset)
time_base = SystemTest()
operation_num = 0
for i in range(0, tuple_num - 1):
operation_num += original_tuple_num + i
operation_num += (new_tuple_num + original_tuple_num) * (new_rule_num + new_tuple_num)
estimator = operation_num * time_base
return estimator
2. SmartUpdate Algorithms
Leveraging grouping and dynamic decision-making mechanism, we propose the SmartUpdate algorithm.
def SmartUpdateIncrementalUpdate(ruleset, new_ruleset, toleration,
preference):
if Estimator([ruleset; new_ruleset]) < toleration:
return HyperSplitFullVolumeUpdate([ruleset; new_ruleset])
else:
data_structures = InitialList()
switch (preference):
case CLASSIFICATION_SPEED :
return SmartUpdateFullVolumeUpdate([ruleset; new_ruleset], toleration, CLASSIFICATION_SPEED)
case UPDATING_SPEED :
original_data_structure = GetInfo(ruleset)
new_data_structure = TupleSpaceSearchFullVolumeUpdate(ruleset)
AddListItem(data_structures, original_data_structure)
AddListItem(data_structures, new_data_structure)
default:
data_structures = HybridUpdate(ruleset, new_ruleset,
toleration)
return data_structures
def HybridUpdate(new_ruleset, original_tuple_num):
original_data_structure GetInfo(ruleset)
if Estimator(original_data_structure, new_ruleset) <
toleration:
return IncrementalUpdate(original_data_structure, new_ruleset)
else:
data_structures = InitialList()
if Estimator(new_ruleset) < toleration:
new_data_structure = HyperSplitFullVolumeUpdate(ruleset)
else:
new_data_structure = SmartUpdateFullVolumeUpdate(ruleset, toleration, CLASSIFICATION_SPEED)
AddListItem(data_structures, original_data_structure)
AddListItem(data_structures, new_data_structure)
return data_structures
4. Performance
For rules updating speed, SmartUpdate Strategy 2 beats HyperSplit about one hundred times.
For after-updating classification speed, SmartUpdate beats HyperSplit about 5 times on complex rules set (like Fire Wall 5K, 7K, and 10K).
Related Technologies
- DPDK: We use DPDK to generate and forward Internet packets.
- Rules Generator, Traces Generator: We refined these tools to generate rules and traces and do not leak the confidential information of industry rule sets.
- Algorithms: We use hash tables in parts related to tuple space, and build balanced decision trees using recursion in split methods.
Programming Languages
- C: 70%. The main projects are using C (for better performance).
- Python: 25%. I write python scripts for preprocessing of rules, handling trade-off, and testing.
- Bash: 5%. I write bash scripts to do testing.
Link to Codes
The core codes are open sourced on GitHub.