Conference starts in:

Use FRUCT discount code at booking.com

Find a new job

You are here

Distributed Objects Allocation/Retrieval System for Heterogeneous P2P Network

Goal


The goal of the project is development of management algorithms for content distribution and content search in heterogeneous p2p network.

Introduction


There is a heterogeneous network consisting of nodes with different capabilities (e.g. mobile phone, UMPC, desktop, NAS and RF sensors). It is required to develop service for storing and searching of files in this network. For creation of such service it is necessary to develop two sets of algorithms: the algorithms for working with the file descriptors and the algorithms for working with files. File descriptor consists of two parts: file description (e.g. file name and the set of keywords) and file location. The file search is interpreted as searching for file location using file description.

The nodes could join and leave the network in arbitrary moments of time. Therefore it is impossible to establish central server, which stores all file descriptors. The network could consist of many nodes with many files, so it is also impractical to replicate full list of file descriptors on all nodes.

The possible solution for storing and searching of file descriptors is p2p network. Such network could be considered as distributed database.

Approach Description


Current proposal presents the concept design and prototype development plan of the incremental routing and distributed objects allocation/retrieval system which scales in advance within highly dynamic heterogeneous environments. The resulting application areas are not limited to the distributed infrastructure usage. Thus, the resulting solution should not be considered as a fixed but as a synergistic concept presented in a formal way which can be easily rendered to the actual case.

Despite the fact of multi-domain convergence still distributed systems management stands on complex and computationally expensive cross-domain relationships findings and extrapolations. Thus, most of the systems encounter issues in scalability, robustness and durability.

The main reason of such issues comes from the intelligent engine, which is usually based on complicated cross-correlation findings and analysis. Even though such complexity, and corresponding overhead, is (compensated) by resulting performance, it is obvious to see limited applicability in case of energy and computationally constrained device. Therefore this raises the question of balanced, in energy and computational aspects, distributed systems management construction - managing content allocation with min cost; allocation mechanism which is aware of content features and network properties at any moment of time.

To narrow down our consideration the following two items are essential for further considerations:

  1. planning path to target content (constructs the most optimal path to the content, in the scope of the network and data properties, or, it gives a “hint” how to get to the target and where to go)
  2. content concentration (shows how much content slices locality diverge, in the sense of the query satisfaction and content slices)

As it was highlighted above, a multitier architecture can be seen: incremental low-level routing mechanism, a distributed query planner and a distributed directory management mechanism.

Any layer above distributed directory management is defined to serve as a distributed content location and retrieval mechanism. The main purpose is to guarantee sustained content evolution management, serialization and access control. To undertake such actions in most efficient (intelligent) way a decision update mechanism considered. Decision update mechanism utilizes information gathered from meta-data (actual content and query related) and network, which is fused and delivered as a conditional rule.

Incremental low-level routing mechanism provides routing and message passing facilities between network mapping facilities and actual physical connection selection/transfer (the connectivity layer).

Below the incremental low-level routing layer a corresponding connectivity should be provided. By means of that any network specific information is delivered by connectivity layer and, optionally, by means of system performance control with service info which includes connection specific details. The granularity level of transferred information can be adjustable.

System performance control can be based on external services specifications which are performance requirements and utilization, e.g. access pattern. The central role the system performance control plays is the infrastructure resource provisioning subsystem which is based on workload-resource mapping and distribution-admission control. These two parts absorb information from actual network topology and service availability, and network conditions and traffic patterns. All elements above are converged by means of resource management and actual performance measurements.

There are two main domains to gather information from. They are content specific and network specific domains. Content specific information can be delivered, for example, by distributed object filesystem infrastructure (which implies tasks arbitrage) and can consist of commonly used meta-data, object distribution and hierarchy. Network specific information is delivered by connectivity layer and can consist of actual network topology, network conditions and traffic pattern information etc.
Any layer above distributed directory management is defined to serve as a distributed content location and retrieval mechanism. The main purpose is to guarantee sustained content evolution management, serialization and access control. To undertake such actions in most efficient (intelligent) way a decision update mechanism considered. Decision update mechanism utilizes information gathered from meta-data (actual content and query related) and network, which is fused and delivered as a conditional rule.

Incremental low-level routing mechanism provides routing and message passing facilities between network mapping facilities and actual physical connection selection/transfer (the connectivity layer).

Below the incremental low-level routing layer a corresponding connectivity should be provided. By means of that any network specific information is delivered by connectivity layer and, optionally, by means of system performance control with service info which includes connection specific details. The granularity level of transferred information can be adjustable.

System performance control can be based on external services specifications which are performance requirements and utilization, e.g. access pattern. The central role the system performance control plays is the infrastructure resource provisioning subsystem which is based on workload-resource mapping and distribution-admission control. These two parts absorb information from actual network topology and service availability, and network conditions and traffic patterns. All elements above are converged by means of resource management and actual performance measurements.

There are two main domains to gather information from. They are content specific and network specific domains. Content specific information can be delivered, for example, by distributed object filesystem infrastructure (which implies tasks arbitrage) and can consist of commonly used meta-data, object distribution and hierarchy. Network specific information is delivered by connectivity layer and can consist of actual network topology, network conditions and traffic pattern information etc.

From the Network side the following information is relevant to network specific domain analysis (for example).

  • Interface properties
  • Adjacent nodes properties
  • Last action type
  • Timestamp of the last action
  • Node access info

Since there are several types of distributed storage infrastructure, two essential content related analyses are vital here – content locality and concentration. Content locality is analyzed in terms of temporal and spatial locality.

Content locality shows the actual proximity of data to the potential consumer in terms of costs. Since cost function is rather compound dependency from several parameters, obviously, proximity is determined in terms of those parameters and is non-linear by nature.

Content concentration shows the number of available data pieces per certain locality (e.g within certain proximity). Content concentration serves as input parameter of local workload model and, is derived from content dispersing estimation and content tracker.

Therefore the path construction converges to the efficient query requests update rule mechanism which is based on two domains information analysis and fusion. Since information which is used to update the management rule is rather independent (orthogonal) there are no ways to use correlation analysis and any derivative as well. The proposed considerably different approach consists of domains (content and network) decomposition and fusion based on specific features analysis.

It is important to note that current approach can be extended to the content concentration management, as it was mentioned above. There, the updated estimate will be used to track the allocation for the certain content distributed in the network, stick it and tight it to the set of aggregate queries that are targeted for that content. Thus, a dual side optimization is possible, from the query and from the content location, concurrently.

Work Plan


This pilot project covers the following topics:

  1. Distributed directory management mechanism, i.e. content distribution algorithm, which is aware of network specific information.
  2. Incremental low-level routing mechanism and a distributed query planner, i.e. content search algorithms, which are aware of network specific information.

Other mentioned topics like

  1. content evaluation management (decision update mechanism),
  2. optimization using notation of content locality and content concentration are considered as future work in subsequent projects.

The project starts at 18 of February, lasts 4 months and will finish at 18th of June.

The expected deliverables are:

  1. Technical report, which contains literature survey, solution description and simulation results.
  2. Simulation code

The work plan has the following structure.

Task name Duration Start date Finish date
Literature survey 1 month 18 Feb 18 March
Solution development 1,5 month 19 March 30 April
Analysis and simulation 1 month 1 May 30 May
Technical report 0,5 month 1 June 18 June

 

Graduate: 4th FRUCT seminar

Final presentation

Project team


Evgeny Linsky, Alexandra Afanasieva

Tutor: Sergey Boldyrev

 

Status: 
Graduate
Final deadline: 
Wednesday, November 5, 2008 (All day)
Users: 
Group: