P2P Dynamic Networks, Page 4
The information that should be defined in the discovery protocol depends on various factors. If all your problems could be solved with the ping/identity map, the resulting protocol would be simplistic. However, there are many additional considerations, such as routing impediments, network connectivity, security, presence, and so on. As a result, protocols must be more complex than simple ping requests.
JXTA defines two protocols specific to discovery:
Peer Discovery ProtocolThis protocol enables a peer to find advertisements on other peers, and can be used to find any of the peer, peer group, or resource advertisements.
Peer Endpoint ProtocolThis protocol enables a peer to query a peer router for available routes to send a message to a destination peer. This is used in a case where NAT or firewall impediments block peers from communicating directly. Peer routers respond to queries with available route information. Any peer can decide to become a peer router by implementing the Peer Endpoint Protocol.
The organization of information into advertisements simplifies the JXTA protocols required to make P2P work. The format of the advertisements themselves dictates the structure and representation of the protocol data.
Routing is a broad topic. An entire book (in fact, many) could be devoted to the subject of Internet routing. P2P introduces application-level routing that overlays network routing. This is the area we will focus on.
A P2P network forms a self-organizing overlay network on the Internet. Any Internet-connected host that runs P2P software and has proper credentials can participate in the overlay network. Each P2P node typically has a unique ID.
The P2P software builds a routing table that attempts to organize the overlay network to increase efficiency and resiliency. The sophistication of locality metrics, and the location, or path diversity, improves the performance of the overlay network.
Locality is based on a proximity metric. In other words, efficiency can be gained if you know the "distance" between any pair of nodes. This distance value represents the number of hops required to get from point A to point B. Or, it might represent the latency, or time delay between points. How it is computed or determined is implementation-specific.
The most simplistic routing implementation ignores locality considerations altogether. Every peer in the routing table is considered equal. Any peer can be used to transfer a discovery or search request. Message propagation ensures that as many peers as possible are queried. In this scenario, you typically limit the number of peers (connections) in the routing table. In addition, you must implement a loopback mechanism to drop message requests you have already received. A unique message ID is stored for each message to determine redundant requests. This is the routing technique employed in most flooding broadcast systems.
A more sophisticated approach recognizes the distance metric. The function that determines distance is critical to the overlay network's effectiveness. Peers that are determined to be close will receive preferential routing. However, that has to be balanced to ensure diversity in the routing table population.
Diversity is based on a distribution metric. The goal of the distribution metric is to ensure that discovery requests are appropriately partitioned. Unlike locality, which optimizes distance, diversity optimizes geographic distribution.
The uniform distribution of peers such as relays and rendezvous points improves the scalability of the virtual space by reducing or eliminating a large number of discovery requests. This requires a function that is capable of computing a distribution metric when given a unique value, such as an IP address, or the preconfiguration of pivot points in the network. A pivot point provides a route from one major network segment to another.
The uniform distribution of peers ensures an even population of the virtual space (network), and reduces the size requirements of the routing table. Each node within the routing table can refer to one of potentially many nodes within a geographic distribution.
Finally, redundancy ensures that pivot points in the routing map, such as those that implement diversity, are replicated and redundant. This is to ensure that no single point of failure exists in the overall virtual space.
Flat Network Model
In a flat overlay network, each peer processes and propagates requests they receive to other peers to ensure the broad distribution of requests, as shown in Figure 6.7. As mentioned previously, network traffic will grow exponentially given a linear increase in peers or queries within the network.
A flat overlay network model will broadcast requests to every reachable peer. This model grows exponentially as the network grows linearly. Bandwidth saturation quickly becomes an issue. This model is only applicable for small work groups.
Simple broadcast is only viable in small networks.
Hierarchical Network Model
A hierarchical overlay network will reduce network traffic by organizing peers into a group hierarchy. Peers are only capable of discovery within their own group, or peers only communicate with peers that possess certain operational characteristics. For instance, hierarchical organization can result from differentiating high-bandwidth, dedicated peers from slower, less powerful ones. Hierarchies can be built from metadata that defines network intelligence and peer capabilities. Such metadata can include the following:
Equate resource consumption to the level of network participation. Discovery and search requests are discarded from noncontributing (sharing) peers.
Avoid expensive protocol operations such as unnecessary broadcast replies with intelligent forwarding to intended destinations.
Implement connection profiles to favor higher-bandwidth connections over slower modem connections.
Allow high-bandwidth broadband users to act as proxies for slower modem users.
Collect peer performance and measurement metrics.
In this final section, we will look at discovery performance implications.
Bandwidth and Scalability
This chapter has emphasized that uncontrolled discovery (broadcast) is only appropriate for small work groups. Balancing geographic reach while limiting excessive bandwidth consumption is a critical requirement of any scalable P2P network. There are a number of solutions that can be applied to optimize the discovery process:
Use diversity to reach a broader range of the virtual space more efficiently.
Minimize routing table size by seeding the table with special-purpose peers.
Throttle discovery requests and minimize heartbeat polling.
Segment the routing table by operational and environmental metadata.
Constrain total bandwidth allocated to discovery processing.
Monitor peer consumption, including bandwidth used by incoming packets, peak bandwidth used, and a quality value associated with the responses received.
The quality value of the response received can be used to determine future query and discovery scenarios. If you record the quality of a peer response, you can begin to maintain a search priority matrix by topic or category. Peers with high values for a given category will be queried first. You'll query lower-grade peers only if the higher-quality peers fail.
One outcome of this technique is that you may organize peers into peer clusters, or groups of peers capable of satisfying discovery requests by type or class of information. Additionally, peer clusters often have the overall effect of limiting the number of packets used, and thus the network bandwidth required to search for a desired response.
As this chapter has highlighted, discovery and searching have huge implications on network bandwidth requirements. Most implementations currently reside on either TCP (to improve reliability), or UDP (to improve performance). In addition, the topology of the discovery network is critical to understanding bandwidth and performance requirements.
A decentralized discovery topology relies on peers to propagate discovery requests across the network. Defining special discovery roles is minimized, and the failure of any one node does not necessitate widespread network failure. Often, UDP can be used because of the inherent redundancy built into the discovery (mesh) topology.
Contrasted with a decentralized topology is a centralized discovery platform. Some would argue that centralized discovery, or super nodes, are in direct contradiction to basic P2P network formation. However, in corporate environments, where large enterprise systems are needed for scalability, administration, and security, centralized servers using TCP might be appropriate.
The emerging middle ground (hybrids) uses a combination of centralized and decentralized discovery processing. Special broadcast roles, such as rendezvous and relay peers, effectively bridge the network to minimize the number of concurrent connections that must be supported.
Providing an adequate level of fault tolerance in P2P networks will be critical to their widespread adoption.
P2P technology solves problems inherent in a dynamic network, including dynamically assigned addresses, routing changes, and firewalls. Redundancy is key to reducing failure points and providing extra value for a P2P network.
Techniques such as round-robin processing, workload queries, and node diversity (geographic distribution) are required to scale P2P networks. This should be implemented without requiring any global coordination.
Data caching of routes, peer groups, and peers will radically improve performance. Subsequent lookup requests whose paths intersect can be served the cached copy. Initialization of peers from cached information can quickly transfer a large amount of virtual space knowledge from one peer to another.
Distributed caching offloads the peers that hold the primary replicas of data, and minimizes delays and network traffic by dynamically caching copies near interested clients.
A key design issue is how to efficiently and dynamically maintain the routing table in the presence of peer failures, peer recoveries, and new peer arrivals. Special broadcast peers can periodically exchange keep-alive messages. If a peer is unresponsive for a period, it is presumed failed. All members of the failed peer's group are notified and update their group membership.
Routing table entries that refer to failed peers can be repaired lazily; in other words only when explicitly requested or addressed to do so.
Applications can perform efficient multicast on subnets to repair or recover from localized damage or failures.
Discovery is the process of locating peers and resources in a P2P network. Discovery is based on three messaging techniques: broadcast, selective broadcast, and adaptive broadcast.
Broadcast, multicast, and replicated unicast all represent viable discovery transport services when used appropriately. Flooding broadcast techniques are only applicable for small work groups. A combination of constrained multicast and targeted unicast is gaining in popularity. It provides the necessary balance between connectivity and bandwidth consumption.
The TTL parameter is a technique used to limit the broadcast horizon in P2P networks. TTL is also used in the network layer to control packet routing in general.
Heartbeat polling in large P2P networks should be minimized, if not eliminated. The frequency of broadcast requests should be monitored. Controls and rules placed on bandwidth consumption can help to mitigate network problems, resource allocation, and security breaches.
Discovery protocols are maturing and becoming more complex. Metadata is providing an important element to extending protocol definition and enabling a more robust network.
Special-purpose discovery roles, such as relays and rendezvous points, provide a number of attractive alternatives for scaling a P2P network.
Authors of this chapter
Robert Flenner is an independent Java software developer based in Texas. He is a regular contributor to the O'Reilly ONJava Web site, where he is currently publishing a series of articles related to Jini and JavaSpaces. He has been involved in managing, architecting, and developing information systems for 17 years. Previously, Robert wrote Jini and JavaSpaces Application Development (0-672-32258-7, Sams), published in Dec 2001.
Frank Cohen is CEO of PushToTest, a test automation solutions business. He is contributing author to books on distribution system development including Java P2P Unleashed and Java Web Services Unleashed, and a popular speaker at software development conferences. For the past 20 years he lead some of the software industry's most successful products, including Norton Utilities for the Macintosh, Stacker, and SoftWindows. He serves as an active member of the Software Developers Forum, the leading computer software industry association in the Silicon Valley of California.
Source of this material
|This material is from Chapter 6, P2P Dynamic Networks, from the book Java P2P Unleashed with JXTA, Web Services, XML, Jini, JavaSpaces, and J2EE (ISBN: 0-672-32399-0) written by Robert Flenner, Michael Abbott, Toufic Boubez, Frank Cohen, Navaneeth Krishnan, Alan Moffet, Rajam Ramamurti, Bilal Siddiqui, and Frank Sommers, published by Sams Publishing. .|
To access the full Table of Contents for the book
Other Chapters from Sams Publishing:Web Services and Flows (WSFL)
Overview of JXTA
Introduction to EJBs
Processing Speech with Java