http://www.developer.com/

Back to article

P2P Dynamic Networks


November 7, 2002


This material is from Chapter 6, P2P Dynamic Networks, from the book Java P2P Unleashed with JXTA, Web Services, XML, Jini, JavaSpaces, and J2EE (ISBN: 0-672-32399-0) written by Robert Flenner, Michael Abbott, Toufic Boubez, Frank Cohen, Navaneeth Krishnan, Alan Moffet, Rajam Ramamurti, Bilal Siddiqui, and Frank Sommers, published by Sams Publishing.


P2P Dynamic Networks

by Robert Flenner and Frank Cohen

In This Chapter

  • Discovery

  • Identity and Presence

  • Virtual Spaces

  • Routing

  • Performance

Techniques for peers to discover and use each other's functions are perhaps the greatest distinction between P2P technology and client/server Web technology. P2P technology expects peers to live at the edge of a network, and to require a variety of techniques to interoperate. On the other hand, client/server Web technology requires the network to know where to find resources before the request is made. P2P uses a group of methods known collectively as discovery.

Discovery

Discovery answers the big questions about a network:

  • What peers exist on the network?

  • How are the peers organized around their capabilities?

  • What uniquely identifies a peer?

  • How does a peer exchange data with another peer?

P2P is forced to identify answers to these questions. Unfortunately for the Java developer, not all P2P technologies are successful. Worse yet, some P2P technologies are closed and proprietary, or they hard-code implementations into one solution that would otherwise use open technology.

Although many P2P techniques exist to build peers, three types of peers have emerged as popular designs:

  • Simple peer

  • Rendezvous peers

  • Router peers

A simple peer is designed to be an endpoint that offers functions and data to peers making requests. Simple peers have the least responsibility of all three peer types. They usually reside outside a general network, and possibly behind a firewall or Network Address Translation (NAT) router. Simple peers are not expected to handle communication on behalf of other peers, or to serve information that they don't directly consume themselves.

Rendezvous peers provide a dating service in which peers discover other peers and peer resources like data and functions. All three types of peers issue discovery queries to rendezvous peers, but the rendezvous peer is also usually a cache of previous requests. When a rendezvous peer lives behind a firewall, it must have the ability to communicate through the firewall to other peers.

Router peers provide a mechanism for peers to communicate through firewalls and NAT routers. A router peer tunnels peer requests across a network. The information needed to use a router peer is enough to replace the need for a Dynamic Naming Service (DNS) and supports dynamic IP addressing.

Let's look at a simple example of the three peers in action. Imagine using a P2P client that looks for magazine articles on human genomics. The user initiates a search for the articles with a simple peer. The peer sends a discovery query to all its known simple peers and rendezvous peers. The rendezvous peers that receive the query look to see whether they have data the simple peer is looking for. If so, the rendezvous peer might return a discovery response message containing advertisements from other peers that are stored in its cache. The rendezvous peer will also likely send along the same query to its list of known peers.

Although we have described three different types of peers, in real-world P2P applications each peer might include a combination of the functions described in simple, rendezvous, and router peers. Let's look at how peers discover data, functions, and services using a variety of P2P techniques.

Router Peers and Dynamic Networks

P2P technology expects to find a network filled with firewalls, dynamic addresses, and changing peer locations. P2P provides a loose coupling of peers, so the P2P network remains functional even when parts of the real network break. Three P2P discovery techniques have become popular in this environment:

  • Broadcast—Sends a discovery request to every network node that is reachable

  • Selective broadcast—Sends a discovery request to every network node based on established heuristics

  • Adaptive broadcast—Sends a discovery request to every network node based on heuristics and rules

These techniques will be joined, modified, and abandoned over time as new ways to dynamically form a network are identified. The following are some of the areas of study from which P2P technology innovations might spring:

  • Transport—How do transport services such as broadcast, multicast, and unicast messaging relate to discovery?

  • Radius—How is the discovery horizon established and maintained?

  • Frequency of broadcast—How often should discovery messages be broadcast to populate the network?

  • Discovery protocol—What information should be defined in a discovery protocol?

  • Discovery roles—Do all peers participate equally in the discovery process? Do all peers have the same broadcast role?

Broadcasts

Traditionally, broadcast messages have been sent by devices that deal with network routing or data packet exchange at a low level, such as routers. Broadcast messages on IP networks contain a special address reserved for broadcasting. The network and host part of the address is set to ones (hex:—FFFFFFFF). This indicates to the network layer that the packet is addressed to every device on the subnet, as seen in Figure 6.1.


Figure 6.1

Broadcasts try to reach all nodes on the subnet.

In a P2P context, broadcasting might sound like TCP/IP multicasting, but it isn't. P2P technology plays mostly in the application layer of a software application. The actual method for moving a broadcast message across the Internet might use multicasting or a number of other techniques that we will explore next.

Transport—Multicast Versus Unicast Messaging

Multicast messaging is often compared to radio or TV broadcasts, in the sense that only those who have tuned their receivers to a particular frequency receive the information. Only the channels selected are heard. The sender sends the information without knowledge of the number of receivers.

In contrast, when you send a packet and there is only one sender and one recipient, this is referred to as unicast. A unicast transmission is by definition point-to-point. Unicast can be used to send identical information to many different destinations; however, this involves replicating data, and is not the most efficient transport.

Multicast addresses are in the Class D 224–239 range. Multicast messaging uses this range of addresses to define multicast groups, as shown in Table 6.1.

Table 6.1 IPv4 Address Classifications

IP Address Classification

Address Range

Class A

0.0.0.0127.255.255.255

Class B

128.0.0.0191.255.255.255

Class C

192.0.0.0223.255.255.255

Multicast (Class D)

224.0.0.0239.255.255.255

Reserved

240.0.0.0247.255.255.255



Note - You can find all the reserved multicast addresses at http://www.iana.org/assignments/multicast-addresses.


Multicasting has produced mixed results in applications that require a number of machines in a distributed group to receive the same data, such as conferencing, group mail, news distribution, and network management. Multicasting suffers from the lack of a control protocol, which makes it unsuitable for large, reliable, and sustained transmissions. Multicasting appears to be well-suited to P2P because peers on a P2P network do not require the synchronization of data among the peers, as multicasting often fails to deliver 100% of its data to everyone listening to the multicast. Figure 6.2 shows multicasting being used in P2P networks for discovery.


Figure 6.2

Multicasting goes beyond simple subnet penetration, but it requires that receivers listen on a specific "channel." The underlying network supports the transport services.

Multicast advantages include the following:

  • Decreased network utilization—Reduces the number of messages required by eliminating redundant packets and decreasing the number of point-to-point connections that must be established.

  • Resource discovery—Discovery and multicasting assume a sender is transmitting to an unknown number of peers without knowledge of their location.

  • Dynamic participation—Multicasting provides flexibility in joining and leaving a group. This membership flexibility supports the transient behavior of peers.

  • Multimedia support—Multimedia transmission continues to increase in popularity and consumes a significant amount of bandwidth. This is one area where network optimization is of paramount importance. Multicasting can be used to transmit multimedia data to receiving stations that compress the transmission and then deliver it to destination nodes, rather than using point-to-point connections for all destinations.

Unfortunately, multicasting is not implemented everywhere. Hardware, specifically routers, often block multicast traffic from penetrating corporate networks or traversing ISP providers. Firewalls and NAT devices often block not only multicast traffic, but constrain traffic in general to well-controlled choke points (ports). As a result, additional means of discovery are generally required in scalable P2P networks.

Radius of Broadcast

Broadcast packets need to have a mechanism to avoid bouncing around the network forever. This can happen when there is invalid addressing or routing information delivered with a packet. The time-to-live (TTL) parameter (an 8-bit field in an IP packet header), has been defined to address this issue. It ensures that packets cannot traverse the network endlessly. Each packet has a TTL value, which is a counter that is decremented every time the packet passes through a hop; for instance, a router between networks.

In the example in Figure 6.3, the TTL parameter is set to 4, and the broadcast request needs to make five hops (pass through five routers) to make it to the nearest peer. Peer-2 will never "hear" the broadcast request, and Peer-1 will never "know" about Peer-2 through this route. The packet will be discarded when the TTL count reaches zero.

When a peer receives a request, it looks at the TTL value. If the value is greater than 1, it decrements the value and transfers the request to the destination address or the next hop. If the value is 1 or less, it discards the message. In this respect, the P2P network is providing a layer of control that "overlays" the network layer.


Figure 6.3

Time-to-live parameters define the extent to which a packet can travel across the network. Routers typically decrement the TTL value of the packet as it passes through the router. When it reaches zero, the packet is discarded.

Frequency of Broadcast

Most systems that use broadcast techniques place some control on the frequency of the broadcast. For instance, when a peer activates, it sends a discovery message on the local subnet and waits a predetermined time before sending another discovery request. If no response is returned within that time interval, a subsequent request will be sent. In effect, the peer has started to poll the network. If responses are returned, the peer builds a map, or view of the peer network. This is important, because the peer view is probably very different from the physical view. The map reflects the peers that responded to the discovery request.

As peers enter and leave the network, they must be able to update their view. One approach is to go into a heartbeat mode of polling. The peer periodically sends a discovery request. As responses are received, the map is updated. During the polling process, some peers might no longer be available. In Java fashion, these peers are eventually removed when the Java garbage collector destroys the object holding the instantiated map. New peers that respond will be added to the map. A simple ping map contains the list of peers that have responded to discovery requests. The ping map can be as simple as a list of active IP addresses, as in Table 6.2.

Table 6.2 Ping Map

IP Addresses

172.16.1.3

172.16.1.4

12.239.129.4

...


The ping map, which might also be viewed as a peer routing table, is built from scratch each time the peer activates. In this model, the peer does not implement the notion of memory. In other words, each time the peer activates, it invokes the discovery process and collects a new image of the peer network. This approach is unable to deal with many of the problems inherent with P2P networks. For instance:

  • Dynamic IP assignment—History of peer interaction is limited to current IP assignment.

  • Size and scale of network—Every peer maintaining maps of connections cannot scale.

  • Reputation and trust issues—No history of past peer interactions is possible.

  • Equitable resource allocation—No controls are placed on resource utilization.

  • Security in general.

The identity of the peer is directly mapped (implicitly) to the IP address. If a peer changes its IP address, it is considered a new member of the network. A history of prior interactions is not possible.

The ping map can be extended to include the notion of identity, which resolves some of the problems. Persistence or memory of the peer network becomes more viable and attractive with identity. This approach requires each peer to have a unique ID. Once generated, the ID is fixed for the lifetime of the peer. When a discovery request is received, the responding peer returns its IP address (which might be different) and its unique ID (which never changes). This assumes that peers have a consistent method to generate unique IDs (see Table 6.3). ID collision occurs if two peers generate the same ID. Inconsistent ID representation (integer, String, UUID, and so on) causes identification problems throughout the network. Clearly, there are control mechanisms required even when using this simple approach.

Table 6.3 Ping Map with Peer Identity

IP Address:Port

Unique ID

172.16.1.3:

ABCD-3456-2345-DEFA

172.16.1.4:

DECF-5432-5643-EFDA

12.239.129.4:

DCDD-1324-7654-DEAC

...

...


Selective Broadcast

Instead of sending a discovery request to every peer on the network, peers are selected based on heuristics such as quality of service, content availability, or trust relationships.

Trust relationships are commonly used when a specific peer(s) acts as a relay or router to the peer network. Usually the trusting peer is seeded with the IP address of the trusted peer. This is the technique used by JXTA routing and rendezvous peers. The trusted peer has some knowledge of the network and is publicly available.

Selective broadcast requires that you maintain historical information on peer interactions, peer roles, peer identity, and so on. It begins to extend the ping and identity map concept to include the following:

  • Peer discovery roles—Peers have special roles to enable discovery. All peers are "not" equal.

  • Past performance metadata—A historical record of peer performance is maintained. This includes availability metrics, as well as environmental metadata.

  • Environmental metadata—Includes additional information on the peers' capabilities, such as bandwidth, disk space, and processing power (see Table 6.4).

Selective broadcast systems are much more scalable than simple broadcast networks. Instead of sending a request to all peers, it is selectively forwarded to specific peers who have a higher probability of being able to locate other peers or resources.

Each peer must contain or have access to information used to route or direct requests received. Although this might be appropriate from relatively small networks, in larger networks this overhead can quickly grow to levels that are unsupportable.

Table 6.4 Ping Map with Peer Identity and Metadata

IP Address:Port

Unique ID

Metadata

172.16.1.3:

ABCD-3456-2345-DEFA

Dial-up, # of concurrent connections

172.16.1.4:

DECF-5432-5643-EFDA

DSL, # of concurrent connections

12.239.129.4:

DCDD-1324-7654-DEAC

T1, # of concurrent connections

...

...

...


Adaptive Broadcast

As mentioned in Chapter 1, "What Is P2P?," adaptive broadcast tries to minimize network utilization while maximizing connectivity to the network. You can limit the growth of discovery and searching by predefining a resource tolerance level that, if exceeded, will begin to curtail the process. This will ensure that excessive resources are not being consumed because of a malfunctioning element, a misguided peer, or a malicious attack. Adaptive broadcast requires monitoring resources such as peer identity, queue size, port usage, and message frequency.

Rules can be used to complement metadata to build sophisticated discovery techniques (See Table 6.5).

Table 6.5 Ping Map with Peer Identity, Metadata, and Rules

IP Address:Port

Unique ID

Metadata

Rules

172.16.1.3:

ABCD-3456-2345-DEFA

Dial-up, # of concurrent connections

Congestion -> Throttle

Connections -> Accept

172.16.1.4:

DECF-5432-5643-EFDA

DSL, # of concurrent connections

Congestion -> Throttle

Connections -> Accept

12.239.129.4:

DCDD-1324-7654-DEAC

T1, # of concurrent connections

Congestion -> Throttle

Connections -> Accept

...

...

...

...


The ALPINE Network implements a form of adaptive broadcast in its adaptive social discovery protocol. It's based on the ALPINE-defined datagram protocol DTCP. See http://www.cubicmetercrystal.com/alpine/overview.html for more information on ALPINE networks and protocols.

Identity and Presence

As discussed in Chapter 3, "P2P Application Types," users of instant messaging (IM) systems must be uniquely identified. How a user is identified is fundamental to the operation of the system.

Identity has also proven fundamental to discovery and P2P systems in general. Our simple ping map example was unable to satisfy the critical requirements of P2P networks. It had no way to resolve the dynamic and transient nature of peer participation.

Peers and resources need to be uniquely identifiable. This identity must not be limited to current session or current IP address identification. It must persist to enable contextual information and historical interactions to be stored and subsequently restored. In effect, it is required to accumulate the knowledge necessary to support sophisticated P2P networks. Presence information tied to identity can be used to ensure that peer maps are consistent and represent the current state of the network. Knowing when a peer is online is required for building efficient, distributed, and user-centric systems.

Virtual Spaces

Broadcast messages require senders and receivers to agree on the semantics of the exchange (protocol) to create groups of collaborating nodes. The formation of a group of nodes creates a virtual space that shares a common context. Even at the base level of discovery, there is a significant amount of cooperation and collaboration involved. This is before any real work, such as transferring files, messages, transactions, and so on has even been initiated. A virtual space implies more than simple connectivity.

Another way of looking at virtual spaces involves JXTA. Before JXTA, a Java developer's choices for P2P technology were limited. If you were developing a file sharing application, the likely choice would have been the Gnutella protocol; for instant messaging, it would have been ICQ. The protocols' incompabilities divided the network into groups of applications based on protocols. With JXTA, the protocols are mixed-and-matched freely.

A natural result of JXTA mixing-and-matching protocols in P2P applications is found in peer groups. Peer groups are formed by combining groups of peers to serve a common interest or goal defined by the application the peers were built to solve. Peer groups provide services that are not available to other peers in the P2P network.

The J2SE implementation of JXTA organizes peer groups hierarchically. At the root is the NetPeerGroup, of which all peers are members by default (see Figure 6.4). On a local network, the NetPeerGroup provides peers with global connectivity according to the restrictions imposed by network administrators.


Figure 6.4

JXTA provides a view of a virtual space as a collection of common services shared by a group of peers.

The common services shared by the members of the NetPeerGroup include the following:

  • Discovery service

  • Membership service

  • Resolver service

  • Endpoint service

  • Pipe service

  • Peer Info service

Peers self-organize into peer groups, each identified by a unique peer group ID. So, it is the peer group ID that uniquely identifies the virtual space in the JXTA protocol.

Discovery, identity, and namespaces are the building blocks of a virtual space. Discovery determines the horizon or scope of membership. Identity uniquely defines the membership, and namespaces supply the context for membership.

In the computing disciplines, the term namespace conventionally refers to a set of names; that is, a collection containing no duplicates. In the context of P2P, a virtual namespace augments current addressing technology. It provides the context to support consistent identification and service composition.

You can define context as any information that can be used to characterize the situation of an entity or an action. Formal context definition is critical to enabling richer integration between distributed systems. Our software must be more intelligent and adaptive to the environment. For software to be adaptive, it must be able to "reason" and make assertions based on situational analysis. A virtual space provides the context. Members share a common protocol and metadata definition. P2P will help provide identity, presence, and context within the virtual spaces of cyberspace.

Discovery Implementations

This section discusses some P2P implementations of discovery.

Gnutella Discovery

Gnutella uses a broadcast-messaging protocol for peer discovery. The Gnutella net has no hierarchy. Every peer is both a client and a server (servent). Each Gnutella peer knows about the peers to which it is directly connected. All other peers are invisible, unless they announce themselves by answering to a broadcast request or a query.

After making the initial connection to a peer, you must handshake. Currently, the handshake is very simple. The connecting peer sends

GNUTELLA CONNECT/0.4\n\n

The accepting peer responds with

GNUTELLA OK\n\n

A Gnutella network is cyclic, in that loopback messages are possible. All messages have a unique ID (GUID). Gnutella peers check the message ID and if they have received the message before, they discard the request. If they have not seen the message, they route it to the peers to which they are directly connected.

JXTA Discovery

Per the specification, "JXTA does not mandate exactly how discovery is done. It can be completely decentralized, completely centralized, or a hybrid of the two." JXTA enables discovery by providing a discovery service, which provides a mechanism in JXTA for discovering advertisements. The Peer Discovery Protocol (PDP) defines a protocol for requesting advertisements from other peers, and responding to other peers' requests for advertisements.

Technically, advertising means sending an advertisement to everyone on the network. An advertisement is an identifier for any network resource that a using entity might need. A JXTA advertisement is platform-independent, and is typically represented by an XML document.

In JXTA, you can control the scope of discovery by specifying a threshold. The threshold is an upper limit of the number of advertisements that the requesting peer specifies. The responding peers cannot exceed this limit. Each PeerGroup has an instance of a DiscoveryService, so the scope of the discovery is limited to the group.

JXTA discovery mechanisms include local broadcast, peer invitation, message cascading, and discovery using rendezvous peers.

Rendezvous peers help an isolated peer by quickly seeding it with network information. Rendezvous peers provide peers with two possible ways of locating peers and other advertisements:

  • Propagation—A rendezvous peer will pass the discovery request to other peers on the network it knows about, including other rendezvous peers that will also propagate the request to other peers, a process illustrated in Figure 6.5.

  • Cached advertisements—A rendezvous peer can use cached advertisements to reduce network traffic, and can use cached advertisements to respond to a peer's discovery queries.


Figure 6.5

Rendezvous peers provide "fan-out" capabilities by propagating discovery requests from peers initiating discovery.

Relay peers also have a special role in JXTA discovery. These are peers that are capable of forwarding requests to rendezvous peers and other relay peers. They are used to provide connectivity, or a bridge, from behind a firewall or NAT device to the peer network. Any peer can query a peer relay for route information, and any peer in a peer group may become a relay. Peer relays typically cache route information. Route information includes the peer ID of the source, the peer ID of the destination, a TTL for the route, and an ordered sequence of gateway peer IDs (see Figure 6.6).


Figure 6.6

Relay peers can be used to circumvent firewalls and NAT devices. Typically, these IP addresses (relay and rendezvous) will be configured in the PlatformConfig file of the JXTA platform.

When a peer sends its advertisement to another peer, it can expect the other peer to reply by sending its advertisement back. This way, both peers will have the other party's advertisement.

Advertisements are stored in a persistent local cache (the cm directory). When a peer activates, the same cache is referenced. A JXTA peer can use the getLocalAdvertisements method to retrieve advertisements that are in its local cache. If it wants to discover other advertisements, it uses getRemoteAdvertisements to send a DiscoveryQuery message to other peers. DiscoveryQuery messages can be sent to a specific peer, or propagated to the JXTA network.

In the J2SE platform binding, DiscoveryQuery messages not intended for a specific peer are propagated on the local subnet utilizing IP multicast, and they're also propagated to the configured rendezvous peers. A peer includes its own advertisement in the DiscoveryQuery message, performing an announcement or automatic discovery mechanism. Only peers in the same peer group will respond to a DiscoveryRequest message.

Discovery Protocol

The information that should be defined in the discovery protocol depends on various factors. If all your problems could be solved with the ping/identity map, the resulting protocol would be simplistic. However, there are many additional considerations, such as routing impediments, network connectivity, security, presence, and so on. As a result, protocols must be more complex than simple ping requests.

JXTA defines two protocols specific to discovery:

  • Peer Discovery Protocol—This protocol enables a peer to find advertisements on other peers, and can be used to find any of the peer, peer group, or resource advertisements.

  • Peer Endpoint Protocol—This protocol enables a peer to query a peer router for available routes to send a message to a destination peer. This is used in a case where NAT or firewall impediments block peers from communicating directly. Peer routers respond to queries with available route information. Any peer can decide to become a peer router by implementing the Peer Endpoint Protocol.

The organization of information into advertisements simplifies the JXTA protocols required to make P2P work. The format of the advertisements themselves dictates the structure and representation of the protocol data.

Routing

Routing is a broad topic. An entire book (in fact, many) could be devoted to the subject of Internet routing. P2P introduces application-level routing that overlays network routing. This is the area we will focus on.

Overlay Networks

A P2P network forms a self-organizing overlay network on the Internet. Any Internet-connected host that runs P2P software and has proper credentials can participate in the overlay network. Each P2P node typically has a unique ID.

The P2P software builds a routing table that attempts to organize the overlay network to increase efficiency and resiliency. The sophistication of locality metrics, and the location, or path diversity, improves the performance of the overlay network.

Locality

Locality is based on a proximity metric. In other words, efficiency can be gained if you know the "distance" between any pair of nodes. This distance value represents the number of hops required to get from point A to point B. Or, it might represent the latency, or time delay between points. How it is computed or determined is implementation-specific.

The most simplistic routing implementation ignores locality considerations altogether. Every peer in the routing table is considered equal. Any peer can be used to transfer a discovery or search request. Message propagation ensures that as many peers as possible are queried. In this scenario, you typically limit the number of peers (connections) in the routing table. In addition, you must implement a loopback mechanism to drop message requests you have already received. A unique message ID is stored for each message to determine redundant requests. This is the routing technique employed in most flooding broadcast systems.

A more sophisticated approach recognizes the distance metric. The function that determines distance is critical to the overlay network's effectiveness. Peers that are determined to be close will receive preferential routing. However, that has to be balanced to ensure diversity in the routing table population.

Diversity

Diversity is based on a distribution metric. The goal of the distribution metric is to ensure that discovery requests are appropriately partitioned. Unlike locality, which optimizes distance, diversity optimizes geographic distribution.

The uniform distribution of peers such as relays and rendezvous points improves the scalability of the virtual space by reducing or eliminating a large number of discovery requests. This requires a function that is capable of computing a distribution metric when given a unique value, such as an IP address, or the preconfiguration of pivot points in the network. A pivot point provides a route from one major network segment to another.

The uniform distribution of peers ensures an even population of the virtual space (network), and reduces the size requirements of the routing table. Each node within the routing table can refer to one of potentially many nodes within a geographic distribution.

Node Redundancy

Finally, redundancy ensures that pivot points in the routing map, such as those that implement diversity, are replicated and redundant. This is to ensure that no single point of failure exists in the overall virtual space.

Flat Network Model

In a flat overlay network, each peer processes and propagates requests they receive to other peers to ensure the broad distribution of requests, as shown in Figure 6.7. As mentioned previously, network traffic will grow exponentially given a linear increase in peers or queries within the network.


Figure 6.7

A flat overlay network model will broadcast requests to every reachable peer. This model grows exponentially as the network grows linearly. Bandwidth saturation quickly becomes an issue. This model is only applicable for small work groups.

Simple broadcast is only viable in small networks.

Hierarchical Network Model

A hierarchical overlay network will reduce network traffic by organizing peers into a group hierarchy. Peers are only capable of discovery within their own group, or peers only communicate with peers that possess certain operational characteristics. For instance, hierarchical organization can result from differentiating high-bandwidth, dedicated peers from slower, less powerful ones. Hierarchies can be built from metadata that defines network intelligence and peer capabilities. Such metadata can include the following:

  • Equate resource consumption to the level of network participation. Discovery and search requests are discarded from noncontributing (sharing) peers.

  • Avoid expensive protocol operations such as unnecessary broadcast replies with intelligent forwarding to intended destinations.

  • Implement connection profiles to favor higher-bandwidth connections over slower modem connections.

  • Allow high-bandwidth broadband users to act as proxies for slower modem users.

  • Collect peer performance and measurement metrics.

Performance

In this final section, we will look at discovery performance implications.

Bandwidth and Scalability

This chapter has emphasized that uncontrolled discovery (broadcast) is only appropriate for small work groups. Balancing geographic reach while limiting excessive bandwidth consumption is a critical requirement of any scalable P2P network. There are a number of solutions that can be applied to optimize the discovery process:

  • Use diversity to reach a broader range of the virtual space more efficiently.

  • Minimize routing table size by seeding the table with special-purpose peers.

  • Throttle discovery requests and minimize heartbeat polling.

  • Segment the routing table by operational and environmental metadata.

  • Constrain total bandwidth allocated to discovery processing.

  • Monitor peer consumption, including bandwidth used by incoming packets, peak bandwidth used, and a quality value associated with the responses received.

Searching

The quality value of the response received can be used to determine future query and discovery scenarios. If you record the quality of a peer response, you can begin to maintain a search priority matrix by topic or category. Peers with high values for a given category will be queried first. You'll query lower-grade peers only if the higher-quality peers fail.

One outcome of this technique is that you may organize peers into peer clusters, or groups of peers capable of satisfying discovery requests by type or class of information. Additionally, peer clusters often have the overall effect of limiting the number of packets used, and thus the network bandwidth required to search for a desired response.

Transport

As this chapter has highlighted, discovery and searching have huge implications on network bandwidth requirements. Most implementations currently reside on either TCP (to improve reliability), or UDP (to improve performance). In addition, the topology of the discovery network is critical to understanding bandwidth and performance requirements.

A decentralized discovery topology relies on peers to propagate discovery requests across the network. Defining special discovery roles is minimized, and the failure of any one node does not necessitate widespread network failure. Often, UDP can be used because of the inherent redundancy built into the discovery (mesh) topology.

Contrasted with a decentralized topology is a centralized discovery platform. Some would argue that centralized discovery, or super nodes, are in direct contradiction to basic P2P network formation. However, in corporate environments, where large enterprise systems are needed for scalability, administration, and security, centralized servers using TCP might be appropriate.

The emerging middle ground (hybrids) uses a combination of centralized and decentralized discovery processing. Special broadcast roles, such as rendezvous and relay peers, effectively bridge the network to minimize the number of concurrent connections that must be supported.

Fault Tolerance

Providing an adequate level of fault tolerance in P2P networks will be critical to their widespread adoption.

Redundancy

P2P technology solves problems inherent in a dynamic network, including dynamically assigned addresses, routing changes, and firewalls. Redundancy is key to reducing failure points and providing extra value for a P2P network.

Load Balancing

Techniques such as round-robin processing, workload queries, and node diversity (geographic distribution) are required to scale P2P networks. This should be implemented without requiring any global coordination.

Storage

Data caching of routes, peer groups, and peers will radically improve performance. Subsequent lookup requests whose paths intersect can be served the cached copy. Initialization of peers from cached information can quickly transfer a large amount of virtual space knowledge from one peer to another.

Distributed caching offloads the peers that hold the primary replicas of data, and minimizes delays and network traffic by dynamically caching copies near interested clients.

Communication

A key design issue is how to efficiently and dynamically maintain the routing table in the presence of peer failures, peer recoveries, and new peer arrivals. Special broadcast peers can periodically exchange keep-alive messages. If a peer is unresponsive for a period, it is presumed failed. All members of the failed peer's group are notified and update their group membership.

Routing table entries that refer to failed peers can be repaired lazily; in other words only when explicitly requested or addressed to do so.

Applications can perform efficient multicast on subnets to repair or recover from localized damage or failures.

Summary

Discovery is the process of locating peers and resources in a P2P network. Discovery is based on three messaging techniques: broadcast, selective broadcast, and adaptive broadcast.

Broadcast, multicast, and replicated unicast all represent viable discovery transport services when used appropriately. Flooding broadcast techniques are only applicable for small work groups. A combination of constrained multicast and targeted unicast is gaining in popularity. It provides the necessary balance between connectivity and bandwidth consumption.

The TTL parameter is a technique used to limit the broadcast horizon in P2P networks. TTL is also used in the network layer to control packet routing in general.

Heartbeat polling in large P2P networks should be minimized, if not eliminated. The frequency of broadcast requests should be monitored. Controls and rules placed on bandwidth consumption can help to mitigate network problems, resource allocation, and security breaches.

Discovery protocols are maturing and becoming more complex. Metadata is providing an important element to extending protocol definition and enabling a more robust network.

Special-purpose discovery roles, such as relays and rendezvous points, provide a number of attractive alternatives for scaling a P2P network.

Authors of this chapter

Robert Flenner is an independent Java software developer based in Texas. He is a regular contributor to the O'Reilly ONJava Web site, where he is currently publishing a series of articles related to Jini and JavaSpaces. He has been involved in managing, architecting, and developing information systems for 17 years. Previously, Robert wrote Jini and JavaSpaces Application Development (0-672-32258-7, Sams), published in Dec 2001.

Frank Cohen is CEO of PushToTest, a test automation solutions business. He is contributing author to books on distribution system development including Java P2P Unleashed and Java Web Services Unleashed, and a popular speaker at software development conferences. For the past 20 years he lead some of the software industry's most successful products, including Norton Utilities for the Macintosh, Stacker, and SoftWindows. He serves as an active member of the Software Developers Forum, the leading computer software industry association in the Silicon Valley of California.

Source of this material

This material is from Chapter 6, P2P Dynamic Networks, from the book Java P2P Unleashed with JXTA, Web Services, XML, Jini, JavaSpaces, and J2EE (ISBN: 0-672-32399-0) written by Robert Flenner, Michael Abbott, Toufic Boubez, Frank Cohen, Navaneeth Krishnan, Alan Moffet, Rajam Ramamurti, Bilal Siddiqui, and Frank Sommers, published by Sams Publishing. .

To access the full Table of Contents for the book


Other Chapters from Sams Publishing:

Web Services and Flows (WSFL)
Overview of JXTA
Introduction to EJBs
Processing Speech with Java


Sitemap | Contact Us

Thanks for your registration, follow us on our social networks to keep up-to-date