Overview of JXTA, Page 2
Messaging in JXTA is done in two different ways. First is the standard way that would be expected with XML. The messages are packets that contain a payload of data formatted to follow XML standards.
The second type of message is a very economical binary message. Despite the desire to use XML for all JXTA messages, the reality is that there are many messages sent and received. The bulk of XML messages send in large volumes is very inefficient. Also, because messages are usually sent from application to application, it is simple to standardize on the contents of the message. The remainders of the protocols are still XML.
The use of binary messages in an XML protocol may seem counterintuitive. The truth is that there are more advantages than just compactness. The first is that data can be compressed using standard techniques. Compression of data, such as text, can create a huge savings in time to transmit.
Another reason for binary data is that many messages are already binary. For example, a document-sharing program will most likely share binary documents. If you were transferring messages via XML, the data would need to be converted to XML.
Another reason for binary messages is encryption. Because the data needs to be converted to binary for encryption, moving straight to and from binary instead of XML makes sense.
JXTA has a wide selection of different identifiers. Identifiers range from large, unique identifiers to names and URLs. Identifiers are used like pointers of references. In the reference platform, identifiers are used for indexing, filenames, and searching.
A rendezvous is a peer that processes queries from other peers. The rendezvous can also delegate queries to other peers, which must also be a rendezvous. A key purpose of rendezvous is to facilitate searching of advertisements beyond a peer's local network. Rendezvous usually have more resources than other peers and can store large amounts of information about the peers around it. In a peer network, information is scattered among peers and not stored entirely on any single machine, such as a server. Instead, there are rendezvous that distribute the storage of the advertisements.
Rendezvous peers can also act as relays of searches. The rendezvous peer can forward discovery requests to other rendezvous peers that receive their information from peers with whom they have exchanged advertisements. Each rendezvous will forward on a request if it does not have the information requested.
A typical search is illustrated in Figure 2.1. The remote search starts from Peer 1 which firsts queries local Peers 2 and 3 via IP Multicast. These Peers (2 and 3) are most likely on the local LAN and are quickly accessed. Next, if these Peers do not have the specified resource, a rendezvous peer is searched. If the rendezvous peer does not have the advertisement, successive rendezvous peers are searched. Note that besides the peers local to the query peer, only rendezvous peers are searched.
Figure 2.1: Rendezvous query routing.
IP Multicast is a one-to-many messaging protocol. IP Multicast is used to send one copy of data to a group address, reaching all recipients who are configured to receive it.
IP Multicast has two benefits over P2P applications. First, because multicast uses a group address instead of IP addresses, a peer sending a message can do so without knowledge of the listening peer's address. The result is that all peers within the multicast network can now respond to the caller with information to the query and even their IP address for direct communication.
IP Multicast's second benefit is the reduction of bandwidth. Because all peers can see a single message, there is no need to send a copy of the message to each peer. This is very important when sending large amounts of data to a group of peers.
A drawback to using multicasting is that some firewalls and routers block multicast messages. There is some support for sending multicast messages via Internet backbones between Internet providers, but it is often a service for which you must pay extra. There are other barriers to IP-Multicast. These can include personal firewalls and subnet routers. This is why JXTA supports more than just IP-Multicast.
In general, the multicast support available behind a firewall is sufficient for most P2P needs. You can take advantage of localized multicast support by sending duplicate messages for each network to a specific peer within the network for rebroadcast via multicasting to the local peers.
Only a rendezvous allows searching beyond a local network. A peer has the option of being a rendezvous, but it is not required. There is a side benefit of being a rendezvousthe peer will retain a cached copy of the results from other rendezvous of the result of cached answers to requests.
On the negative side of being a rendezvous, the peer will use more memory and higher bandwidth. Because of the possibly high number of requests and the resources consumed by a large database of advertisements, the case can be made for a dedicated rendezvous peer. In corporate installations, the rendezvous could also perform the duties of gateway and router for the corporate intranet. The effect would be similar to the use of a traditional router. Additional scaling could use a rendezvous in each of the corporation's subnets.
The need for dedicated rendezvous peers depends on security and the scale of P2P applications used. The P2P network topology should be examined on a case-by-case basis and monitored regularly.
It is important to mention here that a P2P network becomes more efficient as services are duplicated among a large number of peers. However, there may be a point where additional rendezvous do not add to efficiency.
Rendezvous are used when a peer is searching for an advertisement or when other services use the rendezvous mechanism to route messages, so the need by a peer is not constant. A rendezvous connected to the Internet will be exposed to possibly thousands of peers. Within the bounds of a firewall-isolated network, having all peers configured as a rendezvous will probably not have a large affect caused by rendezvous tasks. Note that this observation may prove incorrect if there are many requests that have large search scopes.
Rendezvous are also used for application specific queries. In Chapter 9, "Synchronizing Data Between Peers," peers use peers acting as rendezvous to propagate information about new appointments in a calendar as well as synchronizing an address book.
When considering any network topology or deciding if your peers should be a gateway, router, or rendezvous, be sure to check the current implementation for its efficiency and capabilities. Remember also that some applications may specifically use these services in unique ways. The JXTA platform will continue to evolve and should eventually cover most topologies or be configurable for many situations. But each environment is unique, so experimentation and monitoring of actual use may be the best way to configure your peersespecially in a large corporate environment.
A router in JXTA is any peer that supports the peer endpoint protocol. Not all peers need to implement the protocol because, like traditional network routers, you only need a few to support a large network. JXTA routers are very similar to a traditional router. The primary difference is that a P2P network is less stable and includes many addresses that are not static.
Figure 2.2 is an example of how a route is created. The request for a route starts at Peer 1 with the request for a route passed to available routers until the complete path to Peer 8 is built. Note that because a peer can be a gateway as well as a router, Peer 2 includes itself as a node in the route. Please understand that this is just conceptually how routers work, not how they are actually implemented. Routers can support caching or complex algorithms. For example, instead of the first router forwarding a request for the rest of the route, this router may already have a cached version of the route available.
Figure 2.2: Conceptual example of peer endpoint routers creating a route between Peer 1 and Peer 8.
A gateway is a peer that acts as a communications relay. Don't confuse gateways with rendezvous. A gateway is used to relay messages between peers, not requests.
Gateways are like radio repeaters or a middleman between peers used to relay messages. Gateways are critical to connectivity because of firewalls, NAT devices, and network proxies. Gateways can store messages and wait for their intended recipient to collect the messages.
Gateways exist because the Internet is very messy. The mess is caused by the fact that we have all sorts of security and barriers that prevent a common way to communicate between peers. Another bit of this mess is the difference between protocols supported by peers. Some peers may use TCP, others may use HTTP. With wireless, we would need Wireless Application Protocol (WAP) as well. The gateway supports as many of these protocols as is possible so that it can act as a middleman between different types of protocols. JXTA started with support for TCP and HTTP, but other gateways are in development.
Gateways are key to getting around most of the security on the Internet. Firewalls, proxy servers, and NAT devices are the common security barriers. Figure 2.3 shows how the gateway Peer 2 is used to interface between Peer 1 and Peer 3. The gateway translates HTTP messages from peer 1 to TCP for delivery to Peer 3. When messages are sent from Peer 3, they are sent via TCP to Peer 2, which holds the message until Peer 1 makes an HTTP request to retrieve the data.
Figure 2.3: Example of gateway participation in a single pipe.
JXTA has started using the term relay to merge the terms rendezvous, router and gateway. Relay will also be used to include concepts like proxy, transcoding, and related "JXTA network helpers," rather than proliferating a bunch of overlapping terms. It is also expected that there will be specialized peers or even commercial appliances that have just these functions, so a single name is more marketable. We will use the terms interchangeably for now, depending on the focus of discussion.
Why We Need Relays (Routers and Gateways)?
Although we have touched on the subjects, we need to specifically cover why relays are required for a P2P network. The following sections discuss each of the barriers to a P2P network. Each of these creates a need for us to abstract the network and create a virtual network where the P2P system provides routing and messaging via HTTP tunneling or to switch transport protocols.
Firewalls, which filter almost everything except HTTP, are most often found at larger companies, but they are also now found in homes that use special firewall routers. There are also personal firewalls, so called because they run on the user's personal computer. Firewalls are often configured to filter almost everything except HTTP. HTTP only allows communications that are initiated by the client.
For example, when you are requesting a Web page, connecting to a Web server, sending the request, receiving the requested page, and then disconnecting. At no time does the Web server initiate a connection to the Web browser.
Because there is only one direction of communication that can be initiated, the gateway acts as a virtual agent that accepts messages for later delivery. Therefore, if a peer attempts to talk to a peer that can only initiate HTTP communications, the gateway holds the message until the HTTP peer contacts the gateway and asks for messages addressed to it.
NAT (Network Address Translator)
A Network Address Translator (NAT) device is just as disruptive as a firewall. Most wideband routers for cable and DSL use NAT.
NAT lets you use a single IP address for a whole network of computers. Because Internet providers charge by the unique IP address, many use a NAT to save money. The NAT sits between the public Internet and the a local area network (LAN) where it rewrites IP addresses and port numbers in IP headers on-the-fly so that the packets all appear to be with the public IP address of the NAT device instead of the actual source or destination. This causes multiple problems with applications that pass addresses and ports back and forth across the NAT. The NAT simply cannot detect and correct the message to reflect the mapped address. Essentially, this means that if you are behind a NAT, you will have trouble telling anyone what your address really is.
Many NATs, for security reasons, only allow incoming traffic from an outside address only if an outgoing packet has already been sent to that outside address. This is like a poor man's firewall, because it prevents anyone from directly connecting to your computer without you first initiating communications. This is like a phone that cannot receive calls but will still let you call anyone.
A socket connection may be assigned to the same external address/port on subsequent connections. This means that you cannot be sure of a return path for messages. This makes it very difficult to create a two-way conversation.
The gateway gets around the NAT the same way it gets around a firewall. By using the HTTP protocol, the gateway on the other side of the NAT ensures that peers can communicate with the peer behind the NAT.
A proxy server is a device that sits between the Internet and a LAN. The proxy servers provide services like filtering, caching, and monitoring of traffic.
The result of having a network proxy is similar to a NAT. The proxy device can limit addresses as well as map them to others, such as NAT. Proxy devices can be as sophisticated as a firewall and limit certain types of communication. For example, a proxy service can prevent you from accessing a forbidden Web site. Some proxy servers can even detect viruses embedded in incoming e-mail before they ever reach your e-mail's in-box.
The gateway can usually get around a proxy server by bridging the gap with HTTP. Some very sophisticated systems can be programmed to detect and prevent such traffic. Some companies only allow pure HTML to pass and discard all other types of data. In some cases, you may not be able to use JXTA applications without specific configuration and permission of the network administrator.
Many companies, and especially Internet service providers utilize Dynamic Host Configuration Protocol (DHCP). DHCP allocates IP addresses dynamically. The effect is that each time the DHCP server re-boots or a user's computer re-boots, the IP address is changed. The address can also change if the IP address lease expires. The effect is that even a known address is a temporary address. Because of DHCP, even when you are behind a firewall, peers may still not have addresses on which you can depend. This makes it difficult when communicating across the firewall to peers on the other side.
The possibility of changing addresses is greatly improved by router peers. Router peers are able to create new routes between peers when addresses change.