http://www.developer.com/

Back to article

Implementing Search Result Pagination in a Web Application


August 24, 2007

Web pagination is something every web user takes for granted, but for developers a lot of consideration goes into implementing it. The web pagination mechanism will automatically improve responsiveness of the system, user experience, and may reduce clutter on the page. In this article, I will discuss different approaches and best practices to the pagination algorithms, and show what logic needs to be done for the actual link generation on the front end. For that, I will present a generic algorithm to implement page links on the result page.

Current Solutions and Technologies

Unless the returning result set is guaranteed to be very small, any web application with search capabilities must have pagination. For instance, if the result set is less then 30 rows, pagination may be optional. However, if it's bigger then 100 rows, pagination is highly recommended, and if it's bigger then 500 rows, pagination is practically required. There are a lot of different ways to implement the pagination algorithm. Depending on the method used, both performance and the user experience will be affected.

Pagination algorithms can be categorized generally into two types: database driven and application server or middleware driven. A third approach also exists, but I find it less favorable in comparison to the others. I will mention it later and explain my reasoning as to why I wouldn't recommend it. For developers, there are also two choices, either using a third-party solution or implementing their own algorithm. The actual execution of the pagination algorithm, however, depends on the technology used. If a third-party solution is employed, it may or may not hide the implementation logic from the developers. But, under the hood any algorithm would still fall in one of the first two aforementioned categories.

The end result of the pagination algorithm is almost always the same with some minor front-end differences, such as CSS, inclusion of the last/first page link if there are more page links then visible pages window (pages window is the number of visible links—for example, 10), or logic behind of the "next/prev" link.

Here are some screen shots of page links from popular pages:

Google pagination links

Ebay pagination links

DealOgre.com pagination links

There are a number of third-party solutions, both open source and commercial, that will provide APIs or tag libraries to implement paging and/or caching. One of the more popular is the Hibernate ORM solution for Java, which comes with support for most database flavors, and has internal caching. Another one is OSCache, which provides different generic high-performance J2EE caching solutions. Many modern web frameworks also come with the pagination algorithm hooks, or ready-to-use modules.

Database-Driven Pagination Algorithm

The database-driven method of implementing pagination requires structuring SQL selects in such a way as to traverse the result set and return only a portion of it to the application server (or the middle tier). This type of pagination algorithm is the most commonly used, more efficient, and produces less data redundancy. All the heavy lifting is done on the database tier and the requester of the result set only gets a portion of it. Because the execution of this approach depends on the database server used, custom solutions utilizing database-driven pagination can not be generic because different vendors implement SQL language standards differently. For example, you can use a "limit" clause with a MySql database, but there is no such thing in Oracle, or you can also use row numbers with Sybase to modify result, but it's much harder to do so efficiently with Oracle.

Here is the overview of database-driven pagination:

Database-driven pagination, without result set caching, will always take time equal to the time it takes to query for the entire set of data. It is irrelevant that only a portion of the data is returned to the consumer. For instance, if selection is complicated and involves sorting the data, returning rows 1 through 50 to the user will take the same time as returning rows 550 through 600.

The considerations for this approach are performance of the database server and whether any caching mechanism is involved. The performance of the database server is related to the size of the data tables searched, complexity of the search query, whether proper indexing is in place, and what query mechanism is used. For instance, an enterprise application may have a server farm with a stored procedure query mechanism and some in-memory caching on the database side.

Using in-memory caching is always a good idea because memory retrieval is always faster then disk I/O retrieval associated with the databases. But please, do not confuse in-memory caching of the same query searches with opening a cursor to the database from the application server, and then traversing the result set at the user clicks on different page links. This approach will result in displaying the first page in time equal to the time it takes to query for the entire set of data, but much faster subsequent page navigation. However, this logic involves keeping a connection open to the database server for every initial search session, and has no easy way of detecting search session abandonment. Because every search session result set would be stored in-memory, cursor-based pagination also would create additional stain on the database server memory resources.

Middle-Tier-Driven Pagination Algorithm

Modern frameworks and database drivers allow efficient in-memory traversal of data on the application server level or middle-tier level. I would not recommend this approach for large data sets. However, because this approach could be combined with the caching solution on the application server, it should be considered for small to medium data sets. Depending on the nature of the business requirements, this solution could be used for "rigid" searches, where users have no control of the query parameters and are forced to use pre-defined criteria, such as a shopping category.

If many users can request an identical search query (and the result is not in the millions of rows or changes very often), it makes sense to select the whole result set from the database and pre-cache it in the application server's memory. Subsequent queries would not result in expansive database hits, and will be served very quickly from the application server's memory.

Here is the overview diagram of the second approach to the algorithm for pagination:



Click here for a larger image.

Front-End Pagination Algorithm

This approach is the third way to implement pagination, but it is very inefficient and should not be used unless some specific business rule demands it. Front-end pagination is done entirely on the front-end side, in the client's browser. Typically, it uses JavaScript or some other client-side technology. The entire result set is brought to the client via a response object, and manipulated visually to create clickable pages, by hiding and showing rows (or HTML elements representing them). Depending on the size of the result set, this approach's initial request will always take time equal to the time it takes to query for the entire set of data. But, subsequent page clicks will be fast, and depend on the client's hardware. Note that if the result is large, it can take a while to download the first time around; in the worst case, it can hang the browser.

Here is the overview diagram of the third approach:

Generating Page Links

After choosing the best approach to retrieve the result set and traverse it, the application needs to create the actual page links for users to click. Below is a generic function pseudo-code, which should work with any server-side technology. This logic should reside in your application, and will work with both database-driven and middle-tier pagination algorithms.

The function takes three (3) parameters and returns HTML representing page links.

The parameters are query text, staring location row number, and total number of the result set rows. The algorithm is clever enough to generate appropriate links based on where the user is in the navigation path.

Note: I set the default of 50 rows per page, and a page window to 10. This means that only 10 (or fewer) page links will be visible to the user.
private String getSearchLinks(String query, int start, int total) {
// assuming that initial page = 1, total = 0, and start is 0

   String result = "";
   int start = 0;

   if (total == 0) { return "";    // no links }

   Int page_size = 50;    // number of rows per page
   //page window - number of visible pagers per page
   Int window = 10;


   int pages = ceil(total / page_size );
   result = "Pages:";

   int current_page = (start / page_size ) + 1;
   //numeric value of current page ex. if start is 51 : 51/50 =
   // 1 + 1 = 2

   Int left_link_count = ((current_page - 1) > (window / 2))
      ? (window / 2 - 1) : (current_page - 1);
   Int pageNo = current_page - left_link_count;

   if (pageNo > 1) {    // show first page and if there are more
                           // links on the left
      result += "1 .. ";
      result += "«";
   }

   for (int i = 0; i < window-1; i++) {
      if (pageNo > pages) {
         break;
      }
      else if (pageNo == current_page) {
         result += "" + pageNo + "";
      }
      else {
         result += "pageNo";
      }
      pageNo++;
   }    // end for

   if ((pageNo - 1) < pages) {
      result += "»";
   }

   result += "
Showing"+((start > total)?total+1:start+1)+ " - "+(((start + 50)>total)?total:(start + 50))+" of Total:"+total; return result; }

This logic does not care how the viewable portion of the result set is generated. Whether it is on the database side or on the application server's side, all this algorithm needs is a "total" number of results (that can be pre-cached after the first retrieval), and the indicator ("start") containing which row number was the first on the last page user was looking at. This algorithm also shows the first page link if the user is on a page beyond the initial page window (for example, page 20), and correctly accounts for the result sets with a number of rows not enough for 10 pages. (for example, only 5 pages)

The main "for loop" generates the links and correctly computes the "start" parameter for the next page. The Query string and Total are always the same values.

Conclusion

In this article, I have discussed different methodologies in implementing pagination links in a web application. I have covered the database-driven approach, the middle-tier approach, and some third-party solutions. I have also detailed the actual algorithm for the page links generation. Ultimately, which solution you chose will depend on the size of the dataset, technology at hand, size of the site, and the budget for the project.

References

About the Author

Vlad Kofman is working on enterprise-scale projects for the major Wall Street firms. He has also worked on defense contracts for the U.S. government. His main interests are object-oriented programming methodologies, UI, and design patterns.

Sitemap | Contact Us

Thanks for your registration, follow us on our social networks to keep up-to-date