Implementing Search Result Pagination in a Web Application
Web pagination is something every web user takes for granted, but for developers a lot of consideration goes into implementing it. The web pagination mechanism will automatically improve responsiveness of the system, user experience, and may reduce clutter on the page. In this article, I will discuss different approaches and best practices to the pagination algorithms, and show what logic needs to be done for the actual link generation on the front end. For that, I will present a generic algorithm to implement page links on the result page.
Current Solutions and Technologies
Unless the returning result set is guaranteed to be very small, any web application with search capabilities must have pagination. For instance, if the result set is less then 30 rows, pagination may be optional. However, if it's bigger then 100 rows, pagination is highly recommended, and if it's bigger then 500 rows, pagination is practically required. There are a lot of different ways to implement the pagination algorithm. Depending on the method used, both performance and the user experience will be affected.
Pagination algorithms can be categorized generally into two types: database driven and application server or middleware driven. A third approach also exists, but I find it less favorable in comparison to the others. I will mention it later and explain my reasoning as to why I wouldn't recommend it. For developers, there are also two choices, either using a third-party solution or implementing their own algorithm. The actual execution of the pagination algorithm, however, depends on the technology used. If a third-party solution is employed, it may or may not hide the implementation logic from the developers. But, under the hood any algorithm would still fall in one of the first two aforementioned categories.
The end result of the pagination algorithm is almost always the same with some minor front-end differences, such as CSS, inclusion of the last/first page link if there are more page links then visible pages window (pages window is the number of visible links—for example, 10), or logic behind of the "next/prev" link.
Here are some screen shots of page links from popular pages:
Google pagination links
Ebay pagination links
DealOgre.com pagination links
There are a number of third-party solutions, both open source and commercial, that will provide APIs or tag libraries to implement paging and/or caching. One of the more popular is the Hibernate ORM solution for Java, which comes with support for most database flavors, and has internal caching. Another one is OSCache, which provides different generic high-performance J2EE caching solutions. Many modern web frameworks also come with the pagination algorithm hooks, or ready-to-use modules.
Database-Driven Pagination Algorithm
The database-driven method of implementing pagination requires structuring SQL selects in such a way as to traverse the result set and return only a portion of it to the application server (or the middle tier). This type of pagination algorithm is the most commonly used, more efficient, and produces less data redundancy. All the heavy lifting is done on the database tier and the requester of the result set only gets a portion of it. Because the execution of this approach depends on the database server used, custom solutions utilizing database-driven pagination can not be generic because different vendors implement SQL language standards differently. For example, you can use a "limit" clause with a MySql database, but there is no such thing in Oracle, or you can also use row numbers with Sybase to modify result, but it's much harder to do so efficiently with Oracle.
Here is the overview of database-driven pagination:
Database-driven pagination, without result set caching, will always take time equal to the time it takes to query for the entire set of data. It is irrelevant that only a portion of the data is returned to the consumer. For instance, if selection is complicated and involves sorting the data, returning rows 1 through 50 to the user will take the same time as returning rows 550 through 600.
The considerations for this approach are performance of the database server and whether any caching mechanism is involved. The performance of the database server is related to the size of the data tables searched, complexity of the search query, whether proper indexing is in place, and what query mechanism is used. For instance, an enterprise application may have a server farm with a stored procedure query mechanism and some in-memory caching on the database side.
Using in-memory caching is always a good idea because memory retrieval is always faster then disk I/O retrieval associated with the databases. But please, do not confuse in-memory caching of the same query searches with opening a cursor to the database from the application server, and then traversing the result set at the user clicks on different page links. This approach will result in displaying the first page in time equal to the time it takes to query for the entire set of data, but much faster subsequent page navigation. However, this logic involves keeping a connection open to the database server for every initial search session, and has no easy way of detecting search session abandonment. Because every search session result set would be stored in-memory, cursor-based pagination also would create additional stain on the database server memory resources.