August 31, 2014
Hot Topics:
RSS RSS feed Download our iPhone app

Use Spring Batch's 'Chunk' Processing for Large Data Sets

  • January 15, 2010
  • By Cesar Otero
  • Send Email »
  • More Articles »

The Application Context

Listing 1 is the full application context needed to run this batch job. In the code, first we declare XML namespaces; any Spring bean falls under the namespace 'beans'. The first bean we configure is the data source, which is an instance of org.apache.commons.dbcp.BasicDataSource. We set the properties for driver, the connection URL, user name, and password, and then we create a bean for the transaction manager that will handle the transactions for the dataSource bean.

We use dependency injection for the jdbcEmployeeDao, as well as for the class com.theCompany.jdbcDao.jdbcEmployeeDao, which implements the interface employeeDao shown in code listing below. The employeeDao interface requires only one method, generateTestData(). Of course, if you want to add any other data access methods, feel free to add them here.

The Interface employeeDao

package com.theCompany.dao;

import java.util.List;

public interface EmployeeDao {
    void generateTestData();
}


The jdbcEmployeeDao definition (shown in listing below) extends JdbcDaoSupport, which gives us a free setDataSource() method, and executes an update with the Spring Batch writer object. Later, when you try running the sample, notice how the number of records printed per second slows down as execution continues. When you run the example with Batch processing, you'll notice that there's no slow down.

The jdbcEmployeeDao Definition

package com.theCompany.jdbcDao;

import com.theCompany.dao.EmployeeDao;
import org.springframework.jdbc.core.support.JdbcDaoSupport;

// by inheriting from JdbcDaoSupport we get a free setDataSource() method
public class JdbcEmployeeDao extends JdbcDaoSupport implements EmployeeDao {
    public void generateTestData(){
        for(long i = 1001; i<=100000; i++){
            getJdbcTemplate().execute("INSERT INTO EMPLOYEE VALUES("+i+",0,'blah name')");
            System.out.println("record " + i + " inserted");
        }
    }
}


Now, create a main method and execute with the code below in order to generate the test data.

package com.theCompany.utils;

import org.springframework.context.ApplicationContext;
import org.springframework.context.support.ClassPathXmlApplicationContext;
import com.theCompany.jdbcDao.jdbcEmployeeDao;

public class GenerateData {
    public static void main(String[] a) throws Exception {
        ApplicationContext context = new ClassPathXmlApplicationContext("resources/application-context.xml");
        jdbcEmployeeDao dao = (jdbcEmployeeDao)context.getBean("employeeDao");

        dao.generateTestData(); 
    }
}


The Batch Job

Now that we have our basic data connection stuff set up and we have some test data to play with, we can proceed with configuring our batch job. We'll create a simple job, which uses the default job repository and contains only a single step.

From Listing 1, you can see that the tasklet does nothing more than run a chunk of code using the item reader (an instance of org.springframework.batch.item.database.JdbcCursorItemReader) and the item writer. The item writer requires an itemPreparedStatementSetter and a row mapper. The row mapper is shown in the listing below. The item writer will update the data using the query UPDATE EMPLOYEE SET DEPARTMENT_ID = SUBSTRING(ID,1,2), or grab the first two characters from ID and insert it into the DEPARTMENT_ID column.

The Row Mapper
package com.theCompany.jdbcDao;

import java.sql.ResultSet;
import java.sql.SQLException;

import org.springframework.jdbc.core.RowMapper;
import com.theCompany.beans.Employee;


public class EmployeeRowMapper implements RowMapper {

    public static final String ID_COLUMN = "id";
    public static final String NAME_COLUMN = "name";

    public Object mapRow(ResultSet resultSet, int i) throws SQLException {
        Employee employee = new Employee();
        
        employee.setId(resultSet.getInt(ID_COLUMN));
        employee.setName(resultSet.getString(NAME_COLUMN));

        return employee;
    }
}


Add the class in Listing 2 to package com.theCompany, which will be the main for our program. Different from before when we added our test data, when you run this main with the writer.write(emp) commented out, notice how the data is written to the standard output without slowing down. This is one of the big advantages of using batch processing.

Now, uncomment write.write(emp) and run again. The execution speed slows down significantly. So, what's the advantage? Under an enormous load, the database will slow down. Whereas here, the execution speed will stay constant, independent of the load.

Chunk Processing Is More Efficient

From the example presented here, you can see how it's more efficient to process chunks of code as opposed to trying to run everything from memory. If the data is too large, it's impossible to take that route anyways. With a little extra configuration, you can save a lot of processing time. With that in mind, I've really only scratched the surface of what can be done with Spring Batch.

Code Download

  • SpringBatchEmployeeExample.zip

    References


  • Tags: Java, Spring, data



    Page 2 of 2



    Comment and Contribute

     


    (Maximum characters: 1200). You have characters left.

     

     


    Sitemap | Contact Us

    Rocket Fuel