March 1, 2021
Hot Topics:

Data Structures in Java: Part 1, Getting Started

  • By Richard G. Baldwin
  • Send Email »
  • More Articles »

Java Programming, Lecture Notes #1350


This is the first lesson in a new miniseries on Java data structures and the Java Collections Framework.

The purpose of this miniseries is to help you learn the essential features of Object-Oriented data structures in Java using the Collections Framework.

Viewing tip

You may find it useful to open another copy of this lesson in a separate browser window.  That will make it easier for you to scroll back and forth among the different listings while you are reading about them.

Recommended supplementary reading

I recommend that you also study the other lessons in my extensive collection of online Java tutorials.  You will find those lessons published at Gamelan.com.  However, as of the date of this writing, Gamelan doesn't maintain a consolidated index of my Java tutorial lessons, and sometimes my lessons are difficult to locate there.  You will find a consolidated index at Baldwin's Java Programming Tutorials.

The index on my site provides links to the lessons at Gamelan.com.


This lesson provides a brief introduction to the use of the Java Collections Framework.  The framework is designed to encourage you to reuse rather than to reinvent collections and maps.

A collection represents a group of objects, known as its elements.  Some collections allow duplicate elements while others do not. Some collections are ordered and others are not.  (Maps will be discussed in subsequent lessons.)

The Collections Framework is defined by a set of interfaces and associated contracts, and provides concrete implementations of the interfaces for the most common data structures.  In addition, the framework also provides several abstract implementations, which are designed to make it easier for you to create new and different implementations while still maintaining the structural polymorphic integrity of the framework.


A little quiz

Let's begin with a little quiz to establish your baseline knowledge of the Collections Framework.  Take a look at the program in Listing 6 near the end of this lesson.  Which of the following is the output produced by that program?

  • A.  Compiler Error
  • B.  Runtime Error
  • C.  44321
  • D. 12344
  • E. 1234
  • F.  None of the above.
If your answer was 1234 then you already know quite a lot about the use of the Collections Framework.  If not, keep reading to begin learning about the framework.

Elements of the Framework are easy to use

This simple introductory program is not intended to do anything particularly useful.  Rather, it was designed to illustrate several important features of the framework, including the ease with which elements of the framework can be reused in your programs.

Don't reinvent the wheel

As many of you already know, I am a college professor.  I specialize in teaching OOP using Java.  In the past, many college courses in Data Structures (often referred to as CS2 courses) have emphasized the concept of reinventing the wheel.  Students were required to learn how to reinvent a variety of complex data structures in order to successfully complete the course.

Hopefully a change is in the works

Hopefully, with the conversion of these CS2 courses to Java OOP, the emphasis will change to reuse instead of reinvent.

Unfortunately, most of the CS2 Java textbooks that I have seen so far look like warmed-over textbooks from the days of Pascal, C, and C++.  Many look like they were written using cut-and-paste technology to update an old Pascal, C, or C++ textbook to incorporate a little Java OOP.  Most of the textbooks that I have seen include only enough Java OOP to avoid violating truth in advertising requirements.  They still emphasize reinvent instead of reuse.

Collections Framework encourages reuse

The Java Collections Framework is designed to encourage programmers to reuse existing interfaces and classes instead of inventing new ones.  In the event that it is necessary to invent a new class or interface, the programmer is encouraged to integrate it into the Framework in a polymorphic manner.

Sample Program

I am going to provide a brief discussion of the sample program (shown in Listing 6) in this lesson.  Later, I will provide more detailed discussions of many of the features used in that program.

Interesting Code Fragments

I will break this program down and discuss it in fragments.

An object of the TreeSet class

The code fragment in Listing 1 instantiates an object of the TreeSet class and stores the object's reference in a reference variable of type Collection.

class Worker{
  public void doIt(){
    Collection ref = new TreeSet();

Listing 1

Collection is an interface

The TreeSet class implements the SortedSet interface, which extends the Set interface, which in turn extends the Collection interface.  Thus, a TreeSet object is a Collection.  Therefore, a reference to a TreeSet object can be stored in a reference variable of type Collection, and can be treated as the generic type Collection.

What is a TreeSet object?

Among other things, in CS2 courses, we worry about the time and memory cost of a collection.  According to Sun, the TreeSet class guarantees that the sorted set will be in ascending element order, and provides guaranteed log(n) time cost for the basic operations (add, remove and contains).

What does ascending element order mean?

Again, according to Sun, the elements will be sorted according to the natural order of the elements (see the Comparable interface) or by a comparator (see the Comparator interface) provided at the time the set is created.  This depends on which overloaded constructor is used.  I will have more to say about these alternatives in a subsequent lesson.

What does log(n) time cost mean?

I'm not going to try to explain the details of log(n) time cost here.  Suffice it to say that the add, remove, and contains methods execute very fast.  (I will have more to say about this is a subsequent lesson.)

A TreeSet object is a Set

An object of the TreeSet class also is a Set.  One of the characteristics of a Java Set (an object that implements the Set interface) is that it can contain no duplicate elements.  Therefore, a TreeSet object can contain no duplicate elements.  If the add() method of a TreeSet object is invoked in an attempt to add a duplicate element, the element will not be added.

A TreeSet object is a SortedSet

The TreeSet class also implements the SortedSet interface.  This guarantees that the contents of a TreeSet object will be in ascending element order, regardless of the order in which the elements are added.  (In a subsequent lesson, I will discuss how comparisons are made to enforce the ordering of the elements.)

A TreeSet object is a Collection

Because an object of the TreeSet class is a Collection, a reference to such an object can be passed to any method that requires an incoming parameter of type Collection.  The receiving method can invoke any method on that reference that is declared in the Collection interface. (I will discuss such methods in detail in subsequent lessons.)

Populate the Collection

The statement in Listing 2 below passes the TreeSet object's reference to a method named fillIt(), which is a class method of the Populator class.  (The Populator class is a class of my own design whose only purpose is to illustrate the polymorphic behavior achieved using the Collections Framework.)  The behavior of this method is to add elements to the incoming Collection object without regard for the actual type of the object (the class from which the object was instantiated).


Listing 2

At this point, I am going to discuss the fillIt() method of the Populator class invoked in the above fragment.  The entire class definition of the Populator class is shown in Listing 3 below.

class Populator{
  public static void fillIt(
                       Collection ref){
    ref.add(new Integer(4));
    ref.add(new Integer(4));
    ref.add(new Integer(3));
    ref.add(new Integer(2));
    ref.add(new Integer(1));
  }//end fillIt()
}//end class populator

Listing 3

Don't know, don't care

As you can see in the above fragment, the fillIt() method receives the reference to the TreeSet object as type Collection.  This method doesn't know, and doesn't care, what the actual type of the object is.  All it cares about is that the object is a Collection object.  (Otherwise, the object's reference couldn't be passed in as a parameter.  A type mismatch would occur.)

Because the incoming parameter is a reference to a Collection object, the fillIt() method can invoke the add() method on the object, with confidence that the behavior of the add() method will be appropriate for the specific type of object involved.  (For example, the behavior of the add() method for an object of the TreeSet class will probably be different from the behavior of the add() method for an object of some other class that implements the Collection interface.)

Polymorphism is great!

The great thing about polymorphic behavior is that the author of the fillIt() method doesn't need to be concerned about the implementation details of the add() method.

Add five elements with some duplicates

The code in the fillIt() method adds five elements to the object.  Each element is a reference to a new object of type Integer.  Two of the objects encapsulate the int value 4, and thus are duplicates.

The int values encapsulated in the Integer objects are not in ascending order.  Rather, they are added to the object in descending order.  (They could be added in any order and the end result would be the same.)

Filter out the duplicates

Unknown to the author of the fillIt() method, the add() method filters out the duplicate element in order to satisfy the contract of the Collection interface.  In this case, the author didn't care what happens in the case of duplicate elements.

Notification of duplicates

If the author of the fillIt() method does care what happens in the case of duplicates, she can find out when an object is a duplicate.

According to the contract of the Collection interface, the add() method must return true if the invocation of the method modifies the contents of the object and must return false if the collection does not permit duplicates and the collection already contains the specified element.

Sort the elements

Also unknown to the author of the fillIt() method, even though the elements are added in descending order (or could be added in any other order for that matter), they are stored and maintained in the TreeSet object in such a way that they can later be accessed in ascending order.

The TreeSet object is now populated

When the fillIt() method returns, the TreeSet object contains four (not five) elements with no duplicates.  Each element is a reference to an object of type Integer.  Those references are maintained in such a way as to make them accessible in ascending order, based on the int values encapsulated in each of the Integer objects.

Get an Iterator object

Returning now to the flow in the doIt() method, the following statement invokes the iterator() method on the TreeSet object's reference that is stored in the reference variable of type Collection.  This is shown in Listing 4 below.

    Iterator iter = ref.iterator();

Listing 4

Invocation of the iterator() method on any Collection object returns an instance of a class that implements the Iterator interface.  The Iterator object can be used to traverse the collection, gaining access to each element in order.  (The concept of in order means different things for different kinds of collections.  For a collection instantiated from the TreeSet class, in order means in ascending order.)

Again, don't know, don't care

Again, the author of the method that uses the Collection object doesn't need to know or care about the internal implementation of the collection, or the implementation of the methods of the Iterator object.  They simply do what they do, and can be used for their intended purpose.

An Iterator object acts as a doorkeeper

The Iterator interface declares three methods:

  • hasNext()
  • next()
  • remove()
You might say that an Iterator object acts as a doorkeeper for the collection object that it represents, providing access to the contents of the collection in a very specific manner.

The code fragment in Listing 5 below shows how the first two of the above methods can be used to

  • Traverse the collection, accessing each of the object's elements in succession
  • Display the value encapsulated in the object referred to by each element
As mentioned earlier, when the collection is an object instantiated from the TreeSet class, access to the elements is provided in ascending order.
    }//end while loop

Listing 5

Four elements with no duplicates

At this point, the TreeSet object contains four elements, with no duplicates.  Each of the elements is a reference to an object of type Integer.  The code in the above loop causes each of those elements to be accessed and displayed in ascending order.  This causes the following text to appear on the screen:


An editorial opinion

In my opinion, this is the kind of knowledge that a computer science student in a modern data structures course should be learning.  This is a far departure from courses of the past where CS2 students were required to memorize the intricate details of how to implement various data structures.

What kind of knowledge is needed?

Does an architect need to understand the detailed inner workings of an air conditioning compressor in order to design a cooling system into a building?  Of course not!

However, the architect does need to know the tradeoffs among the available cooling systems in terms of initial cost, operating cost, size, efficiency, etc.

Does an audio technician need to understand the detailed inner workings of an electronic audio equalizer in order to construct an integrated audio system?  Absolutely not!  If that were a requirement, there would likely be very few audio systems in existence.

However, the audio technician does need to understand the tradeoffs among the various available audio equalizers.

The same concept applies to software design

Does an OOP software designer need to know the detailed inner workings of the various kinds of collection objects in order to use them effectively?  No!

However, the software designer does need to know the tradeoffs among the various types of collection objects in terms of their operational behavior.

Modern CS2 students should be learning about the performance and operational differences among the different types of collections, and how to use available frameworks to create and use those collections.  They should not be wasting their time learning how to reinvent them.  They have more important ways to spend their valuable time, and they have more important things to learn.

An analogy

Frankly, I don't care how the programmers at Sun implemented the TreeSet class, so long as the behavior of objects instantiated from that class meets the published specifications.

As an analogy, I also don't care how they implemented the Random class, so long as objects instantiated from the Random class provide the pseudo random values that I need in my programs.

I see no conceptual differences between the TreeSet class and the Random class from a software reuse viewpoint.

  • I can instantiate an object of the Random class to produce pseudo random values, without caring how those values are actually generated.  However, if I am working in cryptography, I might need to know how many such values can be generated before the sequence repeats.
  • I can use any of the thirty or so methods of the Math class to produce a variety of complex mathematical values without caring about how those values are actually produced.  However, since many of those values are approximations, I might need to know something about the quality of the approximation.
  • I can instantiate an object of the TreeSet class to create a collection object, which guarantees that the sorted set will be in ascending element order, and provide log(n) time cost for the basic operations of add, remove, and contains.  As long as I know that, I have very little need to know exactly how the collection object is implemented.

Its time to reinvent the CS2 curriculum

I'm confident that the future employers of most students share my opinion on this.  I don't know of any employer who wants their programmers to expend time and dollars reinventing the classical data structures.  What those employers are looking for is a staff of programmers who understand the tradeoffs among the data structures, and when it is appropriate to use each of the different structures.

It is time to reinvent the curriculum in CS2 courses by

  • Encouraging the understanding of techniques for software reuse
  • Teaching when, why, and how each of the different structures should be used
  • Discouraging the reinvention of those structures

  • Page 1 of 2

    This article was originally published on May 10, 2001

Enterprise Development Update

Don't miss an article. Subscribe to our newsletter below.

Thanks for your registration, follow us on our social networks to keep up-to-date