CS680 Object Identity and equals/hashCode


The idea of object identity is crucial to object-oriented programming and design. Each object has a unique identifier that follows it around through its lifetime, even through changes to its attributes.

In memory-based objects, the identifier can be its reference value, which is simply its address in the JVM address space, a very lightweight unique id that can live as long as the enclosing program execution. There can be multiple program variables with the same reference value (aliasing.)  If x and y both point to the same object, then x==y.

Later, when we study persistent objects, we will switch to unique ids that are held in fields (key fields), but it's the same basic idea.

Equals and hashCode have default implementations in Object based on reference values. For o and x of type Object, o.equals(x) is true if o == x.  o.hashCode() is a certain hash function of this address value.

For example, if a class Customer does not implement equals and hashCode, its objects will be compared by reference value and Object's hash function applied to its reference will be used for its hashCode. We can implement Set<Customer> with a HashSet, and the resulting set will have all-different ref values, that is, all different Customer objects. If we change an attribute of one of these Customers, the set is still the same: it just contains the modified Customer object.

This simple setup is used for most app objects. Note that TreeSet<X> is not usually used because the only available ordering is by hashCode, an arbitrary ordering.

Value Objects
There are also value objects in common use. These have equals based on equality of all attributes. For example, p and q of type Point2D.Double are equal (p.equals(q) is true) if their x and y coordinates are equal. HashCode is implemented based on both x and y. A set of points, HashSet<Point2D>, has points of all different positions in 2-space.

The JDK has lots of classes with value-based equality. Integer, Double, etc., String, Date, Set (even though it's an interface), Point2D, Rectangle2D.Double, and so on. It also has lots of classes with ref-based equality: FileWriter, etc., Swing Components, Swing Events, etc. The Set equality by value means that for sets a and b, a.equals(b) is true iff a and b have the same number of elements and each element of a is contained in b and vice versa (See Javadoc for official definition.)

App classes can be given value-based equality, but you need to be careful:
1.You need to be sure equals and hashCode are consistent, that is, if x.equals(y), then x.hashCode() == y.hashCode().
2. If the classes are in an inheritance hierarchy, implementing equals properly is tricky. See Core Java v 7 pg. 171
3. If elements of a Set are value-based, and you change an element that is in a set, you can break the set. The Set is not notified of changes to an element, and has placed the element in its own data structure using the original attribute values.

Note that point 3. is not relevant to immutable objects. Integer, Double, and String are immutable, so Sets of these are completely sturdy. Point2D.Double and Date and Set are mutable, so you have to be a little careful with Sets of these.

The upshot of all this is that we usually use ref-based identity and equality for app objects, unless they are like points or numbers or enums or other JDK value-based classes. (Later, when using database persistence, we'll use key-based identity.)

Two ref-based objects (different ref values) can have the same attributes, and we say they are different objects. This happens quite often because we often model real world objects without enough information to discriminate between them. For example, we could have a Person object with firstName and lastName fields, and end up with two different people in the system with two different Person objects but the same attributes.

Also, although we usually keep only one object instance around, there are cases when we duplicate an object for some particular purpose so as to keep the original one untouched. Suppose we are doing what-if analysis. We can duplicate the original object and do the what-if processing on the copy, then throw it away.