Serialization, cyclic references (via hashmaps) and overriding hashCode()

I'll try to simplify it.
There's an object model. In it there are cyclic references (one object references a second one, the second one - a third one, the third one - the first one).
Some of the cyclic references are through aggregations - one object has a map of other objects.
Some of the objects have a meaningful hashCode() and equals() overridden. These two depend on some properties in the object itself.

Some of the objects get serizalized/deserialized (travel through a stream).

Now here comes the problem - the deserialization first sees the cyclic reference, makes instances of all the objects, initializes all the primitive fileds, does not initialize the other fields, then links the objects.

Here comes the problem, linking two objects (one of which has a map of the other) requires hashCode(). This requires some specific properties in that object that are not initialized - this causes NullPointerException (or in my case an AssertionError).

If the hashCode returns a default value if the properties are not there - another serious problem si caused - there are objects in the map in the wrong buckets - they entered the map with the default hash, but when they got completely initialized - they now have a different hash. I think that is really bad - the map has to be rehashed.

Here's a bug detail:
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4957674

Here's what some of the guys say on the subject:

The problem is that HashMap's readObject() implementation , in order to re-hash the map, invokes the hashCode() method of some of its keys, regardless of whether those keys have been fully deserialized.

And:

The fix for this is actually quite easy: Modify the readObject() and writeObject() of HashMap so that it also saves the original hash code. (I am currently using this fix in production code for a large web site.) That way, when the map is reconstructed, you don't have to recompute the hashcode----the problem is caused by recomputing the hashcode at a moment when it is not computable.

What you *give up* with this fix is that HashMaps containing Objects that don't override hashCode() and equals() will not be deserialized properly.

So basically, you have a choice: either it will be robust for classes that implement hashCode(), or it will work for bare Objects(). One or the other. I prefer the former, because people are supposed to implement hashCode().

But, not all my object have a rewritten equals (of course I can check with reflection which ones do and which ones don't, but...). This would also mean that I'm using a customized collection.

There's another proposition - to hash the hashcode.

The fix for this is actually quite easy:  Modify the readObject() and writeObject() of HashMap so that it also saves the original hash code.  (I am currently using this fix in production code for a large web site.)

The hashcode is a primitive type, so it would get initialized first and the problem would be solved. This would mean to have an hashCode() and equals() which check which one is available - the cached hash or the properties - isn't that UGLY.

I'll investigate more.

One thought on “Serialization, cyclic references (via hashmaps) and overriding hashCode()”

  1. Hi Mihail.

    I found your nice blog article when looking for solutions to a Java deserialization problem:

    http://mihail.stoynov.com/blog/2008/07/17/SerializationCyclicReferencesViaHashmapsAndOverridingHashCode.aspx

    I have a good solution to this problem, for anyone who is able to wrap the original HashMap in a wrapper/delegating Map. I tried to post my solution on your blog for the general edification of humanity (), but no dice: the blog software crapped out on me. If you're feeling especially friendly towards the world, perhaps you will personally add my information to the blog and save someone else a bit of time coming up with the same solution.

    Here's what I tried to post:

    I have a different fix, but it will only work in a situation where you can change all code of the following form:

    new HashMap();

    ...to instead use a wrapper:

    new SimplySerializedMap( new HashMap());

    The SimplySerializedMap class stores the underlying map as "private transient Map realMap" and delegates all Map methods to that object. SimplySerializedMap defines its readObject and writeObject methods as follows: the serialized form of SimplySerializedMap is an array of Pair, where Pair is exactly what you think it should be. During writeObject (serialization) this array is computed trivially from calling realMap.keySet().toArray(). During readObject (deserialization) the Pair array is read out and stored into a transient field called tempPairs. The other thing that readObject does is register a callback with the ObjectInputStream.registerValidation method, which is only fired when Java has deserialized the entire object graph, including all of the map's keys-- so it's finally safe to call hashCode on all of those keys. Inside the registerValidation callback, that's when you initialize the realMap field by loading it from the tempPairs field (and then nulling out the latter field). One minor issue is the fact that you'll need an empty clone of the original Map; I solve this by creating such an empty clone in the SimplySerializedMap constructor. If the wrapped map is a known Map class such as HashMap, I cast to that class and call clone(). Otherwise I clone via either a reflective call to the public clone() method of the wrapped map, or in a pinch I clone the wrapped map via serialization/deserialization.

    This solution has worked quite well for me.

    Thanks.
    -Dave

    (comment posted by admin)

Leave a Reply

Your email address will not be published. Required fields are marked *

Notify me of followup comments via e-mail. You can also subscribe without commenting.

This site uses Akismet to reduce spam. Learn how your comment data is processed.