Project Valhalla – Value Types

In July last year, Brian Goetz announced Project Valhalla, an experimental project to explore major new language features for a future version of Java. Features to be investigated are:

  • Value types
  • Generic specialization
  • Enhanced volatiles
  • Possibly other related topics, such as reified generics

The features in Project Valhalla are not yet planned for any specific future version of Java, but they will certainly not be included in Java 9 (which is planned for September 2016), so a Java version with these features is at least 2 to 3 years away (if they will ever be added at all – this isn’t even decided at this point).

In this post, I’ll explain value types: what they are, why it would be beneficial to have them in Java and an introduction to what is involved in adding them to Java. Finally, we’ll have a quick look at value types in Scala.

What are value types?

The simplest way to explain what value types are is: user-defined primitive types.

Java has two kinds of types: primitive types and reference types. The major difference between these is that a variable of a primitive type directly contains the value, for example a variable of type int contains a 32-bit value of type int, while a variable of a reference type has a level of indirection; it doesn’t contain the content of an object itself, but the address of an object that’s stored somewhere in memory[1].

A consequence of this is that objects implicitly have an identity, while primitive types do not. You can for example have different String objects, and even when they contain the same value, you can still distinguish them as different objects, using the == operator. Primitive values are just values – for example the value 25 is just a number, and there are no multiple numbers 25 that can be distinguished from each other.

Why have custom value types?

So, Java already has value types – the primitive types (byte, short, int, long, float, double, char and boolean). Why would it be good if you could define your own value types?

Because some types naturally behave like values, and you would like to avoid the overhead that is necessary for objects. For example a java.util.Date is just a timestamp value (a long containing a number of milliseconds since 01-01-1970, 00:00:00 GMT). You’re normally only interested in the value that a Date object contains, and not in the identity of the object.

The overhead associated with objects consists of the following:

  • To access a value in an object, the JVM always has to go through a level of indirection (it has to lookup the value in memory through the reference to the object).
  • Each object has some extra data to support synchronization, which is a large overhead for objects that contain only a small value. For example an Integer object might take up 16 bytes of memory[2], while actual value contained in the object is just 4 bytes.
  • Objects are allocated on the heap[3]. Heap allocation and garbage collection cost CPU cycles.
  • When you have an array of object references, the objects themselves may be scattered across memory. If you iterate over the array, accessing the objects one by one, this will cause cache misses, making the code run a lot slower than when the objects would be laid out in memory one after the other.

Value types do not necessarily contain just a single value. For example a type that represents complex numbers would contain two values, for the real and imaginary part of the number. You’d want to store this as two double values that are treated as a unit.

In State of the Values – Infant Edition, a number of use cases for value types are listed:

  • Numeric types, for example complex numbers, extended-precision or unsigned integers and decimal types
  • Native types for which there is no equivalent Java primitive type
  • Algebraic data types, for example Optional<T> shouldn’t need to be an object itself
  • Tuples
  • Cursors (for example iterators)
  • Flattening (avoid unnecessary pointer indirections)

What this would look like in Java

All the consequences of having value types in Java are not yet clear (the point of Project Valhalla is to experiment with them and discover what exactly it would mean). As mentioned in State of the Values, for value types you should be able to say:

Codes like a class, works like an int!

Here are some points to explain what that would mean:

  • You can compare them with ==, just like the existing primitive types.
  • Since they are not references, you can’t set a variable of a value type to null.
  • All reference types implicitly inherit from class java.lang.Object. This should probably not be true for value types, because the facilities that class Object provides don’t make sense for value types (for example, because values have no object identity it makes no sense to lock on a value, and clone() and finalize() would also not be useful).
  • There will need to be some way to box value types using a wrapper type that is a reference type, just like we have wrapper classes to box the built-in primitive types.
  • There will be limitations to inheritance, because without any room to store runtime type information in the value it’s hard to have polymorphism. Maybe it won’t be possible at all to extend value types. If you have a variable of value type A and you assign it a value that is of value type B extends A, then there’s no way for the JVM to know at runtime that the variable actually refers to a value of type B.

That’s just the beginning. State of the Values goes into much more detail and also lists some open questions, and some ideas about how it would be implemented in a JVM.

Once this has all has been thought through, then there’s the question of backward compatibility. There are a number of classes in the standard library which would be a natural fit for value types, such as class java.util.Date that I already mentioned. However, it will be impossible to change these existing classes into value types without breaking backward compatibility. Should new value types be added for those classes, for example a type DateValue – and then everybody would have to learn to use the new value types, and ignore the old reference types that won’t be removed because of backward compatibility reasons? That wouldn’t make the language easier to use.

Value types in Scala

Scala has value classes, but they have limitations. Probably the biggest limitation is that a Scala value class can have only one value member. This means that you cannot, for example, create a class Complex containing two double values as a value class.

Also, when runtime type information is necessary, for example if the value class extends a trait, so that you can polymorphically call methods, or when you use a value class for pattern matching, Scala will automatically allocate a wrapper object for the value object, so that you lose the advantage of not having to allocate memory for an object.

Perhaps when support for value types is added to the JVM, some of the limitations of value classes in Scala can be removed too.

Further reading

Footnotes

[1] This is only conceptual – how a reference is actually represented depends on the implementation of the JVM, it doesn’t necessarily have to be a direct pointer to the content of an object.

[2] 16 bytes is just an example (although not unrealistic) – what the actual memory overhead is for an object, depends on the implementation of the JVM.

[3] Through escape analysis objects may sometimes be allocated on the stack instead of on the heap, which lowers the cost of allocation and makes deallocation essentially free.

You may also like...