Project Valhalla – Value Types
In July last year, Brian Goetz announced Project Valhalla, an experimental project to explore major new language features for a future version of Java. Features to be investigated are:
- Value types
- Generic specialization
- Enhanced volatiles
- Possibly other related topics, such as reified generics
The features in Project Valhalla are not yet planned for any specific future version of Java, but they will certainly not be included in Java 9 (which is planned for September 2016), so a Java version with these features is at least 2 to 3 years away (if they will ever be added at all – this isn’t even decided at this point).
In this post, I’ll explain value types: what they are, why it would be beneficial to have them in Java and an introduction to what is involved in adding them to Java. Finally, we’ll have a quick look at value types in Scala.
What are value types?
The simplest way to explain what value types are is: user-defined primitive types.
Java has two kinds of types: primitive types and reference types. The major difference between these is that a variable of a primitive type directly contains the value, for example a variable of type int contains a 32-bit value of type int, while a variable of a reference type has a level of indirection; it doesn’t contain the content of an object itself, but the address of an object that’s stored somewhere in memory[1].
A consequence of this is that objects implicitly have an identity, while primitive types do not. You can for example have different String objects, and even when they contain the same value, you can still distinguish them as different objects, using the == operator. Primitive values are just values – for example the value 25 is just a number, and there are no multiple numbers 25 that can be distinguished from each other.
Why have custom value types?
So, Java already has value types – the primitive types (byte, short, int, long, float, double, char and boolean). Why would it be good if you could define your own value types?
Because some types naturally behave like values, and you would like to avoid the overhead that is necessary for objects. For example a java.util.Date is just a timestamp value (a long containing a number of milliseconds since 01-01-1970, 00:00:00 GMT). You’re normally only interested in the value that a Date object contains, and not in the identity of the object.
The overhead associated with objects consists of the following:
- To access a value in an object, the JVM always has to go through a level of indirection (it has to lookup the value in memory through the reference to the object).
- Each object has some extra data to support synchronization, which is a large overhead for objects that contain only a small value. For example an Integer object might take up 16 bytes of memory[2], while actual value contained in the object is just 4 bytes.
- Objects are allocated on the heap[3]. Heap allocation and garbage collection cost CPU cycles.
- When you have an array of object references, the objects themselves may be scattered across memory. If you iterate over the array, accessing the objects one by one, this will cause cache misses, making the code run a lot slower than when the objects would be laid out in memory one after the other.
Value types do not necessarily contain just a single value. For example a type that represents complex numbers would contain two values, for the real and imaginary part of the number. You’d want to store this as two double values that are treated as a unit.
In State of the Values – Infant Edition, a number of use cases for value types are listed:
- Numeric types, for example complex numbers, extended-precision or unsigned integers and decimal types
- Native types for which there is no equivalent Java primitive type
- Algebraic data types, for example Optional<T> shouldn’t need to be an object itself
- Tuples
- Cursors (for example iterators)
- Flattening (avoid unnecessary pointer indirections)
What this would look like in Java
All the consequences of having value types in Java are not yet clear (the point of Project Valhalla is to experiment with them and discover what exactly it would mean). As mentioned in State of the Values, for value types you should be able to say:
Codes like a class, works like an int!
Here are some points to explain what that would mean:
- You can compare them with ==, just like the existing primitive types.
- Since they are not references, you can’t set a variable of a value type to null.
- All reference types implicitly inherit from class java.lang.Object. This should probably not be true for value types, because the facilities that class Object provides don’t make sense for value types (for example, because values have no object identity it makes no sense to lock on a value, and clone() and finalize() would also not be useful).
- There will need to be some way to box value types using a wrapper type that is a reference type, just like we have wrapper classes to box the built-in primitive types.
- There will be limitations to inheritance, because without any room to store runtime type information in the value it’s hard to have polymorphism. Maybe it won’t be possible at all to extend value types. If you have a variable of value type A and you assign it a value that is of value type B extends A, then there’s no way for the JVM to know at runtime that the variable actually refers to a value of type B.
That’s just the beginning. State of the Values goes into much more detail and also lists some open questions, and some ideas about how it would be implemented in a JVM.
Once this has all has been thought through, then there’s the question of backward compatibility. There are a number of classes in the standard library which would be a natural fit for value types, such as class java.util.Date that I already mentioned. However, it will be impossible to change these existing classes into value types without breaking backward compatibility. Should new value types be added for those classes, for example a type DateValue – and then everybody would have to learn to use the new value types, and ignore the old reference types that won’t be removed because of backward compatibility reasons? That wouldn’t make the language easier to use.
Value types in Scala
Scala has value classes, but they have limitations. Probably the biggest limitation is that a Scala value class can have only one value member. This means that you cannot, for example, create a class Complex containing two double values as a value class.
Also, when runtime type information is necessary, for example if the value class extends a trait, so that you can polymorphically call methods, or when you use a value class for pattern matching, Scala will automatically allocate a wrapper object for the value object, so that you lose the advantage of not having to allocate memory for an object.
Perhaps when support for value types is added to the JVM, some of the limitations of value classes in Scala can be removed too.
Further reading
- OpenJDK – Project Valhalla
- State of the Values – Infant Edition
- JVM Language Summit 2014 – Evolving the JVM (video)
Footnotes
[1] This is only conceptual – how a reference is actually represented depends on the implementation of the JVM, it doesn’t necessarily have to be a direct pointer to the content of an object.
[2] 16 bytes is just an example (although not unrealistic) – what the actual memory overhead is for an object, depends on the implementation of the JVM.
[3] Through escape analysis objects may sometimes be allocated on the stack instead of on the heap, which lowers the cost of allocation and makes deallocation essentially free.
Very interesting. Personally I haven’t really missed this feature because I have grown so accustomed to the fact that Java always “stores” objects by reference and cannot pass them by value but passes their reference by value. I think it is already unfortunate that as a Java programmer you have to handle primitive types differently from their wrappers; you cannot call e.g. ‘1.toString()’. Value objects would mean more of this kludge. Wouldn’t it be more natural that ANY field inside a class can be defined to either store an object directly or by reference? This way the dichotomy between value types and objects on the heap would disappear and depends on how you decide to use instances of a class. I do understand that this is an even more drastic language change than adding value types and I understand the Java designers just don’t have the luxury to make such huge changes.
I do think value types can be useful for real performance bottlenecks. Still I do wonder if the JIT compiler doesn’t already optimize access to frequently used objects. Perhaps there are cases where this is impossible. I’d think heavily reading and writing to a large array of objects. And I suppose value objects can help by reducing the memory overhead of references.
This does leave me with the feeling that introducing value objects will increase the complexity of the language without a very real benefit. Or am I missing some point?
> Value objects would mean more of this kludge.
On the contrary – it would mean that the primitive types such as ‘int’ could be defined as a value class, with methods that would then be able to call on them – so you could indeed call ‘toString’ on an ‘int’. The primitive types would look more like any other kind of object, making the language more uniform.
Ok that actually sounds pretty good. Is this a new form of improved autoboxing then?
It’s not really autoboxing, but it would enable you to use primitive types in a way that looks as if you’re using objects. There’s no boxing involved (the value would not have to be wrapped in an object).
In general, boxing values into objects is something that you often want to avoid, because it brings a lot of overhead with it. For example an ‘ArrayList’ is a really wasteful way to store a list of integers – each ‘Integer’ object takes up something like 16 bytes (while an ‘int’ is only 4 bytes), so a list with N integers takes up way more than N * 4 bytes of memory. Also accessing the values is inefficient because you’d always have to go through an extra level of indirection.
Avoiding boxing is actually what one of the other parts of Project Valhalla is about: specialization of generics. I was planning to write a blog post about that soon.
Thank you for the explanation. Right after I posted that comment I was thinking to myself: perhaps this is not autoboxing at all; it shouldn’t be neccessary to box just to invoke a method on a value object.
I guess I have to revise my initial opinion a bit: if I understand you correctly, this change will also allow us to write generic classes that can also work with value types, including the existing primitive types. That would be quite useful. There exist some libraries (e.g. colt http://java-performance.info/primitive-types-collections-trove-library/ ) that duplicate the List interface and some implementations for several types of primitive. This duplication could then be removed. I actually remember that I once experimented with them and found they provide much better performance than a List that uses auto(un)boxing. Performance was comparable with using primitive arrays, but lists provide many more capabilities.
Yes, specialization of generics means that you could use primitive types as type arguments, for example ‘ArrayList’, and that a specialized version of ArrayList would be used which doesn’t need to box the values into Integer objects, so it would work just like ‘TIntArrayList’ from Trove. I’ll write about that soon!
Very cool feature, but wow, they’ve been talking about this for years, and this realistically won’t be in production Java until 2019 at the earliest… That is slow…
Oracle (and formerly Sun) are always very careful with adding new features, and they have to be, because once something is added it’s impossible to change it – they have to get it right the first time. And what also makes it really hard to add new features to Java is that it all has to backward compatible – all existing Java source code should still work on the new version. So they have to be really, really carefull when adding new features, and that takes time.