Julia: a new language for scientific computing

New programming languages are probably invented every day, and even those that get developed and published are too numerous to mention. New programming languages developed specifically for science and engineering are very rare, however, and that’s why such a rare event deserves some publicity. A while ago, I saw an announcement for Julia, which announces itself as “a fresh approach to technical computing”. I couldn’t resist the temptation to download, install, and test-drive it. Here are my first impressions.

The languages used today for scientific computing can be grouped into four categories:

  • Traditional compiled languages optimized for number crunching. The big player in this category is of course Fortran, but some recent languages such as X10, Chapel, or Fortress are trying to challenge it.
  • Rapid-development domain-specific languages, usually interpreted. Well-known examples are Matlab an R.
  • General-purpose statically compiled languages with libraries for scientific computing. C and C++ come to mind immediately.
  • General-purpose dynamic languages with libraries for scientific computing. The number one here is Python with its vast library ecosystem.

What sets Julia apart is that it sits somewhere between the first two categories. It’s compiled, but fully interactive, there is no separate compilation phase. It is statically typed, allowing for efficient compilation, but also has the default type “Any” that makes it work just like dynamically typed languages in the absence of type declarations. Type infererence makes the mix even better. If that sounds like the best of both worlds, it actually is. It has been made possible by modern code transformation techniques that don’t really fit into the traditional categories of “compilers” and “interpreters”. Like many other recent languages and language implementations, Julia uses LLVM as its infrastructure for these code transformations.

Julia has a well-designed type system with a clear orientation towards maths and number crunching: there is support for complex numbers, and first-class array support. What may seem surprising is that Julia is not object-oriented. This is neither an oversight nor a nostalgic return to the days of Fortran 77, but a clear design decision. Julia has type hierarchies and function polymorphism with dispatch on the types of all arguments. For scientific applications (and arguably for some others), this is more useful than OO style method dispatch on a single value.

Another unusual feature of Julia is a metaprogramming system that is very similar to Lisp macros, although it is slightly more complicated by the fact that Julia has a traditional syntax layer, whereas Lisp represents code by data structures.

So far for a summary of the language. The real question is: does it live up to its promises? Before I try to answer that question, I would like to point out that Julia is a young language that is still in flux and for now has almost no development tool support. For many real-life problems, there is no really good solution at the moment but it is clear that a good solution can be provided, it just needs to be done. What I am trying to evaluate is not if Julia is ready for real-life use (it is not), but whether there are any fundamental design problems.

The first question I asked myself is how well Julia can handle non-scientific applications. I just happened to see a blog post by John D. Cook explaining why it’s preferable to write math in a general-purpose language than to write non-math in a math language. My experience is exactly the same, and that’s why I have adopted Python for most of my scientific programming. The point is that any non-trivial program sooner or later requires solving non-math problems (I/O, Web publishing, GUIs, …). If you use a general-purpose language, you can usually just pick a suitable library and go ahead. With math-only languages such as Matlab, your options are limited, with interfacing to C code sometimes being the only way out.

So is it feasible to write Web servers or GUI libraries in Julia? I would say yes. All the features of general-purpose languages are there or under consideration (I am thinking in particular of namespaces there). With the exception of systems programming (device drivers and the like), pretty much every programming problem can be solved in Julia with no more effort than in most other languages. The real question is if it will happen. Julia is clearly aimed at scientists and engineers. It is probably good enough for doing Web development, but it has nothing to offer for Web developers compared to well-established languages. Will scientists and engineers develop their own Web servers in Julia? Will Web developers adopt Julia? I don’t know.

A somewhat related question is that of interfacing to other languages. That’s a quick way to make lots of existing code available. Julia has a C interface (which clearly needs better tool support, but I am confident that it will come), which can be used for other sufficiently C-like languages. It is not clear what effort will be required to interface Julia with languages like Python or Ruby. I don’t see why it couldn’t be done, but I can’t say yet whether the result will be pleasant to work with.

The second question I explored is how well Julia is suited to my application domain, which is molecular simulations and the analysis of experimental data. Doing molecular simulation in Julia looks perfectly feasible, although I didn’t really implement any non-trivial algorithm yet. What I concentrated on first is data analysis, because that’s where I could profit most from Julia’s advantages. The kinds of data I mainly deal with are (1) time series and frequency spectra and (2) volumetric data. For time series, Julia works just fine. My biggest stumbling block so far has been volumetric data.

Volumetric data is usually stored in a 3-dimensional array where each axis corresponds to one spatial dimension. Typical operations on such data are interpolation, selection of a plane (2-d subarray) or line (1-d subarray), element-wise multiplication of volume, plane, or line arrays, and sums over selected regions of the data. Using the general-purpose array systems I am familiar with (languages such as APL, libraries such as NumPy for Python), all of this is easy to handle.

Julia’s arrays are different, however. Apparently the developers’ priority was to make the transition to Julia easy for people coming from Matlab. Matlab is based on the principle that “everything is a matrix”, i.e. a two-dimensional array-like data structure. Matlab vectors come on two flavors, row and column vectors, which are actually matrices with a single row or column, respectively. Matlab scalars are considered 1×1 matrices. Julia is different because it has arrays of arbitrary dimension. However, array literals are made to resemble Matlab literals, and array operations are designed to behave as similar as possible to Matlab operations, in particular for linear algebra functions. In Julia, as in Matlab, matrix multiplication is considered more fundamental than elementwise multiplication of two arrays.

For someone used to arrays that are nothing more than data structures, the result looks a bit messy. Here are some examples:

julia> a = [1; 2]
[1, 2]

julia> size(a)
(2,)

julia> size(transpose(a))
(1,2)

julia> size(transpose(transpose(a)))
(2,1)

I’d expect that the transpose of the transpose is equal to the original array, but that’s not the case. But what does transpose do to a 3d array? Let’s see:

julia> a = [x+y+z | x=1:4, y=1:2, z = 1:3]
4x2x3 Int64 Array:
...

ulia> transpose(a)
no method transpose(Array{Int64,3},)
 in method_missing at base.jl:60

OK, so it seems this was not considered important enough, but of course that can be fixed.

Next comes indexing:

julia> a = [1 2; 3 4]
2x2 Int64 Array:
 1  2
 3  4

julia> size(a)
(2,2)

julia> size(a[1, :])
(1,2)

julia> size(a[:, 1])
(2,1)

julia> size(a[1, 1])
()

Indexing a 2-d array with a single number (all other indices being the all-inclusive range :) yields a 2-d array. Indexing with two number indices yields a scalar. So how do I extract a 1-d array? This generalizes to higher dimensions: if the number of number indices is equal to the rank of the array, the result is a scalar, otherwise it’s an array of the same rank as the original.

Array literals aren’t that frequent in practice, but they are used a lot in development, for quickly testing functions. Here are some experiments:

julia> size([1 2])
(1,2)

julia> size([1; 2])
(2,)

julia> size([[1;2] ; [3;4]])
(4,)

julia> size([[1;2] [3;4]])
(2,2)

julia> size([[1 2] [3 4]])
(1,4)

julia> size([[[1 2] [3 4]] [[5 6] [7 8]]])
(1,8)

Can you guess the rules? Once you have them (or looked them up in the Julia manual), can you figure out how to write a 3-d array literal? I suspect it’s not possible.

Next, summing up array elements:

julia> sum([1; 2])
3

julia> sum([1 2; 3 4])
10

Apparently sum doesn’t care about the shape of my array, it always sums the individual elements. Then how do I do a sum over all the rows?

I have tried to convert some of my basic data manipulation code from Python/NumPy to Julia, but found that I always spent most of the time fighting against the built-in array operations, which are clearly not made for my kind of application. In some cases a change of attitude may be sufficient. It seems natural to me that a plane extracted from volumetric data should be a 2-d array, but maybe if I decide that should be a 3-d array of “thickness” 1, everything will be easy.

I haven’t tried yet, because I know there are cases that cannot be dealt with in that way. Suppose I have a time series of volumetric data that I store in a 4-d array. Obviously I want to be able to apply functions written for static volumetric data (i.e. 3-d arrays) to an element of such a time series. Which means I do need a way to extract a 3-d array out of a 4-d array.

I hope that what I need is there and I just didn’t find it yet. Any suggestions are welcome. For now, I must conclude that test-driving Julia is a frustrating experience: the language holds so many promises, but fails for my needs due to superficial but practically very important problems.

Explore posts in the same categories: Programming

8 Comments on “Julia: a new language for scientific computing”

  1. TimH Says:

    1. transpose for higher-dimensional arrays: use “permute” and “ipermute”
    2. “Indexing a 2-d array with a single index yields a 2-d array. Indexing with two indices yields a scalar.” Indexing any array with a single index yields a scalar, not an array.
    3. “How do I extract a 1d array?” c = Int32[a[1,2]]
    4. “How do I write a 3d array literal?” There may be other solutions, but (a) the “comprehension” syntax you already demonstrated covers many use cases, and (b) there’s always:
    A = Array(Int,2,2,2)
    A[:,:,1] = [1 2; 3 4]
    A[:,:,2] = [5 6; 7 8]
    Not sure you’d count this as sufficiently convenient; how does it work in other languages?
    One important point is that with Julia, the distinction between “built-in” and “not built in” is blurry. For example, you can define (with very little effort) your own types that can be accorded the same status as Int32. This is quite different from many other languages. So, you could presumably define whatever syntax for this operation you want.
    5. “Then how do I do a sum over all the rows?” sum(A,2)
    6. “It seems natural to me that a plane extracted from volumetric data should be a 2-d array, but maybe if I decide that should be a 3-d array of “thickness” 1, everything will be easy.” Try “squeeze” and “reshape”. There are applications where it’s really nice to slice but still keep track of the original array dimensions, so I personally am quite happy with this design decision in Julia.
    7. Given your interests in volumetric data sets over time, perhaps you might want to participate in the work going on in the Image library:
    git@github.com:timholy/julia.git

    • khinsen Says:

      Thanks for your comments! Some replies:

      1) permute is indeed one solution:

      transpose(A::AbstractArray) = permute(A::AbstractArray, ndims(A):-1:1)
      

      2) I should have said “with a single index and : everywhere else” (as my examples illustrate).

      3) That’s not quite what I want. What I want is the same as reshape(a[1, :], size(a)[2:]) but without having to do the shape computation myself.

      4) I am looking for something compact that I could type as a function argument on the command line. In NumPy, it’s the nesting level of the brackets that sets the number of dimensions, so it’s trivial to write arrays of any rank.

      5) sum(A, 2) looks good. It led me to discover Region and Dims, now I just have to figure out how they work exactly.

      6) I did use resize() but found my code littered with it to the point of becoming unreadable. I suspect it’s no good for performance either but I didn’t measure. squeeze() is of no use for me. I don’t want to flatten all dimensions of length 1, but just a specific one.

      I agree it is sometimes useful to keep the original array dimension, but sometimes it isn’t. NumPy offers both with a simple notation, and Julia could follow the same idea: a[1, :] would have one dimension less than a and a[1:1, :] would have the same rank as a.

      7) I don’t think of my volumetric data as images as I use it mostly as a function of space sampled at discrete grid points. But I’ll have a look at your image library, thanks for the pointer!

  2. TimH Says:

    Ah, by “single index” I assumed you meant a[3].

    I like the idea of nested brackets and a[1,:] vs. a[1:1,:].

  3. Johann Hibschman Says:

    I just stumbled across this, not quite a week late. Thanks for writing this; it articulates well the problems that I have with Julia. Maybe I’ve spent too much time with NumPy and APL-variants, but looking at Julia gives that same off-putting “everything’s a matrix” feeling that Matlab gives.

    For my use (statistics, time series, finance), I use the elementwise operators much more frequently than I use matrix operators. I often have 3D time series of matrices that I want to slice either longitudinally or laterally. Elementwise can also apply in a natural way to higher-dimension arrays.

    I want to like Julia, since I really want a modern-feeling type-inferenced language to do numerics in, but the array slicing feels like a step backwards from NumPy.

  4. hkutschbach Says:

    Nice post! I agree to a large extend. Allow one small justification: “everything is a matrix” should actually mean: “everything is at least a matrix”. In Matlab scalars and vectors have always at least two dimensions. Singleton dimensions can only get removed for arrays with more than 2 dimensions. As far as I understand it, Julia is simliar.

    “It seems natural to me that a plane extracted from volumetric data should be a 2-d array, but maybe if I decide that should be a 3-d array of “thickness” 1, everything will be easy.” Both concepts do really describe the same thing, right? Isn’t it rather an insufficient function design, if it consists of a 2d arrays and refuses a “3d slice of thickness 1”?

  5. Vladimir Says:

    Hi,
    Julia seems promised, especially its MATLAB like syntax.
    I didn’t find answers on the following 2 questions:
    1. How Julia can be embedded in C++ application (not via TCP/IP connection) ?
    2. Are there plans to add image processing library (similar to MATLAB) to Julia ?
    Thanks,
    Vladimir

    • khinsen Says:

      I suggest you subscribe to the Julia mailing list to get answers to your questions. I have seen discussions there about an image library, for example.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: