Another look at Julia

Three years ago, I first looked at the then-very-new language Julia. Back then, I concluded that there were many interesting features, but also regretted too much bad Matlab influence in the array handling.

A hands-on Julia tutorial in my neighborhood was a good occasion to take another look at this language, which has evolved quite a bit since 2012, and continues to evolve rapidly. The tutorial taught by David Sanders was an excellent introduction, and his notebooks should even be good for self-teaching. If you already have some experience in computational science, and are interested in trying Julia out on small practical applications, have a look at them.

The good news is that Julia has much improved over the years, not only by being more complete (in particular in terms of libraries), but also through changes in the language itself. More changes are about to happen with version~0.4 which is currently under development. The changes being discussed include the array behavior that I criticized three years ago. It’s good to see references to APL in this discussion. I still believe that when it comes to arrays, APL and its successors are an excellent reference. It’s also good to see that the Julia developers take the time to improve their language, rather than rushing towards a 1.0 release.

Due to David’s tutorial, this time my contact with Julia was much more practical, working on realistic problems. This was a good occasion to appreciate many nice features of the language. Julia has taken many good features from both Lisp and APL, and combined them seamlessly into a language that, in spite of some warts, is overall a pleasure to use. A major aspect of Julia’s Lisp heritage is the built-in metaprogramming support. Metaprogramming has always been difficult to grasp, which was clear as well during the tutorial. It isn’t obvious at all what kind of problem it helps to solve. But everyone who has used a language with good metaprogramming support doesn’t want to go back.

A distinctive feature of Julia is that it occupies a corner of the programming language universe that was almost empty until now. In scientific computing, we have traditionally had two major categories of languages. “Low-level” languages such as Fortran, C, and C++, are close to the machine level: data types reflect those directly handled by today’s processors, memory management is explicit and thus left to the programmer. “High-level” languages such as Python or Mathematica present a more abstract view of computing in which resources are managed automatically and the data types and their operations are as close as possible to the mathematical concepts of arithmetic. High-level languages are typically interpreted or JIT-compiled, whereas low-level languages require an explicit compilation step, but this is not so much a feature of the language as of their age and implementation.

Julia is resolutely modern in opting for modern code transformation techniques, in particular under-the-hood JIT compilation, making it both fully compiled and fully interactive. In terms of the more fundamental differences between “low-level” and “high-level”, Julia chooses an unconventional approach: automatic memory management, but data types at the machine level.

As an illustration, consider integer handling. Julia’s default integers are the same as C’s: optimal machine-size signed integers with no overflow checks on arithmetic. The result of 10^50 is -5376172055173529600, for example. This is the best choice for performance, but it should be clear that it can easily create bugs. Traditional high-level languages use unlimited integers by default, eventually offering machine-size integers as a optimization option for experienced programmers. Julia does have a BigInt type, but using it requires a careful insertion of big(...) in many places. It’s there if you absolutely need it, but you are expected to use machine-sized integers most of the time.

As a consequence, Julia is a power tool for experienced scientific programmers who are aware of the traps and the techniques to avoid falling into them. Julia is not a language suitable for beginners or occasional users of scientific programming, because such inexperienced scientists need more of a safety net than Julia provides. Neither is Julia a prototyping language for trying out new ideas, because when concentrating on the science you also need a safety net that protects you from the traps of machine-level abstractions. In Julia, you have to design your own safety net, and you also have to verify that it is strong enough for your needs.

Perhaps the biggest problem with Julia is that this is not obvious at first glance. Julia comes with all the nice interactive tools for rapid development and interactive data analysis, in particular the IJulia notebook which is basically the same as the now-famous IPython/Jupyter notebook. At a first glance, Julia looks like a traditional high-level language. A strong point of David’s Julia tutorial is that it points out right from the start that Julia is different. Whenever a choice must be made between run-time efficiency and simplicity, clarity, or correctness, Julia always chooses efficiency. The least important consequence is surprising error messages that make sense only with a basic understanding of how the compiler works. The worst consequence is that inexperienced users are easily induced to write unsafe code. There are nice testing tools, in particular FactCheck which looks very nice, but scientists are notoriously unaware of the need of testing.

The worst design decision I see in Julia is the explicit platform dependence of the language: the default integer size is either 32 or 64 bits, depending on the underlying platform. This default size is used in particular for integer constants. As a consequence, a Julia program does in general not have a single well-defined result, but two distinct results. This means that programs must be tested on two different architectures, which is hard to do even for experienced programmers. Given the ongoing very visible debate about the (non-)reproducibility of computational research, I cannot understand how anyone can make such a decision today. Of course I do understand the performance advantage that results from this choice, but this clearly goes to far for my taste. If I ever use Julia for my research, I’ll start each source code file with @assert WORD_SIZE==64 just to make sure that everyone knows what kind of machine I tested my code on.

As for the surprising but not dangerous features that can probably only be explained by convenience for the compiler, there is first of all the impossibility to redefine a data type without clearing the workspace first – and that means losing your whole session. It’s a bit of a pain for interactive development, in particular in IJulia notebooks. Another oddity is the const declaration, which makes a variable to which you can assign new values as often as you like, as long as the type remains the same. It’s more a typed variable declaration than the constant suggested by the name.

Finally, there is another point where I think the design for speed has gone too far. The choice of machine-size integers turns into something completely useless (in my opinion) when it comes to rational arithmetic. Julia lets you create fractions by writing 3//2 etc., but the result is a fraction whose nominator and denominator are machine-size integers. Rational arithmetic has the well-known performance and memory problem of denominators growing with each additional operation. With machine-size integers, rational arithmetic rapidly crashes or returns wrong results. Given that the primary application of rationals is unlimited precision arithmetic, I don’t see a practical use for anything but Rational{BigInt}.

In the end, Julia leaves me with a feeling of a lost opportunity. My ideal software development environment for computational science would support the whole life cycle of computational methods, starting from prototyping and ending with platform-specific optimizations. As code is progressively optimized based on profiling information, each version would be used as a reference to test the next optimization level. In terms of fundamental language design, Julia seems to have everything required for such an approach. However, the default choice of fast-and-unsafe operations almost forces programmers into premature optimization. Like in the traditional high-/low-level language world, computational science will require two distinct languages, a safe and a fast one.

Explore posts in the same categories: Computational science

Tags:

You can comment below, or link to this permanent URL from your own site.

11 Comments on “Another look at Julia”


  1. […] Another look at Julia The good news is that Julia has much improved over the years, not only by being more complete (in particular in terms of libraries), but also through changes in the language itself. More changes are about to happen with version~0.4 which is currently under development. The changes being discussed include the array behavior that I criticized three years ago. It’s good to see references to APL in this discussion. I still believe that when it comes to arrays, APL and its successors are an excellent reference. It’s also good to see that the Julia developers take the time to improve their language, rather than rushing towards a 1.0 release. […]


  2. @Konrad

    > This means that programs must be tested on two different architectures, which is hard to do even for experienced programmers.

    No it’s not, continuous integration in Julia is a breeze, please refer to the manual documentation:

    * http://julia.readthedocs.org/en/latest/manual/packages/#generating-the-package

    And also Travis CI documentation:

    * http://docs.travis-ci.com/user/languages/julia

    Here is an example of `Pkg.generate`:

    * http://git.io/vqg7s

    Tests can be run automatically after every `push`, here is `Gadfly` package Travis account as an example:

    * https://travis-ci.org/dcjones/Gadfly.jl

    > Julia is not a language suitable for beginners or occasional users of scientific programming, because such inexperienced scientists need more of a safety net than Julia provides.

    I kinda disagree, if introductory courses have been taught with Java or C/C++ (Harvard CS50 for example) and also Python, then Julia is perfectly suitable as an introductory language. When Julia reaches 1.0 then I’ll fully disagree.

    > Julia is a power tool for experienced scientific programmers who are aware of the traps and the techniques to avoid falling into them.

    And also for unexperienced programmers that are able or willing to read the documentation.

    > Metaprogramming has always been difficult to grasp, which was clear as well during the tutorial. It isn’t obvious at all what kind of problem it helps to solve.

    It allows for development of domain specific languages, Julia’s metaprogramming capabilities gives the flexibility to pick notation for specific problems that let you redesign the language, which is the highest level of abstraction, while still being high performant.

    It can also reduce a lot of redundancy when a code pattern is met, for example wrapping several but similar functions in a `for … @eval … end` loop:

    * http://git.io/vqgAY

    > However, the default choice of fast-and-unsafe operations almost forces programmers into premature optimization. Like in the traditional high-/low-level language world, computational science will require two distinct languages, a safe and a fast one.

    Computers work the way they do, not the way we would like them to do, ie. fast integer arithmetic requires machine integers, arbitrary precision integers align better whit how we reason about integers, the pitfall is that it’s very inefficient, as all the dynamic languages that implement this “feature” by default can demonstrate.

    > Whenever a choice must be made between run-time efficiency and simplicity, clarity, or correctness, Julia always chooses efficiency.

    I’m glad of that!


    • Do you really expect PhD students hacking on their research code to use continuous integration and related heavy-weight tools for software professionals? 99% don’t even understand the problem that these tools address.

      Yes, Java and C++ have been used in introductory programming courses. That doesn’t make them good languages for beginners, it merely shows that there are many bad teachers out there.

      We will probably see beginners picking up Julia, and then we can both judge the results. I am certainly not looking forward to having to deal with the resulting code.


      • > Do you really expect PhD students hacking on their research code to use continuous integration and related heavy-weight tools for software professionals? 99% don’t even understand the problem that these tools address.

        I don’t expect nothing from anyone, but if I learned and I haven’t even finished the university …ZOMG according to your ideas I must be some kind of professional evil genius software developer! LOL Gee are 99% PhD students really incapable of reading?! If that’s true they can expect buggy code in any language, not Julia’s fault.

        Anyway I don’t care if they like to hit themselves with a hammer, just don’t keep saying testing is difficult in Julia because `Pkg.generate`, and the linked documentation actually spoon feeds CI to you. (And of course it can always be improved further)

        > Yes, Java and C++ have been used in introductory programming courses. That doesn’t make them good languages for beginners, it merely shows that there are many bad teachers out there.

        Are you implying that say David Malan is a terrible teacher? Are you a good teacher? Which languages do you use to teach? Good learners also mater!

        > We will probably see beginners picking up Julia, and then we can both judge the results. I am certainly not looking forward to having to deal with the resulting code.

        Sure let them do! Are you worried that they’ll be unable to ask if they have doubts? More and more learning resources are coming all the time, I like to sahre what I know of Julia to beginners and experts alike and also to learn from them:

        * https://gitter.im/JuliaLangEs/julialang-es (Spanish)


      • Let me be precise: anyone who chooses Java or C++ as a language to teach to scientists with no prior programming experience is in my opinion a bad teacher. There’s of course nothing wrong with teaching Java or C++ in an absolute sense: it all depends on who your students are. The language I choose for my own classes (audience: mainly physicists, chemists, and biologists) is Python.

        I am obviously not claiming that learning software engineering techniques and tools requires some form of superior intelligence that scientists do not have. But learning anything non-trivial requires a serious investment in time and therefore a serious motivation. Scientists (and science students) have chosen to focus on science rather than on software development. Some of them do look into software engineering as well, which is fine, but it’s a minority. You can’t ignore this fact when teaching computing to scientists.

        What I am worried about is not people asking questions, but scientists writing incorrect code, yielding incorrect results, without noticing or even suspecting a problem.


  3. > But learning anything non-trivial requires a serious investment in time and therefore a serious motivation.

    I agree, but Isn’t: “…scientists writing incorrect code, yielding incorrect results, without noticing or even suspecting a problem.”, serious enough? Or are you claiming this never happens in Python?

    > Scientists (and science students) have chosen to focus on science rather than on software development.

    Yet they *are* developing software. (potentially bad software according to your concerns, that or inefficient software or both).

    > 99% don’t even understand the problem that these tools address.

    If you are their teacher shouldn’t you be the one explaining that to them? Do you just let them write potentially buggy/spaghetti/untested/inefficient code?

    > You can’t ignore this fact when teaching computing to scientists.

    You can’t ignore the fact, that if they want efficiency and they learned with Python first, in the end they’ll also have to learn things like Cython, C, C++ or Fortran, etc.

    > Of course I do understand the performance advantage that results from this choice, but this clearly goes to far for my taste.

    But having to learn several languages/tools to have efficient code does not?

    In conclusion you want a “safety nets included” and an efficient language. You complain about Julia decision to use machine integers, claiming it’s “unsafe” (but efficient), while arbitrary precision arithmetic is “safe” (but inefficient, thats they price you pay for abstracting to much from low level details).

    You mean you want fast arbitrary precision arithmetic? How do you suggest to achieve that with current hardware?

    > Traditional high-level languages use unlimited integers by default, eventually offering machine-size integers as a optimization option for experienced programmers.

    Which is why they are traditionally slow. Even more so for unexperienced programmers (99% PhD).

    > In the end, Julia leaves me with a feeling of a lost opportunity. My ideal software development environment for computational science would support the whole life cycle of computational methods, starting from prototyping and ending with platform-specific optimizations.

    How may languages you use in your current software development environment for computational science in order to achieve all of that? Just Python? I don’t think so. Is this just an idealized development environment or an actual one?


  4. I think we should at this point agree to disagree. You will find the answers to all the questions you ask in my post, if you read it carefully.

    As for teaching scientists with no prior programming experience, I suggest you have a look at the Software Carpentry lessons (https://github.com/swcarpentry) which are the result of many years of experience by many people to teach computing to scientists. Imagine adding a two- or three-hour introduction to Julia (or C, or Java, …) at the same level. You wouldn’t get very far.


  5. Julia is a more complicated language than Python. You have to understand type-stability to use Julia properly. However, the key is that Julia allows you to write “bad code” and it will usually work, and usually be quite fast (if the compiler can find out the right types, which it can usually do).

    Yes, this means that there is a bit more to teach when teaching Julia than Python. But it can easily be used as a starter language. Choosing Python as a starting language glosses over so many details that I think it makes it harder for a student to understand what the computer is actually doing and make performant code. While understanding how to teach beginners Julia will take some time, Julia is still great as a second language which has all of the OO/Functional/Compiler details a CS student wants in a high performing language, so it’s a great language for experts who want to work faster than in C. That alone makes it a great language for package developers, and it’s the packages that really make the language.

    When you take Python and add a JIT like in Numba, one of the big changes that breaks “regular” Python code is the fact that they require type stability. There’s no getting around it for fast code. However, these JITs for Python only tend to work on a subset of Python code (focusing on speeding up Python’s array-based and math code) because the rest of the language is not built from the ground up for type-stability. This means that you cannot use Numba to make your own “Fast Stack” or “Fast Distributed Arrays” since they don’t accelerate the objects, while by default Julia’s system has all of its types as first-class and compiler optimized. So if you really want to build out a large ecosystem of high performance types/objects, you need to go to Julia (or write them in C++ and bind them to Python…).

    As a sidenote, you can always use Rational{BigInt} if you want. But Rational{Int} has a lot of uses. For example, it’s good for storing the tableaus for Runge-Kutta methods for ODEs. These are defined by rational numbers which easily fit within Rational{Int}. Since Rationals are exact, if one wants to solve the ODE with regular floats, you just do float(tableau) and its in that precision, while you can at the start of the computation also use big(tableau) to make it that precision. If you instead had the tableau saved only as the floating point values, then it would already have the truncation error before changing to big and not be as exact. Parametric types let you optimize by using “as tight” of a type as possible using the same amount of code, and I think this is a really good case for showing that.

    As for metaprogramming, check out things like Devectorize.jl and ParallelAccelerator.jl. The idea is that you can use it to write the code you’re supposed to write for performance, but be lazy and let Julia re-write the code for you. This makes code “optimal” but also easy to read. While I personally find it hard to think of good reasons to use macros on my own, others have written some very useful macros (others include @parallel, @progress (in Atom), @simd, @threads, @fastmath, @recipe, etc.).


    • Thanks for your detailed feedback!

      In view of your comments, let me reformulate the problem I have with Julia: its focus and intended audience is not clear. It’s being marketed as good for everything in computational science, which I don’t believe to be possible. There are fundamental conflicts between different goals (clarity, correctness, performance, …) that require a compromise, which cannot be the same for everyone.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: