Exploring Racket

Over the last few months I have been exploring the Racket language for its potential as a language for computational science, and it’s time to summarize my first impressions.

Why Racket?

There are essentially two reasons for learning a programing language: (1) getting acquainted with a new tool that promises to get some job done better than with other tools, and (2) learning about other approaches to computing and programming. My interest in Racket was driven by a combination of these two aspects. My background is in computational science (phsyics, chemistry, and structural biology), so I use computation extensively in my work. Like most computational scientists of my generation, I started working in Fortran, but quickly found this unsatisfactory. Looking for a better way to do computational science, I discovered Python in 1994 and joined the Matrix-SIG that developed what is now known as NumPy. Since then, Python has become my main programming language, and the ecosystem for scientific computing in Python has flourished to a degree unimaginable twenty years ago. For doing computational science, Python is one of the top choices today.

However, we shouldn’t forget that we are still living in the stone age of computational science. Fortran was the Paleolithic, Python is the Neolithic, but we have to move on. I am convinced that computing will become as much an integral part of doing science as mathematics, but we are not there yet. One important aspect has not evolved since the beginnings of scientific computing in the 1950s: the work of a computational scientist is dominated by the technicalities of computing, rather than by the scientific concerns. We write, debug, optimize, and extend software, port it to new machines and operating systems, install messy software stacks, convert file formats, etc. These technical aspects, which are mostly unrelated to doing science, take so much of our time and attention that we think less and less about why we do a specific computation, how it fits into more general theoretical frameworks, how we can verify its soundness, and how we can improve the scientific models that underly our computations. Compare this to how theoreticians in a field like physics or chemistry use mathematics: they have acquired most of their knowledge and expertise in mathematics during their studies, and spend much more time applying mathematics to do science than worrying about the intrinsic problems of mathematics. Computing should one day have the same role. For a more detailed description of what I am aiming at, see my recent article.

This lengthy foreword was necessary to explain what I am looking for in Racket: not so much another language for doing today’s computational science (Python is a better choice for that, if only for its well-developed ecosystem), but as an evironment for developing tomorrow’s computational science. The Racket Web site opens with the title “A programmable programming language”, and that is exactly the aspect of Racket that I am most interested in.

There are two more features of Racket that I found particularly attractive. First, it is one of the few languages that have good support for immutable data structures without being extremist about it. Mutable state is the most important cause of bugs in my experience (see my article on “Managing State” for details), and I fully agree with Clojure’s Rich Hickey who says that “immutability is the right default”. Racket has all the basic data structures in a mutable and an immutable variant, which provides a nice environment to try “going immutable” in practice. Second, there is a statically typed dialect called Typed Racket which promises a straightforward transition from fast prototyping in plain Racket to type-safe and more efficient production code in Typed Racket. I haven’t looked at this yet, so I won’t say any more about it.

Racket characteristics

For readers unfamiliar with Racket, I’ll give a quick overview of the language. It’s part of the Lisp family, more precisely a derivative of Scheme. In fact, Racket was formerly known as “PLT Scheme”, but its authors decided that it had diverged sufficiently from Scheme to give it a different name. People familiar with Scheme will still recognize much of the language, but some changes are quite profound, such as the fact that lists are immutable. There are also many extensions not found in standard Scheme implementations.

The hallmark of the Lisp family is that programs are defined in terms of data structures rather than in terms of a text-based syntax. The most visible consequence is a rather peculiar visual aspect, which is dominated by parentheses. The more profound implication, and in fact the motivation for this uncommon choice, is the equivalence of code and data. Program execution in Lisp is nothing but interpretation of a data structure. It is possible, and common practice, to construct data structures programmatically and then evaluate them. The most frequent use of this characteristic is writing macros (which can be seen as code preprocessors) to effectively extend the language with new features. In that sense, all members of the Lisp family are “programmable programming languages”.

However, Racket takes this approach to another level. Whereas traditional Lisp macros are small code preprocessors, Racket’s macro system feels more like a programming API for the compiler. In fact, much of Racket is implemented in terms of Racket macros. Racket also provides a way to define a complete new language in terms of existing bits and pieces (see the paper “Languages as libraries” for an in-depth discussion of this philosophy). Racket can be seen as a construction kit for languages that are by design interoperable, making it feasible to define highly specific languages for some application domain and yet use it in combination with a general-purpose language.

Another particularity of Racket is its origin: it is developed by a network of academic research groups, who use it as tool for their own research (much of which is related to programming languages), and as a medium for teaching. However, contrary to most programming languages developed in the academic world, Racket is developed for use in the “real world” as well. There is documentation, learning aids, development tools, and the members of the core development team are always ready to answer questions on the Racket user mailing list. This mixed academic-application strategy is of interest for both sides: researchers get feedback on the utility of their ideas and developments, and application programmers get quick access to new technology. I am aware of only three other languages developed in a similar context: OCaml, Haskell, and Scala.

Learning and using Racket

A first look at the Racket Guide (an extended tutorial) and the Racket Reference shows that Racket is not a small language: there is a bewildering variety of data types, control structures, abstraction techniques, program structuration methods, and so on. Racket is a very comprehensive language that allows both fine-tuning and large-scale composition. It definitely doesn’t fit into the popular “low-level” vs. “high-level” dichotomy. For the experienced programmer, this is good news: whatever technique you know to be good for the task at hand is probably supported by Racket. For students of software development, it’s probably easy to get lost. Racket comes with several subsets developed for pedagogical purposes, which are used in courses and textbooks, but I didn’t look at those. What I describe here is the “standard” Racket language.

Racket comes with its own development environment called “DrRacket”. It looks quite poweful, but I won’t say more about it because I haven’t used it much. I use too many languages to be interested in any language-specific environment. Instead, I use Emacs for everything, with Geiser for Racket development.

The documentation is complete, precise, and well presented, including a pleasant visual layout. But it is not always an easy read. Be prepared to read through some background material before understanding all the details in the reference documentation of some function you are interested in. It can be frustrating sometimes, but I have never been disappointed: you do find everything you need to know if you just keep on following links.

My personal project for learning Racket is an implementation of the MOSAIC data model for molecular simulations. While my implementation is not yet complete (it supports only two kinds of data items, universes and configurations), it has data structure definitions, I/O to and from XML, data validation code, and contains a test suite for everything. It uses some advanced Racket features such as generators and interfaces, not so much out of necessity but because I wanted to play with them.

Overall I had few surprises during my first Racket project. As I already said, finding what you need in the documentation takes a lot of time initially, mostly because there is so much to look at. But once you find the construct you are looking for, it does what you expect and often more. I remember only one ongoing source of frustration: the multitude of specialized data structures, which force you to make choices you often don’t really care about, and to insert conversion functions when function A returns a data structure that isn’t exactly the one that function B expects to get. As an illustration, consider the Racket equivalent of Python dictionaries, hash tables. They come in a mutable and an immutable variant, each of which can use one of three different equality tests. It’s certainly nice to have that flexibility when you need it, but when you don’t, you don’t want to have to read about all those details either.

As for Racket’s warts, I ran into two of them. First, the worst supported data structure in Racket must be the immutable vector, which is so frustrating to work with (every operation on an immutable vector returns a mutable vector, which has to be manually converted back to an immutable vector) that I ended up switching to lists instead, which are immutable by default. Second, the distinction (and obligatory conversion) between lists, streams, generators and a somewhat unclear sequence abstraction makes you long for the simplicity of a single sequence interface as found in Python or Clojure. In Racket, you can decompose a list into head and tail using first and rest. The same operations on a stream are stream-first and stream-rest. The sequence abstraction, which covers both lists and streams and more, has sequence-tail for the tail, but to the best of my knowledge nothing for getting the first element, other than the somewhat heavy (for/first ([element sequence]) element).

The macro requirements of my first project were modest, not exceeding what any competent Lisp programmer would easily do using defmacro (which, BTW, exists in Racket for compatibility even though its use is discouraged). Nevertheless, in the spirit of my exploration, I tried all three levels of Racket’s hygienic macro definitions: syntax-rule, syntax-case, and syntax-parse, in order of increasing power and complexity. The first, syntax-rule is straightforward but limited. The last one, syntax-parse, is the one you want for implementing industrial-strength compiler extensions. I don’t quite see the need for the middle one, syntax-case, so I suppose it’s there for historical reasons, being older than syntax-parse. Macros are the one aspect of Racket for which I recommend starting with something else than the Racket documentation: Greg Hendershott’s Fear of Macros is a much more accessible introduction.

Scientific computing

As I said in the beginning of this post, my goal in exploring Racket was not to use it for my day-to-day work in computational science, but nevertheless I had a look at the support for scientific computing that Racket offers. In summary, there isn’t much, but what there is looks very good.

The basic Racket language has good support for numerical computation, much of which is inherited from Scheme. There are integers of arbitrary size, rational numbers, and floating-point numbers (single and double precision), all with the usual operations. There are also complex numbers whose real/imaginary parts can be exact (integer or rational) or inexact (floats). Unlimited-precision floats are provided by an interface to MPFR in the Racket math library.

The math library (which is part of every standard Racket installation) offers many more goodies: multidimensional arrays, linear algebra, Fourier transforms, special functions, probability distributions, statistics, etc. The plot library, also in the standard Racket installation, adds one of the nicest collections of plotting and visualization routines that I have seen in any language. If you use DrRacket, you can even rotate 3D scenes interactively, a feature that I found quite useful when I used (abused?) plots for molecular visualization.

Outside of the Racket distribution, the only library I could find for scientific applications is Doug Williams’ “science collection“, which predates the Racket math library. It looks quite good as well, but I didn’t find an occasion yet for using it.

Could I do my current day-to-day computations with Racket? A better way to put it is, how much support code would I have to write that is readily available for more mature scientific languages such as Python? What I miss most is access to my data in HDF5 and netCDF formats. And the domain-specific code for molecular simulation, i.e. the equivalent of my own Molecular Modeling Toolkit. Porting the latter to Racket would be doable (I wrote it myself, so I am familiar with all the algorithms and its pitfalls), and would in fact be an opportunity to improve many details. But interfacing HDF5 or netCDF sounds like a lot of work with no intrinsic interest, at least to me.

The community

Racket has an apparently small but active, competent, and friendly community. I say “apparently” because all I have to base my judgement on is the Racket user mailing list. Given Racket’s academic and teaching background, it is quite possible that there are lots of students using Racket who find sufficient support locally that they never manifest themselves on the mailing list. Asking a question on the mailing list almost certainly leads to a competent answer, sometimes from one of the core developers, many of whom are very present. There are clearly many Racket beginners (and also programming newbies) on the list, but compared to other programming language users’ lists, there are very few naive questions and comments. It seems like people who get into Racket are serious about programming and are aware that problems they encounter are most probably due to their lack of experience rathen than caused by bugs or bad design in Racket.

I also noticed that the Racket community is mostly localized in North America, judging from the peak posting times on the mailing list. This looks strange in today’s Internet-dominated world, but perhaps real-life ties still matter more than we think.

Even though the Racket community looks small compared to other languages I have used, it is big and healthy enough to ensure its existence for many years to come. Racket is not the kind of experimental language that is likely to disappear when its inventor moves on to the next project.

Conclusion

Overall I am quite happy with Racket as a development language, though I have to add that I haven’t used it for anything mission-critical yet. I plan to continue improving and completing my Racket implementation of Mosaic, and move it to Typed Racket as much as possible. But I am not ready to abandon Python as my workhorse for computational science, there are simply too many good libraries in the scientific Python ecosystem that are important for working efficiently.

Explore posts in the same categories: Computational science, Programming

21 Comments on “Exploring Racket”

  1. khinsen Says:

    There’s an interesting follow-up discussion on the Racker users mailing list: http://lists.racket-lang.org/users/archive/2014-May/062521.html

  2. feeley Says:

    If you are interested in interfacing Scheme and Python, one approach is to use the “universal backend” of the Gambit Scheme compiler. The universal backend can generate Python code (as well as JavaScript, Ruby and even PHP) from Scheme source code. It then becomes easy to call Python libraries or existing code from Scheme code.

    If performance of numerical code is critical, then you could also try Gambit’s C backend which translates Scheme code to efficient C code (particularly for floating point computations because the compiler can keep the numbers in an unboxed state). The C backend has the nice feature that it generates portable C code, so after the C code is generated, it can be distributed and compiled on different platforms (the code is independent of the operating system, C compiler, machine word width, etc).

    • khinsen Says:

      I hadn’t heard about the universal backend before. That looks quite interesting. However, what made me look at Racket are the features that go beyond standard Scheme, so I don’t think I’ll find them in Gambit or elsewhere.

  3. M. Samir Says:

    What about clojure? is it in your future plans for testing?

    • khinsen Says:

      I have explored and used Clojure on the past (I wrote the tools.macro and algo.monads libraries in Clojure Contrib), and I still use it for specific problems. It’s a nice language overall, whose major advantage and major disadvantage is its close link to the JVM. Compared to Racket, it wins in simpicity but loses in low-level support (Racket goes closer to the metal) and macro programming (Clojure macros are roughly the same as Common Lisp macros). For my specific interest in DSL development, I think Racket is a better bet.


      • Hi Konrad, curious about what you need that is “closer to the metal” in Clojure? People have been doing GPU stuff quite happily, and we have BLAS matrix linear algebra via Clatrix etc. In addition core.matrix / related ecosystem keeps getting better :-)

      • khinsen Says:

        With Clojure, getting closer to the metal always involves the JNI, and that usuallly means that data gets copied at the interface. It also means that much of the JVM’s portability advantage is lost, but that’s true for any approach that involves C or C++ code, in any dynamic language.

  4. dsblank Says:

    You might be interested in our project “Calico Scheme”… it is a Scheme written in Scheme, and then converted to Python as the implementation language. One nice aspect of this language is that you can use Python libraries, and they appear as Scheme natives. It supports real Scheme semantics (call/cc, no stack limitations, proper tail call handling, etc). We haven’t developed it fully, but could be useful.

    See:

    http://calicoproject.org/Calico_Scheme and
    http://calicoproject.org/Calico_Scheme#Use_in_Python

    • khinsen Says:

      Thanks for the pointer, that looks like an interesting project. It’s amazing how many Lisp-like languages compiling to Python have been developed. There are at least three others: Hy (http://docs.hylang.org/en/latest/) is best described as a Lisp syntax for Python, Clojure-Py (https://github.com/halgari/clojure-py) is a port of Clojure to the Python platform (apparently abandoned), and Shen (http://www.shenlanguage.org/) is a multi-platform language for which Python is one of several supported platforms.

      What looks particularly interesting about Calico Scheme is the implementation process. It could be re-used for developing other languages.

      • dsblank Says:

        You pretty much nailed the pros and cons! And you are correct that our process could generate any language. It this manner it is similar to Racket (and their Pyret) and also PyPy (Python in Python)… although we don’t have a JIT.

      • khinsen Says:

        I had a closer look at Calico Scheme, or more precisely its Python implementation. It’s surprising how small it is – just a 330 KB Python source code file. On the other hand, it seems to be an interpreter for an abstract register machine, rather than a compiler to Python bytecode (the approach taken by both Hy and Clojure-Py), so I don’t expect much in terms of performance. One interesting option would be to have another implementation that creates and executes Python extension modules. Developers could then use the interpreter for development and compile to a fully compatible but much faster module in the end. Yes, I know that this is a lot of work.

        What’s not quite clear to me is the role (and thus the long-term prospects) of the Python implementation of Calico Scheme in the Calico framework, which seems to be centered around a CLR-based multi-language development environment. Why have a Python implementation of just one of the languages supported by the Calico environment?

      • dsblank Says:

        Yes, you are correct… we use Python as the register machine. The speed is how you might expect… some parts are as fast as Python, but function calls have a bit of overhead to deal with the continuations. It is just a bit slower than the C# Calico Scheme version (which is also not that fast compared to other Scheme’s).

        Calico is an experiment. We only wrote our own Scheme because we wanted one in the C# world because we were focusing on the sharing of libraries and values between languages.

        The Python implementation of Calico Scheme was just an interesting by-product of the process. Although, we do use the process to demonstrate Programming Languages design (ala The Essentials of Programming Languages, by Friedman.)

        We are also really interested in seeing Scheme remain a viable language in CS education. It may well be that Calico Scheme in Python gets a life of its own, especially if we make an IPython kernel for it. That might put it into the numpy/matplotlib/scipy stack as a useful variation.

        In any event, we will continue to explore, and play, to see what is fun and useful in CS. If you have anything that you would like to suggest or add, we would be glad to have your input!

        Thanks for taking the time to look at the system, and to make comments!


  5. As you have also tried Julia (https://khinsen.wordpress.com/2012/04/04/julia-a-new-language-for-scientific-computing/) I would like to know your opinion on how these two languages compare. Not in terms of syntax, but in more subtle issues such as their communities, their development pace, their scientific packages, etc. You know, the kind of issues you cannot see just after following a tutorial, but that you start glimpsing after several hours of real use of the language.


    • Racket is developed by a community of computer science researchers working mainly in programming language theory. As a result, Racket supports almost every approach ever invented to program computers, and it facilitates designing new languages. In addition, Racket has a lot of support for teaching programming.

      Scientific computing is just one out of many application domains that Racket is used for, and certainly not the major one. But Racket inherits the tradition of Scheme, which, coming out of MIT, has always had good support for numerical work. On top of that traditional support, Racket has a small but excellent maths library. Application-specific scientific libraries are extremely rare. If you look at Racket as a potential tool for doing scientific computing, you will conclude that there is probably a lot of code left to write, no matter what your application is, but that there is a good infrastructure to build on.

      There is also very interesting support for presenting and publishing (the libraries are called “slideshow” and “scribble”), with the possibility to integrate any computation you want. Integrating a simulation plus visualization into a slideshow is pretty straightforward, and I know of no other platform with such features. I actually use Racket a lot for preparing diagrams for publications. It offers a programming approach to doing diagrams much like what TeX does for text documents.

      The Racket community is very helpful and has an overall focus on program correctness that is a welcome change from what I am used to in scientific computing. But their expertise is more in compiler writing than in solving differential equations.

      Julia is almost at the opposite end of the spectrum in being designed for scientific computing, with almost everything else as an afterthought. With Julia, you can be productive rapidly for simple scientific projects, and if you find application-specific libraries, also for less simple ones. The limits become apparent when you want to do something outside of the tradition of scientific computing. The community is very similar to the one around NumPy and SciPy, only smaller.

      In fact, I see many Pythoneers who take a closer look at Julia looking for an overall similar environment but with better performance. Unfortunately this comes at the price of some loss of expressiveness due to the Matlab heritage, which was the subject of my blog post.

      Personally, I consider that the main problems in today’s scientific computing are reliability and stability, not performance. We rarely have good reasons to trust the results of our programs, and we often cannot reproduce them a few years later because the languages and libraries change all the time (in an incompatible way). I don’t see any sign of the Julia community addressing these issues, which is why my interest in Julia remains limited to “curious observer”.


      • Thanks for your insight in the subject. I also think that one of the dangers of Julia is being too focused in scientific computation. When people advertise Python as a language for science, they (we) remark that it can be used for many other tasks, compared to Matlab or R. That creates a big community of developers, lots of resources for learning, etc. Can Julia compete with that? Or will it become the Fortran of the 2000? (a long flame war could follow this comment, but my opinion is that Fortran has been loosing momentum since the ’80, the point is at that time it had a huge momentum…) A have similar concerns for languages such as Chapel. Will it follow the path of Fortress?

        With that in mind, Racket can be better alternative.


      • Speculating on the future is always fun ;-)

        I see a split happening between “scientific computing” and “high-performance computing” as an important subdomain. Fortran survives in the HPC branch, competing with C/C++ and being challenged by newcomers such as Chapel. HPC receives enough money to make continued development of Fortran tools interesting, so I don’t expect Fortran to disappear. HPC is also a rather conservative field, due to industry implications and large investments in both hardware and software, so newcomers have a hard time. I wouldn’t be surprised to see Chapel follow the path of Fortress, no matter what its technical merits are.

        Julia, like Python/NumPy/SciPy, is aimed at the non-HPC sector, where interactivity and fast turn-around times matter at least as much as performance. But that sector also appreciates integration with non-scientific code. It makes a lot of sense to integrate visualization with a Web server, for example. I’d expect Julia to be more popular with people coming from Matlab, who have never had much access to non-scientific libraries. Python can more easily defend its position as a general-purpose language with good support for science.

        The real lesson that the scientific computing community should learn from all this is that we have to find a way to live with change. Computing is still a young field, and all of today’s technologies are likely to be replaced by better ones continuously for quite some time to come. How can we preserve existing software, both for reproducibility and for future re-use, while at the same time allowing languages and libraries to improve? This is not an easy problem to solve, but I don’t even see many people recognizing it as a problem.

        One ingredient to a solution is a multi-level approach to the design of a scientific computing infrastructure. If we could ensure interoperability between languages, we could let them evolve more freely without breaking everything. The JVM and CLI/.NET universes have demonstrated that this approach works. We’d need a similar platform for scientific computing, with built-in efficient array support and more programmer control over memory management.

        Another ingredient, which I am working on, is to separate science and technology in scientific software. We want scientific models to evolve on the slower time scale of scientific progress while allowing software to evolve on the faster time scale of computing technology. This can work only if the scientific models have an electronic existence of their own, as data items that are processed by software tools.


  6. When using Racket, instead of Python/numpy, for scientific computing, do you miss the infix notation?


    • No. But this is clearly a matter of habit. I have been using languages of the Lisp family for quite a while. The syntax is strange at first, but (1) you get used to it and (2) after a while the advantages become apparent. In particular the possibility to introduce one’s own syntax elements through macro definitions.

      Note also that Racket is ultimately a platform for designing languages. Nothing stops you from implementing a language with infix notation. There is even a very complete Python implementation for the Racket platform (https://github.com/pedropramos/PyonR).


      • Thanks! I was asking this question because there is a lot of debates in the imperative languages community about the pros and cons of operator overloading. Some languages like C/Java/JavaScript/Go forbid them. Other languages like C++/Python/Rust/Scala embrace them. Considering you’re versed into scientific computing, and that the main application of operator overloading is related to using an infix notation for numeric expressions (like vectors and matrices), I think this is an interesting data point that you don’t find them absolutely necessary.

        The implementation of Python in Racket looks impressive!!!


      • Operator overloading had to be judged for each language separately. In C++, it’s useful because it’s the only way to use the same syntax for plain data types (numbers, …) and objects. In Python, it’s inevitable because operators are just syntactic sugar for method calls.

        Don’t get me wrong: I think infix notation is important in scientific computing because it is so familiar. One can get used to something else (I did), but it’s an entry barrier that may well scare away many potential users.


      • Interesting. Thanks Konrad!


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: