Recently, Sam Lindley, Garrett Morris, Sara Decova and I had our paper, Exceptional Asynchronous Session Types: Session Types without Tiers accepted to POPL 2019. For the first time, I submitted an artifact to the Artifact Evaluation process. I was really impressed with the whole process, which I found to be a positive and valuable experience. If any of the POPL’19 AEC members are reading this – thank you!

While I’ve put various papers through the review process and have a reasonable grip on what’s going on, I’d never been an artifact author before, so had no idea what to expect. Thankfully, I had colleagues to talk to, including Jack Williams who has had artifacts accepted twice at ECOOP’17 and OOPSLA’18, Rudi Horn who had his artifact accepted for ICFP’18, Wen Kokke who served on the ICFP’17 and ICFP’18 AECs, and Michel Steuwer who acted as CGO and will act as LCTES AEC chair.

Even with such expertise available, I still didn’t really know what to expect from the process. Although artifact evaluation is becoming more and more standard for PL conferences, I expect there are a few people who are in the same boat. So in this post, I’ll detail my experience from the perspective of a first-time artifact author, discussing the artifact, the process, the reviews, and what I’d do differently next time.

Before we begin though, a word of caution: this post is probably quite specific to POPL/ICFP/OOPSLA/PLDI/ECOOP, so check with your community rather than relying on this post!

Artifact Evaluation?

The primary output of academic research is peer-reviewed publications. However, while a paper describes the contributions of the research and places it in the proper context, it is often supported by other artifacts: for example, mechanised proofs, software, or data sets.

While these artifacts are important, they have traditionally not been scrutinised in the same way as the paper submission itself. Thus, artifact evaluation processes have been introduced to assess artifacts. Upon (conditional) acceptance of a paper, authors are given the option to submit an artifact for review. Should an artifact be successfully evaluated, the authors are entitled to display a badge on the paper.

Much more information is available on the official artifact evaluation website, and more recently the ACM has introduced their own guidelines.

Our Artifact

The paper

To properly put our artifact into context, it’s worth giving a quick bit of background and a synopsis of the paper.

Session types are a typing discipline which allow lightweight verification of communication-centric programs. In the setting we were interested in, session types are the types of a channel endpoint. Channels are a communication mechanism between shared between two processes, allowing the processes to send or receive messages.

Types for channels have been used to good effect in languages such as Concurrent ML or Go. As an example, in Go, a channel could have type chan int, meaning a channel where it’s possible to send or receive integers.

Session types add more structure to channel types, making it possible to encode communication protocols. As an example, it’s possible to write a session type for the client implementing a stripped-down version of the SMTP protocol as follows:

SMTPClient = ⊕ {
  EHLO: !Domain.!FromAddress.!ToAddress.!Message.SMTPClient
  QUIT: end
}

The SMTPClient type describes a channel endpoint where the programmer can either choose () the EHLO or QUIT branch. If EHLO is selected, the programmer sends (!) the domain, sender address, receiver address, and message to the server, before repeating. If QUIT is selected, then the protocol is finished. You can find out more about session types in my previous post.

To implement the SMTP client we write the following:

sig smtpClient : (SMTPClient, FromAddress, ToAddress, Message) ~> ()
fun smtpClient(c, from, to, msg) {
  var c1 = select EHLO c;
  var c2 = send(getDomain(from), c1);
  var c3 = send(from, c2);
  var c4 = send(to, c3);
  var c5 = send(message, c4)
  var c6 = select QUIT c5;
  close(c6)
}

The smtpClient function takes a channel with session type SMTPClient, along with the various parameters as its arguments. Sending a value results in a new channel endpoint being bound upon each send: for example, c2 has session type !Domain.!FromAddress.!ToAddress.!Message.SMTPClient and c3 has session type !FromAddress.!ToAddress.!Message.SMTPClient. A receive would return a pair of the received value and an endpoint.

The key technical issue when implementing session typing systems is to ensure that we can’t use, say, c2 to send the domain more than once, as this would break the guarantees session types give us. Additionally, we want to ensure that the whole protocol is followed: to do this, each endpoint must be used linearly – that is, exactly once.

Linearity is too restrictive for realistic applications, especially distributed ones. Consider the following implementation of an SMTP client:

sig smtpClient : (SMTPClient, FromAddress, ToAddress, Message) ~> ()
fun smtpClient(c, from, to, msg) {
  var c1 = select EHLO c;
  var c2 = send(getDomain(from), c1);
  var c3 = send(from, c2);
  var c4 = send(to, c3);
  var c5 = send(message, c4)
  var c6 = select QUIT c5;
  close(c6)
}

If getDomain fails (for example, if the FromAddress is incorrect), we have nothing sensible to send along the channel. Yet, we still need to use the channel up! The contribution of the paper is to integrate session types with exception handling, and as a result we provide the first application of session types to web programming. We can therefore write:

sig smtpClient : (SMTPClient, FromAddress, ToAddress, Message) ~> ()
fun smtpClient(c, from, to, msg) {
  var c1 = select EHLO c;
  try {
    getDomain(from)
  } as (domain) in {
    var c2 = send(domain, c1);
    var c3 = send(from, c2);
    var c4 = send(to, c3);
    var c5 = send(message, c4)
    var c6 = select QUIT c5;
    close(c6)
  } otherwise {
    cancel(c1);
  }
}

which completes the protocol if getDomain succeeds, but explicitly throws away the endpoint if it fails. Instead of blocking forever, an exception is raised on the server whenever it tries to read from the cancelled channel.

Similarly, the approach can be used to handle the case where a client disconnects mid-session.

This approach is inspired by that of Mostrous & Vasconcelos; we adapt their work to the functional setting, relax restrictions inherent in the work, and show that it can be implemented in practice.

The paper describes the formalisation, metatheory, and implementation of this approach. The paper also showcases a larger case study of a session-typed chat application.

What is our artifact?

While the paper contains a substantial chunk of metatheory, it’s currently pen-and-paper: to my shame I’m not really proficient in a proof assistant yet (planning on having a go at PLFA over the holidays!), and mechanising linear type systems still appears to be an open problem. Thus, we don’t claim a mechanisation of the metatheory.

That said though, a substantial contribution of the paper is showing that our theoretical technique can be implemented in practice, which we did by extending the Links programming language, along with examples.

My own expectations

The AEC reads the paper, and then judges the artifact on the expectations set by the paper. In packaging the artifact, I therefore needed to put myself in the shoes of the reviewers and think about my own expectations.

For our paper, these were my expectations:

  • That Links could be installed and run easily, and was packaged in such a way that would make evaluation quick and easy
  • That the examples from the paper written in the core calculus (EGV), namely the two-factor authentication example and the three minimal examples in Figure 2, worked as intended
  • That the exception handling functionality worked correctly on the interpreter
  • That the exception handling functionality worked correctly on web pages served to the client
  • That raising exceptions worked when crossing the client-server boundary: namely, cancelling an endpoint on the client where the server was blocked receiving resulted on an exception being thrown on the server, and vice-versa
  • That the larger application (that of a web chat application) ran successfully
  • That the typing rules for the three new constructs: cancel, try-as-in-otherwise, and raise were implemented successfully
  • That the source code was available for inspection to show how the features were implemented

The paper builds on previous work (namely Lightweight Functional Session Types by Sam Lindley and Garrett Morris), which brings some expectations in transitively:

  • The session typing system accepts well-typed protocols
  • The session typing system rejects protocols that are ill-typed due to:
    • A communication mismatch (sending a value of an incorrect type along a channel)
    • An incomplete protocol implementation
    • Re-use of a linear channel endpoint

Artifact structure

I chose to use Docker to make the artifact easy to run. The alternative would have been to have prepared a VM with everything pre-installed, but I didn’t want to do this because of my own frustration of having to download 4-5GB VDI files just to look at the source code of a library. That said, as we will see later on, maybe this wasn’t the best idea.

The README file, as submitted, can be found here. The whole artifact, as submitted, can be found here.

The structure of the artifact can be found below:

 .
 ├── cleanup.sh
 ├── custom-examples
 │   └── helloworld.links
 ├── examples -> links-docker/examples
 ├── links -> links-docker/links
 ├── links-docker
 │   ├── config
 │   ├── Dockerfile
 │   ├── examples
 │   │   ├── caught-errors
 │   │   │   ├── communication-mismatch.links
 │   │   │   ├── erroneous-discard.links
 │   │   │   └── linearity-violation.links
 │   │   ├── chatserver
 │   │   │   ├── chatClient.links
 │   │   │   ├── chatServer.links
 │   │   │   ├── chatSessions.links
 │   │   │   ├── css
 │   │   │   └── linearList.links
 │   │   ├── distributed-exceptions
 │   │   │   ├── clientOnServer.links
 │   │   │   └── serverOnClient.links
 │   │   ├── paper-examples
 │   │   │   ├── cancellation.links
 │   │   │   ├── closures.links
 │   │   │   └── delegation.links
 │   │   └── two-factor
 │   │       ├── css
 │   │       └── twoFactor.links
 │   ├── links
 │   ├── opam-repository
 │   ├── run-chatserver.sh
 │   ├── run-example.py
 │   └── run-two-factor.sh
 ├── prepare.sh
 ├── README.md
 ├── run-chatserver.sh
 ├── run-custom.sh
 ├── run-example.sh
 ├── run-interactive.sh
 ├── run-shell.sh
 ├── run-two-factor.sh
 ├── run-unit-tests.sh
 └── unit-tests -> links/tests/session-exceptions

The idea was for the reviewers to set everything up by running prepare.sh, and then use the run-* scripts to interact with the image.

The above directory tree omits the contents links-docker/links directory, which contains the Links source code.

The caught-errors directory contains the basic examples of session types catching errors. The chatserver example contains the chat server example described in the paper. The distributed-exceptions directory showcases two small examples showcasing exceptions in the distributed setting. The paper-examples directory contains the 3 small examples from the introduction of the paper, and finally the two-factor directory contains a web-based implementation of the two-factor authentication example used throughout the paper.

The run-custom.sh script allows reviewers to run their own examples, contained in the custom-examples directory.

The process

Initial submission

Submission was done via a fresh HotCRP instance (http://popl19aec.hotcrp.com). There was no pre-registration of accepted papers, and our submission consisted of our accepted draft (I just plucked the submitted PDF from the main HotCRP site) and a Google Drive link to the artifact archive.

I believe the organisers put a 1 hour grace period in place to allow for last-minute problems with submissions (but I wouldn’t count on this!).

The first review

The artifact reviewing period, according to the publicly-available information for committee members, was between the 18th October and the 1st November. In principle, this means authors need to be available to answer questions between these dates. Unlike the “classic” model of paper submission—uploading the paper, submitting a response, and finding out the results on three dates—the AEC process is designed to be more interactive to sort out any snags along the way. A (slightly strange) consequence of this is that reviews are available to authors as they are submitted.

We received our first review on the 30th October:

Review #26A
===========================================================================
* Updated: 30 Oct 2018 6:43:20pm EDT
   
Overall merit
-------------
4. Accept
   
Reviewer expertise
------------------
3. Knowledgeable
   
Artifact summary
----------------
This artifact is an OCaml implementation of session types supporting asynchronous communication with exceptions and distribution, on top of the Links functional programming language. The language extension is accompanied by two non-trivial applications, a chat server and a two-factor authentication, as well as several smaller examples demonstrating type checking and language capabilities.
   
General assesment
-----------------
The introduction of exception handling to asynchronous session types is a welcome one, and the implementation arguably brings session types closer to practical application (in, say, web application development) than earlier work. The evaluation workflow suggested by the authors worked well and served as good guide to the highlights of the artifact. The application and example implementations appear well aligned with those presented in the paper.
   
While the small examples and two key applications from the paper function as intended, I am concerned about the gap between the EGV formal calculus and its implementation on top of Links. What are the possible sources of discrepancies that could cause violations of the metatheory properties? As far as I can see, this is not discussed in the paper or the artifact documentation, and the number of unit tests provided with the artifact are quite few. For example, one source could be the semantics or implementation of WebSockets.
   
Suggestions for improvement
---------------------------
The authors could better explain the meaning and location of the `raiseExn` variable for the two-factor authentication example in their artifact documentation. I had to search through and read the source code carefully to figure out what this variable and the `checkDetails` function was about.

And there was my first glimpse of an artifact review! Needless to say, I was delighted that the artifact worked as intended and that the reviewer thought that the artifact matched the expectations set by the paper.

The main sticking points was the gap between the calculus in the paper, and its implementation. The implementation is rather more involved than the fairly small calculus described in the paper, so this is a valid concern, and the suggestion to more concretely tie the two together was a good one.

Disaster!

However, we then received a comment on the submission site:

I cannot build the artifact using Docker 18.06.1-ce-mac73 (26764) running on macOS 10.13.6.
   
I get the following warning:
   
[WARNING] Failed checks on links package definition from source at file:///home/opam/links:
           warning 41: Some packages are mentioned in package scripts or features, but there is no dependency or depopt toward them: "links"
             error 57: Synopsis and description must not be both empty
And then, after gathering sources, the following error:
   
[ERROR] The sources of the following couldn't be obtained, aborting:
          - ssl.0.5.6: curl error code 404
   
[NOTE] Pinning command successful, but your installed packages may be out of sync.
I tried building the artifact directly too. There I get the following error:
   
### stderr ###
# Too many opam files for package "links-postgresql":
# - links-docker/links/links-postgresql.opam
# - links/links-postgresql.opam
Any advice on how to build on macOS?

I was pretty perplexed about this, but granted, I hadn’t checked on MacOS. That said, MacOS turned out to be a red herring. What had actually happened was that Links depends on the ssl OCaml package. The way the opam package manager works is by indexing URLs and hashes for different versions of packages. While my image contained a snapshot of the opam index, the maintainer of the ssl package had since changed the location of the released bundle, which was causing the 404.

The fix was simple: I just updated the snapshot to use the latest version of the package. At this point, HotCRP’s commenting functionality came in handy, and I posted the following response:

Comment @A2 by Simon Fowler <simon.fowler@ed.ac.uk>
---------------------------------------------------------------------------
@A1:
Thanks for trying the artifact, and I'm sorry that you couldn't get it working first time. After investigating, I found that the issue isn't Mac-specific: it arises because the maintainer of the `ssl` package has re-released version 0.5.6 of the library under a different URL. Changing the URL of a released package is against the assumptions of the `opam` OCaml package manager, and package maintainers shouldn't do it (see https://opam.ocaml.org/doc/Manual.html, which states that a package URL in the official repository "should always point to a stable archive over HTTP or FTP.").

I have fixed this by updating the `opam-repository` snapshot in the Docker image, so it pulls in the latest version of the `ssl` library.
You can find the revised artifact here: https://drive.google.com/open?id=1SXOnw6ptxlkYQ_iuS2X5SFxXsL8w0ynU

I have additionally opened an issue on the `ssl` library, informing the maintainers about this problem: https://github.com/savonet/ocaml-ssl/issues/46

Nevertheless, I appreciate that this hiccup highlights a possible reusability hazard, and while it's likely too late to properly address this for the present artifact submission, I will liaise with the OCaml `opam` community on best practice to avoid this type of issue in the future. I imagine that the upcoming `duniverse` tool (https://github.com/avsm/duniverse) will help in this regard.

In spite of the fact that the problem isn't Mac-related, I tried the revised artifact on a Mac to make sure that there weren't any Mac-specific errors you may run into. I can verify that I managed to run the revised artifact on MacOS 10.13.6.

@Reviewer A:

Thank you for your careful review -- it's much appreciated!

We appreciate your comments. The main potential problems with WebSockets violating the semantics would be if WebSockets did not guarantee in-order delivery of messages, or if they did not reliably raise an event upon session close.  Thankfully, both of these properties are mandated in the WebSocket RFC (see https://tools.ietf.org/html/rfc6455), in particular since WebSockets are layered over TCP.

We will explicitly mention that we assume the OCaml WebSocket implementation correctly implements the WebSocket RFC, and clarify `raiseExn`, in the README of the final artifact.

And it seemed to do the trick:

Comment @A3 by Reviewer B
---------------------------------------------------------------------------
Thanks for the updated image - the build now works on my machine.

The second review

Shortly thereafter, we received our second review:

Review #26B
===========================================================================
* Updated: 1 Nov 2018 12:19:19pm EDT

Overall merit
-------------
4. Accept

Reviewer expertise
------------------
3. Knowledgeable

Artifact summary
----------------
The integration of session types and exceptions is useful because communicating programs commonly rely on I/O which may err.
The challenge is to guarantee that a program still adheres to its session type contract even if exceptions are raised.

The paper presents a progression of calculi that provide such guarantees, by integrating asynchronous session typed communication with exceptions and exception handlers with support for explicit (and implicit) cancellation of open communication channels.

The artifact contains an implementation of the Links programming language which has been extended to support primitives corresponding to the calculi in the paper.
Links is a functional language for the web.
It is implemented in OCaml.
The implementation consists of three aspects:

1. Compilation of client code to JavaScript.
2. An interpreter implemented in OCaml for server code.
3. Compilation of database queries to SQL.

As I understand, the main purpose of the artifact is to illustrate that the supported primitives are useful for practical applications.
To this end, the artifact contains various example applications, their source code, and the source code of Links.

General assesment
-----------------
My expectations of the artifact are mainly based on section 5 and section 6 of the paper.
Section 5 gives a high-level overview of the Links language, focusing on
process representation and concurrency on client and the server,
session type communication via WebSockets, and the implementation of
exception handling on the client and server.
Section 6 describes the web chat application which is one of the main example
applications in the artifact.

I also expected the primitives implemented in Links to correspond to the primitives defined in the calculi in the paper.

By and large, the artifact meets these expectations.

#### Strengths

+ The subset of Links that implements the primitives described in the paper are well documented at a high level of abstraction, even if understanding the details requires a deeper dive into the source code of Links which is less well documented.

+ The implementation of the primitives and the example applications match the expectations given in section 5 and 6 of the paper. There is no `close` primitive, since Links supports automatically closing channels when a session typed protocol ends. Seems convenient and an improvement over the expectations I had based on the paper.

+ The benchmark applications are interesting and seem polished.

+ The error messages given by the type checker seem useful.

#### Weaknesses

- It's not clear how the type system of Links relates to the type systems in the paper. For example, the type systems described in the paper clearly distinguish linear and unrestricted types. It seems that Links makes this distinction for session typed channels. Whether it makes the distinction for other types is unclear.

- Related to the point above: there's an interesting remark in the `links/core/desugarSessionExceptions.ml` file about the T-Try rule from fig. 4 in the paper, essentially stating that this rule is unsound as-is, but that it's possible to recover the expressiveness of the rule by a desugaring. Does the paper proof of safety rely on this desugaring, or is the necessity of this desugaring specific to Links?

- The encoding of processes and distributed communication in JavaScript sounds intricate, and the unit test suite seems rather small. I'd be surprised if the unit test suite and handful of example programs are close to exhaustive when it comes to validating that the encoding works as intended. However, I haven't found any bugs (but I also haven't tried very hard to see if I could expose any).

- The provided Docker image requires fetching sources from online URLs that (evidently) may cease to exist. It would be more future-proof to provide a packaged VM, or some other way that does not rely on external URLs.

Suggestions for improvement
---------------------------
I think it would have been helpful to spend a few sentences in the README (or in the paper) on how the type system of Links corresponds to the type systems covered in the paper.

It would also be helpful to point to the `links/doc/quick-help.pod` file which I found useful for reading Links code (e.g., why function arrows are written `~>`).

Consider packaging your artifact as a VM or some other format that does not rely on external URLs.

More documentation in the rest of the Links code base wouldn't hurt, and perhaps some more thorough documentation of the Links language itself, although I realize that this is perhaps a little beyond the scope of this paper.

Future work suggestion: the compilation to JavaScript sounds complicated -- have you considered WebAssembly?

Another future work suggestion (I think?): the current model of distributed communication relies on passing all messages through a centralized server which seems undesirable for some applications.
Does Links/WebSockets also support client-to-client communication?

Again, it seemed to go down well: the reviewer understood and managed to run the artifact, and provided useful suggestions and comments.

Two comments stood out as particularly useful clarifications to make:

It's not clear how the type system of Links relates to the type systems in the paper. For example, the type systems described in the paper clearly distinguish linear and unrestricted types. It seems that Links makes this distinction for session typed channels. Whether it makes the distinction for other types is unclear.

This is a really good point, and deserving of clarification in the README. The EGV calculus is a small formal model designed to facilitate mathematical reasoning, but in fact the implementation is based on the larger and more full-featured FST calculus described by Lindley and Morris.

Related to the point above: there's an interesting remark in the `links/core/desugarSessionExceptions.ml` file about the T-Try rule from fig. 4 in the paper, essentially stating that this rule is unsound as-is, but that it's possible to recover the expressiveness of the rule by a desugaring. Does the paper proof of safety rely on this desugaring, or is the necessity of this desugaring specific to Links?

We implement exception handling by translating to effect handlers. However, the translation into linear effect handlers need a more restrictive typing rule for exception handlers in order for the translation to be sound. This is only an implementation detail with no impact on the metatheory described in the paper, but it’s worth spelling out in more detail (and we have done since, in the extended version of the paper and the updated README) since it caused confusion.

The reviews were really helpful in making clarifications to the README file. To contrast with the submitted version, you can find the final README here.

Outcome

And with that, we waited a few more days, until I received the following e-mail:

Dear Simon,

Congratulations! We are pleased to inform you that the artifact of your
paper Session Types without Tiers was accepted by the POPL artifact
evaluation committee, who deemed that it met or exceeded the expectations
set by your paper. The reviewers decided to award your artifact the
“Artifact Evaluated - Reusable” badge

Of the 77 conditionally accepted papers, 43 submitted artifacts, and 42 of
those were found to meet or exceed expectations, with 29 receiving the
Reusable badge and 13 receiving the Functional badge.

The artifacts were reviewed by an outstanding committee:
https://popl19.sigplan.org/committee/popl-2019-artifact-evaluation-artifact-evaluation-committee
They put significant effort into the reviews and discussions, which often
resulted in revisiting the artifact and further discussions. Their work
demonstrated a deep interest in doing the process well. Each artifact
received two reviews.

The reviews are below. We strongly encourage you use the suggestions of the
reviewers to improve the artifact and (in some cases) the paper. Reviewers
are still finishing up summaries of our discussions; these will appear as
an author-visible comment on HotCRP shortly. The publishers will add their
badge(s) to papers after you submit the final version.

We encourage authors to make their artifacts publicly available in some
form. Authors who indicated they would make their artifact available in a
way that enables reuse are expected to release it under an open source
license, for instance on a platform like GitHub, GitLab, BitBucket,
SourceForge, etc. Moreover, authors who make a snapshot of their artifacts
available eternally on ACM DL will also get the “Artifacts Available”
badge. It is, however, up to you whether and how you want to make your
artifact available and storing a snapshot on ACM DL does not prevent you in
any way from also putting your code on GitHub or wherever you like.

Thanks again for submitting your artifact!

Best Regards,
Catalin and Ben
POPL'19 AEC Chairs
popl19aec.chairs@gmail.com

and soon thereafter a nice comment on the submission site:

Comment @A4 by Reviewer A
---------------------------------------------------------------------------
We have decided that this artifact is Reusable, thanks to a practical language implementation and several enlightening example applications. Researchers experimenting with session types or web application programming should have no trouble modifying and extending the language or developing more applications.

All’s well that ends well :)

Thoughts

I’m undoubtedly glad I submitted an artifact. I was pretty humbled by the effort put into the reviews, and found the suggestions from the reviewers useful.

I’m glad I gave Docker a try (and I did really like it!), but I do wonder whether it’s the best tool for submitting artifacts for evaluation. The nature of Docker images in running a set of install steps does seem to open the doors to unintentionally including external URLs and thus does seem to make such artifacts less future-proof than is necessary for the process. Of course, this might just be my relative inexperience with Docker. Despite my reservations of making people download a 4GB file to see what might be ~10MB of actual data, it seems to be the easiest and most reliable way to ensure that reviewers can easily run the application without relying on any external data. Maybe the best thing to do is to submit the VM image for evaluation and archiving, but also link to a GitHub snapshot of the files for easy browsing (indeed, I did end up making a GitHub snapshot available, and I plan to have a VM image available before the conference).

The AEC awarded two types of badges this year: functional, and reusable. I was happy we got awarded the reusable badge, but am still a bit confused as to the standards expected by each. I expect this will become clearer with time and experience.

The interactive HotCRP instance was invaluable – I do wonder what would have happened had we not had the chance to update the image to correct the problem.

Finally, thanks to the AEC Chairs and the AEC. The process seemed smooth and professional throughout. I imagine it was a phenomenal amount of work, but I greatly appreciated it.