EBNF tooling

2023-09-07 - 8 min read

You may find the editor at gassmann.dev/ebnf.

This semester I am going to be teaching the course "Einführung in die Programmierung" (Introduction to Programming) at ETH Zürich. One idea that has been on my mind for quite some time is to create an EBNF verifier, along with an editor that provides basic EBNF language support. Given that I've already been working with it a few years ago, I decided to use Monaco editor, Microsoft's web-based editor, which also powers Visual Studio Code.

First of all, I'd like to point out that the vast majority of the grammar simplification/normalization, as well as the CYK algorithm, has been adapted from github.com/JM4ier/parsley.

The EBNF parser mostly follows the EBNF syntax as discussed in the "Einführung in die Programmierung" course taught at ETH Zürich. Some things to be aware of include:

  • The input is evaluated against the last rule
  • Each EBNF rule occupies exactly one line
  • Comments can be created using #
  • Empty lines are skipped
  • Sequences bind stronger than selections, i.e. <r> <= [+|-] 0|2|4|6|8 is interpreted as <r> <= ([+|-] 0)|2|4|6|8 and not as <r> <= [+|-] (0|2|4|6|8). This also coincides with the rules of the lecture, see slide 10 of this document. Always set parentheses to avoid confusion.
  • You won't be able to use ε to match an empty sequence, use "" instead. The following rule matches the empty string:
    <empty> <= ""
    

Currently there is an EBNF verifier and an EBNF producer. The EBNF verifier checks whether a given input matches the EBNF rules, while the EBNF producer incrementally generates words that conform to the EBNF rules, with increasing word length.

Verifier

The verifier provides a basic editor where input to check against the rules can be entered. Some things to keep in mind:

  • Each line is evaluated against the EBNF ruleset separately
    • If a line matches the EBNF ruleset, the line is highlighted green
    • If a line does not match the EBNF ruleset, the line is highlighted red

You can try the EBNF verifier down below with the EBNF rules from the HS22 exam:

EBNF Rules:

Loading...
Loading...

Please make sure you have some valid rules first.

TODO: You should be able to compare two different EBNF descriptions for equality here. I did not have time to implement this yet :(

TODO: You should be able to enter a word and see a tree detailing why this word is matched here. I did not have time to implement this yet :(

There are also some deviations to EBNF as it is being taught at ETH Zürich. For instance, the EBNF ruleset above matches expressions like *(a + b) but not *(a+b). To faciliate things, you might consider adding a rule akin to <whitespace> <= {" "}. In general:

  • This parser is sensitve to whitespace
  • For string literals containing whitespace or that themselves contain EBNF tokens, use quotation marks

With an additional whitespace rule as suggested above, we'd have a ruleset similar to the one below (which now also matches *(a+b)):

EBNF Rules:

Loading...
Loading...

Please make sure you have some valid rules first.

TODO: You should be able to compare two different EBNF descriptions for equality here. I did not have time to implement this yet :(

TODO: You should be able to enter a word and see a tree detailing why this word is matched here. I did not have time to implement this yet :(

Producer

Now, regarding the EBNF producer:

  • Words are sorted by length first, then lexicographically
  • Currently, words are generated on the UI thread, so the performance may vary depending on your ruleset
    • I will probably use web workers for this if I find the time

Again, you can try the EBNF producer down below, just repeatedly press Produce words of length [...] until you see more words that conform to these EBNF rules:

EBNF Rules:

Loading...
Loading...

Please make sure you have some valid rules first.

TODO: You should be able to compare two different EBNF descriptions for equality here. I did not have time to implement this yet :(

TODO: You should be able to enter a word and see a tree detailing why this word is matched here. I did not have time to implement this yet :(

Keep in mind that depending on your EBNF ruleset, the number of possible words may grow exponentially as a function of word length.

Integrating basic EBNF language support into Monaco Editor has proven remarkably straightforward, especially if you already have an EBNF parser in place. Things like commenting using Ctrl + K, Ctrl + C or uncommenting using Ctrl + K, Ctrl + U also require very little configuration.

Language support extends beyond syntax errors and also encompasses some semantic errors, such as referencing a non-existing rule, as illustrated in the example below:

EBNF Rules:

Loading...
Loading...

Please make sure you have some valid rules first.

TODO: You should be able to compare two different EBNF descriptions for equality here. I did not have time to implement this yet :(

TODO: You should be able to enter a word and see a tree detailing why this word is matched here. I did not have time to implement this yet :(

All errors have squiggly lines with a description upon hovering over the offending token.

The EBNF verifier/producer view below the rule editor automatically update once the ruleset changes, i.e. the verifier reevaluates whether the input matches the EBNF rule and the producer resets itself.

The producer also detects whether a given ruleset can only produce the empty word. There are also some heuristics in place to detect whether no more words can be produced from the given EBNF ruleset.

EBNF Rules:

Loading...
Loading...

Please make sure you have some valid rules first.

TODO: You should be able to compare two different EBNF descriptions for equality here. I did not have time to implement this yet :(

TODO: You should be able to enter a word and see a tree detailing why this word is matched here. I did not have time to implement this yet :(

TODOs

You can also find the EBNF parser here outside the blog-post-context. There are still many things the editor could be extended with. Some TODOs include:

  • Dark mode (main problem here is finding the colors for syntax highlighting)
    • Dark mode on this website is still very basic in general, only the Polyring widget supports it, I'm bad at choosing colors
  • Debounce input for rules, especially for larger rules
  • Don't load monaco from an external CDN and use the webpack plugin
  • Produce words in a web worker instead of doing it on the UI thread
    • Maybe don't produce words incrementally by word length but rather by some other useful incremental unit?
    • Also show produced words with virtual scroll
  • Autocomplete using monaco's auto-complete API
  • Add an escape character to e.g. allow matching quotation marks (")
  • serve as a PWA

I do not guarantee the correctness of any results. In case, you spot any errors, feel free to send me an email.

Updates

Update 2023-09-16

You can now also use the above EBNF component as a web component:

<ebnf-editor height="100px"></ebnf-editor>
<script type="module" src="https://thomasgassmann.com/widgets/ebnf.js"></script>

The height property is the only property that is currently exposed. It is required. The editor is going to take up the full width of its parent element.

Update 2023-09-20
  • Fixed some issues with the parser mistakenly accepting incomplete choice selections.
  • Disallow creating empty rules (matched the empty string before)
Update 2023-09-22
  • Added fullscreen mode
  • Moved share button to toolbar
  • Show credits for parsely and docs in toolbar in all cases (not just when referenced via widget)
  • Added fullscreen page
Update 2023-09-25
  • Added a neat easter egg for late-night EBNF enjoyers
Update 2023-09-27
  • Added full page mode support to EBNF widget. Use the fill property:
    <ebnf-editor height="100px" fill="true"></ebnf-editor>
    <script type="module" src="https://thomasgassmann.com/widgets/ebnf.js"></script>
    
  • Fixed an issue with CRLF / LF line breaks
  • Added Debug button to print output of parser to console
  • Clarify binding preference in documentation above (sequences bind stronger than selections)
Update 2023-09-30
  • Properly documented how to match empty sequences and emphasized that ε is not a keyword to do so within this editor
Update 2023-10-01
  • Do not show Debug button by default. Allow users to optionally enable it on the debug page.