BerandaComputers and TechnologyBig Changes to the Oil Language

Big Changes to the Oil Language

blog | oilshell.org

2020-10-31

I recently released Oil 0.8.3, and it’s the biggest release in
recent memory! What’s new?

  • Many changes to the Oil expression language, including
    Python compatibility and legacy syntax removal.
  • Changes to the word syntax, e.g. the @ sigil.
  • Keyword changes, like the removal of pass.
  • New functions, like _match() to access eggex matches.
  • Enhancements to shell builtins, including QSN
    support in read and write.
  • The comprehensive errexit overhaul, mentioned in the last
    post
    .
  • Many new docs

This is the first of two posts that describe the language changes. Separately,
I plan to write “the ultimate guide” to error handling in shell.

If you’re not familiar with Oil, see the new Language
Influences
and Oil Language
Idioms
docs, as well as posts tagged
#oil-language.

Help Wanted

If you’re interested in Oil, now is a great time to get involved. Recall that
the last post said that OSH would have four significant
fixes, but the rest of the project was too much work. The work described
here is what I need help with
!

Toward the end, I recently updated these pages:

Asking questions and leaving feedback about the language on
Zulip is also appreciated! Several people have influenced the
language design this way.

Operators (Expression Mode)

The expression language lets you talk about typed data with operators and
literals. Let’s review those changes first.

Return to Python Compatibility

Last year, Oil had some “cleanups” of the Python expression language, but I
decided that the unfamiliarity isn’t worth it. I reverted them, so:

  • Integer division div is back to //
  • Modulus mod is back to %
  • xor is back to ^
  • Exponentiation ^ is back to

(The appendix has some rationale for this.)

++ to concatenate, ~~ and !~~ to match globs

The ++ operator is for string and list concatenation. That is, a + b
always does math, and a ++ b always does concatenation.

This is to support Awk-like auto-type conversion. Similarly, comparison
operators like < and <= will only work on numbers, and we'll use a
different syntax for strings. (Yes, I realize the danger with such type
conversion!)

The ~~ and !~~ operators are for glob matching. They deprecate [[ x == *.py ]] in bash.

Literals (Expression Mode)

Dicts are {}, not %{}

This is another return to Python compatibility.

We used sigils like %{foo: 42} in dict literals because Oil uses { } for
C-like statement blocks, and it lacks semicolons.

Making the tokens distinct is one way to avoid a subtle parsing issue. This
Hacker News comment
about the
Dart language describes some of the difficulties with using {} in both
expressions and statements.

However, Oil's problem is not as hard as Dart's, and I solved it by simply
including newlines in the grammar. A key-value pair can be on a line:

var mydict = {
  server: "www.example.com"  
  port: 80
}

But you can't split it across lines


var mydict = {
  server:
    "www.example.com"
}

without either () or :

var mydict = {
  
  server: (
    "www.example.com"
  )
}

It was bugging me that lists are just [1, 2, 3], while dicts were %{key: 'value'}. This is now fixed!

(Good Zulip Feedback on Line
Breaking
.
I'm still looking for more feedback.)


I also removed the %[] syntax , which was an overly ambitious idea for typed
array literals. We already have %(one two) for shell-like arrays, and
['one', 'two'] for Python/JS-like lists.

(Aside: Perl and Ruby have qw(one two) or qw[one two] which is like our
%(one two).)

Blocks are &(echo $PWD)

Oil's Ruby-like blocks are "first class". Normally they're passed to procs as
the last argument:

cd /tmp {
  echo $PWD
}

But we also need them in expression
mode
.
I decided on the syntax &(echo $PWD).

This may seem inconsistent at first, but it's consistent with command subs:

var b1 = $(echo $PWD)  
var b2 = &(echo $PWD)  

Chars are u{012345}

Character literals stand alone in the expression language, like

var x = u{3bf}  

That is, you don't need quotes. They're for both "code point literals"
("runes" in Go) and eggex char classes.

This syntax is now consistent within C-escaped strings like $'' and c'',
and QSN, which leads us into the next section.

Tightened Up String Literals

Shell has a rich string literal syntax. Oil inherits all of its power, but (as
of this release) removes unnecessary flexibility.

C-Style

Here are some C-style strings:

echo $'C-style'
echo $'n i'               
echo $'123 x01 x1'      
echo $' u1234 U00012345'  

Notes:

  1. n is a valid char escape, but i is an invalid one. Bash accepts
    it and prints i literally.
  2. Octal escapes and hex escapes can express exactly the same bytes.
  3. Hex escapes can be abbreviated x1 instead of x01.

I made the following changes to simplify this syntax:

  1. Disallow invalid char escapes.
  2. Disallow all octal escapes.
  3. Disallow single char hex escapes. Must be xHH.
  4. Disallow the two unicode escapes in favor of the QSN/Rust style u{12345},
    which I added support for.

As usual, we do a dance to avoid breaking existing code, while preventing
legacy from creeping into the Oil language:

  • In command mode, shopt --unset parse_backslash enables all these syntax
    errors. This is the default in bin/oil (option group oil:all).
  • In expression mode, they're always disallowed, even when running bin/osh.
    Legacy shell scripts don't have expressions, so this is OK!

A Superset of QSN

Now that we have u{12345}, we have an interesting property: any QSN
string is now an Oil string! Though you have to add a $ sigil:

echo $'QSN and Oil \ n'    

var mystr = $'x01 u{3bf}'  
var mystr = c'x01 u{3bf}'  

Double Quoted

Here are some doubled quoted strings:

echo "double quoted"
echo "$ i"         
echo "\  ."        
echo "$ $ ."        
echo "old: `hostname`, new: $(hostname)"  

Oil makes the following changes:

  • parse_backslash makes i and a syntax error. Add the
    to fix it.
  • parse_dollar makes $ a syntax error. Ditto.
  • parse_backticks makes the old command sub style a syntax
    error. Use the new style.

These options are unset in the option group
oil:all.

Aside: our lexing style is awesome for making these
changes!

Word Syntax (Command Mode)

I made similar changes to unquoted words.

parse_at_all Reserves Words Beginning With @

In the oil:basic option group, we allow this syntax, but we only break the
bare minimum:

echo @myarray

But the oil:all option group reserves any word beginning with @, like:

@{} @[] @// @'' @""

This will be useful for future language extensions. That is, creating more
syntax errors lets the language evolve.

I also expect shopt --unset parse_dollar to have this benefit. It allows us
to parse inline eggexes like $/ digit+ /.

parse_dollar Again, For Strictness

To recap:

No:

echo $
echo "$"

Yes:

echo $
echo "$"

TODO: We also need to support strict_backslash in unquoted words.

Next

This post got long, so I split it into two parts. The next part will review
changes in Oil keywords, stdlib functions, shell builtins, and documentations.

Let me know what you think of these changes!

Appendix: The Tea Language

One reason to be more Python compatible is that I have a quixotic plan to
self-host Oil and expose the metalanguage to users. That is, our DSLs:

should be combined into one language, which I'm calling "Tea".

Against my better judgement, I brought this up on
Reddit

and on
lobste.rs
.
Briefly, Tea can be described as statically-typed Python with sum types
— which someone asked actually for!

And it should have metaprogramming features to express the equivalent
of Oil's use of textual code generation.

I wrote a working grammar to design Tea's syntax [[ x == *.py ], but that's the only
implementation so far. It would be a large project, but it's also a concrete
one, because we have 30K-60K+ lines of working code as a use case.

If you want to work on a statically typed language, let me know! I don't know
how to write a type checker, and can use help.

Even if Tea doesn't get done, Oil will be useful either way. We can continue
using these DSLs for a long time.


[[ x == *.py ] The entire language is expressed in the grammar as a big expression, using
a single lexer mode. It's nowhere near as complicated as
shell!

Read More

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments