Proposal for Standardization of C++ Statement Expressions

ISO/IEC JTC1 SC22 WG21 TBD = TBD - 2017-02-06

Lorand Szollosi, szollosi.lorand@gmail.com or lorro@lorro.hu

Table of Contents

Introduction

Statement expressions were introduced in GCC 3 and quickly implemented - to various extent - by Clang, IBM, Intel, Sun and Open64 and other compilers. A statement expression is a sequence of (zero or more) statements followed by an expression, placed between ({ and }), which yields the same type as the last expression, and might appear anywhere where an expression is allowed. Furthermore, parametric statement expressions are proposed to replace the usual macro-based usage pattern.
Part of the proposal is to allow using the control flow / continuation management functions of the evaluating context. This means allowing to return from a SE that's inside a function; allowing break inside loops and switch statements; continue inside loops; also, the proposed coroutine keywords once accepted. Furthermore, an optional part is to allow for named / parametric statement expressions (NSE/PSE), which can be thought of function-like and lambda-like SEs, respectively.
A simplified syntax is proposed that makes it convenient to pass a locally described PSE to another NSE/PSE along with other parameter(s). This moves proposals like for...else, for..if(break), for..if(!break), for..if constexpr (first), if(auto&& x : opt) et al. from language scope to library scope.
While not strictly a part of the proposal, the idea of inline exception handling is also described (throw inline, catch inline). This is to contrast inline handling to capturing control / continuation switch in statement expression.

Motivation and Scope

Originally, statement expressions were implemented to provide a way for defining variables within macros. Had this been the only use case, one would recommend using template functions. However, several other use cases were discovered, mainly due to the fact that flow control statements are available inside statement expressions. This allows for safe custom control structures in library (vs. language), e.g: (note that a is not necessarily bool, we only assume castable-to-bool.

#define return_if_false(a) ({ auto tmp = (a); if (!tmp) return std::move(tmp); std::move(tmp); })

Another use case is complex variable initialization. This is now usually done with lambdas:

std::vector<Handler> handlers = [&]{
    std::map<HandlerEnum, Handler::CPtr> e2handler{ { EBlueHandler,  new BlueHandler (...) },
                                                    { EGreenHandler, new GreenHandler(...) },
                                                    { ERedHandler,   new RedHandler  (...) },
                                                    { EPinkHandler,  new PinkHandler (...) } };
    assert(e2handler.size() == NHandlers);
    return e2handler | map_values;
};
The problem with this is two-fold: Consider instead:
std::vector<Handler> handlers = ({
    std::map<HandlerEnum, Handler::CPtr> e2handler{ { EBlueHandler,  new BlueHandler (...) },
                                                    { EGreenHandler, new GreenHandler(...) },
                                                    { ERedHandler,   new RedHandler  (...) },
                                                    { EPinkHandler,  new PinkHandler (...) } };
    assert(e2handler.size() == NHandlers);
    produce e2handler | map_values;
});

Statement expressions have an easy learning curve in the beginning, up to nested expressions, which require careful analysis w.r.t. destruction order and flow control.

Since statement expressions are already widespreaded, this document serves rather as a classification of supported feature sets and a set of proposed macros to detect those. In the second part, parametric statement expressions are proposed to overcome the limitation of macros.

Impact On the Standard

Statement expression support is a new language feature from the language's p.o.v. Accepting it means changing the parser to accept one more expression variant:

expression:
({statement-seqopt expression; })
A note on destructor order should be added for variables created within statement expressions - currently compilers which accept destructors in SE agree that variables - except for the last expression - should be destructed at the evaluation of the expression, in the same manner as automatic storage duration variables within a function are destructed on return. The entire SE has the evaluation type of the last expression, but special care must be taken as control might be transferred before reaching the last expression.

Statement expression proposal should be viewed in relation to other, related proposals, including, but not limited to:

These are not competing proposal; rather, these can be viewed of aspects and levels of an integrated feature set.

Design Decisions

Several implementations of various subsets of proposed functionality already exists. Therefore, the common subset was taken as a basis, while compiler-specific and proposed (i.e., currently not existing) features listed as options. This allows for staged acceptance: a possible outcome is to accept a subset of the features proposed and a subset of the feature discovery macros for optional features, which allows compiler implementors to converge with the latter. Features are discussed in the next chapter.

To better understant what SEs / PSEs / NSEs do, we first need to discuss the currently available and proposed control / continuation switching constructs. The table below summarizes these.

Control / continuation switching constructs
ConstructClassificationWhereArgument
returnstmtin function / lambdaone, defining return type if auto, matching it otherwise
breakstmtinside for, while, switch; in SE / PSE inside the abovenone
continuekstmtinside for, while, switch; in SE / PSE inside the abovenone
throwexpranywherenone or one, defining the type of exception to be thrown
producestmtproposed: in SE / PSE / NSEone, defining the evaluation type of SE
co_yieldstmtproposed: in coroutine, in SE / PSE inside coroutineone, defining return type if auto, matching it otherwise
co_returnstmtproposed: in coroutine, in SE / PSE inside coroutineone, defining return type if auto, matching it otherwise
throw inlineexprproposed: in locations where the corresponding catch exists and is trivially deducible in compile-time (i.e., the list of calls from the try block to the throw inline are all inlines residing in the same compilation unit)none or one, defining the type of exception to be thrown

Furthermore, there exist support for some exotic features in comilers which is not proposed to be included in the standard. These include goto and switch into, out of and across multiple different statement expressions.

A note on produce: current implementations do not have a keyword for this, but instead evaluate to the last expression. This allows evaluation only in one point, which is inconvenient. A keyword is suggested as that allows for multiple evaluation points. When a code uses produce only as the last expressions, older compilers might use a #define produce to become compatible. Optionally, the last expression as result rule might be allowed to remain compatible with old codes. Note that this is only necessary when there's no produce in the SE.

Technical Specifications

Changes to grammar

expression should allow for ({ statement-seq (opt) expression; }). An implementation might place further requirements on statement-seq that are discussed below.
template-parameter should allow for break, continue, return (and from the continuations proposal, co_return and co_yield).

Macro definitions

Based on the decision of the committee, a subset of the functions in the above table might be accepted into the standard and required to be supported. Furthermore, recommendations are made for implementations wishing to support further features. A compiler should define the corresponding macro if and only if it fully supports the feature. If defined, the macro should expand to the standard version (i.e., similar to __cplusplus) that is supported. This allows feature detection and workarounds if a given feature is not supported in a given compiler.

Basic statement expression support

In common subset, no macro proposed

Support for statement expressions that don't require any of the additional features listed in the table above. The statement sequence is evaluated before the last expression; then the value of the statement expression is that of the last expression. Example: (for illustration purposes only, not the recommended approach to product calculation)

int product = ({
    int j = 1;
    for (auto i : v)
        j *= i;
    produce j;
});

Support for return, break and continue in SE

Proposed macro: __cplusplus_se_ret

Support for basic control transfer that leaves the statement expression. Example:

#define return_if_false(a) ({ auto tmp = (a); if (!tmp) return tmp; produce tmp; })

#define accumulate_range_se(range, init, op) \
({                                 \
    auto result = (init);          \
    for (auto elem : (range))      \
        result = op(result, elem); \
    produce result;                \
})

#define product_op(lhs, rhs) \
({                           \
    if (rhs == 0) return 0;  \
    produce lhs * rhs;       \
})

int product_of_vector(const std::vector<int>& v)
{
    return accumulate_range_se(v, 1, product_op);
}

Support for parametric SE

Proposed macro: __cplusplus_se_param

Support for parametric statement expressions. This is not yet a supported feature (at the time of writing) by any compiler. Parametric statement expression support tries to solve the problems that arise due to the heavy use of macros when implementing patterns via SE. A parametric statement expression is proposed to be similar to a lambda in the capture and arguments parts, but has the body as a statement expression. Such an expression can use the flow control statements of the creating context. It is probably desirable to fix the capture list of the parametric SE as [&]. When calling the parametric statement expression, the calling code can offer its control / continuation switching constructs in template parameter list (e.g. int life = return_if_false(42);). Such passing of a construct is not considered as a (numbered) template argument. It is a compile-time error if the PSE needs a construct from the caller that is not offered; it is allowed to offer constructs that are not needed in the PSE. Note that for a given set of argument types, both the return and produce type of a PSE are fixed at definition, not at calling. This is to simplify return type deduction of the caller of the PSE. It is a compile-time error to call a PSE that needs return from the caller and the return types of the PSE and the caller are incompatible. PSEs must be defined in the compilation unit where called and are inline candidates. Example:

template<typename R>
int product_of_range(const R& r)
{
    return std::accumulate(begin(r), end(r), 1, [&](int lhs, int rhs) ({
        if (rhs == 0) return 0;
        produce lhs * rhs;
    }));
}

Support for named SE

Proposed macro: __cplusplus_se_named

Support for named statement expressions. This is not yet a supported feature (at the time of writing) by any compiler. Named SE is similar to a function in terms of (optional) template arguments, return type and arguments; but it has the body as a statement expression. Predeclaration of a named SE, if allowed, would be identical to the corresponding function / template function / member function / member template function, but the predeclaration must include the control / continuation switching construct(s) needed from the caller. The latter is not necessary in the definition (or if the definition is the declaration). The named SE must be defined in the same compilation unit where called. Caller must offer control / continuation switching statements that the NSE can use. When evaluated, the named SE might use the flow control statements of the caller; this means in particular that it returns to the caller via the produce expression, not by return statement. The latter returns to the caller's caller. Example:

template<break, typename T>
T return_if_zero(T t);

template<typename T>
T return_if_zero(T t)
({
    auto tmp = std::move(t);
    if (!tmp) return 0;
    produce tmp;
})

template<typename R>
int product_of_range(const R& range)
{
    int result = 1;
    for (int elem : range)
        result *= return_if_zero(elem);
    return result;
}

Support for inline exceptions

No macro proposed. This feature is described here for completeness, it is a separately proposed feature.

It is proposed to allow explicitly stating that a throw should be inline candidate. Similarly, a catch might be specified to be inline, thus only catching inline candidate exceptions. Even template catch can be allowed for inline exceptions. Inline candidate here is similar to inline candidate functions: the complete call path from the try-block to the throw must be in the same compilation unit, no virtual calls, no function pointer calls allowed. If these are satisfied, an optimizing compiler should be able to compile a code which, from performance perspective, is similar to a goto (and manual destructors) on the exception path.

The most common use case is to throw inline within the same function (as a replacement for catch(break). A function that might be left via an inline exception is called leaking the exception. It is suggested to disallow taking the address of such a function (otherwise a shadow type system would be introduced). A SE / PSE / NSE might leak exceptions.

Example:

struct Break { elem_t elem_ };
try {
    for (auto&& elem : range) {
        if (fn(elem)) throw inline Break{ elem };
        process(elem);
    }
} catch inline (const Break& brk) {
    return brk.elem_;
}
postprocess();
return std::nullopt;

One possible way to implement NSE / PSE support is via regular SE and inline exceptions. Consider a function call that offers all the control flow statements discussed. It can be rewritten as a SE as shown in the table below.

NSE call vs. inline exceptions and SE
NSEinline exceptions and SE
// might use return, break, continue from caller; produce to pass result
template<return, break, continue, typename T>
T process_elem(T&& lhs, T&& rhs);

template<typename R>
int process_range(const R& range)
{
    int result = 1;
    for (int elem : range)
        result = process_elem<return, break, continue>(result, elem);
    return result;
}
struct Break    {};
struct Continue {};

// might throw int, Break, Continue; return to pass result
template<typename T>
T process_elem(T&& lhs, T&& rhs) // leaks int, Break, Continue

template<typename R>
int process_range(const R& range)
{
    int result = 1;
    for (int elem : range)
        result = ({
            try {
                process_elem(result, elem);
            } catch (int finalResult) {
                return finalResult;
            } catch (Break) {
                break;
            } catch (Continue) {
                continue;
            }
        });
    return result;
}

One can observe that, while rewriting as inline exceptions is possible (and might be the underlying implementation), it's definitely verbose. Named SE provides a convenient syntax.

Syntax sugar: in-place PSE declaration in an NSE call

No macro proposed. This feature is described here for completeness, it is a separately proposed feature.

Many proposals, including the for...else, for...break, if (auto&& x : opt) rely on a syntax similar to range-based for loops. These proposals could be moved from language to library if we extended the grammar to support in-place PSE declarations. Consider a for (auto&& x : r) { ... } loop: it can be viewed as an in-place void PSE that takes x as the parameter (of type deduced from *begin_expr(r)) and a repeated evaluation over the range by the for statement. Allowing this syntax for a user-defined NSE (of arbitrary return type) in place of for is sufficient to provide a close approximation of the above-defined features.

Thus the proposed feature is to make nse(auto&& x : v) { /* ... */ } equivalent to nse<return, break..., continue...>([&](auto&& x) ({ /* ... */ }), v) (see table).

In-place PSE in NSE call
Current workaround of missing featureWith PSE / NSE, without syntax sugarWith syntax sugar
// provides return, break and continue for body
template<return, template typename B, typename R,
                    template typename Br,
                    template typename E,
                    template typename D>
for_state<R> for_if(B body, R&& r,
                          Br on_break = []({ produce do_exit_loop }),
                          E on_empty = []({}), D on_done = []({}));
enum { elem_broken };
enum { iterator_broken };
enum on_break_enum { do_continue, do_reenter, do_exit_loop };

// provides return, break and continue for body
template< typename R>
struct for_if_holder { ... };

template<return, typename R>
for_if_holder::for_if_holder(R&&) { ... };

// dtor, .on_break(), .on_empty(), .on_done(), etc.

template<return, typename R>
for_if_holder for_if(R&&);
std::optional<int>
range_product(const std::vector<int> v,
              const int ignoreLimit)
{
    // return would need further handling
    enum for_action { EBreak, EContinue };
    int i = 1;
    bool empty = true;
    for (auto&& x : v) {
        empty = false;
        auto on_break = [&] {
            if (log(x)) return EContinue;
            else return EBreak;
        };
        if (check(x)) {
            for_action act = on_break();
            if (act == EBreak) break;
            else if (act == EContinue) continue;
        }
        if (!x) return 0;
        if (check(x)) {
            for_action act = on_break();
            if (act == EBreak) break;
            else if (act == EContinue) continue;
        }
        if (x > ignoreLimit) continue;
        i *= x;
    }
    if (!empty) { return i; }
    else        { return std::nullopt; };
}
template<typename R>
std::optional<int>
range_product(const std::vector<int> v,
              const int ignoreLimit)
{
    int i = 1;
    bool result = false;
    auto loop_body_SE = [&](auto&& x) ({
        auto on_break = [&] ({
            if (log(x)) continue;
            else        break;
        });
        if (check1(x)) on_break();
        if (!x) return 0;
        if (check2(x)) on_break();
        if (x > ignoreLimit) continue;
        i *= x;
    });
    auto on_break_SE = [&](auto&& x) ({
        if (log(x)) return do_continue;
        return do_exit_loop;
    });
    auto on_empty_SE = [&] ({ return std::nullopt; });
    auto on_done_SE  = [&] ({ return i; });
    for_if(loop_body_SE, v, on_break_SE, on_empty_SE, on_done_SE);
}
std::optional<int>
range_product(const std::vector<int> v,
              const int ignoreLimit)
{
    int i = 1;
    for_if(auto&& x : v) { // this creates a temporary
        if (check1(x)) break;
        if (!x) return 0;
        if (check2(x)) break;
        if (x > ignoreLimit) continue;
        i *= x;
    }
    .on_break(auto x : elem_broken) {
        if (log(x)) return do_continue;
        return do_exit_loop;
    }
    .on_empty( [&]({ return std::nullopt; }) )
    .on_done ( [&]({ return i; }) ); // dtor of temporary initiates loop
}

POD class definitions in SE

In common subset, no macro proposed.

Support for POD class definitions inside statement expressions. Example: (given key_t and value_t are PODs)

#define try_find(haystack, needle, key, value) \
({                                                              \
    using key_type   = decltype(haystack.find(needle)->first ); \
    using value_type = decltype(haystack.find(needle)->second); \
    auto it = haystack.find(needle);                            \
    struct result_t {                                           \
        key_t   key;                                            \
        value_t value;                                          \
    };                                                          \
    std::optional<result_t> result;                             \
    if (it != haystack.end()) {                                 \
        result = result_t{ it->first, it->second };             \
    }                                                           \
    produce result;                                             \
})

Copy elision support

Proposed macro: __cplusplus_se_move

The last expression is not copied when the expression is evaluated.

Returning large objects from SE
Without copy elisionWith copy elision
({ std::vector<MyObject> largeVector = { ... };
   produce std::move(largeVector); })
({ std::vector<MyObject> largeVector = { ... };
   produce largeVector; })

Support for variables with non-trivial destructors in SE

Proposed macro: __cplusplus_se_destructor

Support for automatic storage duration variables with non-trivial destructor inside statement expressions. It is suggested to define destructor execution order exactly as if the statement expression's body were a function, with the last expression being a return statement. This feature also means that proper move support is provided. Example:

#define get_or_create(T, ...) \
({                                           \
    std::unique_ptr<T> p(try_get<T>());      \
    if (!p)                                  \
        p = new T(__VA_ARGS__);              \
    produce p;                               \
})

Dynamically initialized local static variables in SE

Proposed macro: __cplusplus_se_static

Support for dynamic initializers of local static variables in statement expressions. Example:

#define global_countdown() \
({                                                          \
    static int countdown = get_countdown_max_from_config(); \
    produce --countdown;
})

Non-POD class definitions in SE

Proposed macro: __cplusplus_se_class

Support for non-POD class definitions in statement expressions. Example: same as POD class definition in SE, but with non-PODs.

Support for throw and catch inside SE

Proposed macro: __cplusplus_se_except

Support for exception handling inside statement expressions, including exceptions leaving the SE and exceptions caught inside SE that are thrown either inside or come from a called source. Example:

#define call_or_default(expr, def) \
({                                \
    decltype(expr) result;        \
    try        { result = expr; } \
    catch(...) { result = def;  } \
    produce result;               \
})

Acknowledgements

Thanks for the support from ISO C++ Standard - Future Proposals Group:

Thanks for the support from co-workers on earlier, related proposal versions:

TBD

References

TBD
Older version(s) of this document: