Proposal for Standardization of C++ Statement Expressions

ISO/IEC JTC1 SC22 WG21 TBD = TBD - 2017-02-06

Lorand Szollosi, szollosi.lorand@gmail.com or lorro@lorro.hu

Introduction

Statement expressions were introduced in GCC 3 and quickly implemented - to various extent - by Clang, IBM, Intel, Sun and Open64 and other compilers. A statement expression is a sequence of (zero or more) statements followed by an expression, placed between ({ and }), which yields the same type as the last expression, and might appear anywhere where an expression is allowed. Furthermore, parametric statement expressions are proposed to replace the usual macro-based usage pattern.
An optional part of the proposal is to allow using the control flow / continuation management functions of the evaluating context. This means allowing to return from a SE that's inside a function; allowing break inside loops and switch statements; continue inside loops; also, the proposed coroutine keywords once accepted.

Motivation and Scope

Originally, statement expressions were implemented to provide a way for defining variables within macros. Had this been the only use case, one would recommend using template functions. However, several other use cases were discovered, mainly due to the fact that flow control statements are available inside statement expressions. This allows for safe custom control structures in library (vs. language), e.g: (note that a is not necessarily bool, we only assume castable-to-bool.

#define return_if_false(a) ({ auto tmp = (a); if (!tmp) return std::move(tmp); std::move(tmp); })

Statement expressions have an easy learning curve in the beginning, up to nested expressions, which require careful analysis w.r.t. destruction order and flow control.

Since statement expressions are already widespreaded, this document serves rather as a classification of supported feature sets and a set of proposed macros to detect those. In the second part, parametric statement expressions are proposed to overcome the limitation of macros.

Impact On the Standard

Statement expression support is a new language feature from the language's p.o.v. Accepting it means changing the parser to accept one more expression variant:

expression:: ({statement-seq_opt expression; })

A note on destructor order should be added for variables created within statement expressions - currently compilers which accept destructors in SE agree that variables - except for the last expression - should be destructed at the evaluation of the expression, in the same manner as automatic storage duration variables within a function are destructed on return. The entire SE has the evaluation type of the last expression, but special care must be taken as control might be transferred before reaching the last expression.

Statement expression proposal should be viewed in relation to other, related proposals, including, but not limited to:

Abbreviated lambda syntax w.r.t. parametric statement expressions
Continuations proposal
For..else and for..break proposals, which can partially be implemented by SE
TBD proposal on capture of caller's pre-defined continuations and inline exceptions

These are not competing proposal; rather, these can be viewed of aspects and levels of an integrated feature set.

Design Decisions

Several implementations of various subsets of proposed functionality already exists. Therefore, the common subset was taken as a basis, while compiler-specific and proposed (i.e., currently not existing) features listed as options. This allows for staged acceptance: a possible outcome is to accept the common subset and the feature discovery macros as a common subset, while allowing compiler implementors to converge with optional features. The table below lists the features proposed.

**Statement expression features**
Proposed discovery macro	Feature
(core)	Basic statement expression support
(core)	POD class definitions in SE
__cplusplus_se_move	Copy elision support
__cplusplus_se_destructor	Support for variables with non-trivial destructors in SE
__cplusplus_se_ret	Support for `return`, `break` and `continue` in SE
__cplusplus_se_static	Dynamically initialized local static variables in SE
__cplusplus_se_class	Non-POD class definitions in SE
__cplusplus_se_except	Support for `throw` and `catch` inside SE
__cplusplus_se_param	Support for parametric SE
__cplusplus_se_named	Support for named parametric SE

Furthermore, there exist support for some exotic features in comilers which is not proposed to be included in the standard. These include goto and switch into, out of and across multiple different statement expressions.

Technical Specifications

Changes to grammar

expression should allow for ({ statement-seq (opt) expression; }). An implementation might place further requirements on statement-seq that are discussed below.
Under discussionThe above is the current industry-wide status quo. However, alternatives produce and => were proposed. The latter is intuitive once simplified lambdas are accepted; the former can be a macro that can be defined empty in pre-standard and => in standard-compliant compilers. Also, it was proposed to allow multiple => points. Current version of standard proposal is based on last expression; it will be rewritten once this part is decided.

Macro definitions

Based on the decision of the committee, a subset of the functions in the above table might be accepted into the standard and required to be supported. Furthermore, recommendations are made for implementations wishing to support further features. A compiler should define the corresponding macro if and only if it fully supports the feature. If defined, the macro should expand to the standard version (i.e., similar to __cplusplus) that is supported. This allows feature detection and workarounds if a given feature is not supported in a given compiler.

Basic statement expression support

Support for statement expressions that don't require any of the additional features listed in the table above. The statement sequence is evaluated before the last expression; then the value of the statement expression is that of the last expression. Example: (for illustration purposes only, not the recommended approach to product calculation)

int product = ({
    int j = 1;
    for (auto i : v)
        j *= i;
    j;
});

POD class definitions in SE

Support for POD class definitions inside statement expressions. Example: (given key_t and value_t are PODs)

#define try_find(haystack, needle, key, value) \
({                                                              \
    using key_type   = decltype(haystack.find(needle)->first ); \
    using value_type = decltype(haystack.find(needle)->second); \
    auto it = haystack.find(needle);                            \
    struct result_t {                                           \
        key_t   key;                                            \
        value_t value;                                          \
    };                                                          \
    std::optional<result_t> result;                             \
    if (it != haystack.end()) {                                 \
        result = result_t{ it->first, it->second };             \
    }                                                           \
    std::move(result);                                          \
})

Copy elision support

The last expression is not copied when the expression is evaluated.

**Returning large objects from SE**
Without copy elision	With copy elision
`({ std::vector<MyObject> largeVector = { ... }; std::move(largeVector); })`	`({ std::vector<MyObject> largeVector = { ... }; largeVector; })`

Support for variables with non-trivial destructors in SE

Support for automatic storage duration variables with non-trivial destructor inside statement expressions. It is suggested to define destructor execution order exactly as if the statement expression's body were a function, with the last expression being a return statement. This feature also means that proper move support is provided. Example:

#define get_or_create(T, ...) \
({                                           \
    std::unique_ptr<T> p(try_get<T>());      \
    if (!p)                                  \
        p = new T(__VA_ARGS__);              \
    std::move(p);                            \
})

Support for `return`, `break` and `continue` in SE

Support for basic control transfer that leaves the statement expression. Example:

#define return_if_false(a) ({ auto tmp = (a); if (!tmp) return std::move(tmp); std::move(tmp); })

#define accumulate_range_se(range, init, op) \
({                                 \
    auto result = (init);          \
    for (auto elem : (range))      \
        result = op(result, elem); \
    result;                        \
})

#define product_op(lhs, rhs) \
({                           \
    if (rhs == 0) return 0;  \
    lhs * rhs;               \
})

int product_of_vector(const std::vector<int>& v)
{
    return accumulate_range_se(v, 1, product_op);
}

Dynamically initialized local static variables in SE

Support for dynamic initializers of local static variables in statement expressions. Example:

#define global_countdown() \
({                                                          \
    static int countdown = get_countdown_max_from_config(); \
    --countdown;
})

Non-POD class definitions in SE

Support for non-POD class definitions in statement expressions. Example: same as POD class definition in SE, but with non-PODs.

Support for `throw` and `catch` inside SE

Support for exception handling inside statement expressions, including exceptions leaving the SE and exceptions caught inside SE that are thrown either inside or come from a called source. Example:

#define call_or_default(expr, def) \
({                                \
    decltype(expr) result;        \
    try        { result = expr; } \
    catch(...) { result = def;  } \
    result;                       \
})

Support for parametric SE

Support for parametric statement expressions. This is not yet a supported feature (at the time of writing) by any compiler. Parametric statement expression support tries to solve the problems that arise due to the heavy use of macros when implementing patterns via SE. A parametric statement expression is proposed to be similar to a lambda function in the capture and arguments parts, but has the body as a statement expression. Such an expression can use the flow control statements of the creating context. It is probably desirable to fix the capture list of the parametric SE as [&]. Calling the parametric statement expression is allowed from the block and until leaving the block in which all the flow control statements (return, break and continue) are defined to be the same as in the block defining the parametric SE. Calling it from outside this block might be defined UB; or, if desirable, as a compile-time error for which diagnostic is required. Example:

template<typename R>
int product_of_range(const R& r)
{
    return std::accumulate(begin(r), end(r), 1, [&](int lhs, int rhs) ({
        if (rhs == 0) return 0;
        lhs * rhs;
    }));
}

Support for named parametric SE

Support for named parametric statement expressions. This is not yet a supported feature (at the time of writing) by any compiler. Named parametric SE is similar to a function in terms of (optional) template arguments, return type and arguments; but it has the body as a statement expression. Predeclaration of a name parametric SE, if allowed, would be identical to the corresponding function / template function / member function / member template function. The definition of a named parametric SE must be visible to the caller. Compile-time diagnostics is required if the named parametric SE uses any flow control statements (return, break or continue) which are invalid in the calling block. When evaluated, the named parametric SE might use the flow control statements of the caller; this means in particular that it returns to the caller via the last expression, not by return statement, which returns to the caller's caller. Example:

template<typename T>
T break_if_zero(T t)
({
    auto tmp = std::move(t);
    if (!tmp) break;
    std::move(tmp);
})

template<typename R>
int product_of_range(const R& range)
{
    int result = 1;
    for (int elem : range)
        result *= break_if_zero(elem);
    return result;
}

Acknowledgements

Thanks for the support from ISO C++ Standard - Future Proposals Group:

Thanks for the support from co-workers on earlier, related proposal versions:

TBD

References

TBD