As I mentioned in my previous article many features introduced in the latest C++ standards allow functional patterns to thrive in your codebase. Two ideas from that programming paradigm that I really like are currying and partial application.
In this article we’re going to:
Introduce and briefly explain the two aforementioned concepts.
Write a generic
constexprzero-overheadcurryfunction in C++17.Analyze the generated assembly of the implementation to prove the lack of overhead.
Currying
Let’s begin by explaining currying.
In mathematics and computer science, currying is the technique of translating the evaluation of a function that takes multiple arguments (or a tuple of arguments) into evaluating a sequence of functions, each with a single argument.
The following add3 function is an example of a non-curried function, as its arity is \(3\).
We can curry it by returning nested lambda expressions.
auto curried_add3(int a)
{
return [a](int b)
{
return [a, b](int c)
{
return a + b + c;
};
};
}
curried_add3(1)(2)(3); // Returns `6`.As you can see, the arity of every lambda is \(1\). This pattern is useful because it allows developers to intuitively bind arguments incrementally until the last one. If you’re finding yourself constantly using the same arguments except one in a series of function calls, currying avoids repetition and increases readability.
auto add2_one = curried_add3(1);
auto add1_three = add2_one(2);
add1_three(3); // Returns `6`.
add1_three(4); // Returns `7`.
add1_three(5); // Returns `8`.Basically, add1_three(5) is equivalent to add2_one(2)(5), which is equivalent to curried_add3(1)(2)(5).
A slightly more realistic example could involve std::find:
std::vector<std::string> names{/* ... */}
auto find_in_names =
curried_find(std::begin(names))(std::end(names));
auto jack = find_in_names("Jack");
auto rose = find_in_names("Rose");In the above code snippet some repetition between std::find invocations is cleanly avoided thanks to currying.
(This short article by Ivan Čukić has some additional interesting examples of currying in C++.)
Partial application
In computer science, partial application (or partial function application) refers to the process of fixing a number of arguments to a function, producing another function of smaller arity.
Despite them being two separate concepts, partial application is very similar to currying. Even though I couldn’t find a formal confirmation anywhere, I believe that thinking about partial application as a “generalized form of currying” can be helpful: instead of binding one argument and getting \((arity - 1)\) unary functions back, we can bind \(n\) arguments at once and get another partially-applicable function with \((arity - n)\) arity.
Imagine we had add a partial_add3 function which allowed partial application to sum three numbers:
partial_add3(1, 2, 3); // Returns `6`.
partial_add3(1)(2, 3); // Returns `6`.
partial_add3(1, 2)(3); // Returns `6`.
partial_add3(1)(2)(3); // Returns `6`. (Currying!)As you can see, we can decide how many arguments to bind (including zero). We could easily implement this in C++17 using recursion, generic lambdas, if constexpr(...), and variadic templates. (We’ll also use a fold expression to compute the sum.)
template <typename... Ts>
auto partial_add3(Ts... xs)
{
static_assert(sizeof...(xs) <= 3);
if constexpr (sizeof...(xs) == 3)
{
// Base case: evaluate and return the sum.
return (0 + ... + xs);
}
else
{
// Recursive case: bind `xs...` and return another
return [xs...](auto... ys)
{
return partial_add3(xs..., ys...);
};
}
}Writing code that enables currying and partial application for every function is cumbersome. Let’s write a generic curry function that, given a function object f, returns a curried/partially-applicable version of f!
C++17 curry
As mentioned in the beginning of the article, these are the goals for our curry function:
Given a generic function object
f, invokingcurry(f)returns a curried/partially-applicable version off.If
fisconstexpr-friendly, the returned one will be as well.curryshould not introduce any overhead compared to hand-written currying/partial application.
Credit where it’s due
Please note that the design and implementation of curry that I am going to cover is a heavily-modified version of this snippet that was tweeted by Julian Becker - in fact, it was that tweet that inspired me to write this article. Thanks!
(Julian also wrote an excellent answer on the StackOverflow question “How can currying be done in C++?” - make sure to check it out.)
Example usage
Before we analyze the declaration and definition of curry, let’s take a look at some usage examples.
Nullary functions:
auto greet = []{ std::puts("hi!\n"); }; greet(); // Prints "hi!". curry(greet); // Prints "hi!". // Compile-time error: /* curry(greet)(); */As you can see, in the case of a nullary function object
f, invokingcurry(f)calls the original object immediately.Unary functions:
auto plus_one = [](auto x){ return x + 1; }; plus_one(0); // Returns `1`. curry(plus_one)(0); // Returns `1`. // Returns a wrapper around `plus_one` that enables // currying/partial application. // `plus_one` is "perfectly-captured" in the wrapper. auto curried_plus_one = curry(plus_one); curried_plus_one(1); // Returns `2`.What does perfectly-captured mean?
It means that if the captured object is an lvalue, it will be captured by reference. If the captured object is an rvalue, it will be captured by move*. I’ve written a comprehensive article on this topic: “capturing perfectly-forwarded objects in lambdas”.
Binary functions:
auto add2 = [](auto a, auto b){ return a + b; }; // All of the invocations below return `3`. add2(1, 2); curry(add2)(1, 2); // Partial application. curry(add2)(1)(2); // Currying. // Example of "binding" an argument: auto add_one = curry(add2)(1); add_one(2); // Returns `3`. add_one(3); // Returns `4`.You should be starting to see the pattern now…
\(N\)-ary functions:
auto add3 = [](auto a, auto b, auto c) { return a + b + c; }; // All of the invocations below return `6`. add3(1, 2, 3); curry(add3)(1, 2, 3); curry(add3)(1, 2)(3); curry(add3)(1)(2, 3); curry(add3)(1)(2)(3);The example above shows that currying and partial application can be freely combined. Let’s see another example of that with a
constexprlambda of arity \(5\).auto add5 = [](auto a, auto b, auto c, auto d, auto e) constexpr { return a + b + c + d + e; }; constexpr auto csum5 = curry(sum5); constexpr auto a = csum5(0, 1, 2, 3, 4, 5); constexpr auto b = csum5(0)(1, 2, 3, 4, 5); constexpr auto c = csum5(0, 1)(2, 3, 4, 5); constexpr auto d = csum5(0, 1, 2)(3, 4, 5); constexpr auto e = csum5(0, 1, 2, 3)(4, 5); constexpr auto f = csum5(0, 1, 2, 3, 4)(5);Note that the usages of
curry(sum5)above are in no way exhaustive - more combinations such ascurry(sum5)(0, 1)(2, 3)(4, 5)can be written, and every intermediate step can be given a name.
Now that you have an idea on how curry can be used, let’s dive into its declaration and definition.
Declaration
Given the constraints listed earlier, we can easily write down the declaration of curry.
Why
decltype(auto)instead ofauto?
Because the final step of curry needs to return exactly what the original function object does. Example:
auto f = [](auto, auto) -> auto&
{
return some_global_variable;
};
// OK - can return an additional "curry wrapper" by value.
auto step0 = curry(f);
// Same as above.
auto step1 = step0('a');
// Now `step1` has to return a reference!
auto& that_same_global = step1('b');Additionally, the f parameter is taken by forwarding-reference. I will assume you’re familiar with move semantics, std::forward, and “forward captures” for the rest of the article.
Definition
I’ll show the complete definition of curry first, and then analyze all the parts one-by-one more closely.
template <typename TF>
constexpr decltype(auto) curry(TF&& f)
{
if constexpr (std::is_callable<TF&&()>{})
{
return FWD(f)();
}
else
{
return [xf = FWD_CAPTURE(f)](auto&&... partials) mutable constexpr
{
return curry
(
[
partial_pack = FWD_CAPTURE_PACK_AS_TUPLE(partials),
yf = std::move(xf)
]
(auto&&... xs) constexpr
-> decltype(forward_like<TF>(xf.get())(FWD(partials)...,
FWD(xs)...))
{
return apply_fwd_capture([&yf](auto&&... ys) constexpr
-> decltype(forward_like<TF>(yf.get())(FWD(ys)...))
{
return forward_like<TF>(yf.get())(FWD(ys)...);
}, partial_pack, FWD_CAPTURE_PACK_AS_TUPLE(xs));
}
);
};
}
}The first thing to notice is the recursive structure of curry.
template <typename TF>
constexpr decltype(auto) curry(TF&& f)
{
if constexpr (std::is_callable<TF&&()>{})
{
return FWD(f)();
}
else
{
// ...
}
}The base case branch is taken when std::is_callable<TF&&()>{} evaluates to true. std::is_callable is a new C++17 type trait that checks whether or not a particular object types can be called with a specific set of argument types.
If
std::is_callable<TF&&()>{}evaluates tofalse, then it means thatTFneeds some arguments in order to be called - those arguments can be curried/partially-applied.If it evaluates to
true, it means that there are no more arguments to curry/partially-apply inf. Therefore,fcan be invoked to get the final result:FWDis a macro that expands tostd::forward<decltype(f)>(f). It’s being used asTFmay have a ref-qualifiedoperator()that behaves differently depending onf’s value category.
We will now focus on the recursive case of curry. The first step is allowing partial application of arguments - since we don’t know how many arguments will be bound in advance, a generic variadic lambda will be returned:
The returned lambda will:
Capture
fby forward capture intoxf.Accept any amount of forwarding references in the
partials...pack. These arguments will be bound for subsequent calls.Be marked as
mutable: this is important asxfwill be moved in the inner lambda’s capture list.Be marked as
constexpr: this allowscurryto be used as a constant expression where possible.- Note that, since C++17, lambdas are implicity
constexprunless they fail to satisfy anyconstexprfunction requirement.
- Note that, since C++17, lambdas are implicity
Recursively call
curryin its body, returning a new curried/partially-applicable function.
Let’s now focus on the return curry(/*...*/) statement. We want to return a curried version of a new intermediate function object where:
The
partials...pack values are bound for its invocation - these values will be captured by forward capture aspartial_pack.The forward-captured
xffrom the “parent” lambda is captured by move intoyf.xfdoesn’t need to be forwarded asFWD_CAPTURE(f)returns a movable wrapper that either stores an lvalue reference or a value.
return curry
(
[
partial_pack = FWD_CAPTURE_PACK_AS_TUPLE(partials),
yf = std::move(xf)
]
(auto&&... xs) constexpr
-> decltype(forward_like<TF>(xf.get())(FWD(partials)...,
FWD(xs)...))
{
// ...
}
);The lambda passed to curry will accept any number of forwarding references in the xs... pack that will be used alongside the captured partials... to call f. The expected function call can be easily understood by the lambda’s trailing return type:
// Unwrap `f` from the `xf` `FWD_CAPTURE` wrapper and propagate
// the original function object's value category.
// vvvvvvvvvvvvvvvvvvvvvvvvv
-> decltype(forward_like<TF>(xf.get())(FWD(partials)..., FWD(xs)...))
// ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
// Unpack both the `partials` and `xs` argument packs in a
// single function call to `f`.forward_like is an utility function in my vrm_core library that forwards the passed argument with the same value category of the potentially-unrelated specified type. It basically copies the “lvalue/rvalue-ness” of the user-provided template parameter and applies it to its argument.
The expression inside the above return type essentially means: “invoke the original function object by unpacking partials... and xs... one after another”.
Lastly, let’s analyze the body of the lambda.
return apply_fwd_capture([&yf](auto&&... ys) constexpr
-> decltype(forward_like<TF>(yf.get())(FWD(ys)...))
{
return forward_like<TF>(yf.get())(FWD(ys)...);
}, partial_pack, FWD_CAPTURE_PACK_AS_TUPLE(xs));Remember: we’re trying to call f by unpacking both partials... and xs... at the same time. The partials... pack is stored in a special wrapper returned by FWD_CAPTURE_PACK_AS_TUPLE. The xs... pack contains the arguments passed to the lambda.
The apply_fwd_capture takes any number of wrapped forward-capture pack wrappers and uses them to invoke an user-provided function object. The wrappers are unpacked at the same time, preserving the original value category. Since xs... is not wrapped, we’re going to explicitly do so by using the FWD_CAPTURE_PACK_AS_TUPLE macro.
In short, apply_fwd_capture will invoke the constexpr variadic lambda by expanding partials... and xs... correctly - those values will be then forwarded to the wrapped callable object yf.
That’s it! Eventually the recursion will end as one of the steps will produce an intermediate function objects that satisfies std::is_callable<TF&&()>{}, giving back a “concrete” result to the caller.
Generated assembly benchmarks
As I did in my previous “passing functions to functions” article, I will compare the number of generated assembly lines for different code snippets where curry is used. The point of these “benchmarks” is giving the readers an idea on how easy it is for the compiler to optimize curry out - they are in no way exhaustive or representative of a real-world situation. (The benchmarks were generated with this Python script, which also prints out the assembly.)
The compiler used for these measurements is g++ 7.0.0 20170113, compiled from the SVN repository.
constexpr variables
When curry is used in a constexpr context it is trivial to prove that it gets completely optimized out by the compiler. Regardless, here’s the snippet that’s going to be measured:
int main()
{
const auto sum = [](auto a, auto b, auto c, auto d, auto e, auto f, auto g,
auto h) constexpr
{
return a + b + c + d + e + f + g + h;
};
constexpr auto expected = sum(0, 1, 2, 3, 4, 5, 6, 7);
#if defined(VR_BASELINE)
constexpr auto s0 = sum(0, 1, 2, 3, 4, 5, 6, 7);
constexpr auto s1 = sum(0, 1, 2, 3, 4, 5, 6, 7);
constexpr auto s2 = sum(0, 1, 2, 3, 4, 5, 6, 7);
constexpr auto s3 = sum(0, 1, 2, 3, 4, 5, 6, 7);
constexpr auto s4 = sum(0, 1, 2, 3, 4, 5, 6, 7);
constexpr auto s5 = sum(0, 1, 2, 3, 4, 5, 6, 7);
constexpr auto s6 = sum(0, 1, 2, 3, 4, 5, 6, 7);
constexpr auto s7 = sum(0, 1, 2, 3, 4, 5, 6, 7);
#elif defined(VR_CURRY)
constexpr auto s0 = curry(sum)(0, 1, 2, 3, 4, 5, 6, 7);
constexpr auto s1 = curry(sum)(0)(1, 2, 3, 4, 5, 6, 7);
constexpr auto s2 = curry(sum)(0, 1)(2, 3, 4, 5, 6, 7);
constexpr auto s3 = curry(sum)(0, 1, 2)(3, 4, 5, 6, 7);
constexpr auto s4 = curry(sum)(0, 1, 2, 3)(4, 5, 6, 7);
constexpr auto s5 = curry(sum)(0, 1, 2, 3, 4)(5, 6, 7);
constexpr auto s6 = curry(sum)(0, 1, 2, 3, 4, 5)(6, 7);
constexpr auto s7 = curry(sum)(0, 1, 2, 3, 4, 5, 6)(7);
#endif
static_assert(s0 == expected);
static_assert(s1 == expected);
static_assert(s2 == expected);
static_assert(s3 == expected);
static_assert(s4 == expected);
static_assert(s5 == expected);
static_assert(s6 == expected);
static_assert(s7 == expected);
return s0 + s1 + s2 + s3 + s4 + s5 + s6 + s7;
}sumis aconstexprgeneric lambda with an arity of \(8\).When measuring the baseline, the
s0…s7constexprvariables are initialized by simply callingsum.When using
curry,s0…s7are initialized by using various invocations ofcurry(sum).In the end, the expected sum result is statically asserted and returned from
main.
Baseline
| O0 | O1 | O2 | O3 | Ofast |
|---|---|---|---|---|
| 14 | 2 | 2 | 2 | 2 |
Curry
| O0 | O1 | O2 | O3 | Ofast |
|---|---|---|---|---|
| 14 (+0.0%) | 2 (+0.0%) | 2 (+0.0%) | 2 (+0.0%) | 2 (+0.0%) |
As shown by the tables above, using curry introduces no additional overhead when used in the initialization of constexpr variables.
You can find the complete snippet on GitHub.
volatile variables
Let’s now measure the eventual overhead of curry when initializing volatile variables. The snippet is almost identical to the previous one, except for a few differences:
The
s0…s7variables are now marked asvolatileinstead ofconstexpr.The
static_assert(x)checks have been replaced withif(!x){ return -1; }.
Baseline
| O0 | O1 | O2 | O3 | Ofast |
|---|---|---|---|---|
| 68 | 56 | 42 | 42 | 42 |
Curry
| O0 | O1 | O2 | O3 | Ofast |
|---|---|---|---|---|
| 68 (+0.0%) | 56 (+0.0%) | 42 (+0.0%) | 42 (+0.0%) | 42 (+0.0%) |
Even with volatile, there isn’t any additional overhead introduced by curry!
You can find the complete snippet on GitHub.
Intermediate curry steps
The above benchmarks never stored any intermediate curry return value - the entire expression was part of the s0…s7 initializer expression. Let’s see what happens when those intermediate steps are stored as follows:
auto i0 = curry(sum);
auto i1 = curry(sum)(0);
auto i2 = curry(sum)(0, 1);
auto i3 = curry(sum)(0, 1, 2);
auto i4 = curry(sum)(0, 1, 2, 3);
auto i5 = curry(sum)(0, 1, 2, 3, 4);
auto i6 = curry(sum)(0, 1, 2, 3, 4, 5);
auto i7 = curry(sum)(0, 1, 2, 3, 4, 5, 6);
volatile auto s0 = i0(0, 1, 2, 3, 4, 5, 6, 7);
volatile auto s1 = i1(1, 2, 3, 4, 5, 6, 7);
volatile auto s2 = i2(2, 3, 4, 5, 6, 7);
volatile auto s3 = i3(3, 4, 5, 6, 7);
volatile auto s4 = i4(4, 5, 6, 7);
volatile auto s5 = i5(5, 6, 7);
volatile auto s6 = i6(6, 7);
volatile auto s7 = i7(7);Baseline
| O0 | O1 | O2 | O3 | Ofast |
|---|---|---|---|---|
| 68 | 56 | 42 | 42 | 42 |
Curry
| O0 | O1 | O2 | O3 | Ofast |
|---|---|---|---|---|
| 19141 (+2804%) | 56 (+0.0%) | 42 (+0.0%) | 42 (+0.0%) | 42 (+0.0%) |
From optimization level -O1 onwards everything is great: zero overhead! When using -O0, though, there is a quite noticeable overhead of \(+2804\%\) extra generated assembly compared to the baseline.
You can find the complete snippet on GitHub.
(Some additional benchmarks with volatile lambda parameters and values are available on the GitHub repository.)
Compiler bugs
currylooks great! Zero run-time overhead, partial application and currying all in one… what’s the catch?
Well, the hardest part is… getting curry to compile. As seen from these tweets between me and Julian Becker, it seems that both g++ and clang++ fail with internal compiler errors for different reasons.
This snippet on gcc.godbolt.org produces a g++ internal compiler error. Commenting out the trailing return type on line 158 fixes the ICE. I reported a minimal version of this issue as bug #78006.
clang++’s frontend crashes in versions 3.9 and 4.0 with this wandbox snippet. I’ve reported this as bug #31435.
I managed to compile curry and the snippets used for this article by cloning the latest version of gcc from SVN and compiling it on my machine - I assume that some of the crashes were fixed in on trunk and gcc.godbolt.org is still a little bit behind.
Acknowledgments
Many thanks to Julian Becker and Jackie Kay for proofreading the article and providing very valuable feedback.