Program Preview: Concurrency, Modules and Finance

Speaking for the first time in the US, Anthony Williams, one of the original authors of Boost.Thread and the author of C++ Concurrency in Action will be joining us this year at CppCon! His talk, The Continuing Future of Concurrency in C++, will provide overview of the additions to the standard C++ concurrency libraries in the Technical Specifications for Concurrency and Parallelism and the C++14 and C++17 standards. These additions include: continuations, latches, barriers, atomic smart pointers, shared ownership mutexes, executors, concurrent queues, distributed counters, coroutines, parallel algorithms and more.

Hans Boehm, the chair of the C++ standards committee’s concurrency and parallelism study group (SG1), will also be speaking at CppCon this year. Hans may be best known for his work on the Boehm garbage collector, but he’s also one of the chief architects of the C++ memory model. Hans will be talking about Using Weakly Ordered Atomics Correctly.

We have plenty more content on concurrency in this year’s program, including:

The Speed of Concurrency: Is Lock-Free Faster?, Fedor Pikus
No Sane Compiler Would Optimize Atomics, JF Bastien

Richard Smith, the project editor for the C++ standards committee and the code owner for the Clang project will be at CppCon 2016. In his talk, There and Back Again: An Incremental C++ Modules Design, Richard will share the Clang community’s experience with modules and discuss the direction of modules standardization efforts.

We have a few other talks on modules:

Deploying C++ Modules to 100s of Millions of Lines of Code, Manuel Klimek
C++ Modules: State of the Union, Gabriel Dos Reis

Finally, some talks of interest to the financial industry:

Introspection of Performance Sensitive Financial Market Data, Eduardo Madrid
Implementing Lightweight Object Persistence with Modern C++, Bob Steagall

It’s not too late to register for CppCon 2016! Come join us in September!

— Bryce Adelstein Lelbach

Using Weakly Ordered Atomics Correctly

Most programmers should usually avoid C++ atomics altogether and use mutexes instead. If that’s not possible, perhaps because the code must be usable in interrupt handlers, I recommend that you consider limiting yourself to sequentially consistent atomics, which provide a more subtle, but still reasonably unsurprising programming model. This talk will target those who choose to ignore both of those pieces of advice, for either good or bad reasons.

I will start by trying to distinguish the good and bad reasons for using weakly ordered atomics, and then follow with guidelines for using them correctly.

I will discuss why it is often incorrect to think of atomics in terms of fence-based implementations, and about some common errors I’ve seen, including some really convincing looking, but still incorrect, code. I will also try to go through some of the common idioms for which weakly ordered atomics are actually safe. In my experience, the latter are also reasonably common, but not always easy to distinguish from the subtly erroneous examples.

Hans Boehm is an engineer at Google and the chair of the C++ standards committee’s concurrency and parallelism study group (SG1). He is well known for his developing the Boehm garbage collector. Hans was one of the driving forces behind the C++11 memory model.

The Speed of Concurrency: Is Lock-free Faster?

This talk takes the “ultimately practical” approach to concurrent programming, with a focus on lock-free programs: after all, in reality such programs are almost always written in the hopes of getting better performance. We’re going to measure performance of the individual concurrent primitives and their effect on the overall performance of the whole program.

The goal of the talk is two-fold. On one hand, I will show a set of tools and practices that can be used to get quantitative measurements of the performance of different implementations under various load conditions. Mastering these techniques will allow the attendees to choose their concurrent algorithms and implementations based on solid data instead of guesswork or “common knowledge” (which is often wrong or outdated). On the other hand, even with the focus on real-life applications we can learn a few things about the fundamental nature of concurrent programs. This understanding comes especially useful when dealing with the “common knowledge” and “simple logic”. For example, it’s “common knowledge” that lock-free programs are faster than lock-based (not always). It’s also a “simple logic” that the hardware must protect shared memory access in a multi-core system, so ultimately locking is always present (sometimes true, sometimes true but misleading, and sometimes false). It is both “common knowledge” and “simple logic” that a wait-free program does not wait (but if your definition of wait is “will I have to wait for the results after I finish my coffee?” then it definitely does).

We will explore practical examples of (mostly) lock-free data structures, with actual implementations and performance measurements. Even if the specific limitations and simplifying assumptions used in this talk do not apply to your problem, the key point to take away is how to find such assumptions and take advantage of them in your specific application: after all, in reality it’s almost always about performance.

Fedor Pikus is a Chief Engineering Scientist in the Design to Silicon division of Mentor Graphics Corp. His earlier positions included a Senior Software Engineer at Google and a Chief Software Architect for Calibre PERC, LVS, DFM at Mentor Graphics. He joined Mentor Graphics in 1998 when he made a switch from academic research in computational physics to software industry. His responsibilities as a Chief Scientist include planning long-term technical direction of Calibre products, directing and training the engineers who work on these products, design and architecture of the software, and research in new design and software technologies. Fedor has over 25 patents and over 90 papers and conference presentations on physics, EDA, software design, and C++ language.

No Sane Compiler Would Optimize Atomics

False.

Compilers do optimize atomics, memory accesses around atomics, and utilize architecture-specific knowledge. My hobby is to encourage compilers to do more of this, programmers to rely on it, and hardware vendors to give us new atomic toys to optimize with. Oh, and standardize yet more close-to-the-metal concurrency and parallelism tools.

But, you say, surely volatile always means volatile, there’s nothing wrong with my benign races, nothing could even go wrong with non-temporal accesses, and who needs 6 memory orderings anyways? I’m glad you asked, let me tell you about my hobby…

JF Bastien is a Jest-in-Time compiler on Google’s Chrome web browser, currently focusing on performance and security to bring portable, fast and secure code to the Web. JF is a member of the C++ standards committee, where his mechanical engineering degree serves little purpose. He’s worked on startup incubators, business jets, flight simulators, CPUs, dynamic binary translation, systems, and compilers.

There and Back Again: An Incremental C++ Modules Design

The Clang project has been working on modules in one form or another for many years. It started off with C and Objective-C many years ago. Today, we have a C++ compiler that can transparently use modules with existing C++ code, and we have deployed that at scale. However, this is very separate from the question of how to integrate a modular compilation model into the language itself. That is an issue that several groups working on C++ have been trying to tackle over the last few years.

Based on our experience deploying the core technology behind modules, we have learned a tremendous amount about how they interact with existing code. This has informed the particular design we would like to see for C++ modules, and it centers around incremental adoption. In essence, how do we take the C++ code we have today, and migrate it to directly leverage C++ modules in its very syntax, while still interacting cleanly with C++ code that will always and forever be stuck in a legacy mode without modules.

In this talk we will present our ideas on how C++ modules should be designed in order to interoperate seamlessly with existing patterns, libraries, and codebases. However, these are still early days for C++ modules. We are all still experimenting and learning about what the best design is likely to be. Here, we simply want to present a possible and still very early design direction for this feature.

Richard Smith leads the Clang open source project, and has been driving the implementation of C++ modules in that compiler. He is also an active member of the Core Working Group (CWG) of the C++ standards committee and he is the project editor for the C++ standard.

Deploying Modules to 100s of Millions of Lines of Code

Compile times are pain point for C++ programmers the world over, and Google is no exception. We have a single unified codebase with hundreds of millions of lines of C++ code, all of it built from source. As the size of the codebase and the depth of interrelated interfaces exposed through textually included headers grew, the scaling of compiles became a critical issue.

Years ago we started working to build technology in the Clang compiler that could help scale builds more effectively than textual inclusion. This is the core of C++ modules: moving away from the model of textual inclusion. We also started preparing our codebase to migrate to this technology en masse, and through a highly automated process. It’s been a long time and a tremendous effort, but we’d like to share where we are as well as what comes next.

In this talk, we will outline the core C++ modules technology in Clang. This is just raw technology at this stage, not an integrated part of the C++ programming language. That part is being worked on by a large group of people in the C++ standards committee. But we want to share how Google is using this raw technology internally to make today’s C++ compiles faster, what it took to get there, and how you too can take advantage of these features. We will cover everything from the details of migrating a codebase of this size to use a novel compilation model to the ramifications for both local and distributed build systems. We hope to give insight into the kinds of benefits that technology like C++ modules can bring to a large scale C++ development environment.

Manuel Klimek has been a software engineer at Google since 2008 and a professional code monkey since 2003. After developing embedded Linux terminals for the payment industry and distributed storage technology at Google in C++, he decided that C++ productivity lags significantly behind other programming languages and set out to change this. He led the effort to grow Clang into a world class tooling platform for AST-based C++ tools and spearheaded large scale distributed semantic C++ code transformations both at Google and in the broader industry. Besides being sad that Germany lost against France in the Euro 2016, he is currently modularizing Google’s internal C++ codebase and leading the development of the next generation of Clang-based C++ tools that range from editor based code completion to deep API refactorings.

C++ Modules: State of the Union

I will give a report on the progress we have made since last year on specification, standardization, implementation experience, deployment, and user experience with C++ modules. Looking forward, I will give a glimpse into the world of semantics-aware developer tools, including runtime reflection, made possible by C++ module implementation.

Gabriel Dos Reis is a Principal Software Development Engineer at Microsoft. He is also a researcher and a longtime member of the C++ community. His research interests include programming tools for dependable software. Prior to joining Microsoft, he was Assistant Professor at Texas A&M University. Dr. Dos Reis was a recipient of the 2012 National Science Foundation CAREER award for his research in compilers for dependable computational mathematics and educational activities.

Introspection of Performance Sensitive Financial Market Data

C++ does not yet have complete introspection (reflection), but in many cases it may be easy to complete. We will present an example of what we think is is a general method whenever data specifications may be converted to C++ through a code generator. We have done this for processing financial market data very sensitive to latencies and obtained huge advantages of:

Economy of effort
Performance
Reliability
Extensibility

over any other option we considered. We will show:

How we converted the specification of market data from an important exchange, CME MDP3, into C++ types and a minimal set of variadic templates that give us full introspection capabilities
The code and the techniques to implement generic introspecting software components, including these concrete examples:
- Converting any value that belongs to the specification into string
- Testing for whether the value is null in any of the three ways the specification allows encoding of null values
- Applying design patterns such as flyweights to traverse the data with zero or minimal performance cost
- Subscription mechanisms

Eduardo Madrid is a senior developer of automated trading systems at Crabel Capital Management with 18 years of C++ experience.

Implementing Lightweight Object Persistence with Modern C++

Modern C++ brings many exciting and powerful advances to both the core language and the standard C++ library. Among these are changes to the standard allocator requirements that now permit allocators to allocate and deallocate blocks of memory that are addressable by generalized (i.e., non-native) pointers, as well as requirements for allocator-aware containers to employ such pointers.

This talk will describe a slightly different way of thinking about memory allocation, decomposing the idea into four distinct concepts – addressing model, pointer interface, storage model, and allocation strategy. To illustrate this new mental picture, we’ll examine the design of a standard-conformant allocator that uses shared memory as its storage model, and show how it can be used to construct complex data structures based on standard C++ containers directly in shared memory. We’ll then explore how this particular allocator’s address-independent storage model supports a form of lightweight object persistence. Along the way we’ll compare and contrast the old C++03 allocator requirements with those of C++14, and we’ll also see at least one way to implement a generalized pointer. Finally, we’ll touch on other storage models and possible applications.

Bob Steagall has been working in C++ for the last 24 years. The majority of his career has been spent in medical imaging, where he has led teams building applications for functional MRI and CT-based cardiac visualization. Along the way, in the late 90’s he wrote and sold a few copies of an alternative implementation of the standard library, called XTL.