Performance and Efficiency in C++ for Experts, Future Experts, and Everyone Else [2022 class archive]

Performance and Efficiency in C++ for Experts, Future Experts, and Everyone Else is a two-day onsite training course with programming exercises, taught by Fedor Pikus.  It is offered at the Gaylord Rockies from 09:00 to 17:00 Aurora time (MDT) on Saturday and Sunday, September 17th and 18th, 2022 (immediately following the conference). Lunch is included.

Course Description

This class is about performance and efficiency, spanning the entire range from the fundamentals of the hardware to the peculiarities of compiler optimizations. You will learn, on practical examples extracted from real-life programs, how to measure, analyze, and improve the performance of your programs. But most importantly, you will learn how to understand why your programs, compilers, and hardware behave the way they do. Some of the material will be basic and fundamental, some cutting-edge and esoteric, and the rest somewhere in between. All explanations will be reinforced with hands-on exercises, which you get to do in the classroom and can explore later in detail if you want to learn more.

In the early days of computing, programming was hard. The processors were slow, the memory was limited, the compilers were primitive, and nothing could be achieved without a significant effort. The programmer had to know the architecture of the CPU, the layout of the memory, and when the compiler did not cut it, the critical code had to be written in assembler.

Then things got better. The processors were getting faster every year, the number that used to be the capacity of a huge hard drive became the size of the main memory in an average PC, and the compiler writers learned a few tricks to make programs faster. The programmers could spend more time actually solving problems. This was reflected in the programming languages and design styles: between the higher-level languages and evolving design and programming practices, the programmers’ focus shifted from what they wanted to say in code to how they wanted to say it.

Then the “simple” progress of ever-increasing clock frequencies and decreasing latencies has halted, and any performance gains had to be achieved by increasing complexity, from parallelism to much more complex hardware architectures. These elaborate designs are really workarounds for the performance limitations imposed by the basic physics, and then workarounds for the problems caused by the previous workarounds, and then … you see where this is going. The key point is while the faster clock just runs faster by itself, many of these complex hardware features either don’t “program themselves” at all, or they open major pitfalls and danger zones that the programmer can unwittingly walk into, with dire consequences for performance.

In this class, you will learn what the hardware designers had to do to keep your programs running faster every year, how to take advantage of that power and complexity, and avoid the problems created by the same complexity. You will see the code from the point of view of your hardware and that of your compiler, and why subtle divergences in how the programmer, the compiler, and the processor interpret the same code can cause major performance headaches.

At the same time, the progress we have made in writing code that clearly expresses what needs to be done, rather than how it’s done, is not to be rolled back. We still want to write readable and maintainable code, and (“and,” not “but”) we want it to be efficient as well. Every now and then, we will point out concrete examples of how good design and coding practices do not have to come at the expense of good performance but rather can help to write, debug, and improve efficient programs.

Prerequisites

Desire to learn above all. C++ competence at the intermediate level, some familiarity with concurrent programming are needed.

Course Topics

  1. Why performance matters
    1. Why performance requires the programmer’s attention
    2. General types of factors affecting performance
    3. Different types of performance
    4. How to evaluate the performance
  2. Performance measurements
    1. Why performance must be measured
    2. Performance metrics
    3. Performance measurement tools (benchmarks, timers)
    4. Profiling and profiler tools
  3. CPU architecture, resources, and performance implications
    1. The architecture of modern CPUs
    2. Using internal concurrency of the CPUs for optimum performance
    3. CPU pipelines and conditional execution
    4. Branch optimization and branchless computing
    5. Speculative execution
  4. Memory Architecture and Performance Impact
    1. Overview of the memory subsystem
    2. Performance of memory accesses
    3. Access Patterns and Impact on Algorithms and Data Structure Design
    4. Memory bandwidth and latency
  5. Threads and Memory
    1. Overview of threads
    2. Threads in C++
    3. Multi-threaded and multi-core memory access
    4. Avoiding data races and its cost
    5. Synchronization of Memory Accesses
    6. Memory Models
  6. Efficient Concurrency
    1. Locks and the cost of lock-based synchronization
    2. Thread-Safe Data Structures
    3. Introduction to Lock-Free Programming
  7. High-performance C++
    1. Efficiency and overhead of the C++ language
    2. Avoiding inefficient C++ code
  8. Compiler optimizations in C++
    1. How do compilers see your code
    2. How to get the best optimizations from the compiler
  9. Undefined behavior and performance
    1. What is undefined behavior
    2. How is UB related to performance
    3. How to use UB for performance gain

Register Here

Course Instructor

Fedor Pikus

Fedor Pikus is a Chief Engineering Scientist in the Design to Silicon division of Mentor Graphics Corp (Siemens business). His earlier positions included a Senior Software Engineer at Google and a Chief Software Architect for Calibre PERC, LVS, DFM at Mentor Graphics. He joined Mentor Graphics in 1998 when he made a switch from academic research in computational physics to the software industry.

Fedor is a recognized expert on high-performance computing and C++, he has presented his work at CppCon, SD West, DesignCon, in Software Development Journal, and is also an O’Reilly author. His responsibilities as a Chief Scientist include planning the long-term technical direction of Calibre products, directing and training the engineers who work on these products, design, and architecture of the software, and research in the new design and software technologies. Fedor has over 25 patents and over 100 papers and conference presentations on physics, EDA, software design, and C++ language.