Simplifying Concurrent Code with ActorSrcGen: A C# Source Generator for High-Performance Pipelines

Introduction

Writing concurrent code in C# can be a challenging task. The .NET framework offers powerful tools like TPL Dataflow to help developers manage parallel processing, but it often involves writing complex and error-prone boilerplate code. To address these challenges, I am excited to introduce ActorSrcGen, a new C# source generator designed to simplify the creation of high-performance pipeline systems. In combination with the DataflowEx library, ActorSrcGen provides a clean, stateful, and object-oriented encapsulation of the TPL Dataflow wiring code, streamlining the development of robust concurrent applications.

Challenges of Concurrent Code

Concurrent programming is essential for building efficient, responsive, and scalable software systems. However, it comes with its own set of complexities and challenges. When using TPL Dataflow, developers often find themselves writing a significant amount of boilerplate code for managing dataflow networks, managing state, and handling exceptions. This boilerplate code can quickly become unwieldy, making it difficult to maintain, extend, and debug your applications.

Introducing ActorSrcGen

ActorSrcGen aims to simplify concurrent programming by automating the generation of this boilerplate code. It works seamlessly with the DataflowEx library, a powerful extension of TPL Dataflow, to provide a clean and object-oriented approach to building dataflow pipelines.

Here’s how ActorSrcGen can help you:

  1. Automated Boilerplate Code Generation: With ActorSrcGen, you no longer need to write extensive boilerplate code to set up and manage TPL Dataflow networks. The source generator does the heavy lifting for you, allowing you to focus on your pipeline’s logic.
  2. Clean and Object-Oriented Design: Your code is more maintainable and readable and can focus entirely on the mechanics of each step rather than controlling data flow and concurrency.
  3. Pipeline Encapsulation: The pipeline wiring is hidden from view. It ensures your actor is robust and easy to reason about and can be reused within other pipeline models.

Let’s take a look at a simple class that has been adapted to work as a data processing pipeline.

[Actor]
public partial class MyWorkflow
{
    [InitialStep(next: "DoTask2")]
    public Task<string> DoTask1(int x)
    {
        Console.WriteLine("DoTask1");
        return Task.FromResult(x.ToString());
    }

    [Step(next: "DoTask3")]
    public Task<string> DoTask2(string x)
    {
        Console.WriteLine("DoTask2");
        return Task.FromResult($"100{x}");
    }

    [LastStep]
    public Task<int> DoTask3(string input)
    {
        Console.WriteLine("DoTask3");
        return Task.FromResult(int.Parse(input));
    }
}

There are a few things to notice about this class. First thing to notice is that there is literally no TDF code in there at all. Everything that is needed to create dataflow blocks and wire them together is latent in the type information of the methods, and the pattern of wiring is defined using the next parameter of the Step attribute. The ActorSrcGen, being a roslyn powered analyzer, is able to navigate the code and gather the type information needed to create the right types of blocks.

The second thing to notice is that the pattern of flow is not controlled by method invocations. Instead, step methods are just simple request/response functions. TDF will take care of ensuring the flow of data between each of the methods. ActorSrcGen injects top level exception handlers to ensure any failures coming from your code do not go unhandled. It uses an idiom whereby it converts a return type of T into IEnumerable<T> allowing exceptions to just pass on nothing instead of rethrowing the exception and thereby totally failing the pipeline.

Lastly, the use of attributes to annotate your code with metadata about the data flow. Only classes annotated with the ActorAttribute are generated. Only the methods annotated with the Step attributes get dataflow blocks created for them. That doesn’t mean you can’t have any other methods in you class. It just means that the high-level flow of data and error trapping is done via the attributed methods.

What kind of code gets generated?

For the above example code, ActorSrcGen will automatically generate the following code:

namespace ActorSrcGen.Abstractions.Playground;
using System.Threading.Tasks.Dataflow;
using Gridsum.DataflowEx;

public partial class MyWorkflow : Dataflow<Int32, Int32>
{
    public MyWorkflow() : base(DataflowOptions.Default)
    {
        _DoTask1 = new TransformManyBlock<Int32, String>(async (Int32 x) => {
            var result = new List<String>();
            try
            {
                result.Add(await DoTask1(x));
            }catch{}
            return result;
        },
            new ExecutionDataflowBlockOptions() {
                BoundedCapacity = 5,
                MaxDegreeOfParallelism = 8
        });
        RegisterChild(_DoTask1);
        _DoTask2 = new TransformManyBlock<String, String>(async (String x) => {
            var result = new List<String>();
            try
            {
                result.Add(await DoTask2(x));
            }catch{}
            return result;
        },
            new ExecutionDataflowBlockOptions() {
                BoundedCapacity = 5,
                MaxDegreeOfParallelism = 8
        });
        RegisterChild(_DoTask2);
        _DoTask3 = new TransformManyBlock<String, Int32>(async (String x) => {
            var result = new List<Int32>();
            try
            {
                result.Add(await DoTask3(x));
            }catch{}
            return result;
        },
            new ExecutionDataflowBlockOptions() {
                BoundedCapacity = 5,
                MaxDegreeOfParallelism = 8
        });
        RegisterChild(_DoTask3);
        _DoTask1.LinkTo(_DoTask2, new DataflowLinkOptions { PropagateCompletion = true });
        _DoTask2.LinkTo(_DoTask3, new DataflowLinkOptions { PropagateCompletion = true });
    }
    TransformManyBlock<Int32, String> _DoTask1;
    TransformManyBlock<String, String> _DoTask2;
    TransformManyBlock<String, Int32> _DoTask3;

    public override ITargetBlock<Int32> InputBlock { get => _DoTask1; }
    public override ISourceBlock<Int32> OutputBlock { get => _DoTask3; }

    public async Task<bool> Post(Int32 input)
    => await InputBlock.SendAsync(input);
} 

At the top level, it defines a set of dataflow blocks, and wires them together, choosing types based on the input and output types of the annotated methods. It also modifies the signature of the registered handler of each block to support the trapping of unhandled exceptions.

Installing ActorSrcGen

Getting started with ActorSrcGen is easy. You can install the ActorSrcGen NuGet package in your C# project. Once the package is installed, you can annotate your classes with the following attributes to take advantage of its capabilities:

  • [Actor]: This attribute marks a class as an actor, specifying it as a reusable dataflow block.
  • [InitialStep]: Use this attribute to denote the first step in the dataflow pipeline.
  • [Step]: Mark intermediate steps in your dataflow network with this attribute.
  • [LastStep]: Indicate the final step in your dataflow network using this attribute.

Known Limitations

Currently, the generated source does not support defining multiple recipients for data (branching). It is also currently unable to support multiple initial steps. Both of these would be beneficial and are on the list for development in the near future.

Conclusion

ActorSrcGen is a powerful tool that simplifies the complexities of concurrent programming in C#. By automating the generation of boilerplate code and promoting a clean, object-oriented design, it allows developers to build high-performance pipeline systems with ease. When used in conjunction with the DataflowEx library, ActorSrcGen empowers developers to create robust, maintainable, and efficient concurrent applications.

To get started, simply install the NuGet package, annotate your classes with the appropriate attributes, and enjoy a more streamlined and productive coding experience. With ActorSrcGen, you can unlock the full potential of TPL Dataflow and take your concurrent C# applications to the next level.