What's new in rotor v0.09

actor system


rotor is a non-intrusive event loop friendly C++ actor micro framework, similar to its elder brothers like caf and sobjectizer. The new release came out under the flag of pluginization, which affects the entire lifetime of an actor.


Actor Linking


The actor system is all about interactions between actors, i.e. sending messages to each other (and producing side effects for the outer world or listening to messages it produces). However, to let a message be delivered to the final actor, the actor should be alive (1); in other words, if actor A is going to send message M to actor B, A should somehow be sure that actor B is online and will not go offline while M is routing.


Before rotor v0.09, that kind of warranty was only available due to child-parent relations, i.e. between supervisor and its child-actor. In this case, an actor was guaranteed that a message would be delivered to its supervisor because the supervisor owned the actor and said supervisor's lifetime covered the respective actor's lifetime. Now, with the release of v0.09, it is possible to link actor A with actor B that are not parent- or child-related to one another and to make sure that all messages will be delivered after successful linking .


So, linking actors is performed somewhat along these lines:


namespace r = rotor;

void some_actor_t::on_start() noexcept override {
    request<payload::link_request_t>(b_address).send(timeout);
}

void some_actor_t::on_link_response(r::message::link_response_t &response) noexcept {
    auto& ec = message.payload.ec;
    if (!ec) {
        // successful linking
    }
}

However, code like this should not be used directly as is… because it is inconvenient. It becomes more obvious if you try linking actor A with 2 or more actors (B1, B2, etc.), since some_actor_t should keep an internal count of how many target actors are waiting for (successful) link responses. And here the pluginization system featured in the v0.09 release comes to the rescue:


namespace r = rotor;

void some_actor_t::configure(r::plugin::plugin_base_t &plugin) noexcept override {
    plugin.with_casted<r::plugin::link_client_plugin_t>(
        [&](auto &p) {
            p.link(B1_address);
            p.link(B2_address);
        }
    );
}

Now, this is much more convenient, since link_client_plugin_t is included out of the box with the rotor::actor_base_t. Nevertheless, it's still not enough, because it does not answer a few important questions, such as: 1. When is actor linking performed (and a "by-question": when is actor unlinking performed)? 2. What happens if the target actor (aka "server") does not exist or rejects linking? 3. What happens if the target actor decides to self-shutdown when there are "clients" still linked to it?


To provide answers to these questions, the concept of actor lifetime should be revisited.


Async Actor Initialization And Shutdown


Represented in a simplified manner is, here is how an actor’s state usually changes: new (constructor) -> initializing -> initialized -> operational -> shutting down -> shut down


The main job is performed in the operational state, and it is up to the user to define what an actor is to do in its up-and-running mode.


In the I-phase (i.e. initializing -> initialized), the actor should prepare itself for further functioning: locate and link with other actors, establish connection to the database, acquire whichever resources it needs to be operational. The key point of rotor is that I-phase is asynchronous, so an actor should notify its supervisor when it is ready (2).


The S-phase (i.e. shutting down -> shut down) is complementary to the I-phase, i.e. the actor is being asked to shut down, and, when it is done, it should notify its supervisor.


While it sounds easy, the tricky bit lies in the composability of actors, when they form Erlang-like hierarchies of responsibilities (see my article on trees of Supervisors). In other words, any actor can fail during its I-phase or S-phase, and that can lead to asynchronous collapse of the entire hierarchy, regardless of the failed actor's location within it. Essentially, the entire hierarchy of actors becomes operational, or, if something happens, the entire hierarchy becomes shut down.


rotor seems unique with its init/shutdown approach. There is nothing similar in caf;
in sobjectizer, there is a shutdown helper, which
carries a function similar to the S-phase above; however, it is limited to one actor only and offers no I-phase because sobjectizer has no concept of hierarchies (see update below).


While using rotor, it was discovered that the progress of the I-phase (S-phase) may potentially require many resources to be acquired (or released) asynchronously, which means that no single component, or actor, is able, by its own will, to answer the question of whether it has or has not completed the current phase. Instead, the answer comes as a result of collaborative efforts, handled in the right order. And this is where plugins come into play; they are like pieces, with each one responsible for a particular job of initialization/shutdown.


So, here are the promised answers related to link_client_plugin_t:


  • Q: When is the actor linking or unlinking performed? A: When the actor state is initializing or shutting down respectively.
  • Q: What happens if the target actor (aka "server") does not exist or rejects linking? A: Since this happens when the actor state is initializing, the plugin will detect the fail condition and will trigger client-actor shutdown. That may trigger a cascade effect, i.e. its supervisor will be triggered to shut down, too.
  • Q: What happens if the target actor decides to self-shutdown when there are "clients" still linked to it? A: The "server-actor" will ask its clients to unlink, and once all "clients" have confirmed unlinking, the "server-actor" will continue the shutdown procedure (3).

A Simplified Example


Let's assume that there is a database driver with async-interface with one of the available event-loops for rotor, and there will be TCP-clients connecting to our service. The database will be served by db_actor_t and the service for serving clients will be named acceptor_t. The database actor is going to look like this:


namespace r = rotor;

struct db_actor_t: r::actor_base_t {

    struct resource {
        static const constexpr r::plugin::resource_id_t db_connection = 0;
    }

    void configure(r::plugin::plugin_base_t &plugin) noexcept override {
        plugin.with_casted<r::plugin::registry_plugin_t>([this](auto &p) {
            p.register_name("service::database", this->get_address())
        });
        plugin.with_casted<r::plugin::resources_plugin_t>([this](auto &) {
            resources->acquire(resource::db_connection);
            // initiate async connection to database
        });
    }

    void on_db_connection_success() {
        resources->release(resource::db_connection);
        ...
    }

    void on_db_disconnected() {
        resources->release(resource::db_connection);
    }

    void shutdown_start() noexcept override {
        r::actor_base_t::shutdown_start();
        resources->acquire(resource::db_connection);
        // initiate async disconnection from database, e.g. flush data
    }
};

The inner namespace resource is used to identify the database connection as a resource. It is good practice, better than hard-coding magic numbers like 0. During the actor configuration stage (which is part of initialization), when registry_plugin_t is ready, it will asynchronously register the actor address under a symbolic name of service::database in the registry (will be shown further down below). Then, with the resources_plugin_t, it acquires the database connection resource, blocking any further initialization and launching connection to the database. When connection is established, the resource is released, and the db_actor_t becomes operational. The S-phase is symmetrical, i.e. it blocks shutdown until all data is flushed to DB and connection is closed; once this step is complete, the actor will continue its shutdown (4).


The client acceptor code should look like this:


namespace r = rotor;
struct acceptor_actor_t: r::actor_base_t {
    r::address_ptr_t db_addr;

    void configure(r::plugin::plugin_base_t &plugin) noexcept override {
        plugin.with_casted<r::plugin::registry_plugin_t>([](auto &p) {
            p.discover_name("service::database", db_addr, true).link();
        });
    }

    void on_start() noexcept override {
        r::actor_base_t::on_start();
        // start accepting clients, e.g.
        // asio::ip::tcp::acceptor.async_accept(...);
    }

    void on_new_client(client_t& client) {
        // send<message::log_client_t>(db_addr, client)
    }
};

The key point here is the configure method. When registry_plugin_t is ready, it is configured to discover the name service::database and, when found, store it in the db_addr field; it then links the actor to the db_actor_t. If service::database is not found, the acceptor shuts down (i.e. on_start is not invoked); if the linking is not confirmed, the acceptor shuts down, too. When everything is fine, the acceptor starts accepting new clients.


The operational part itself is missing for the sake of brevity because it hasn't changed in the new rotor version: there is a need to define payload and message (including request and response types), as well as define methods which will accept the messages and finally subscribe to them.


Let's bundle everything together in a main.cpp. Let's assume that the boost::asio event loop is used.


namespace asio = boost::asio;
namespace r = rotor;

...
asio::io_context io_context;
auto system_context = rotor::asio::system_context_asio_t(io_context);
auto strand = std::make_shared<asio::io_context::strand>(io_context);
auto timeout = r::pt::milliseconds(100);
auto sup = system_context->create_supervisor<r::asio::supervisor_asio_t>()
               .timeout(timeout)
               .strand(strand)
               .create_registry()
               .finish();

sup->create_actor<db_actor_t>().timeout(timeout).finish();
sup->create_actor<acceptor_actor_t>().timeout(timeout).finish();

sup->start();
io_context.run();

The builder pattern is actively used in the v0.09 rotor. Here, the root supervisor sup was created with 3 actors instantiated on it: the user defined db_actor_t and acceptor_actor_t and implicitly created a registry actor. As is typical for the actor system, all actors are decoupled from one another, only sharing message types (skipped here).


All actors are simply created here, and the supervisor does not know the relations between them because actors are loosely coupled and have become more autonomous since v0.09.


Runtime configuration can be completely different: actors can be created on different threads, different supervisors, and even using different event loops, but the actor implementation remains the same (5). In that case, there will be more than one root supervisor; however, to let them find each other, the registry actor address should be shared between them. This is also supported via the get_registry_address() method of supervisor_t.


Summary


The most important feature of rotor v0.09 is the pluginization of its core. Among other plugins, the most important are: the link_client_plugin_t plugin, which maintains kind of a "virtual connection" between actors; the registry_plugin_t, which allows registering and discovering actor addresses by their symbolic names; and the resources_plugin_t, which suspends actor init/shutdown until external asynchronous events occur.


There are a few less prominent changes in the release, such as the new non-public properties access and builder pattern for actor construction.


Any feedback on rotor is welcome!


PS. I'd like to say thanks to Crazy Panda for supporting me in my actor model research.


Notes


(1) Currently, it will lead to segfault upon attempt to deliver a message to an actor whose supervisor is already destroyed.


(2) If it does not notify, init-request timeout will occur, and the actor will be asked by its supervisor to shut down, i.e. bypass the operational state.


(3) You might ask: what happens if a client-actor does not confirm unlinking on time? Well, this is somewhat of a violation of contract, and the system_context_t::on_error(const std::error_code&) method will be invoked, which, by default, will print error to std::cerr and invoke std::terminate(). To avoid contract violation, shutdown timeouts should be tuned to allow client-actors to unlink on time.


(4) During shutdown, the registry_plugin_t will unregister all registered names in the registry.


(5) With the exception of when different event loops are used, when actors use the event loop API directly, they will, obviously, change following the event loop change, but that's beyond rotor.


Update


During discussings with sobjectizer author below, it was clarified sobjectizer shutdowner and stop guard offer "long lasting" shutdown actions, however it's main purpose to give some actors additional time for shutdown, even if on the Environment stop was invoked. The asynchronous shutdown (and initialization) similar to rotor I-phase and S-phase can be modeled via actor's states, if needed. This is, however, framework users responsibility, contrary to rotor, where it is the framework responsibility.

Crazy Panda
Crazy about games since 2010

Comments 11

    +1

    Thanks for the article. It's interesting to know that actor frameworks for C++ are continuously evolving.


    However, there are two points I miss in your article:


    1. I think there should be at least two pictures/schemes that can make the article more clear. The first picture could show the relationship(s) between actors. The second picture could show stages of the lifetime of an actor as a diagram. Such pictures can be a good addition to textual explanations and can make reading your description easier.
    2. It's a pity that you show the usage of out-of-box plugins but say nothing about the plugin subsystem itself. What is a plugin? Which tasks can be solved by using plugins? Can a user write its own plugin?

    And I want to ask you: is it a good idea to write some clarification about the mentioned SObjectizer-related parts in a comment for the article?

      –1

      The Russian version of the article is coming, so, I'll add to it missing pictures, thanks for the advice.


      Let me answer the 2nd question here. The abstract answer to the question "What is a plugin", is: actor is a behavioral aspect of an actor, i.e. how actor reacts on particular message (or group of related messages), may be exposing some convenient API for sending messages.


      It's better to give a few examples of build-in plugins: The address_maker plugin is a shortcut for asking actor's supervisor for creating a new address; the init_shutdown plugin does proper actor housekeeping on init and shutdown requests (messages); the child_manager is a supervisor-specific plugin, which allows to spawn child-actors, as well as reacts on their init- and shutdown- responses.


      Can a user write its own plugin? Yes, but from my experience there is no need of that, as the resources_plugin covers actor with external event-loop interactions, and all other plugins are quite rotor-internals specific. There is a nuance, when you'd like to inspect messages flow, and the build-in messages dumper is not OK for your (e.g. you'd like to enable messages traffic only for a specific supervisor, or add your own filtering logic or decorations for your custom messages) — in that case, you'd need to write a plugin (and own supervisor, and insert it there). Again, this is depths of the rotor, and currently, I'd tend to view plugin system is not a public API, hence, it is not documented.


      Yes, please, clarify sobjectizer related parts… I'll update the current article as well as the planed Russian one. I think it would be valuable for users of the both — of sobjectizer and rotor.

        0
        Again, this is depths of the rotor, and currently, I'd tend to view plugin system is not a public API, hence, it is not documented.

        In that case, "pluginization" of the rotor is just an implementation detail that could not be seen as a valuable addition for a user. You, as a maintainer of the framework, can have significant benefits from "pluginization", but end user will see your standard plugins just as subparts of the tool.

          –1

          While it is not documented how to allow user write own plugins, it is supposed that a user will use the shipped plugins, e.g. as in the example below:


          resources->acquire(resource::db_connection);

          You are right in the sense, that it is part of actor API, and not of it's underlying implementation (plugin API).

            0

            My notice is more related to the promotion of the new version of the rotor. I think that materials used for the promotion of the new version should emphasize the things valuable for the end-users. If your plugins can be seen mostly as a part of public API then there is no difference for an end-user between the actual implementation of that API: implemented it as a plugin or as a hardcoded-and-not-changeable part.

          +1
          Yes, please, clarify sobjectizer related parts…

          Agents in SObjectizer also have I- and S-phases.


          I-phase is implemented by two virtual methods of agent_t class. The first is so_define_agent method. This method is called during the preparation of resources needed for an agent. Usually so_define_agent is used for the creation of subscriptions for the agent and for switching the agent to the initial state. It's important to note that during the call to so_define_agent the agent is not a part of the SObjectizer Environment yet. And if some error occurs during this stage then the agent will automatically be removed and all associated resources will automatically be freed.


          The second method is so_evt_start. It is the first method called for an agent after the successful registration of an agent in the SObjectizer Environment. This method is called on the context of the dispatcher the agent is bound to. And this method is usually used for performing initial actions like sending messages of registration of new agents.


          When so_evt_start is called the agent is already a part of the SObjectizer Environment. It means that a failure inside so_evt_start (e.g. an exception is thrown out from that method) is treated like any other agent's failure.


          The S-phase is implemented via the so_evt_finish method. This is the last method called for an agent on the context of the agent's dispatcher. It's rarely used, but if so it is usually used for sending some messages or performing some cleanup action that can only be performed on the context of agent's dispatcher.


          So the agents in SObjectizer have their lifetimes in the form "not_registered" -> "in_registration_phase" -> "registered" -> "deregistering" -> "destroyed". Where so_define_agent is called during the switch from "not_registered" to "in_registration_phase", so_evt_start is called during the switch from "in_registration_phase" to "registered", so_evt_finish is called during the switch "registered" -> "deregistering".


          The shutdowner from so5extra (as well as stop_guards) is intended for a different task: preventing of the fast shutdown of the SObjectizer Environment after the invocation of environment_t::stop method if some agents need to perform long-lasting shutdown-related actions.

            –1
            Agents in SObjectizer also have I- and S-phases.

            I think there is some misunderstanding… The so_evt_start in sobjectizer (and on_start in rotor) are not designed to be part of the initialization, instead, the method purpose it to let agent (actor) play trigger some activity, e.g. request something or do some I/O. Otherwise, without the methods, as actors are passive/reactive by their nature, there will be nobody to initiate activity.


            So, the so_define_agent is the only proper location for initialization and resources acquisition. However, since so_define_agent and so_evt_start they are passive (from the agent perspective, as they are called outside) — they are just a lifetime callbacks, not phases.


            Let me give an abstract example, what I mean:


            struct actor_t {
                void on_init() { ... }
                void on_start() { ... }
            }

            Here is the on_init method. After it the actor is either initialized or not, is has no chance to do "long" (aka asynchonous) initialization. This is because it is simple callback, not a phase/process. Any resources, if needed, can be acquired synchously only. Contrary, the following actor:


            struct actor_async_t {
                void on_start() { ... }
                void on_init(init_token_t t) {
                    ...; // subscribe to messages or trigger I/O on event loop
                    token = t;
                }
            
                void on_message_or_event_loop_event(...) {
                    token.init_complete();
                }
            
                void on_other_message_or_fail_event_loop_event(...) {
                    token.init_fail(error_code);
                }`
            
                init_token_t token;
            }

            After on_init the asynс actor is still initializing, and might postpones the initialization decision, until it gets the whole picture (i.e. until it receives enough messages/events to make init-decision).


            Here is a more concrete example: I'm writing torrent-like client. There is a protocol: the client should announce self in the remote server, before searching other peers. The announce message contains public endpoint (i.e. reacheable IP:port pair, which must be previously opened on the router via UPNP-protocol).


            Using the init-phase it could be done as following. The communiactor_actor (which announces self and searches for other peers), during initialization phase discovers and links to upnp_actor (alternative: it can spawn & link to upnp_actor), and then makes an request for the public endpoint to the upnp_actor. The upnp_actor actually makes an HTTP-request, receives HTTP-response, parses it and replies back to the communiactor_actor. If everything is OK, communiactor_actor announces self, and only then completes its initialization (and then can be asked for searching peers). The entire process is async and non-blocking.


            In the reality, there might be a few other layers of indirection, like adding resolver_actor and http_actor, which also participate in the I-phase of upnp_actor. So, I-phase is scalable in that sense; as well as you can see it is a process, not a single callback invokation.


            I really don't know how to achieve that with the simple actor lifetime callbacks. The only way I see it to do in so_define_agent() is drop actors layering/communication (because it is not part of the Environment), and do HTTP-request, synchronously (including resolving and connecting), and synchronously wait for HTTP-response.


            That's why in the article it is told, that shutdowner in sobjectizer mimics the S-phase in rotor, because it is "long lasting" and not a simple callback.

              +1

              It seems the root of misunderstanding is in point of view to lifetime. There are at least two different viewpoints:


              • viewpoint of actor framework;
              • viewpoint of end-user who writes actual actors.

              From actor framework viewpoint actor is initialized when all related resources are allocated and bound to the actor and actor is able to receive and handle messages sent to it. I am convinced that in that viewpoint initialization phase can't be asynchronous and can't last long.


              From the user viewpoint the initialization sometimes can require a lot of interaction with other actors and can take a long time. But the actor framework itself sees the actor as fully initialized and working. So such initialization is a part of business logic, not a part of lifetime from actor framework point of view.


              In SObjectizer if an actor requires a long-lasting initialization phase it is implemented via agent's states. This is also true for long-lasting deinitialization phase.


              Shutdowner and stop-guards are necessary in SObjectizer just because a call to environment_t::stop can finish all message-processing almost immediately without any chances to send or receive new messages.


              It also seems that your choice is to integrate a concept of initialization and deinitialization phases (from business logic's point of view) into the framework.

                0

                Thanks a lot for explanation. I have published the update to the article, fixing sobjectizer related corrections, based on your comments. The Russian version of the article will be published with fixed information.


                I completely agree with your statements.


                The underlying reason for rotor that it's initialization and shutdown phases where asynchronous since the beginning, because subscription to addresses was made as messaging too, which is non-atomic; especially if address belongs to to different threads/event loops. I consider this as on of the rotor features, however if you don't need it it brings unneeded complexity, and then better to use sobjectizer :)

          +1

          I want to clarify one thing relate to the rotor. Let's assume we have service actor that receives some do_something messages. That actor processes those messages and sends replies to the address specified inside do_something message. Something like:


          struct do_something {
             ... // Some parameters.
             rotor::address_ptr_t reply_to_; // Address for the result.
          };
          
          class service : public rotor::actor_base_t {
             ...
             void on_do_something(rotor::message_t<do_something> cmd) {
                ... // Actual processing.
                send<some_result>(cmd->payload.reply_to_);
             }
          };

          I suspect that the usage of that pattern isn't safe in rotor because at the time of calling send in service the destination of the some_result can go away. And the safe way of using such a pattern is establishing a link between service and destination actors before sending some_result.


          PS. There could be more complex scenarios like actor A sends do_something to service actor with the address of actor B in do_something::reply_to.

            0

            Yes, you correctly catch the issue and the solution about virtual linking. However, usage pattern is slightly wrong, as there is no need of direct shipping of reply_to in the messages (it is indirectly included if request/response pattern is used.) For regular messages it would look like;


            namespace payload {
            struct do_something_t { ... };
            }
            
            namespace message {
            using do_something_t = rotor::message_t<payload::do_something_t>;
            }
            
            struct client_actor_t {
               void configure(...) {
                    // discover and link to server;  
               } 
            
              void on_start() {
                 // safe to do as it is already linked to server
                 send<payload:: do_something_t>(server, ...);
              }
            
              rotor::address_ptr_t server;
            };
            
            struct server_actor_t {
              void configure(...) {
                  // register self in a registry and subscribe
              }
            
              void on_do_something(message::do_something_t) {
                  ...;
              }
            };
            

            In the case of request/response it will be changed a litte bit:


            namespace payload {
            struct request_t { ... };
            struct response_t { ... };
            }
            
            namespace message {
            using request_t = rotor::request_traits_t<payload::request_t>::request::message_t;
            using response_t = rotor::request_traits_t<payload::response_t>::response::message_t;
            }
            
            struct client_actor_t {
               void configure(...) {
                    // discover and link to server;  subscribe
               } 
            
              void on_start() {
                 // safe to do as it is already linked to server
                 request<payload::request_t>(server, ...).send(timeout);
              }
            
              void on_response(message::response_t) {
                ...
              }
            
              rotor::address_ptr_t server;
            };
            
            struct server_actor_t {
              void configure(...) {
                  // register self in a registry and subscribe
              }
            
              void on_request(message::request_t& req) {
                ...;
                // usually it is safe to reply
                reply(req, ...);
              }
            ;}
            

          Only users with full accounts can post comments. Log in, please.