Retrospective on the S3 Bridge for iRODS

I am rapidly approaching the end of my time working on the s3 bridge, and I sit here with a pile of problems and a number of successes. On one hand, I have a reasonably performant project with a relatively low overhead, that happens to work fairly well on the limited operations that it set out to support, on the other I have done very little to prepare for publishing or users actually using the thing. So, let's see what worked and what didn't.

What Worked

  1. Boost ASIO and Boost Beast were excellent to work with, their support for C++ coroutines really bridged the pain points that I had run into previously.
  2. Boost Url is interesting and useful(and absolutely necessary to work with boost beast imho)
  3. Nlohmann json is an excellent library
  4. fmt is a pleasure to use.
  5. CMake is
  6. Separating the S3 actions into their own files and functions worked well.

Let's look at what we got from the more recent ASIO support for coroutines. The core of the connection accepting machinery is contained in this short function.

asio::awaitable<void> listener()
    auto executor = co_await this_coro::executor;
    asio::ip::tcp::acceptor acceptor(executor, {asio::ip::tcp::v4(), port});
    for (;;) {
        asio::ip::tcp::socket socket = co_await acceptor.async_accept(boost::asio::use_awaitable);
        asio::co_spawn(executor, handle_request(std::move(socket)), asio::detached);

Were it not for the compiler magic of co_await and its friend co_spawn, this would be a nest of callback oriented code, and while that's perfectly fine, I guess, it's not a very pleasant experience to write(I suppose that I didn't have this reaction to next.js back before that got async support, so it must just be the overhead of writing a type to manage the context for this). But with C++ coroutines, all that is handled automatically, and it's wonderful honestly. With a bit more machinery in the standard library, C++ coroutines have the opportunity to bring a lot of benefit to the language in the future.

Boost URL is new enough that it was added officially to Boost during the period I was working on this project, and while I'm sure that it was in the works for quite a while, it is definitely not coming a moment too soon.

What didn't

I had a "my machine" first mentality that has reared its ugly side, and if I had been building docker images and toolchains to start with I would have a much more reasonable view of what the future will look like with this piece of software. Bolting this stuff on at the end is frustrating and nearly derailed the entire project up to this point(oof, ouch, my poor local build environment).

vcpkg was the source of a number of mysterious problems at the end, and I'm still not exactly sure what caused it to suddenly start shorting out where it did, this may just be a result of the idiosyncratic way that iRODS handles its dependencies and externals, but it might also just point to vcpkg still not quite being good enough for the purpose it seeks to fulfill(I'm sure that the development team has a number of ideas to bring the ring closer to being closed, even if C and C++ compilation and dependency management is likely to be a source of trouble for quite a while in the future).

I think the best way to prevent this from happening is to ensure that I have it working that way to begin with. Local compilation is nice, but if I can't build it on a docker container then what hope do other people have of building it? People who aren't the author of a program tend to be far less tolerant of difficulty building it than the author is, just like they are less tolerant of debugging problems.

daily report(7/12)

I have spent today cleaning up comments from previous commits. I have come to the conclusion that the hooks exist for the sake of the plugins who actually maintain their own state, so the configuration_hook stuff will have to go into the irods core library rather than the executable code.

That way I can put all the authentication initialization in the place where it makes sense, the authentication plugins themselves.

What that might require is adding a hook to load new plugins, but that strikes me as being a potentially thorny situation.

Daily report(7/1)

Today was successful. I was able to get the control plane to stop crashing and get the configuration tests to run to completion properly. The benchmark is written and reloading the configuration is so much faster than killing the irods server and starting it again.

Of course, that means it has limitations. Currently it is believed that the authentication modules do not work properly when the settings are changed. Honestly, this isn't a very simple program, apache is way simpler than irods, especially in terms of the dependencies, and I'm skeptical that this can be made to work perfectly in all situations.

The problem I had yesterday was the control plane catching the SIGHUP signal and terminating the server. With that fixed, it's working like a charm, all I need to do is see if we can replace some more of the calls to restart in the tests. Might even gain a percent off the tests 🙂

Daily Journal 6/29

Today has been frustrating. It has been beset by constant crashes and test failures. Today it seems that the previous bug where the delay server would die constantly because runtime values were erased appears to have returned with a vengeance.

So far I have been isolating what is causing the particular issue in the reload function(which is now faulting on something)(that something was the .end() sentinel).

After all that, I have it working again, well, working enough to fail tests.

Daily Report(6/27)

Today has been spent addressing comments made on the pull request.

  • Address that the field visibility of acquire_*_lock functions implied that it was meant to be derived
  • Replace rodslog calls in get_server_property with the modern logging system
  • Restore some whitespace

In addition, I updated the rodsLog calls in rodsServer and irods_server_properties.

I still need to rewrite the hook manager to avoid introducing yet another singleton

Daily Report(6/22)

I am starting today by running more of the test suites to see if the changes have broken anything unrelated. So far so good, except for finding that the build+test script was running the specific test "0"(uninitialized variable).

Resource testing seems to fail on the same spots as before erratically. I'm going to assume that that's fixed in main.

I should really pull from main sooner rather than later, rebasing could get tricky if I'm not careful.

Oddly enough, moving the delay server tests towards using the .reload_configuration() made things slower, this seems to be because until now, the delay server did not refresh the amount of time it waits for. I suspect that there will be many more instances of things like that.

Showing that to be true, the delay server will need to be kicked over whenever the number of executors or the size of the queue changes, as it does not appear that boost::asio::thread_pool supports changing the number of threads mid-lifetime. I should be able to swap it out after calling .stop and .join on the thread pool(I will not use placement new for assignment) :p

Right now, it appears to not be killing the delay server properly. Or grandpa(the toplevel irods process) is falling down unexpectedly.

I need to run over some of this stuff with K and T tomorrow.

The delay server should use the hook manager to change the executor count, and it should probably use .reload instead of .capture

Daily report(6/21)

Today I started by deciding to work on what I anticipate to be an especially complicated bit, the authentication system. Later on in the day, I admitted I should work up to that, so instead I'm looking for low hanging fruits.

One of the first things I have found is that the changes in the server_property object are not being recorded adequately. I have fixed this somehow.

I have added '/' as a subobject path character in msi_get_server_property.

The first option that I investigated as to replacing the calls .restart in the tests is test_delay_queue. So far it appears that it may be using the restarts to do things beyond changing settings in many places.

Tests where .restart was replaced with .reload_configuration

  • test_sigpipe_in_delay_server
  • test_delay_queue_with_long_job (no)
  • test_exception_in_delay_server
  • test_static_peps_can_delay_rules_without_session_variables
  • test_session_variables_can_be_used_in_delay_rules_indirectly
  • test_delay_block_with_output_param__3906
  • test_delay_queue_connection_refresh

Daily report(6/20)

Currently the issue I am facing is the rule language. Given that this was a sticking point last year, I can't say that I'm surprised. Currently I can't see what the msi_get_server_property is actually returning, and given that they running in the irods agent server it may need additional logic there.

I am having trouble getting the rule to write out the property to stdout, which is not making this task simpler. I have a suspicion that there will continue to be weird issues that will pop up.

The output appears normal to me.

Weekly journal(6/13-6/18)

This week was spent working on writing the configuration_hook_manager's code. I have started working on writing tests.

The primary issues that I have run into have included moving between machines and the performance issue with my desktop's harddrive substantially slowing the work that I am doing.

msiModAVUMetadata is a complicated little micro-service that I'm not sure about. It seems kinda weirdly designed as a function to be called, but it does feel like you're using some little bit of imeta.

I have added a flag into my build+test script in order to allow specifying the tests you want to run.

Right now I need to add the configuration reloading code back to the cron thread, it'll just use SIGUSR1 as a trigger to enable it. That might not be necessary, as the configuration is indeed being reloaded.

Currently I am having trouble getting the test to finish, and there is a conspicuous lack of logs related to the specific failure. Python's lack of ahead of time syntax checking is annoying