Retrospective on the S3 Bridge for iRODS

I am rapidly approaching the end of my time working on the s3 bridge, and I sit here with a pile of problems and a number of successes. On one hand, I have a reasonably performant project with a relatively low overhead, that happens to work fairly well on the limited operations that it set out to support, on the other I have done very little to prepare for publishing or users actually using the thing. So, let's see what worked and what didn't.

What Worked

  1. Boost ASIO and Boost Beast were excellent to work with, their support for C++ coroutines really bridged the pain points that I had run into previously.
  2. Boost Url is interesting and useful(and absolutely necessary to work with boost beast imho)
  3. Nlohmann json is an excellent library
  4. fmt is a pleasure to use.
  5. CMake is
  6. Separating the S3 actions into their own files and functions worked well.

Let's look at what we got from the more recent ASIO support for coroutines. The core of the connection accepting machinery is contained in this short function.

asio::awaitable<void> listener()
    auto executor = co_await this_coro::executor;
    asio::ip::tcp::acceptor acceptor(executor, {asio::ip::tcp::v4(), port});
    for (;;) {
        asio::ip::tcp::socket socket = co_await acceptor.async_accept(boost::asio::use_awaitable);
        asio::co_spawn(executor, handle_request(std::move(socket)), asio::detached);

Were it not for the compiler magic of co_await and its friend co_spawn, this would be a nest of callback oriented code, and while that's perfectly fine, I guess, it's not a very pleasant experience to write(I suppose that I didn't have this reaction to next.js back before that got async support, so it must just be the overhead of writing a type to manage the context for this). But with C++ coroutines, all that is handled automatically, and it's wonderful honestly. With a bit more machinery in the standard library, C++ coroutines have the opportunity to bring a lot of benefit to the language in the future.

Boost URL is new enough that it was added officially to Boost during the period I was working on this project, and while I'm sure that it was in the works for quite a while, it is definitely not coming a moment too soon.

What didn't

I had a "my machine" first mentality that has reared its ugly side, and if I had been building docker images and toolchains to start with I would have a much more reasonable view of what the future will look like with this piece of software. Bolting this stuff on at the end is frustrating and nearly derailed the entire project up to this point(oof, ouch, my poor local build environment).

vcpkg was the source of a number of mysterious problems at the end, and I'm still not exactly sure what caused it to suddenly start shorting out where it did, this may just be a result of the idiosyncratic way that iRODS handles its dependencies and externals, but it might also just point to vcpkg still not quite being good enough for the purpose it seeks to fulfill(I'm sure that the development team has a number of ideas to bring the ring closer to being closed, even if C and C++ compilation and dependency management is likely to be a source of trouble for quite a while in the future).

I think the best way to prevent this from happening is to ensure that I have it working that way to begin with. Local compilation is nice, but if I can't build it on a docker container then what hope do other people have of building it? People who aren't the author of a program tend to be far less tolerant of difficulty building it than the author is, just like they are less tolerant of debugging problems.

daily report(7/12)

I have spent today cleaning up comments from previous commits. I have come to the conclusion that the hooks exist for the sake of the plugins who actually maintain their own state, so the configuration_hook stuff will have to go into the irods core library rather than the executable code.

That way I can put all the authentication initialization in the place where it makes sense, the authentication plugins themselves.

What that might require is adding a hook to load new plugins, but that strikes me as being a potentially thorny situation.

Daily report(7/1)

Today was successful. I was able to get the control plane to stop crashing and get the configuration tests to run to completion properly. The benchmark is written and reloading the configuration is so much faster than killing the irods server and starting it again.

Of course, that means it has limitations. Currently it is believed that the authentication modules do not work properly when the settings are changed. Honestly, this isn't a very simple program, apache is way simpler than irods, especially in terms of the dependencies, and I'm skeptical that this can be made to work perfectly in all situations.

The problem I had yesterday was the control plane catching the SIGHUP signal and terminating the server. With that fixed, it's working like a charm, all I need to do is see if we can replace some more of the calls to restart in the tests. Might even gain a percent off the tests 🙂

Daily Report(6/27)

Today has been spent addressing comments made on the pull request.

  • Address that the field visibility of acquire_*_lock functions implied that it was meant to be derived
  • Replace rodslog calls in get_server_property with the modern logging system
  • Restore some whitespace

In addition, I updated the rodsLog calls in rodsServer and irods_server_properties.

I still need to rewrite the hook manager to avoid introducing yet another singleton

Daily Report(6/22)

I am starting today by running more of the test suites to see if the changes have broken anything unrelated. So far so good, except for finding that the build+test script was running the specific test "0"(uninitialized variable).

Resource testing seems to fail on the same spots as before erratically. I'm going to assume that that's fixed in main.

I should really pull from main sooner rather than later, rebasing could get tricky if I'm not careful.

Oddly enough, moving the delay server tests towards using the .reload_configuration() made things slower, this seems to be because until now, the delay server did not refresh the amount of time it waits for. I suspect that there will be many more instances of things like that.

Showing that to be true, the delay server will need to be kicked over whenever the number of executors or the size of the queue changes, as it does not appear that boost::asio::thread_pool supports changing the number of threads mid-lifetime. I should be able to swap it out after calling .stop and .join on the thread pool(I will not use placement new for assignment) :p

Right now, it appears to not be killing the delay server properly. Or grandpa(the toplevel irods process) is falling down unexpectedly.

I need to run over some of this stuff with K and T tomorrow.

The delay server should use the hook manager to change the executor count, and it should probably use .reload instead of .capture

Daily report(6/20)

Currently the issue I am facing is the rule language. Given that this was a sticking point last year, I can't say that I'm surprised. Currently I can't see what the msi_get_server_property is actually returning, and given that they running in the irods agent server it may need additional logic there.

I am having trouble getting the rule to write out the property to stdout, which is not making this task simpler. I have a suspicion that there will continue to be weird issues that will pop up.

The output appears normal to me.

Weekly journal(6/13-6/18)

This week was spent working on writing the configuration_hook_manager's code. I have started working on writing tests.

The primary issues that I have run into have included moving between machines and the performance issue with my desktop's harddrive substantially slowing the work that I am doing.

msiModAVUMetadata is a complicated little micro-service that I'm not sure about. It seems kinda weirdly designed as a function to be called, but it does feel like you're using some little bit of imeta.

I have added a flag into my build+test script in order to allow specifying the tests you want to run.

Right now I need to add the configuration reloading code back to the cron thread, it'll just use SIGUSR1 as a trigger to enable it. That might not be necessary, as the configuration is indeed being reloaded.

Currently I am having trouble getting the test to finish, and there is a conspicuous lack of logs related to the specific failure. Python's lack of ahead of time syntax checking is annoying

Daily report(6/16/2022)

Work continues slowly on the configuration_hook_manager. I moved my installation over to an ssd, so now development is happening faster.

At some point I should look into running an lsp server from inside the irods builder, I might actually be able to get good completion and navigation.

I have moved the configuration update logic so that it reloads every loop from the main thread. This may need revision or further amendment in other main loops.

The next thing I need to do is writing tests for this, and then revising the existing tests to use this instead of .restart() when possible. There are three places I would like to see work, some basic delay server properties, and some agent server properties that are relevant.

I don't think a more comprehensive JSON differ would enhance the abilities to satisfy the needs of the server thus far. If something like that is necessary it shouldn't be too hard to adapt it.

The tests will require the usage of a rule to query the server_properties of the server. So I have written a microservice called msi_get_server_property which allows it to query it. With luck it will permit the tests to avoid all the timing issues reading the log is susceptible to.

iRODS daily report(6/15/2022)

I have continued to work on configuration_hook_manager. It is slowly coalescing into something that can truthfully said to compile. There are a few issues that I have to figure out, such as how to handle nested properties, and whether or not dispatching every event to every hook makes sense from a performance point of view(okay, fine it won't matter yet, but it might one day).

Another point in this design is that it does not currently measure changes to nested objects such as advanced_settings in a manner that I find satisfying, namely, anything inside it has to watch for changes on its own.

Much of today was spent trying to get the development/testing environment working 100% again. The build script now manages to clean up everything from previous invocations, and takes options to skip rebuilding irods or the number of tests you want to run at once(I can't run 16 at once, my harddrive is too slow for that, but 2 works).

But as I write this, I'm not sure this is the right action, I think it might not even need the configuration hook manager for the immediately desired outcome of this(making testing faster by eliminating reloads).

I have yet to attach the SIGHUP handler to the reload mechanism. And I still need to talk to J about their ideas for this.

iRODS progress report 6/3/2022

What's been done

  • Development environment copied to laptop and working.
  • Using shared_mutex to synchronize access to the server_properties object
    • [x] Server compiles
    • [x] Server tests run successfully without adding regular calls to server_properties::capture
    • [ ] Server tests run successfully with calls to capture.
  • [ ] Integrate calls to capture back into the cron system

Problems run into so far

  • I didn't have vim installed on my laptop
  • The build script was set to use the 16 cores on my desktop
  • After adding the shared_mutex it still crashes when capture was called

What's next

  • use --leak-containers to keep the container up for long enough for me to muck around in when running the script it and see what happens. Alternatively I could use to just stand up a server.
  • Investigate where the map() function is called on server_properties. It might be a good idea to do away with it if this is preventing thread safety.