Today was successful. I was able to get the control plane to stop crashing and get the configuration tests to run to completion properly. The benchmark is written and reloading the configuration is so much faster than killing the irods server and starting it again.
Of course, that means it has limitations. Currently it is believed that the authentication modules do not work properly when the settings are changed. Honestly, this isn't a very simple program, apache is way simpler than irods, especially in terms of the dependencies, and I'm skeptical that this can be made to work perfectly in all situations.
The problem I had yesterday was the control plane catching the SIGHUP signal and terminating the server. With that fixed, it's working like a charm, all I need to do is see if we can replace some more of the calls to restart in the tests. Might even gain a percent off the tests 🙂
Today has been frustrating. It has been beset by constant crashes and test failures. Today it seems that the previous bug where the delay server would die constantly because runtime values were erased appears to have returned with a vengeance.
So far I have been isolating what is causing the particular issue in the reload function(which is now faulting on something)(that something was the .end() sentinel).
After all that, I have it working again, well, working enough to fail tests.
Today has been spent addressing comments made on the pull request.
- Address that the field visibility of acquire_*_lock functions implied that it was meant to be derived
- Replace rodslog calls in get_server_property with the modern logging system
- Restore some whitespace
In addition, I updated the rodsLog calls in rodsServer and irods_server_properties.
I still need to rewrite the hook manager to avoid introducing yet another singleton
Today has been spent getting the tests working for the pull request. I apparently need to reformat the files, but for some reason, git clang-format isn't making that especially easy
I am starting today by running more of the test suites to see if the changes have broken anything unrelated. So far so good, except for finding that the build+test script was running the specific test "0"(uninitialized variable).
Resource testing seems to fail on the same spots as before erratically. I'm going to assume that that's fixed in main.
I should really pull from main sooner rather than later, rebasing could get tricky if I'm not careful.
Oddly enough, moving the delay server tests towards using the
.reload_configuration() made things slower, this seems to be because until now, the delay server did not refresh the amount of time it waits for. I suspect that there will be many more instances of things like that.
Showing that to be true, the delay server will need to be kicked over whenever the number of executors or the size of the queue changes, as it does not appear that
boost::asio::thread_pool supports changing the number of threads mid-lifetime. I should be able to swap it out after calling .stop and .join on the thread pool(I will not use placement new for assignment) :p
Right now, it appears to not be killing the delay server properly. Or grandpa(the toplevel irods process) is falling down unexpectedly.
I need to run over some of this stuff with K and T tomorrow.
The delay server should use the hook manager to change the executor count, and it should probably use .reload instead of .capture
Today I started by deciding to work on what I anticipate to be an especially complicated bit, the authentication system. Later on in the day, I admitted I should work up to that, so instead I'm looking for low hanging fruits.
One of the first things I have found is that the changes in the server_property object are not being recorded adequately. I have fixed this somehow.
I have added '/' as a subobject path character in
The first option that I investigated as to replacing the calls
.restart in the tests is
test_delay_queue. So far it appears that it may be using the restarts to do things beyond changing settings in many places.
Tests where .restart was replaced with .reload_configuration
Currently the issue I am facing is the rule language. Given that this was a sticking point last year, I can't say that I'm surprised. Currently I can't see what the
msi_get_server_property is actually returning, and given that they running in the irods agent server it may need additional logic there.
I am having trouble getting the rule to write out the property to stdout, which is not making this task simpler. I have a suspicion that there will continue to be weird issues that will pop up.
The output appears normal to me.
This week was spent working on writing the configuration_hook_manager's code. I have started working on writing tests.
The primary issues that I have run into have included moving between machines and the performance issue with my desktop's harddrive substantially slowing the work that I am doing.
msiModAVUMetadata is a complicated little micro-service that I'm not sure about. It seems kinda weirdly designed as a function to be called, but it does feel like you're using some little bit of imeta.
I have added a flag into my build+test script in order to allow specifying the tests you want to run.
Right now I need to add the configuration reloading code back to the cron thread, it'll just use SIGUSR1 as a trigger to enable it. That might not be necessary, as the configuration is indeed being reloaded.
Currently I am having trouble getting the test to finish, and there is a conspicuous lack of logs related to the specific failure. Python's lack of ahead of time syntax checking is annoying
Work continues slowly on the
configuration_hook_manager. I moved my installation over to an ssd, so now development is happening faster.
At some point I should look into running an lsp server from inside the irods builder, I might actually be able to get good completion and navigation.
I have moved the configuration update logic so that it reloads every loop from the main thread. This may need revision or further amendment in other main loops.
The next thing I need to do is writing tests for this, and then revising the existing tests to use this instead of
.restart() when possible. There are three places I would like to see work, some basic delay server properties, and some agent server properties that are relevant.
I don't think a more comprehensive JSON differ would enhance the abilities to satisfy the needs of the server thus far. If something like that is necessary it shouldn't be too hard to adapt it.
The tests will require the usage of a rule to query the
server_properties of the server. So I have written a microservice called
msi_get_server_property which allows it to query it. With luck it will permit the tests to avoid all the timing issues reading the log is susceptible to.
Today I have been investigating enhancing the
reload() method with a list of difference objects. So far this has been relatively simple to implement.
Next week I plan to figure out the design of what will become the
configuration_change_hook_manager facility, but let's go through some preliminary thoughts
- The hook should be addressed by the configuration property's name.
- The hook should have the new and old values passed to it.
- The hook manager should be invoked on the list of changes to the configuration
- configuration_hook should follow a builder pattern as the
- It should make noise about unhooked configuration changes in the log
- This is important for debugging and properly informing admins about the potential issue not restarting their server might cause.
- It might be worth adding a configuration option to make it error out when a configuration change isn't able to be handled.
- I should talk to J about this since their work is likely to intersect it.