A federated architecture for TC (and for all online editions?)
Over the last year we have progressively rolled out TC to more people, and more projects. There is now usually at least one person, and often three or more, logged in to the system and doing something. As a result, people really notice if the system goes down. And, alas, it does. Much of the time, it is not our fault -- as when someone somewhere decided it would be a good idea to install a patch in the cloud system on which our university hosts many of our research virtual servers. Down went our server, for nearly a week. Other times, there is some mysterious thing in the constellation of applications which compose TC which causes memory over-runs and down we go, again. This has led us to start thinking, hard, about what we can do to make TC as bullet proof and totally reliable as we can. One thing was clear: if TC in five years is relying totally on the Saskatchewan servers, it will be dead.
So here is our thinking about a solution. Clearly we need some kind of federation of servers, mirroring and backing each other up. This led us to look at how systems like Gmail and Facebook, etc, work: no matter where you log in from, you get to the latest version of your data — being served to you from multiple servers world wide. Google, Facebook, etc, are invisibly shuttling your data and you from server to server, between sessions and even within sessions. So that was the starting point of our thinking. But we realized we needed more than this. In Saskatchewan, we have built our own interfaces, for our own projects, accessible through a usask.ca web address; our Birmingham partners are also building their own interfaces, for their projects, accessible through bham.ac.uk addresses. We want our system to work so that should the Birmingham system go down, in mid-session, the user would be invisibly moved to the Saskatchewan system — still working through a bham.ac.uk address and interface, without any loss of time or data. And if both Saskatchewan and Birmingham are down, the user is moved to a server in Mexico, or somewhere else, again quite invisibly.
This architecture, it seems to me, would address multiple issues. First, it would go very far to solution of the problems of long term maintenance, interoperability and sustainability which dog us all. Second, it would permit local servers and systems to retain their local branding, naming and interfaces. So Virginia, or Texas, or Madrid, could build something in the TC universe, and have it look and function quite unlike anything anywhere else. And the result is likely to be immense advances in how we make and present richly encoded data, as developers and scholars can concentrate on what is needed for creation and dissemination, without an endless reinventing of what goes in-between.
So: that is the aim. But how to get there? Within the TC universe, Zeth Green has been looking at some possibilities: see his blog at http://zeth.uk/2014/06/12/distributed_global_textual_community.html. There is one area where his thinking and ours converges: the problems with synchronizing distributed databases. Given the complexity of the databases we are developing, and the need to keep these both responsive and current in real time, this is a major difficulty. So one solution, which Zeth canvasses, is NOT to use a database as the fundamental datastore. He outlines one such solution, building on github. Xiaohan Zhang has been looking at others. At the same time, we are becoming even more aware of the ever-looming problem of multiple hierarchies. We deal very effectively (I think) with two hierarchies in TC. But some time, we have to think about more than two. That time might be now. If we are going to rebuild the entire data model underlying TC to enable true federation, then we might take on this one too. Or maybe not. Further, we are not concerned only with backing up the data: we want to back up the entire interface, as it were, so that the whole Birmingham system will run from the Saskatchewan site, should Birmingham go down -- and vice versa. Now that, too, is a challenge.