There is a paragraph in Harari's "Homo Deus" that is almost visionary when you look at the current situation in Britain, Germany and other western countries (except the USA). ““Ordinary voters are beginning to sense that the democratic mechanism no longer empowers them. The world is changing all around, and they don’t understand how or why. Power is shifting away from them, but they are unsure where it has gone. In Britain voters imagine that power might have shifted to the EU, so they vote for Brexit. In the USA voters imagine that ‘the establishment’ monopolizes all the power, so they support anti-establishment candidates such as Bernie Sanders and Donald Trump. The sad truth is that nobody knows where all the power has gone.” ”And, a few lines later: ““Precisely because technology is now moving so fast, and parliaments and dictators alike are overwhelmed by data they cannot process quickly enough, present-day politicians are thinking on a far smaller scale than their predecessors a century ago. Consequently, in the early twenty-first century politics is bereft of grand visions. Government has become mere administration. It manages the country, but it no longer leads it.”― Yuval Noah Harari, Homo Deus: A Brief History of Tomorrow ”
Harari quickly adds, that not all visions have been good, and that a state that just needs administration is probably not in a bad shape. But it leaves a void. A void that is even more problematic as after 40 years of Neoliberalism and the triumph of human liberalism, the field of public participation and engagement has eroded. The idea of society as being fair und just and for everybody to live in comfort has disappeared and been replaced by market religion. Who are the culprits? The digital revolution with its massive influence on our behavior? Jakob Augstein has collected several opinions in the new book "Reclaim Autonomy" but if you read through those, they mostly blame the big 5 (Google, Facebook, Amazon etc.) for all the evil that fell on us. (I am going to comment some more on this book later). The refugees overwhelming the western states have become another negative vision and it has been claimed by the far right corner of the society (Filling the void left by the dying social democracy). I recently went to a talk by Harald Melcher on "the process of civilization" where he claimed, that we seem to have no positive visions left. Which visions are we currently left with? The "data religion" as described in Harari is a rather cold and techno-centric vision that leaves lots of people not knowledgeable enough in Cybernetics and Biotechnology behind, as he admits. And then there is only two others left: The dream of an Islamic world according to the IS, or "lets make America great again", which is an isolationist and backward looking vision.
Again: where did all the power go? It went into process. James Beniger's Control Revolution is now - supported by Big Data and Machine Intelligence - running the world like a clockwork. Except that some of those processes are on a track that is going to hit obstacles rather sooner than later. Obstacles like the climate change, the pension problem, automations effect on jobs, diesel exhaust gases etc. ( TTIP would have been the mother of all control revolutions: economy regulating itself with the help of some lawyers from New York and London). People notice those problems and -as they have a lot to lose in western societies - they get scared. People know that they are only passengers in the western representative democracy. And now they wonder if there is somebody left in the driver seat. Fear turns into rage easily and that is what we see.
How about driving by yourself? Well that's REALLY uncomfortable for all participants, but mostly for those in power as can be seen with the BREXIT. Or with the Trump election which surprised the political elite and Silicon Valley a lot. I don't think people are unwilling to accept change. They accepted the Energiewende and are paying for it. Just to see later on, how it got lost somewhere. The economic imperalism behind globalization, Europe, TTIP etc. is now causing its downfall als people are starting to lose trust in the systems. Will more direct democracy replace the losing power of representation? The economy and political parties will fight it to the end, as it is quite uncomfortable for "the process". It is so much easier to tell people, that things are without alternative ("alternativlos" - Merkels most favourite word). But how come we hear about alternatives like free public transit once the political establisment faces a serious crisis like right now in Germany? That leaves us with the hope that change is still possible! For the rest, I guess we will just have to wait and see what is going to happen in Asia. Could be that they are not only taking over economically but also with respect to future visions of society based on social credits.
I wanted to write a bit on security several times now but never really got around to do it. It started when Linus Thorvalds and the Google team that chases security problems had that little dispute on how to deal with security relevant bugs in the kernel. While the Google team implemented a hard kernel panic when they realized some exploitable bug, Thorvalds got mad and told them to fix the bugs instead of shutting down live systems. I guess both sides have some good arguments for their strategy. I am sure users hate kernel panics which in most cases would not have prevented a security exploit because the system wasn't under attack anyway. But what if the bug doesn't get fixed because nobody cares?
That scenario reminds me about something that happend when I was a young system programmer in the Unix kernel devision of Siemens. As usual in such kernel teams, young professionals take over responsibilities for drivers and other low level stuff and gradually move into more complex areas. One morning I had just started my workstation and ran a testprogram against one of my drivers, just to be rewarded with a nice kernel panic! As more and more developers slowly trickled into the office, you could hear astonished calls throughout the floor. Soon we realized that we ALL got those panics with our respective drivers! What had happened. As usual, a new kernel had been built over night and we installed it in the morning on our development machines. Something had to be wrong with the kernel built last night. We started to look at the error message which said something about interrupt levels not being cleared at return to user. Hm, sounded kind of strange. Finally, after some confusion, guessing and reboot, a mail from an experienced developer in the kernel team told us what was going on: He had detected one major reason for performance issues in the kernel: Many drivers have to block interrupts from the hardware while they are doing critical operations like updating shared structures (this was in the old days of monolithic kernels and shared threads between kernel and user area). After the critical section they unlock interrupts as quickly as possible because otherwise events from the hardware are not handled as fast as possible. The problem was very simple: The driver developers tended to forget the unlock operation. We all know that developers are unable to handle pairwise operations, because there are endless ways of bypassing the unlock statement. This is true for memory allocations, open/close style APIs and of course also for locking and unlocking interrupts. Some kind soul had made a little change in the kernel at exactly the moment, when a users call into the kernel would return to the user. If the interrupts were still blocked, the code would simply reset the mask and unlock everything. It may have even printed some warning in some logfile, but I never saw anything and most developers probably wouldn't have cared anyway.
After many futile attempts to make the colleagues aware of the performance consequences of forgetting the unlock call, he simply changed the code at return to the user into a call to panic! What can I say? It worked. On that day EVERY driver developer found SEVERAL places in his or her code, where the unlock call had been forgotten. The impact on kernel performance was measurable. Of course everybody was mad at the guy who changed the code to panic. But honestly - would things have improved otherwise? It would have been better to tell the colleagues in advance about what he was going to do. But changing the code to panic was definitely the right way to go.
Is this example the same as calling panic when a bug is detected, that might have a security consequence? Yes and no I guess. Yes, because it WILL cause the bug being fixed quickly. No, because unlike the first case, the kernel itself might be corrupted. And how would you deal with this? In the first year of my work as a kernel engineer I remember a course from Instruction Set (a British company doing IT-courses). I saw a call to panic in the kernel code and asked why the code did this? I remember saying that this could cause data loss for users and that there should be a better alternative. The instructor simply asked: Which one? Once a kernel discovers an inconsistency in its own data structures, it cannot trust anything within the kernel anymore. By continuing and trying to save user data e.g. it could even destroy more of those data. So the best decision is to stop working at all. (on a sideline: what if the kernel would run in an autonomous car? Can we just call panic? I guess not. So what kind of architecture is needed in this case?). I don't do kernel work anymore and I don't know how microkernels could handle those situations better. But I think if a cloud datacenter discovers that its zookeeper cluster is inconsistent, there is little that can be done except for a reboot of everything if a fallback to a previous, consistent state is not possible. It is like in the case of the lost Ariane rocket due to an exception not handled. It sounds really stupid to lose a 500 Mio Euro rocket because of such a thing. But once you start thinking hard about other options you realize that those are far and few.
So how should we handle bugs in the kernel that (might) have security consequences? Are they just bugs as Thorvalds claims? I tend to agree. A potential buffer overflow is a software bug first and for all. The fact that it can be used for a security exploit is secondary. Bad input validation that crashes or corrupts an application is a software bug and nothing else. Quickly pointing at potential hackers or even blame them for insecure systems just lets developers hide behind their buggy and bad code.
This brings us to another trigger for writing all this: the paper Adrian Coyler discussed today in the Morning Paper: Daniel Bernsteins 10 year old paper on his product qmail. Reading how Berstein created qmail without a security bug in many years makes you realize that secure software IS possible. And sadly, how little changed in software development with respect to security in all those years. One year after the Bernstein paper Roland Schmitz and myself published our second book, this time on secure systems. We discussed the Bernstein paper in our book and extracted his methods for bug-free software to be put in our toolbox for damage reducing systems. And Adrian Coyler mentions three things Bernstein said to not do: Don't blame insecure software on hackers. It is your software bugs that make the attacks possible. Not the other way around. Don't focus on attacks - focus on making your system robust, e.g. with watching out for integer overflows.
People are usually suprised when I claim that pen-testing is a very limited and unsustainable approach to secure software or systems. But pen-testing won't make your systems any safer except for the few bugs found an fixed. Nothing in the architecture has been changed and you are just waiting for the next security bug to be detected by hackers. As Bernstein said, we need to fix the foundations of our systems and the way we develop software, not focus on attacks.
Coyler was surprised that Bernstein accused the concept of least privilege to lull us into a false sense of security. Bernstein said, that minimizing privilige does not mean minimizing trusted code. And he is right of course. Trusted code is nothing else than "ambient authority" lying around, waiting to be exploited with attacks like "confused deputy". If those terms don't mean anything to you, read our book or go to erights.org for further information from the capability movement.
And with the last of Bernstein's no no's: speed over security, we get straight back to todays problems. Qmail carefully splits dataflows between users and lets those flows run with the respective users rights instead of root. There are certainly faster ways of doing things, but are the as safe? And a total focus on performance over security seems to be behind todays IT-nightmares with the names Meltdown and Spectre. The morning paper has discussed the main scientific papers and on highscalability.com the latest papers on Retpoline - Googles magic code rewrite technology that supposedly even cures Spectre-attacks (this is actually disputed) can be found. I won't go into the details here but one thing is clear: we could have known this long ago. Two years ago in my master seminar we looked at Intel Uefi and system management firmware attacks. The firmware used unprotected function pointers to call into the OS. A year later we learned about major problems in Intels remote management firmware. I do not know any fundamental cure against covert channels. It is true that the attacks mentioned combine an isolation failure caused by parallelization with a covert channel. The isolation failures itself would be inconsequential. But again, as no fundamental cure against covert channels is known (finally, all information processing has a physical dimension as well, which, given exact tools, can be traced and measured. This fits nicely to the revelation, that EVERYTHING on earth is uniqe and can be used for identification, as long as tools have the necessary granularity for their measurements. From how far can we read RFIDs? Your CPU/Network Connection etc. are all unique even without IDs, just because of their unique behavior.) optimizations for speed that weaken isolation (e.g. shared branch prediction) turn out to be deadly. Looks like hardware developers are not so much different from software developers. Bruce Schneier in his January cryptogram predicts 2018 to be the year of the processor bugs. Now that researchers really take a look at what is going on in our CPUs we can expect many more problems. Just compare the features of a 8088 with todays core-7 CPUs. When I started developing system software I frequently had to port realtime executives to new CPUs. Since we were only developing in C, few assembly code was needed for interrupts and context switches and a short look into the CPU handbook usually was enough to come up with the necessary code. I probably wouldn't understand half of a Intel handbook on core-7 CPUs. Hardware transactional memory, support for VMs and way below the surface the translation from CISC to RISC. Not to mention the magic chip designers use to keep the CPU cool by turning off units etc. I think Schneier is right, it will be an interesting year for everybody responsible for critical infrastructures.
A litte bit of guessing: I would pick complex features like hardware transactional memory to find race conditions and isolation breaking optimizations for performance. I remember a chapter in Jim Gray's famous book on transactions with the title "A shadow world" (or something close to it). To speed up parallel transactions you let every one work on its own copy of the world and at the end cancel those histories which can't become reality due to conflicts. This requires extensive state management and I suspect that there are many functional units involved which will show side-channel effects both for sensing and acting. A test scenario would be to run parallel HTM blocks in different processes and look for cross-process influence.
So what can we do for those infrastructures when our whole hardware/software stack is compromisable? Only the overall architecture of a critical infrastructure can bring damage reduction and resilience. No single IT component will do. Concentration and remote control will weaken our solutions. Decentralization and data flow over control flow will harden them. Many and sometimes even expensive or unconventional ideas will have to be used to ensure damage reduction. In my talk on our security day I gave an overview on the desperate situation in the industry and some ideas for such a system. Looks like the talk is still quite relevant.
Another Gamesday is coming up with a broad an interesting line-up of topics and speakers. We will start with a technical talk on rendering techniques by Clemens Kern, a long term game engine specialist and Alumnus. Next, Simon Haentsch will explain his way into the game industry, followed by Alina Dreesmann from Deck13. She will show us, that environmental storytelling removes the need for huge numbers of NPCs. Finally, Maximilian Krauss from our games institute will show you, how the serious parts of a serious game can indeed improve your game experience. Lessons learned from a current research projekt.
Agenda 14.15 Welcome, Prof. Walter Kriha 14.20 "Rendering pipelines in games: How to render many lights", M.Sc. Clemens Kern 15.25 "Mein Weg in die Gamesbranche", Simon Häntsch, Iron Beard Games, Stuttgart 16.30 "Environmental Storytelling - warum Welten für sich selbst sprechen sollten", Alina Dreesmann, Leveldesigner, DECK13 Interactive GmbH 17.05 "How 'the serious' can enrich your game experience", Maximilian Krauss, HdM Stuttgart 18.15 Wrap Up