Sunday, October 15, 2017

Eventual consistency does not exist

There's a lot about reactive systems and reactive coding nowadays. It's very hot and very trendy and there are bunch of new cool technologies related to this. As we are being led to think reactive platforms are fundamentally built on an Event Driven architecture. There are several new platforms that are evolving from this pattern and much new fun things are being done with it. However there are lots and lots of FUD and dis/misinformation.

Event Driven architecture differs from what one might be used to if you worked with request/response systems. It's mental picture is about orthogonal towards request/response in terms of execution. With a request/response system you lock each resource by calling them and making them hold until you are done. This creates a "synchronization" because all the required systems in that call will be updated at the "same time". This is not entirely true if there are

I don't think reactive can and will work as a multipurpose platform. I think most of these advocates of this pattern are selling something where, you as a developer, will have to either stick your head in the sand or quit your job to not to have to clean up the mess that are created by these systems.

Everyone always tries to give you the "silver bullet" to all of your problems. If there would be an universal system to everything, we would be out of a job.

Eventual consistency does not exist in the real world. Here's why.

The short version: A system can't be eventual consistent if there are no-one there with the information to be able to tell if the system is consistent or not. And if there's an observer that can tell that it's consistent, it's immediately consistent because the observer have the correct (consistent) information to be able to decide it. (I'd like to think this of Schrödinger's cat scenario, just the moment after opening the box.)

Something is only eventual consistent if there's something transitioning between state and there's an observer that can deal with that whenever it looks at that particular state immediately knows how to react properly to that consistent state. If the observer, by definition, know that the system is thoroughly consistent, meaning that every event that have happened until this point when the observer is looking at the system, all have occurred in order in relation to each other, in a predetermined (sequential) order the system will be consistent, this means that the system has to be immediate consistent at some time and any other state is not consistent. The mutual arrangement can be relaxed if and only if the producers have no dependency on each other.

An example of this:

You have two number generators A and B. The assignment is to have A and B produce random numbers for the consumer R. Random per definition implies no mutual arrangement between a number produced by A and B, A and A and B and B. Two numbers generated by A have no mutual arrangement and no dependency. This system could be called "eventual consistent" because there's no state (mutual arrangement) kept in the system, however because there's no state to share, there will be no consistency because consistency is undefined (as an observer I don't know what state the system should have). That means that R, our consumer, can't tell anything about the system (there is no consistent state). To be able to define something to be consistent you need to know the outcome at any given sample, which by definition, you can't know of randomly produced numbers, ergo this system is inconsistent and not eventual consistent because we don't have enough information of what the state of the system needs to have to be consistent.

There's another system here as well, which is unrelated to the system of producing random numbers, and it's the rate of produced numbers. This is, in it self, could be consistent/not consistent, but it's unrelated to the above example. This problem is not eventual consistent, but immediate consistent if we care about the producer/consumer rate.

Note that this example is not the same if there would be only one generator and it falls into a completely different category.

Second example
You have two number generators A and B and the assignment is to have A and B generate a sequence of numbers. A sequence of numbers implies that the sequence, per definition, have some sort of order, in this case the natural order. We can solve this problem in several ways and still achieve the correct (consistent) result, one such way is having A and B both produce numbers and we will have a receiver R. Receiver R here must now somehow keep information (state) about what and when A and B have done their job. Because A and B doesn't keep their state shared R needs to interpret the supposedly state to have correct and in consistent state in relation to everything else that is happening in the system. The only part of this system that is able to say if the system is consistent is R because R have the definition of something that is correct. A and B because they are not sharing state doesn't know what state the current are in except their own.

So if R now able to tell the state of the system or in this case "eventually" state with R = {1,2,3} does R know that there will be a 4 number coming? Well if it expects a value the state R = {1,2,3} is invalid and therefore we don't have an "eventual" state and we need to wait to be able to tell that the state is correct. If R = {1,2,4} we know we are in incorrect state since we are expecting a 3 before 4. And for the extreme if the correct state is all natural numbers, we'll never be in any consistent state (it's not defined) because we'll never reach consistent state. If we define that 3 is the right state because the nature of natural numbers, we are immediately consistent because its 3. I want to point out that there are information here not described by the system, which is the "natural order of numbers". This is an implicit requirement because R cannot ask A or B for 4 because R have no idea if 4 is actually sent and therefore might have a possible incomplete picture of the situation. This could be entirely correct though and not an issue at this time.

The presence of synchronization.

Some people compare "eventual consistency" with the real world, and comparing for example the Sun's light beams which are "eventually consistent" with how you perceive the sun's position i relation to how you see the sun. The comparison is about that the light takes 8 minutes to reach your eye after it left the Sun's surface and this is "eventual consistency".
There are fallacies with this analogy.

The difference between the real life scenario and a computer system is that in real life, time will synchronize ALL events. No events can happen out of order (in relation to time, even relative time) and this leaves the system, in this case the Sun, sunlight and you, to be consistent. So the real world do have a synchronizer and this works at all times, and this makes the reader of the system always read a consistent state no matter how or when it observes the information. Not only this, that computers systems can fail, and they do quite frequently, and the weird part, this is also something that is emphasized by "eventualists" that one should embrace system failure, though there again no guarantees that the system will be consistent. The real world events just don't crash with an exception or suddenly produces, out of thin air, "compensating" events (that really never happened) that undo everything that just happened.

Systems can seemingly have the property of consistency or "eventual consistency" but ironically this is harder to maintain (unless there are specifically a synchronizing factor) in a system where the load is high. And this again just contradicts the claim made by "eventualists".

In a discrete world, where computers are running, the guarantee of synchronization is synthetic and has to be designed and accounted for. There are no guarantees, unless explicit designed, that this is the case, and this is much more acute in a system with several machines and worsens with the amount of them. This is particularly true in an asynchronous environment where the "eventual consistency" idea is somehow the norm, it should be the other way around. They also suffer, because of the discrete properties, that events can be duplicated or have to be undone. This can then resolve in that the observer sees partial results.

More important is that, most often you can find a solution to a specific problem where "eventual consistency" is strongly or "good enough" consistent because you can guarantee the ordering of things happening but I do think that there will be other newer problems (requirements), after you came up with the initial solution, where this breaks apart.

The fallacy here is when comparing a discrete resolution with a continuous one. These components are everywhere in and here's some examples.

* Threads execution is a continuous resolution
* Threads are when sharing data, a discrete resolution
* Queues are discrete resolutions, even when run in a thread, if they leave the thread's "compound" like being run on several computers
* Networks are discrete resolutions
* IO are discrete resolutions
* Thread pools and thread priorities are discrete resolutions in relation to other threads
* Machines in interaction are discrete resolutions

Everywhere there are discrete resolutions, these are the places where "eventual consistency" can and will go wrong since there are no actual order of events. A continuous resolution, like an isolated thread (no outside interaction) is continuous and therefore always consistent.

An simple example where events could be out of order.

Consider a system A which produces a sequence of natural numbers. Because you want to produce and deal with these numbers in a scalable fashion, you have two queues (these are everywhere, like a OS, loadbalancer or a network card), and for the sake of this example, they are run on two different machines. You have then a event source which records these events to create a "source of truth". Now the arrival of the events, which would be 1,2... will probably arrive in order they are produced if the queues are equally fast. But if any of them will be slower this won't be the case anymore and therefore your event source will be corrupted.

Also if you have a huge amount of traffic, like LinkedIn, there's no way of telling if the data is correct due to the sheer amount of data coming in. You won't be able to look at all of that data and think that, that particular event is out of order in relation to another event. Not it might not matter for LinkedIn, but for other areas like money transactions or trading, its a huge deal and even illegal (since you need to be able to account for all transactions in you system, and if they are in the wrong order and you made decisions on that you have effectively corrupted your reality).

Focus points
The major problem with "eventual consistency" is not about the writing part, which could be mitigated, but the reading part where a system can make a decision on values that are based on two different values.

I want to conclude this with that I recognize that some problems and solutions can be "eventual consistent" but Ill argue for that those are, as in the case with the sun, immediate consistent because the reader doesn't know anything else. However to design a solution to be exclusevly "eventual consistent" is just stupid and a waste of your employee's or customers money.

Saturday, July 30, 2016

Over engineering with CQRS pattern

I don't understand the CQRS pattern and the hype about it. Technically I believe I think I understand why people think why they want to use it.

But really? Do we really need a complete pattern of explaining that I'm writing or reading to a source? Is reading code so hard that I have to "encapsulate" the specific behavior of writing something to a database? What's wrong with update/insert?

I think we're overdoing simple things which are NOT hard and going full blown nuts with technical "solutions" which are already solved. It's incredible over engineered for something simple and its doesn't hold or solve the "benefits" of the pattern. Problem as I see it is that one is trying to create encapsulated behavior but thing is that you can't encapsulate behavior only technical things which solves it.

Do I really have to say SaveXCommand? Isn't SaveX enough? Are we that stupid that we can't figure out what SaveX means? Are the problems people working with so simple that we can only divide the in a query or a command? Really? Why stacking a http post on a command? I already KNOW it's a post! Or even more unbelievable I've seen solutions using a post and a command at the same time by including the command in the URL! What the hell?

Why stopping at creating commands or queries for a supposedly database? Shouldn't we consider writing or reading to memory a command or query? Wrap every other memory access in a command instead of simply doing an assignment? Heck you might be writing to the disk shouldn't you be writing a command for that? Hell now Samsung and Intel releasing hard drives which are 100000 faster than the current SSDs. We might ending up with a lot more closer to doing memory swapping because it's cheaper and "memory" might end up in a database.

Not only that it seems like there's a lot of different ways of implementing it since there are no real clear rules about it apparently. I've seen people trying to implement it and it always ends in with them believing the other as an complete imbecile. Why is that and why the damn effort? Making a pipeline or a "bus" instead of treating a thread or task as such and work with them instead? The cqrs pattern creates incredibly polluted code with technical details all over the code base and it smells so bad I choke on it.

Imo it's trying to solve a problem which is already solved and stop doing that.

Thursday, July 14, 2016

Are the GoF patterns dead?

I read this post on the topic on getting rid of GoF patterns. I think the author is right about that patterns are bound to a paradigm and it is so that patterns in one language might not be as "effective" in another language, even though they are in the same paradigm. I can really relate to this.

But I think patterns should be considered dead because I think people try to pick a pattern and force a solution into it, instead of solving the problem and then and only then identify the patterns, but by then one could argue that there's no need of doing that.

Also if you are working in an agile environment things change, and if you build something it will change over time, and hopefully it will change rapidly. If you invest a lot of time building things to conform to a pattern and the next iteration the first thing you have to do is to tear it down again, I'd say it's a waste of time and effort. To use the design patterns you have to understand the problem and it's quirks and when you have only a vague idea of what is needed to be done to solve the problem, "designing" with those patterns before the solution is known will set you off on the wrong course and put you in a situation where you are trying to implement a pattern and not a solution. Also even if you know the solution and how it should be done, you should solve the problem not the pattern, which is happening most of the time,

Thursday, May 5, 2016

Trouble with maven-jaxb2-plugin and Eclipse?

Does your build work with plain mvn but won't build in Eclipse (Mars.2)? Does the project fail with:

"Execution default of goal org.jvnet.jaxb2.maven2:maven-jaxb2-plugin:0.13.1:generate failed: A required class was missing while executing org.jvnet.jaxb2.maven2:maven-jaxb2-plugin:0.13.1:generate: com/sun/xml/bind/api/ErrorListener"
The reason for this is that there are two dependencies;

org.glassfish.jaxb:jaxb-xjc:jar:2.2.11

and
org.glassfish.jaxb:jaxb-runtime:jar:2.2.11


have a "malformed" parent pom. If you check the Eclipse error log it may have an error message like:

"The POM for org.glassfish.jaxb:jaxb-xjc:jar:2.2.11 is invalid, transitive dependencies (if any) will not be available, enable debug logging for more details"
The reason for this is the following section from their shared parent; com.sun.xml.bind.mvn:jaxb-parent:pom:2.2.11 :

   <profile>
            <id>default-tools.jar</id>
            <activation>
                <file>
                    <exists>${java.home}/../lib/tools.jar</exists>
                </file>
            </activation>
            <properties>
                <tools.jar>${java.home}/../lib/tools.jar</tools.jar>
            </properties>
        </profile>
        <profile>
            <id>default-tools.jar-mac</id>
            <activation>
                <file>
                    <exists>${java.home}/../Classes/classes.jar</exists>
                </file>
            </activation>
            <properties>
                <tools.jar>${java.home}/../Classes/classes.jar</tools.jar>
            </properties>
        </profile>
        <profile>
            <id>default-rt.jar</id>
            <activation>
                <file>
                    <exists>${java.home}/../jre/lib/rt.jar</exists>
                </file>
            </activation>
            <properties>
                <rt.jar>${java.home}/../jre/lib/rt.jar</rt.jar>
            </properties>
        </profile>
        <profile>       <!--todo: remove me-->
            <id>default-rt.jar-mac</id>
            <activation>
                <file>
                    <exists>${java.home}/../Classes/classes.jar</exists>
                </file>
            </activation>
            <properties>
                <rt.jar>${java.home}/../Classes/classes.jar</rt.jar>
            </properties>
        </profile>
where the incorrect java.home variable causes those two dependencies to be not loaded so the jaxb2 plugin won't have the classes to perform schema generation.

The maven environment resolves java.home environment variable from eclipse's JAVA_HOME environment variable which, if eclipse is run with a JRE, points to a non existent library.

To fix this you need to set the JAVA_HOME Environment variable. However it seems like the authors have had their JAVA_HOME point to the <path to JDK>/bin folder so in order to reach the jre and lib folder they had to use .. to get there. However the proper way to set up JAVA_HOME is the root directory of the installation. So either way you'll break something.

To fix this in turn you have to put the following parameter in eclipse.ini after --launcher.appendVmargs:
 
-startup
plugins/org.eclipse.equinox.launcher_1.3.100.v20150511-1540.jar
--launcher.library
plugins/org.eclipse.equinox.launcher.win32.win32.x86_64_1.1.300.v20150602-1417
-product
org.eclipse.epp.package.jee.product
--launcher.defaultAction
openFile
--launcher.XXMaxPermSize
256M
-showsplash
org.eclipse.platform
--launcher.XXMaxPermSize
256m
--launcher.defaultAction
openFile
--launcher.appendVmargs
-vm
<path to JDK>\bin\javaw.exe
-vmargs
-Dosgi.requiredJavaVersion=1.7
-Xms256m
-Xmx1024m

You should use the JDK runtime since it needs the tools.jar and it should be the absolute path.

To check the java_home parameter you can go to Help->About Eclipse->Installation Details button->Configuration tab and look for the java.home parameter. It should be pointing to your JDK installation.

If you only have the JDK installation on your machine you probably won't see this issue. But if you, as I have, a notorious IT department it might happen they install the JRE and your Eclipse installation will run with the JRE version instead for the JDK one. It won't help to put the tools.jar in the project since the plugins runs in a different classloader to prevent them to mix classes and versions with the project they are running in. So the plugin will still fail. Hope it helps!

Monday, April 25, 2016

Frustration

There's so much frustration in the software industry, not only that it tends to get religious about it as well. Sometimes fiercely sometimes in a sober fashion, but mostly its fierce. Sometimes its between languages and sometimes its between platforms.

I recently found myself in this situation where I changed job to a .NET shop, coming from a basically "use whatever works" (in terms of platform) to a proprietary, or more properly, constricted environment. It's working with a straight jacket and whenever you turn it tightens its grip around you until there's no other way of doing things. Not only that you are bound by the "physics" of the .NET and Windows platform, it soaks into your mind as some sort of brain wash. I'll get to that a bit later.

I started checking out C# and initially I thought "This is Java done right" and then I realized that it's even more than that. I had all these things which were really nice like LINQ, delegates even async/await which seemed like heaven initially. I finally got a language which I could use with really powerful tools without a fuss. I really thought I've found something which would make me happy and this shiny new tool would make me forget the "sorrows" of the JVM echo system.

Disappointments

I immediately started to check out frameworks to see what there was. I checked if there were ports for the most usual frameworks... and found fairly little, Not only that those I found they were not as "good" as their counterparts (YES there are good ones) there weren't much. I turned to the platform itself and found that most of the time you simply use the ones the platform had. Fair enough.

Most projects developed in this shop are developed individually by contractors which are hired whenever they are needed, and not only one person have worked on a single project. There have been several system overhauls where the code have been rewritten to suit the products, and supposedly it is in a state which would qualify as "good". Quite nervous to join such a professional shop, I eagerly dug into the code to see what I could expect and what I though would be expected of me.

I was stunned, I really couldn't believe it so I had to ask. And from one of the more senior and one of those which had worked there the longest I got this blank stare as not understanding what I were talking about. It was a freaking mess. I thought it might just that part of code I was looking at, but there were more, there were those patterns you'd expect and whatnot but all these "tools" just made it somehow worse. I can't really explain it more thoroughly but the code just were f*ked up.

Somewhat dis encouraged I started coding on my own, firmly determined to do a good job. I resorted to every tool I knew of to produce good code (and provided a solid baseline for coding in other languages). A few good frameworks like XUnit came to the rescue (damn that's a really nice framework) and I started doing quite good pace with features. Though then things started to turn ugly, most notably Dispose and the life cycle of objects. Why did they do this? It gets really ugly not only considering thread safety. Not going through all that but some of it can you read here.

Then It came to the solution part of the "ecosystem". It was just Microsoft's things. And really I do know that you can do things without it but it just sort of never gets right. You either do one thing but then you cannot do the other. And when you need help it's really hard to find out what you need to do to get things working. And there's the "easy" solution which is always the wrong one.

I do recognize that you can write good code with C#, its not that. It's the straight jacket which limits the possible solutions. And this somehow restricts the solution space of the developers. You can solve a lot of things bu using LINQ, but it's so easy to actually misuse it you don't see that you actually just did that. LINQ is a perfect example of "too much abstraction" which it seems like a viable tool to use but it really is the wrong one, though its so easy to make that decision and that's why I think the code gets so messy. There are too many tools which are easy to use so you are using it for everything, and you don't see the alternative, and suddenly its NOT the right tool anymore and suddenly you are in a dark corner with no way out.

Additional problem is that you start thinking like it as well, since you become blind for other solutions. Though I've fallen into that trap a few times myself, I feel it like its a real problem for C# developers or at least that's my experience. I don't blame the developers here, but the platform.

Quite frankly I think working with C# taught me that it's good to work with different platforms, though .NET is not the one, at least not for me. Well maybe when the Windows become Linux or any derivatives like such.




Saturday, March 21, 2015

Why TDD is good for you


I must confess, I've realized that TDD is a must have in a project. Though I don't even like TDD, right now there's no better way of developing software. Writing software is like solving a perverted salesman traveling problem where the destination keeps changing and somehow the paths you took start to make less sense as the software progress. To know if you ended up with the best solution, you'd have to build all different versions of the problem to know if that's the best way of doing it, and that's not a viable idea.

Given that you have different developers with different backgrounds you must have something they know that somebody else is producing something accordingly something they know will have the same structure and background. And you'll have to keep doing it throughout the project and if you stray off the methodology you'll end up with inconsistencies in your code. And really in a project, you don't want to spend too much time (or any) discussing technical details related to pure code.

What I don't like with TDD

One important aspect of TDD is that I think TDD gets credit for is things which is TDD cannot solve. One thing is that it somehow creates layers, and this I think is deducted from the fact that TDD help with creating better abstractions, and that is a feature which is really nice with TDD, but it's really a side effect of practicing TDD. And it's not really TDD's fault for not being able to build layers because abstractions cannot be totally abstract, because if they could, that would mean that you could send in any data (or none) and get something which is exactly the data you want, which would be an impossible thing to do. There's only one thing that could pull that off, and I doubt that it actually exist, although there are many believers out there.

Also TDD does different thing to different languages so you get more from TDD in certain languages where others benefits less of it (one could argue that those languages which needs more TDD practices are bad languages).

TDD will influence your code and therefore your solution, and this will inevitably to “test induced design damage”. This means, as Conway's law states, your code will be tainted by TDD and code written with TDD in mind will be easier to integrate in a TDD project. That also means that code which is NOT written according to TDD will be a hard to fit in a TDD project (and no it's not about framework's abilities to be integrated). That also means that trying to use TDD in a project which is not started as a TDD project, will be very hard to start using the TDD practice. Most of the time, TDD projects are not being consistent and this will hurt you in the end.

Also I think TDD is showing that your language is failing describing what you really want, and you need to rely on something external to somehow verify that you have written something which is correct accordingly to your understanding. Instead of having TDD as some sort of “document”, I'd rather have the power of have all those assertions expressed by the code. I usually consider tests which are large a code smell since they give away that the code either does too many things or the code is not expressing enough intention or is not powerful enough.

But most importantly TDD creates some sort of focus on tests and unit tests, where TDD is not about those. It's about dealing with information and always confirming to that information and the test case is about verifying this, an sort of implementation of the TDD abstraction, but also we should be able to get rid of it.

If one consider that TDD is language and tech agnostic, meaning we need TDD to have a framework to actually deliver working code, the amount of work needed to verify your code should say something about the chosen language. If the language requires a lot of test cases to verify that you did something you intended to do would mean that that language is a poor choice. I'm not going to point on specific languages here and I leave this to some future discussion.

I really hope that we one day can get rid of TDD, but as for now, there are simply no better ways to write software.

Saturday, January 3, 2015

Good separation of concerns

Separations of concerns are really important when writing software. It's tightly coupled with working and correct code and it might not be obvious at first glance. One of my personal views of this is why non typed languages are not a good choice for anything longterm, types are crucial for good separations of concerns. I've worked with typed languages in large projects and they showed that even when using types its hard to keep things separated, although its not impossible achieving good separations using a non-typed language and most of the time it just breaks down. Just the fact that you need TDD to "verify" your code is a clear indicator of this.

An example of erroneous mixing of concerns:

val print_info = function(x){
    console.log('Variable x is of type "'+typeof(x)+'" and have the value of "'+x+'");
}
var x = "123"; // Variable x is type string with value "123"

print_info(x);

x-=0; // Variable x has now type number with value 123
 
print_info(x);
Output is;
Variable x is of type "string" and have the value of "123"
Variable x is of type "number" and have the value of "123"

The above example is a really simple but important aspect on mixing concerns but most of the time these things are more subtle and not so obvious.

There are several pitfalls when designing code and knowing when you are making good decisions when building software. One good rule is the "Gun rule" which is quite simple:

A modern gun today has very good separations of concerns (although a very despicable piece of technology). Most notably you have the bullet as an example on excellent separation of concerns. You can manufacture bullets separately but still deliver functionality, there are even room for making modifications to the bullets without needing to change the gun. Obviously there are certain factors you can't change without changing the gun, such as size.

One other factor is that a gun is useless without a bullet and a bullet is equally useless without a gun, so in functionality they are tightly coupled. For the gun to work you need to deploy the bullet with the gun. And this is a good indicator how they should be deployed, they should be deployed together. If you need different release cycles for them, you should separate them into two deployments artifacts, but they should share resources. This is really facilitated by Java VM by using dynamic class loading (one really good feature but for some reason not very well understood), other technologies might have problems with this and might require a full restart.

If you now equip the gun with a scope or perhaps a laser pointer this sure makes the gun better, but it is not entirely necessary for the operation of the gun. The gun will work with and without those additions and they are good separations of concerns by themselves. These are candidates for deploying on their own.

One misconception is that just because you need a different release cycle or you have identified a module with good separation of concerns, you need to deploy it on a separate instance. With the gun as an example; having a gun in one hand and the bullet in another doesn't render it more useful or more modular, though in fact it seems like a good idea and adheres to certain architectural ideas. If this idea should be brought to an extreme you should deploy each class in a separate runtime, but that however doesn't make it more modular or better.

You should look for those things which are possible to remove, but still maintain functionality. In fact being able to remove whole blocks of functionality without impacting function is a good indicator of good separations of concerns (adding them is the same). If you have to tear something apart is an indicator its not separated enough.

There's also another thing which is overlooked with separations of concerns is that too much effort is spent on making abstractions. So much abstractions actually harms your separation of concerns, everything is so abstract you have really no idea what happens.

As an example; instead of using a specific object to be able to "tunnel" data through layers you decide yo use a map like this:
interface SomeInterface {
   public abstract void someMethod(Map<String, String> map);
}

This is convenient because you could now cut through anything just because Map and String are both in a library which happens to be global. Now you can also bunch things together, which modularly, shouldn't be together and more important there's nothing that stops you to add more things which makes no sense at all. Fortunately in Java one could do this instead:
interface SomeInterface {
   public abstract void someMethod(Object map);
}
And then cast it to the Map whenever you need that information, but um, that kind of defeats a lot of things. Not only you lose the typing you also lose the intention and the function of the data. And when you loose that information, you also lose separations of concerns because now you don't know where you separations starts and where they ends.