Saturday, July 25, 2015

The Importance of being DRY

When I have time I have fun solving problems from the Euler project. So far I have 15 problems solved, but I started not so long ago. The thing is that, as I noticed, the execution time from most solutions is not good according to the forums. Also here is an example of the discussion about the Problem 12, where the answer got 10 sec to execute, while my solution takes < 100 msec. And in overall I manage to write quite fast code in C#. Here are some results (problems 1, 5 and 15 were solved with pen and paper):

Problem 2 3 4 6 7 8
Time, msec 0.002 29.848 5.689 0.009 0.231 0.440

Problem 9 10 11 12 13 14
Time, msec 0.331 225.789 0.424 93.75 0.022 577.291

And, to be honest, I don't consider myself as a super programmer, even though I might be not bad at all, >20 years is quite a period. But from my perspective this is rather a problem of many (younger?) people to not understand/misuse few (in fact one) important things:
  1. DRY (Don't Repeat Yourself) principle applies not only to the design of software, but to computations as well. I saw very many examples of the code with two nested loops, where inner loop runs computations that can be easily done in the outer loop.
  2. Dynamic programming and look-up tables. In fact, these two are also implementations of the DRY principle, by reorganizing the algorithm in such a way, that the computations are made only once.
And this fact makes me wondering about the reasons, that ventually lead to not-so-fast code, which we all sooner or later have to deal with via slow UI, game engines, digital cameras, web-pages loading, image processing, and so on. Even well-known commercial software can use naive implementations of the algorithms, that can often be speed-up at least 2-3 times by purely algorithmic means, and without using parallel computing. After thinking a bit, possible reasons for this might be:
  1. Incorrect understanding of the programming as something that requires knowledge of technology/programming language rather than skills and good foundations in algorithms and data structures. There are lots of new and interesting programming languages and technologies, like Clojure and Hadoop, which are actively advertised and attract lots of attention. People tend to like languages, which are either trending or have an expressive power, which at times allows to implement tricky algorithm in 1-3 lines of code. What is forgotten here, is that learning a language is basically learning a syntaxis, which does not really improve the programming skills. It is like buying a new cool machine or tool to build a house, but if you are bad at architecture and finishing, this tool will not help much.
  2. Not enough attention to mathematics. Again not so many people nowadays connect programming and software development with mathematics, even though just 60-70 years ago programming was only for the people with strong mathematical background.
  3. Shift of the paradigm towards parallel computing. It is getting more common to think that to make computations faster, we need to have a parallel version. About 10 years ago computational clusters were very popular, nowadays GPGPU is adopted by many. Even though there are multiple cases, when this line of thinking is true, but this somehow shadows algorithmic improvements and organization of computations in a smarter way.
  4. Lack of fundamental knowledge in programming. How many younger programmers know about loop invariants and can correctly state them? They can discuss about Hadoop, Node.js, Clojure and other cool things, but not so many read "A discipline of programming" by E. Dijkstra. Even much more simple example: I was surprised to know that some programmers around my (!) age were not aware about binary mode of the file reading.
  5. Technological 'corruption'. The reality is that we have gigabytes of RAMs, gigahertz in CPUs, and terabytes on hard drives. This is more than enough for a huge number of problems, and allows not to care about resources consumption while solving assignments. And this might lead to 'reckless' programming, like the code for multiplication of 100x100 matrices which requires 10MB of memory. Nobody cares, 10MB are peanuts. But what would happen if the same code was written 15 years ago? And what happens, if the matrix dimensions grow 100-fold?
  6. Too big hopes on assembler language. It is not so rare to see on a project web-site or in the publication that 'critical parts of the code were written in assembler'. While being good by itself, assembler does not make wonders. Yes, it reduces the overhead, brought by compiler, due to tight control over all operations and variables. However if the underlying algorithm is not good computationally and there are repetitive calculations, they will stay in the assembler code as well.
Can this be fixed? Yes, of course, and I believe that there is more good code than a bad one. And hopefully this proportion will remain and the projects like the Euler Project are very helpful for this, because they make people to think about their code and skills by analysis of other solutions. And this is good :)

No comments:

Post a Comment