Software Archaeology

Human civilization history dates thousands of years. Archaeologists strive to investigate fragile clues of former cultures in order to better understand the past. It is not easy to do because many artifacts are partially or completely destroyed.

It would be a mistake to think that only archaeologists deal with information loss. It is as easy to lose information about systems and organizations which were built quite recently, just a few years ago. In “Institutional memory and reverse smuggling”, an engineer tells a story about a petrochemical company where knowledge about plant design and processes was lost after decades of operation, and they had to bring in a former engineer to smuggle the knowledge back to the company.

Legacy code

This problem of knowledge loss is probably much more actual in software engineering. Successful software, especially enterprise software, can run and evolve for decades. COBOL programs written in 60-s and 70-s are still running on IBM mainframes. Linux core was released 24 years ago, and Microsoft Word is 31 years old. As average tenure of an engineer in a company is somewhat between 1 and 5 years, most of software engineering is brownfield engineering: applications are maintained and extended by those who were not initial developers. High-quality code and good documentation can help to make maintenance easier, but where did you see high-quality code and good documentation? Hence engineers are doomed to be archaeologists: dig the source code and data or even reverse engineer binary code to find business requirements and system design, and then check how these requirements and this design conform with current business needs.

When rewriting is not an option

COBOL is not cool anymore, and mainframes should give way to cloud infrastructure long time ago. Why do we still support these old systems? Because they work! Thousands of man-years and millions of dollars were invested into these systems, and it is too costly and risky to replace them. In some areas, like this 20 years old code where one line of code can bring down Air Traffic Control system, code changes are especially risky.

It does not mean that software once written should be maintained forever. Some factors can make engineers consider rewriting the code, partially or completely.

  1. Death of a company or a branch of a company that uses custom software.

  2. A vendor of proprietary software, you depend on, stops supporting it, e.g. Windows XP extended support ended in 2014.

  3. Hardware platform stops being supported. For an example, IBM could stop selling/supporting its mainframes. Currently moving to a mobile platform impacts many software applications, Steven Sinofsky wrote a lot about this platform shift.

  4. Significant change of functional requirements. This factor probably becomes more and more significant, as a pace of market change grows.

  5. Significant change of non-functional requirements, e.g. users need the system to provide web interface, whereas currently it works as Windows desktop application, or software does not scale well.

  6. Maintenance cost gets high. E.g. supply of COBOL/Fort/C++/Perl engineers is shrinking, their salaries are rising, and at some point it is more cost effective to rewrite the system from scratch.

If put aside factors which we have little influence on, all get up to comparing a cost of maintenance with cost of rewriting.

Driving factors for expensive maintenance

If there are so many reasons to rewrite software, why we still have 50 years old COBOL code and 20 years old Java code? One of the reasons could be that applications are usually not designed to be easily rewritten. If you have a monolithic application with 1 million lines of code, which took a few years to build, rewriting it will probably take another few years, and probably will reintroduce some bugs which have already been fixed in previous system, and add some new ones.

It is much better if a system is modular, and you can rewrite it one piece at a time, hence reducing risks. Probably when microservice architecture is mainstream, we will have more healthy software systems. Finally, we should confess that we are not able to predict the future and design a system to operate more than 3-5 years. Let’s just plan that in 3 years it can be cheaply thrown away and rewritten according to new business requirements and state of the art of software engineering.

We are not there yet

In the meantime, we have to be archaeologists. And we can learn how to do it effectively.

Usually, the first thing to understand how the system works is to search for documentation. If there is up to date documentation, it is perfect though it is a rare situation. If there some documentation, it can be helpful too. Be aware that it is probably not up to date, think about documentation as a set of stories about different versions of the systems in the past.

General software design experience helps a lot. It is likely that system design conforms to some widely used patterns, and you can recognize these patterns. For example, if you identify layers of the systems as a front-end application, public RESTful API, back-end message queue based services and relational database, then you can visualize control and data flow through these layers, and research how each of them processes some kind of request you are interested in. Observing high level architecture is usually more efficient than digging right into low level code, especially if code base is huge.

If you have specific questions, ask around, maybe someone knows the answer. Remember who are experts in which areas of the system.

Take notes while digging. Draw diagrams, they help to understand dependencies between subsystems. Publish these notes and diagrams to establish new, probably better, but certainly more up to date, layer of documentation. It will help to make life easier for future archaeologists.

Get tools which help to do full text search in the code base. Often it is faster to search for name of a class or a method in the whole codebase than navigating through the code in IDE. You can have some special software which indexes code, or use simpler tools like Search In Files functionality in your favorite file manager. It is good to have codebase on SSD drive to make search faster.

Use database diagramming tools (as those built into SQL Server Management Studio) to understand DB schema. You can also use reverse engineering tools to generate Class Diagrams, Project Dependency Diagrams and things like that, to visualize code structure. There are such architecture tools in higher editions of Visual Studio, and some simple architecture tools in latest versions of ReSharper.

 

When you get the knowledge of the system, remember: with great power goes great responsibility. Drive system design to the bright side, make it easier to understand, easier to change, and easier to throw out and rewrite again :)