AI & legacy code refactoring

What is legacy code?

The term legacy code has many connotations and about as many definitions. All the connotations boil down to one thing one: the code is bad. Developers might say it ‘smells’, or ‘is not very pretty’, or other subjective terms but essentially legacy code is:

Difficult to read and/or understand
Difficult to maintain and/or adapt
Difficult to tackle with confidence and/or speed.

Code does not have to be old, which in software development is relative, to be legacy code. And legacy code may well be doing its job very successfully; for example, significant parts of the banking industry is relying on legacy code. Perhaps counter-intuitively developers are producing legacy code BEFORE any users actually experience the output if the code has the above characteristics. I would argue that producing code using the test driven-development (TDD) methodology avoids all of the above pitfalls, and therefore prevents developers creating more legacy code, but that is a post for another time.

The understandable desire to rewrite legacy code

When confronted with legacy code it is always tempting to throw it all away and rewrite it. Like the old joke about the traveller asking for directions to which the person responds with ‘I wouldn’t start from here’ wishing away the need to tackle legacy code (i.e. to start from somewhere else) is very tempting; however there are practical and business reasons why that option is not always available.

Business reasons are variations on cost and trust. The perceived complexity of the change (from the requestor’s perspective) is misaligned to cost (i.e. time) the change would take implement if rewriting the existing functionality were included. This maybe a poor or ill-informed perspective but that doesn’t mean the perspective lacks influence especially in larger and/or more hierarchical organisations. A related point is a lack of trust in developers to do it ‘right’ this time. If there is legacy code, what would stop the expensive rewrite from being legacy code too?

There are practical reasons too why rewriting the legacy code BEFORE implementing the change may not be possible. There is no guarantee that anyone in the company understands how the code is supposed to work (as opposed to how it is working), or they have the time or inclination to be subject-matter experts while the developers retrofit their understanding to the code.

Years ago, when I worked in the relatively young company (sub ten years old), we needed to amend the ‘calculation engine’ for determining annualised product costs and savings versus their existing or alternative products. The business founders were still in leadership roles, but everyone else from the time had left the business meaning no-one really knew what all the nested if-statements were for, whether the code order was important, or why calculations appeared to be repeated. There were slow, outside-inside Selenium automated tests but there were not granular enough to explain the code’s relative complexity. And, of course, there were no unit tests.

How can AI help?

Here is a fictitious example of legacy code with lots of inherent issues:

def process_order(order):
    if order['status'] == 'NEW':
        if order['type'] == 'ONLINE':
            if order['payment_method'] == 'CREDIT_CARD':
				# process credit card payment
                print("Processing online order with credit card")
			elif order['payment_method'] == 'PAYPAL':
				# process PayPal payment
	            print("Processing online order with PayPal")
			elif order['payment_method'] == 'BANK_TRANSFER':
				# process bank transfer
				print("Processing online order with bank transfer")
			else:
				print("Unknown payment method for online order")
		elif order['type'] == 'IN_STORE':
			if order['payment_method'] == 'CASH':
				# process cash payment
				print("Processing in-store order with cash")
			elif order['payment_method'] == 'CREDIT_CARD':
				# process credit card payment
				print("Processing in-store order with credit card")
			else:
				print("Unknown payment method for in-store order")
		else:
			print("Unknown order type")
	elif order['status'] == 'CANCELLED':
		# handle cancelled order
		print("Order is cancelled")
	elif order['status'] == 'SHIPPED':
		# handle shipped order
		print("Order has been shipped")
	else:
		print("Unknown order status")

Key issues with this code:

Deep nesting: there are multiple levels of nested if-statements, making it difficult to follow the flow of the logic
Repetitive logic: there is a lot of repeated logic, such as checking for payment methods in different contexts
Difficult to maintain: adding new conditions or changing existing logic is prone to errors due to the complex nesting.

If I were being brutally honest this code is not even that bad - it is bad, but I have seen far worse with methods spanning hundreds of lines of code, with nested try...catch logic, and early returns to exit the method from multiple parts of the function. Probably not what any one developer wanted, or set out to write, but the accumulative effect of multiple 'tactical' decisions or 'temporary' changes.

First steps

The code must include automated unit tests before any kind of refactoring can be undertaken. The AI suggested, interestingly, to 'slightly refactor' the code, which seemed premature and at odds with the aim, but it probably reflects the code's untestability (is that word?) of the code.

The comment ("# process credit card payment") was replaced with a return statement with similar wording that can be tested.

def process_order(order):
    if order['status'] == 'NEW':
        if order['type'] == 'ONLINE':
            if order['payment_method'] == 'CREDIT_CARD':
                return "Processing online order with credit card"

...

class TestProcessOrder(unittest.TestCase):

    def test_new_online_credit_card(self):
        order = {
            'status': 'NEW',
            'type': 'ONLINE',
            'payment_method': 'CREDIT_CARD'
        }
        result = process_order(order)
        self.assertEqual(result, "Processing online order with credit card")

...

Replacing all the comments with testable return statements (easier in a dynamic language) enables the code to be restructured into multiple, smaller functions that supports simplifying the logic flow. Separating out the competing order statuses allows for easier support of new statuses, like 'declined' or 'recurring', without the increased cognitive load of being mindful of the other statuses at the same time required with nesting.

What are the limitations, if any?

The AI-suggested refactoring did a pretty good writing unit tests for the fictitious code example, even if the unit tests required code changes first which is not ideal. For longer or more obtuse code, AI could well help produce the first level of understanding to empower developers to feel confident enough to tackle the code. AI could increase productivity but it won't revolutionise legacy code refactoring because it won't explain why the code is shaped like it is, are the abstractions at the right level and/or the same level, and the wider point about whether the code is required.

The missing piece from many refactoring discussions is the need to know 'why?' code does something not just 'how?' the code does something. The why questions extend beyond discussions of code semantics, naming conventions, layout, etc to understanding the business domain, end to end user journeys, and machine and human dependencies extending outwards or inwards to the application itself. Until AI can engage stakeholders, make inferences, or take calculated risks (like turning something off to see who moans) then the limits will remain limited to practical, time-saving leaps of understanding.

Conclusion

I would like to explore the use of AI for refactoring legacy code more to validate the above because it is not without value; however, there are still limits as to how much AI can shorten the process. Getting to testable code as quickly as possible enables developers to iterate beyond the mess of the code and focus on delivering business/customer value. Every professional wants to feel their work is valuable, important, and necessary so the more AI can replace 'grunt' work, the more every professional will benefit.