DetectEquivalentPrograms Decision Problem

Ainslee Plesko, Coltin Colucci, Alexander Goddard, and Preston Smith

September 8, 2025

Introduction

Goal: Build a Equivalent Program Detection System
Input: Program 1 and Program 2
Output: “Yes” or “No”
Motivation: Program runtime efficiency

Theory of computation

What is theory of computation?
- Understanding what can be computed
- Analyzing computational complexity
- Proving limits of computation
- “Proofgrammers” combine proofs and programming

Equivalence Detector

How does it work

detect_equivalence(P1: str, P2: str):
- get_syntax_tree(program_one, program_two)
- standardize_naming(source: ast):
- def same_string(str_one: str, str_two: str):
- def same_error(error_one: Exception, error_two: Exception):

Takes two programs as strings and returns yes if it finds there equivalent no otherwise

Standardize Naming Function

def standardize_naming(source: ast) -> ast.AST:
    """
    Standardizes all variable, function, and class names in the AST to numbered placeholders.
    This helps ignore differences in naming conventions when comparing program structure.
    """
    used_name = {}  # Maps original names to standardized numbered names
    name_id = 1  # Counter for generating new names
    # Walk through all nodes in the AST
    for node in ast.walk(source):
        # Standardize variable names
        if isinstance(node, ast.Name):
            if node.id in used_name:
                node.id = used_name[node.id]
            else:
                name_id += 1
                used_name[node.id] = str(name_id)
                node.id = str(name_id)
        # Standardize function and class names
        if isinstance(node, (ast.FunctionDef, ast.ClassDef)):
            if node.name in used_name:
                node.name = used_name[node.name]
            else:
                name_id += 1
                used_name[node.name] = str(name_id)
                node.name = str(name_id)
        # Standardize function argument names
        if isinstance(node, ast.arg):
            if node.arg in used_name:
                node.arg = used_name[node.arg]
            else:
                name_id += 1
                used_name[node.arg] = str(name_id)
                node.arg = str(name_id)
    return source

Output of Standardize Naming

Input

def calculate_sum(numbers):
  toal = 0
  for n in numbers:
    toal += n
  if toal > 10:
    print("Large sum")
  else:
    print("Small sum")
  return toal

calculate_sum([1, 2, 3, 7])

Output

def 2(3):
    4 = 0
    for 5 in 3:
        4 += 5
    if 4 > 10:
        6('Large sum')
    else:
        6('Small sum')
    return 4
2([1, 2, 3, 7])

Detect Equivalence Program

Tractable vs Intractable vs Uncomputable

Tractable
- “can be solved efficiently”
- computable in theory AND in practice
Intractable
- “method for solving exists, but is hopelessly time consuming”
- computable in theory BUT (maybe) not in practice
Uncomputable
- “cannot be solved by any computer program”
- NOT computable in theory or practice

Tractable vs Intractable vs Uncomputable (Part 2)

Tractable (in restricted cases)
- runs in polynomial time
  - “The complexity class Poly is the set of computational problems can be solved by a Python program with running time in \(O(nᵏ)\), for some \(k≥0\).”
Uncomputable (in the general case)
- runs in polynomial time
  - as the input grows in complexity, so does the program complexity
  - depending on input (specifically with infinite input spaces), program will become inefficient to run

Limitations

The major limitation of our program is that the programs need to have the same structure
- If the two programs have different structures but will produce the same results for all inputs we would have a false negative
If you change the inequality of program P1 from \(toal > 10\) to \(10 < toal\) you would now get a negative even though the programs are still equivalent

Why It’s Not Possible in the General Case

There are an infinite number of ways to structure a program that will get the same results so it’s unfeasible to make workarounds for structures being different but equivalent.
It’s not feasible to run every possible input through both programs as it would take an infinite amount of time to get through an infinite amount of inputs.
If an input never halted on one of the programs you would never be able to compare the output with the other program.

Conclusion

detect_equivalence works by running both programs on the same input and checking if they both produce the same output or fail in the same way.
This approach only works for restricted cases, in the general case determining program equivalence is uncomputable.
Our implementation works for simple programs where all possible inputs can be tested but it cannot handle complex programs with infinite input spaces.
This challenge demonstrates that while tools can help with specific cases, fully determining program equivalence for all programs is uncomputable.