Text Error Correction System

Aidan Dyga, Ainslee Plesko, Grant Anderson, Issei Hasegawa

September 1, 2025

Introduction

  • Goal: Build a Text Error Correction System
  • Input: Original (un-mutated) text file
  • Output: Encoded text that can detect errors after mutation/transmission
  • Motivation: Ensure data integrity over unreliable channels

Problem

  • Problem Statement:
    • Data can be corrupted during transmission/storage
    • Need a way to detect (and possibly correct) errors in text
  • Challenges:
    • Random mutations (bit flips, character changes)
    • Efficient encoding/decoding

Example Solution

  • Encode each line
    • Every line is duplicated and encoded in a more secure manner to prevent mutation
  • Message Decoding
    • After the transfer each line is decoded for comparison
  • Line Verification
    • Each line is compared with its encoded variant and corrected where necessary

Python Implementation

Why is this a Tractable Problem?

  • This problem is Tractable because it can be solved efficiently in linear time with respect to the input size.
    1. Linear Time Complexity
    • The system runs in time proportional to the input size (O(n·L)), so it scales efficiently.
    1. Local Verification
    • Errors are detected and corrected per line using checksums and redundancy, avoiding complex global computation.
    1. Feasible Error Model
    • With simple mutation errors (like single-copy corruption), correction is straightforward and does not require intractable algorithms.

Limitations

    1. Homophones
    • The program may not be able to correct words that are spelled correctly but used in the wrong grammatically context.
    • Because they are spelled correctly, they may not get flagged by the checker.
    • Ex: ‘their’ vs ‘there’
    1. Numerical Errors
    • Because the corrector is based on textual errors, like spelling, numerical differences may not be caught.
    • Ex: ‘1889’ vs ‘1989’
    1. Incorrect Proper Nouns
    • Unless a proper noun is included within the program’s dictionary, it migt get flagged as an error.
    • A mispelled proper noun that spells as another word may also be missed.
    1. Structural or Logical Errors
    • Despite its ability to check at a sentence level, it would not be able to check coherance or flow of an entire paragraph.

Conclusion

  • Text error correction is essential for reliable communication
  • Simple codes can efficiently detect errors
  • Tractable due to low computational complexity
  • More advanced codes can also correct errors, not just detect