ShorNet

ShorNet is a a multi-layer perceptron for predicting two prime factors from large 1024-bit hex numbers. It was conceived as a function approximation of the Shor Algorithm for prime factorization intended for theoretical future quantum computers.

Background

ShorNet was conceived as a project for learning about neural networks and cryptography and for testing the abilities of the Macintosh Studio M3 Ultra. It was inspired by the success of Google Deepmind's AlphaFold, which exhibits high effectiveness at predicting protein folding, which was similarly thought only to be solvable by mega- to giga-qubit quantum computers.

Sample usage

import torch
import torch.nn as nn

# Input your 1024-bit hex number to factorize
number_to_factorize = "0x7703af0000000000fa6ead00000000008d2a480000000000772ba480000000007c0a7100000000006b72bb00000000001b842200000000000f9c57100000000015e642c00000000050bf8f00000000003b7c390000000000127718200000000052345d8000000000e7d9db00000000004058fd00000000005eb1d50000000000"

# Model architecture
class ResidualBlock(nn.Module):
    def __init__(self, dim):
        super().__init__()
        self.linear1, self.linear2 = nn.Linear(dim, dim), nn.Linear(dim, dim)
        self.norm1, self.norm2 = nn.LayerNorm(dim), nn.LayerNorm(dim)
        self.activation = nn.ReLU()
    
    def forward(self, x):
        identity = x
        x = self.activation(self.linear1(self.norm1(x)))
        return self.linear2(self.norm2(x)) + identity

class PrimeFactorMLP(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim, num_layers=6):
        super().__init__()
        self.input_proj = nn.Linear(input_dim, hidden_dim)
        self.residual_blocks = nn.ModuleList([ResidualBlock(hidden_dim) for _ in range(num_layers)])
        self.norm, self.activation = nn.LayerNorm(hidden_dim), nn.ReLU()
        self.p_head, self.q_head = nn.Linear(hidden_dim, output_dim), nn.Linear(hidden_dim, output_dim)
    
    def forward(self, x):
        x = self.input_proj(x)
        for block in self.residual_blocks:
            x = block(x)
        x = self.activation(self.norm(x))
        return self.p_head(x), self.q_head(x)

# Helper functions
def int_to_tensor(num, bits=1024, chunk_size=64):
    chunks = bits // chunk_size
    tensor = torch.zeros(chunks)
    for i in range(chunks):
        mask = (1 << chunk_size) - 1
        chunk_val = (num >> (i * chunk_size)) & mask
        tensor[chunks - i - 1] = chunk_val / ((1 << chunk_size) - 1)
    return tensor

def tensor_to_int(tensor, chunk_size=64):
    num = 0
    for i, chunk_val in enumerate(tensor):
        val = int(round(chunk_val.item() * ((1 << chunk_size) - 1)))
        num |= val << ((len(tensor) - i - 1) * chunk_size)
    return num

# Convert input to integer
n = int(number_to_factorize, 16)

# Load model (download from HuggingFace or local path)
model = PrimeFactorMLP(input_dim=16, hidden_dim=2048, output_dim=8)
model.load_state_dict(torch.load('final_model.pth', map_location='cpu')['model_state_dict'])
model.eval()

# Prepare input
n_tensor = int_to_tensor(n).unsqueeze(0)

# Predict factors
with torch.no_grad():
    p_pred, q_pred = model(n_tensor)
    p = tensor_to_int(p_pred[0])
    q = tensor_to_int(q_pred[0])

# Print results
print(f"Input number: {number_to_factorize[:34]}...")
print(f"Predicted P: 0x{p:0128x}")
print(f"Predicted Q: 0x{q:0128x}")
print(f"Product: 0x{p*q:0256x}")
print(f"Bit match: {((p*q) == n)}")

Results

It will come as no surprise that the model does not effectively predict the prime factors of a 1024 bit product, but it does have some interesting properties with their own implications.

Patterns in the Model's Predictions

Consistent Bit Accuracy: The model is achieving remarkably consistent bit matching across samples - averaging around 80-82% of bits correct for both prime factors. This uniformity suggests the model is capturing fundamental structural patterns rather than just memorizing training examples.
High-Order Bit Preservation: Looking at the hex representations, the model often gets the leftmost (most significant) digits fairly accurate. This indicates it's prioritizing the high-order bits, which have the largest impact on the product.
Q Factor Pattern Convergence: The predicted Q values show a noticeable pattern - many predictions have similar digit sequences in the middle sections. For example, many Q predictions contain segments like 80...7f...80... in similar positions, suggesting the model is applying a learned template or pattern.
Relative Error Consistency: The average relative errors for P and Q are almost identical (0.159 vs 0.160), indicating the model doesn't favor one factor over the other.
Prediction Regularization: The predicted factors often appear "smoother" and less random than the true factors. This suggests the network is applying some form of regularization or pattern-based prediction rather than capturing the true randomness of prime numbers.
Value Range Compression: There appears to be compression in the predicted values compared to the true values, particularly for the Q factor - the model's predictions don't span as wide a numerical range as the actual values.

Implications

The model seems to have learned to:

Approximate the general magnitude of each prime factor (getting high-order bits mostly right)
Apply certain templates or patterns it found useful during training
Balance errors between the two factors rather than optimizing for one

Final analysis

While there was a significant Search Space Reduction in the model, the 80% bit accuracy means roughly 20% of bits (about 100 bits for each 512-bit prime) are incorrect. This reduces the search space from 2^512 to approximately 2^100, which is a dramatic improvement but still exponentially large. This means that even using a prediction as a starting point for a classical iterative brute-force approach the worst case calculation time is still exponential. While methods such as probabalistic search algorithms or lattice reductions might allow further computational advantages using the predictions as starting points, such investigations were beyond the scope of this machine learning project.

Claude Sonnet 3.7 was used for large portions of the coding and analysis in this project

maxhirez
/

ShorNet