|
--- |
|
|
|
library_name: transformers |
|
pipeline_tag: feature-extraction |
|
license: apache-2.0 |
|
tags: |
|
- autoencoder |
|
- pytorch |
|
- reconstruction |
|
- preprocessing |
|
- normalizing-flow |
|
- scaler |
|
--- |
|
|
|
# Autoencoder Implementation for Hugging Face Transformers |
|
|
|
A complete autoencoder implementation that integrates seamlessly with the Hugging Face Transformers ecosystem, providing all the standard functionality you expect from transformer models. |
|
|
|
|
|
### Install-and-Use from the Hub (code repo) |
|
|
|
If you want to use the implementation directly from the Hub code repository (without a packaged pip install), you can download the repo and add it to `sys.path`: |
|
|
|
```python |
|
from huggingface_hub import snapshot_download |
|
import sys, torch |
|
|
|
# 1) Download the code+weights for your repo βas isβ |
|
repo_dir = snapshot_download( |
|
repo_id="amaye15/autoencoder", |
|
repo_type="model", |
|
allow_patterns=["*.py", "config.json", "*.safetensors"], # note the * wildcards |
|
) |
|
|
|
# 2) Add to import path so plain imports work |
|
sys.path.append(repo_dir) |
|
|
|
# 3) Import your classes from the repo code |
|
from configuration_autoencoder import AutoencoderConfig |
|
from modeling_autoencoder import AutoencoderForReconstruction |
|
|
|
# 4) Load the placeholder weights from the local folder (no internet, no code refresh) |
|
model = AutoencoderForReconstruction.from_pretrained(repo_dir) |
|
|
|
# 5) Quick smoke test |
|
x = torch.randn(8, 20) |
|
out = model(input_values=x) |
|
print("latent:", out.last_hidden_state.shape, "reconstructed:", out.reconstructed.shape) |
|
``` |
|
|
|
## π Features |
|
|
|
- **Full Hugging Face Integration**: Compatible with `AutoModel`, `AutoConfig`, and `AutoTokenizer` patterns |
|
- **Standard Training Workflows**: Works with `Trainer`, `TrainingArguments`, and all HF training utilities |
|
- **Model Hub Compatible**: Save and share models on Hugging Face Hub with `push_to_hub()` |
|
- **Flexible Architecture**: Configurable encoder-decoder architecture with various activation functions |
|
- **Multiple Loss Functions**: Support for MSE, BCE, L1, Huber, Smooth L1, KL Divergence, Cosine, Focal, Dice, Tversky, SSIM, and Perceptual loss |
|
- **Multiple Autoencoder Types (7)**: Classic, Variational (VAE), Beta-VAE, Denoising, Sparse, Contractive, and Recurrent autoencoders |
|
- **Extended Activation Functions**: 18+ activation functions including ReLU, GELU, Swish, Mish, ELU, and more |
|
- **Learnable Preprocessing**: Neural Scaler, Normalizing Flow, MinMax Scaler (learnable), Robust Scaler (learnable), and Yeo-Johnson preprocessors (2D and 3D tensors) |
|
- **Extensible Design**: Easy to extend for new autoencoder variants and custom loss functions |
|
- **Production Ready**: Proper serialization, checkpointing, and inference support |
|
|
|
|
|
## ποΈ Architecture |
|
|
|
The implementation consists of three main components: |
|
|
|
### 1. AutoencoderConfig |
|
Configuration class that inherits from `PretrainedConfig`: |
|
- Defines model architecture parameters |
|
- Handles validation and serialization |
|
- Enables `AutoConfig.from_pretrained()` functionality |
|
|
|
### 2. AutoencoderModel |
|
Base model class that inherits from `PreTrainedModel`: |
|
- Implements encoder-decoder architecture |
|
- Provides latent space representation |
|
- Returns structured outputs with `AutoencoderOutput` |
|
|
|
### 3. AutoencoderForReconstruction |
|
Task-specific model for reconstruction: |
|
- Adds reconstruction loss calculation |
|
- Compatible with `Trainer` for easy training |
|
- Returns `AutoencoderForReconstructionOutput` with loss |
|
|
|
## π§ Quick Start |
|
|
|
### Basic Usage |
|
|
|
```python |
|
from configuration_autoencoder import AutoencoderConfig |
|
from modeling_autoencoder import AutoencoderForReconstruction |
|
import torch |
|
|
|
# Create configuration |
|
config = AutoencoderConfig( |
|
input_dim=784, # Input dimensionality (e.g., 28x28 images flattened) |
|
hidden_dims=[512, 256], # Encoder hidden layers |
|
latent_dim=64, # Latent space dimension |
|
activation="gelu", # Activation function (18+ options available) |
|
reconstruction_loss="mse", # Loss function (12+ options available) |
|
autoencoder_type="classic", # Autoencoder type (7 types available) |
|
# Optional learnable preprocessing |
|
use_learnable_preprocessing=True, |
|
preprocessing_type="neural_scaler", # or "normalizing_flow", "minmax_scaler", "robust_scaler", "yeo_johnson" |
|
) |
|
|
|
# Create model |
|
model = AutoencoderForReconstruction(config) |
|
|
|
# Forward pass |
|
input_data = torch.randn(32, 784) # Batch of 32 samples |
|
outputs = model(input_values=input_data) |
|
|
|
print(f"Reconstruction loss: {outputs.loss}") |
|
print(f"Latent shape: {outputs.last_hidden_state.shape}") |
|
print(f"Reconstructed shape: {outputs.reconstructed.shape}") |
|
``` |
|
|
|
|
|
### Training with Hugging Face Trainer |
|
|
|
```python |
|
from transformers import Trainer, TrainingArguments |
|
from torch.utils.data import Dataset |
|
|
|
class AutoencoderDataset(Dataset): |
|
def __init__(self, data): |
|
self.data = torch.FloatTensor(data) |
|
|
|
def __len__(self): |
|
return len(self.data) |
|
|
|
def __getitem__(self, idx): |
|
return { |
|
"input_values": self.data[idx], |
|
"labels": self.data[idx] # For autoencoder, input = target |
|
} |
|
|
|
# Prepare data |
|
train_dataset = AutoencoderDataset(your_training_data) |
|
val_dataset = AutoencoderDataset(your_validation_data) |
|
|
|
# Training arguments |
|
training_args = TrainingArguments( |
|
output_dir="./autoencoder_output", |
|
num_train_epochs=10, |
|
per_device_train_batch_size=64, |
|
per_device_eval_batch_size=64, |
|
warmup_steps=500, |
|
weight_decay=0.01, |
|
logging_dir="./logs", |
|
evaluation_strategy="steps", |
|
eval_steps=500, |
|
save_steps=1000, |
|
load_best_model_at_end=True, |
|
) |
|
|
|
# Create trainer |
|
trainer = Trainer( |
|
model=model, |
|
args=training_args, |
|
train_dataset=train_dataset, |
|
eval_dataset=val_dataset, |
|
) |
|
|
|
# Train |
|
trainer.train() |
|
|
|
# Save model |
|
model.save_pretrained("./my_autoencoder") |
|
config.save_pretrained("./my_autoencoder") |
|
``` |
|
|
|
### Using AutoModel Framework |
|
|
|
```python |
|
from register_autoencoder import register_autoencoder_models |
|
from transformers import AutoConfig, AutoModel |
|
|
|
# Register models with AutoModel framework |
|
register_autoencoder_models() |
|
|
|
# Now you can use standard HF patterns |
|
config = AutoConfig.from_pretrained("./my_autoencoder") |
|
model = AutoModel.from_pretrained("./my_autoencoder") |
|
|
|
# Use the model |
|
outputs = model(input_values=your_data) |
|
``` |
|
|
|
## βοΈ Configuration Options |
|
|
|
The `AutoencoderConfig` class supports extensive customization: |
|
|
|
```python |
|
config = AutoencoderConfig( |
|
input_dim=784, # Input dimension |
|
hidden_dims=[512, 256, 128], # Encoder hidden layers |
|
latent_dim=64, # Latent space dimension |
|
activation="gelu", # Activation function (see full list below) |
|
dropout_rate=0.1, # Dropout rate (0.0 to 1.0) |
|
use_batch_norm=True, # Use batch normalization |
|
tie_weights=False, # Tie encoder/decoder weights |
|
reconstruction_loss="mse", # Loss function (see full list below) |
|
autoencoder_type="variational", # Autoencoder type (see types below) |
|
beta=0.5, # Beta parameter for Ξ²-VAE |
|
temperature=1.0, # Temperature for Gumbel softmax |
|
noise_factor=0.1, # Noise factor for denoising AE |
|
# Recurrent autoencoder parameters |
|
rnn_type="lstm", # RNN type: "lstm", "gru", "rnn" |
|
num_layers=2, # Number of RNN layers |
|
bidirectional=True, # Bidirectional encoding |
|
sequence_length=None, # Fixed sequence length (None for variable) |
|
teacher_forcing_ratio=0.5, # Teacher forcing ratio during training |
|
# Learnable preprocessing parameters |
|
use_learnable_preprocessing=False, # Enable learnable preprocessing |
|
preprocessing_type="none", # "none", "neural_scaler", "normalizing_flow" |
|
preprocessing_hidden_dim=64, # Hidden dimension for preprocessing networks |
|
preprocessing_num_layers=2, # Number of layers in preprocessing networks |
|
learn_inverse_preprocessing=True, # Learn inverse transformation |
|
flow_coupling_layers=4, # Number of coupling layers for flows |
|
) |
|
``` |
|
|
|
### ποΈ Available Activation Functions |
|
|
|
**Standard Activations:** |
|
- `relu`, `leaky_relu`, `relu6`, `elu`, `prelu` |
|
- `tanh`, `sigmoid`, `hardsigmoid`, `hardtanh` |
|
- `gelu`, `swish`, `silu`, `hardswish` |
|
- `mish`, `softplus`, `softsign`, `tanhshrink`, `threshold` |
|
|
|
### π Available Loss Functions |
|
|
|
**Regression Losses:** |
|
- `mse` - Mean Squared Error |
|
- `l1` - L1/MAE Loss |
|
- `huber` - Huber Loss |
|
- `smooth_l1` - Smooth L1 Loss |
|
|
|
**Classification/Probability Losses:** |
|
- `bce` - Binary Cross Entropy |
|
- `kl_div` - KL Divergence |
|
- `focal` - Focal Loss |
|
|
|
**Similarity Losses:** |
|
- `cosine` - Cosine Similarity Loss |
|
- `ssim` - Structural Similarity Loss |
|
- `perceptual` - Perceptual Loss |
|
|
|
**Segmentation Losses:** |
|
- `dice` - Dice Loss |
|
- `tversky` - Tversky Loss |
|
|
|
### ποΈ Available Autoencoder Types |
|
|
|
**Classic Autoencoder (`classic`)** |
|
- Standard encoder-decoder architecture |
|
- Direct reconstruction loss minimization |
|
|
|
**Variational Autoencoder (`variational`)** |
|
- Probabilistic latent space with mean and variance |
|
- KL divergence regularization |
|
- Reparameterization trick for sampling |
|
|
|
**Beta-VAE (`beta_vae`)** |
|
- Variational autoencoder with adjustable Ξ² parameter |
|
- Better disentanglement of latent factors |
|
|
|
**Denoising Autoencoder (`denoising`)** |
|
- Adds noise to input during training |
|
- Learns robust representations |
|
- Configurable noise factor |
|
|
|
**Sparse Autoencoder (`sparse`)** |
|
- Encourages sparse latent representations |
|
- L1 regularization on latent activations |
|
- Useful for feature selection |
|
|
|
**Contractive Autoencoder (`contractive`)** |
|
- Penalizes large gradients of latent w.r.t. input |
|
- Learns smooth manifold representations |
|
- Robust to small input perturbations |
|
|
|
**Recurrent Autoencoder (`recurrent`)** |
|
- LSTM/GRU/RNN encoder-decoder architecture |
|
- Bidirectional encoding for better sequence representations |
|
- Variable length sequence support with padding |
|
- Teacher forcing during training for stable learning |
|
- Sequence-to-sequence reconstruction |
|
``` |
|
|
|
## π Model Outputs |
|
|
|
### AutoencoderOutput |
|
|
|
The base model `AutoencoderModel` returns the following output: |
|
``` |
|
```python |
|
|
|
@dataclass |
|
class AutoencoderOutput(ModelOutput): |
|
last_hidden_state: torch.FloatTensor = None # Latent representation |
|
reconstructed: torch.FloatTensor = None # Reconstructed input |
|
hidden_states: Tuple[torch.FloatTensor] = None # Intermediate states |
|
attentions: Tuple[torch.FloatTensor] = None # Not used |
|
``` |
|
|
|
### AutoencoderForReconstructionOutput |
|
```python |
|
@dataclass |
|
class AutoencoderForReconstructionOutput(ModelOutput): |
|
loss: torch.FloatTensor = None # Reconstruction loss |
|
reconstructed: torch.FloatTensor = None # Reconstructed input |
|
last_hidden_state: torch.FloatTensor = None # Latent representation |
|
hidden_states: Tuple[torch.FloatTensor] = None # Intermediate states |
|
``` |
|
|
|
## π¬ Advanced Usage |
|
|
|
### Custom Loss Functions |
|
|
|
You can easily extend the model with custom loss functions: |
|
|
|
```python |
|
class CustomAutoencoder(AutoencoderForReconstruction): |
|
def _compute_reconstruction_loss(self, reconstructed, target): |
|
# Custom loss implementation |
|
return your_custom_loss(reconstructed, target) |
|
``` |
|
|
|
### Recurrent Autoencoder for Sequences |
|
|
|
Perfect for time series, text, and sequential data: |
|
|
|
```python |
|
config = AutoencoderConfig( |
|
input_dim=50, # Feature dimension per timestep |
|
latent_dim=32, # Compressed representation size |
|
autoencoder_type="recurrent", |
|
rnn_type="lstm", # or "gru", "rnn" |
|
num_layers=2, # Number of RNN layers |
|
bidirectional=True, # Bidirectional encoding |
|
teacher_forcing_ratio=0.7, # Teacher forcing during training |
|
sequence_length=None # Variable length sequences |
|
) |
|
|
|
# Usage with sequence data |
|
model = AutoencoderForReconstruction(config) |
|
sequence_data = torch.randn(batch_size, seq_len, input_dim) |
|
outputs = model(input_values=sequence_data) |
|
``` |
|
|
|
### Learnable Preprocessing |
|
|
|
Deep learning-based data normalization that adapts to your data: |
|
|
|
```python |
|
# Neural Scaler - Learnable alternative to StandardScaler |
|
config = AutoencoderConfig( |
|
input_dim=20, |
|
latent_dim=10, |
|
use_learnable_preprocessing=True, |
|
preprocessing_type="neural_scaler", |
|
preprocessing_hidden_dim=64 |
|
) |
|
|
|
# Normalizing Flow - Invertible transformations |
|
config = AutoencoderConfig( |
|
input_dim=20, |
|
latent_dim=10, |
|
use_learnable_preprocessing=True, |
|
preprocessing_type="normalizing_flow", |
|
flow_coupling_layers=4 |
|
) |
|
|
|
# Works with all autoencoder types and sequence data |
|
model = AutoencoderForReconstruction(config) |
|
outputs = model(input_values=data) |
|
print(f"Preprocessing loss: {outputs.preprocessing_loss}") |
|
``` |
|
|
|
```python |
|
# Learnable MinMax Scaler - scales to [0, 1] with learnable bounds |
|
config = AutoencoderConfig( |
|
input_dim=20, |
|
latent_dim=10, |
|
use_learnable_preprocessing=True, |
|
preprocessing_type="minmax_scaler", |
|
) |
|
|
|
# Learnable Robust Scaler - robust to outliers using median/IQR |
|
config = AutoencoderConfig( |
|
input_dim=20, |
|
latent_dim=10, |
|
use_learnable_preprocessing=True, |
|
preprocessing_type="robust_scaler", |
|
) |
|
|
|
# Learnable Yeo-Johnson - power transform for skewed distributions |
|
config = AutoencoderConfig( |
|
input_dim=20, |
|
latent_dim=10, |
|
use_learnable_preprocessing=True, |
|
preprocessing_type="yeo_johnson", |
|
) |
|
``` |
|
|
|
|
|
### Variational Autoencoder Extension |
|
|
|
The configuration supports variational autoencoders: |
|
|
|
```python |
|
config = AutoencoderConfig( |
|
autoencoder_type="variational", |
|
beta=0.5, # Ξ²-VAE parameter |
|
# ... other parameters |
|
) |
|
``` |
|
|
|
### Integration with Datasets Library |
|
|
|
```python |
|
from datasets import Dataset |
|
|
|
# Convert your data to HF Dataset |
|
dataset = Dataset.from_dict({ |
|
"input_values": your_data_list |
|
}) |
|
|
|
# Use with Trainer |
|
trainer = Trainer( |
|
model=model, |
|
train_dataset=dataset, |
|
# ... other arguments |
|
) |
|
``` |
|
|
|
## π Project Structure |
|
|
|
``` |
|
autoencoder/ |
|
βββ __init__.py # Package initialization |
|
βββ configuration_autoencoder.py # Configuration class |
|
βββ modeling_autoencoder.py # Model implementations |
|
βββ register_autoencoder.py # AutoModel registration |
|
βββ pyproject.toml # Project metadata and dependencies |
|
βββ README.md # This file |
|
``` |
|
|
|
## π€ Contributing |
|
|
|
This implementation follows Hugging Face conventions and can be easily extended: |
|
|
|
1. **Adding new architectures**: Extend `AutoencoderModel` or create new model classes |
|
2. **Custom configurations**: Add parameters to `AutoencoderConfig` |
|
3. **Task-specific heads**: Create new classes like `AutoencoderForReconstruction` |
|
4. **Integration**: Register new models with the AutoModel framework |
|
|
|
## π References |
|
|
|
- [Hugging Face Transformers Documentation](https://huggingface.co/docs/transformers) |
|
- [Custom Models Guide](https://huggingface.co/docs/transformers/custom_models) |
|
- [AutoModel Documentation](https://huggingface.co/docs/transformers/model_doc/auto) |
|
|
|
## π― Use Cases |
|
|
|
This autoencoder implementation is perfect for: |
|
|
|
- **Dimensionality Reduction**: Compress high-dimensional data to lower dimensions |
|
- **Anomaly Detection**: Identify outliers based on reconstruction error |
|
- **Data Denoising**: Remove noise from corrupted data |
|
- **Feature Learning**: Learn meaningful representations for downstream tasks |
|
- **Data Generation**: Generate new samples similar to training data |
|
- **Pretraining**: Initialize encoders for other tasks |
|
|
|
## π Model Comparison |
|
|
|
| Feature | Standard PyTorch | This Implementation | |
|
|---------|------------------|-------------------| |
|
| HF Integration | β | β
| |
|
| AutoModel Support | β | β
| |
|
| Trainer Compatible | β | β
| |
|
| Hub Integration | β | β
| |
|
| Config Management | Manual | β
Automatic | |
|
| Serialization | Manual | β
Built-in | |
|
| Checkpointing | Manual | β
Built-in | |
|
|
|
## π Performance Tips |
|
|
|
1. **Batch Size**: Use larger batch sizes for better GPU utilization |
|
2. **Learning Rate**: Start with 1e-3 and adjust based on convergence |
|
3. **Architecture**: Gradually decrease hidden dimensions for better compression |
|
4. **Regularization**: Use dropout and batch normalization for better generalization |
|
5. **Loss Function**: Choose appropriate loss based on your data type |
|
|
|
## π License |
|
|
|
This implementation is provided as an example and follows the same license terms as Hugging Face Transformers. |
|
|