No description
Find a file
LeNooby09 ba11fdb7dd Integrate Nested Learning paradigm with Continuum Memory System, Hope architecture, and DeepMomentum optimizer
- **Core Paradigm**: Adds Nested Learning constructs (arXiv:2512.24695), unifying optimizer and architecture through nested update levels.
- **Continuum Memory System (CMS)**: Introduces multi-time-scale memory in `model.py` as a spectrum of parallel decay-based modules, expanding context retention.
- **Hope**: Implements a self-modifying neural memory variant, applying meta-head gating for higher-order in-context learning.
- **DeepMomentum Optimizer**: Provides a stacked associative-memory optimizer in `train.py` for richer gradient tracking over variable time scales.
- **Configurations**: Extends `configs/default.yaml` with flags and tunables for CMS, Hope,
2026-04-28 22:24:08 +00:00
configs Integrate Nested Learning paradigm with Continuum Memory System, Hope architecture, and DeepMomentum optimizer 2026-04-28 22:24:08 +00:00
data Add foundational research and technical documents: mHC, TurboQuant, and train.py 2026-04-12 04:16:22 +02:00
research Integrate Nested Learning paradigm with Continuum Memory System, Hope architecture, and DeepMomentum optimizer 2026-04-28 22:24:08 +00:00
training Integrate Nested Learning paradigm with Continuum Memory System, Hope architecture, and DeepMomentum optimizer 2026-04-28 22:24:08 +00:00
.gitignore Add foundational research and technical documents: mHC, TurboQuant, and train.py 2026-04-12 04:16:22 +02:00
conftest.py Introduce ES and EGGROLL evolution strategies with configuration and training steps 2026-04-28 21:41:37 +00:00
export_model.py Enhance training and inference capabilities with new export model script and fast tokenizer management 2026-04-13 23:06:42 +02:00
README.md Introduce ES and EGGROLL evolution strategies with configuration and training steps 2026-04-28 21:41:37 +00:00
requirements.txt Add foundational research and technical documents: mHC, TurboQuant, and train.py 2026-04-12 04:16:22 +02:00
setup.py Add foundational research and technical documents: mHC, TurboQuant, and train.py 2026-04-12 04:16:22 +02:00

Project Cynosure

A neural language model training suite leveraging cutting-edge architectures and optimization techniques.

Cynosure incorporates state-of-the-art research from DeepSeek, Google DeepMind, MiniMax, Qwen, Mistral, and others into a unified training pipeline.

Features

  • Multi-Head Latent Attention (MLA) — Compressed KV cache inspired by DeepSeek V3
  • Mixture-of-Experts (MoE) — Fine-grained experts with auxiliary-loss-free load balancing
  • SwiGLU activations, RMSNorm, RoPE — Modern transformer building blocks
  • FP8 mixed-precision training — Memory-efficient training at scale
  • Multi-Token Prediction (MTP) — Predict multiple future tokens simultaneously
  • GRPO reinforcement learning — RL-based alignment and fine-tuning
  • Evolution Strategies (ES at Scale + EGGROLL) — Gradient-free, inference-only post-training. Full-rank ES (arXiv:2509.24372) and low-rank EGGROLL (arXiv:2511.16652).
  • Titans Neural Memory — Surprise-driven long-term memory
  • Manifold Hyper-Connections (mHC) — Learned residual stream routing
  • Test-Time Training (TTT) — Adaptive inference-time updates
  • TurboQuant — Online vector quantization

See research/OPTIMIZATION_NOTES.md for detailed documentation of all integrated techniques.

Quick Start

pip install -r requirements.txt
python -m training.train --config configs/default.yaml

Evolution-Strategies post-training

The same entrypoint dispatches into a gradient-free, inference-only loop when training_mode is set to es (small-population full-rank ES) or eggroll (large-population low-rank ES with the EGGROLL throughput trick):

python -m training.train --config configs/es.yaml       # arXiv:2509.24372
python -m training.train --config configs/eggroll.yaml  # arXiv:2511.16652

How it works

  • Full-rank ES samples N≈30 Gaussian perturbations of every trainable parameter, evaluates each via inference + a scalar reward_fn, and applies the centred-rank update θ ← θ + α/(Nσ) · Σ wᵢ εᵢ. Perturbations are regenerated from (seed, step, member) so they never need to be stored.
  • EGGROLL replaces every nn.Linear.forward with a per-population rank-r outer-product perturbation σ · uᵢ vᵢᵀ. The whole population runs in a single tiled forward pass; the cumulative rank-N·r update is written back to W after the rewards are reduced.
  • The reward function is a Python callable resolved by dotted path (evolution.reward_fn: training.rewards.countdown_reward). Built-ins: exact_match_reward, length_reward, format_reward, countdown_reward, gsm8k_reward, compose. Plug in your own by passing any importable Callable[[seqs, prompts, ...], Tensor[B]].
  • Theory: research/Evolution_Strategies_at_Scale_LLM_Fine_Tuning.md, research/Evolution_Strategies_at_the_Hyperscale.md. Implementation: training/evolution.py, training/rewards.py.

The training_mode: supervised | es | eggroll switch lives at the top level of the YAML config — leaving it unset (or set to supervised) keeps the existing supervised + GRPO pipelines byte-identical.

Project Structure

Cynosure/
├── train/                 # Training package
│   ├── training/          # Model architecture & training pipeline
│   │   ├── model.py       # Transformer model with all optional modules
│   │   ├── train.py       # Training loop (supervised / GRPO / DPO / ES)
│   │   ├── evolution.py   # Evolution-Strategies steps (ES at Scale + EGGROLL)
│   │   ├── rewards.py     # Reward-function library + rollout helper
│   │   └── dataset.py     # Data loading
│   ├── configs/           # YAML configuration files
│   │   ├── default.yaml   # Default training configuration
│   │   ├── es.yaml        # Full-rank ES (arXiv:2509.24372)
│   │   └── eggroll.yaml   # EGGROLL low-rank ES (arXiv:2511.16652)
│   ├── export_model.py    # Export trained model for inference
│   ├── research/          # Architecture & optimization research notes
│   ├── data/              # Training datasets (placeholder)
│   ├── requirements.txt
│   └── setup.py
├── inference/             # Standalone inference module (no training dependency)
│   ├── model.py           # Self-contained model architecture
│   ├── generate.py        # Generation loop, sampling, CLI
│   ├── network/           # Bundled model (config.yaml + checkpoint.pt)
│   └── requirements.txt   # Inference-only dependencies

Exporting a Model for Inference

After training, export the model so the inference module can run standalone:

cd train
python export_model.py \
    --config configs/default.yaml \
    --checkpoint checkpoints/cynosure_final.pt

This copies the config and checkpoint into ../inference/network/.

Inference

The inference module is fully standalone — it does not require the training package. Once a network is exported into inference/network/, run:

# Single prompt (run from project root)
python -m inference --prompt "Once upon a time"

# Interactive mode (omit --prompt)
python -m inference

# Or with explicit paths
python -m inference \
    --config path/to/config.yaml \
    --checkpoint path/to/checkpoint.pt \
    --prompt "Hello world"

Sampling options: --strategy greedy|sample|top_k|top_p, --temperature, --top-k, --top-p, --repetition-penalty.

Programmatic usage:

from inference import load_model, generate, get_tokenizer

# Uses inference/network/ by default
model = load_model(device="cuda")
tokenizer = get_tokenizer("gpt2")
input_ids = tokenizer.encode("Hello world", return_tensors="pt").cuda()
output_ids = generate(model, input_ids, max_new_tokens=128, strategy="top_p")
print(tokenizer.decode(output_ids[0]))

License

Research project — not affiliated with CD Projekt Red.