- Python 100%
- **Core Paradigm**: Adds Nested Learning constructs (arXiv:2512.24695), unifying optimizer and architecture through nested update levels. - **Continuum Memory System (CMS)**: Introduces multi-time-scale memory in `model.py` as a spectrum of parallel decay-based modules, expanding context retention. - **Hope**: Implements a self-modifying neural memory variant, applying meta-head gating for higher-order in-context learning. - **DeepMomentum Optimizer**: Provides a stacked associative-memory optimizer in `train.py` for richer gradient tracking over variable time scales. - **Configurations**: Extends `configs/default.yaml` with flags and tunables for CMS, Hope, |
||
|---|---|---|
| configs | ||
| data | ||
| research | ||
| training | ||
| .gitignore | ||
| conftest.py | ||
| export_model.py | ||
| README.md | ||
| requirements.txt | ||
| setup.py | ||
Project Cynosure
A neural language model training suite leveraging cutting-edge architectures and optimization techniques.
Cynosure incorporates state-of-the-art research from DeepSeek, Google DeepMind, MiniMax, Qwen, Mistral, and others into a unified training pipeline.
Features
- Multi-Head Latent Attention (MLA) — Compressed KV cache inspired by DeepSeek V3
- Mixture-of-Experts (MoE) — Fine-grained experts with auxiliary-loss-free load balancing
- SwiGLU activations, RMSNorm, RoPE — Modern transformer building blocks
- FP8 mixed-precision training — Memory-efficient training at scale
- Multi-Token Prediction (MTP) — Predict multiple future tokens simultaneously
- GRPO reinforcement learning — RL-based alignment and fine-tuning
- Evolution Strategies (ES at Scale + EGGROLL) — Gradient-free, inference-only post-training. Full-rank ES (arXiv:2509.24372) and low-rank EGGROLL (arXiv:2511.16652).
- Titans Neural Memory — Surprise-driven long-term memory
- Manifold Hyper-Connections (mHC) — Learned residual stream routing
- Test-Time Training (TTT) — Adaptive inference-time updates
- TurboQuant — Online vector quantization
See research/OPTIMIZATION_NOTES.md for detailed documentation of all integrated techniques.
Quick Start
pip install -r requirements.txt
python -m training.train --config configs/default.yaml
Evolution-Strategies post-training
The same entrypoint dispatches into a gradient-free, inference-only loop when
training_mode is set to es (small-population full-rank ES) or eggroll
(large-population low-rank ES with the EGGROLL throughput trick):
python -m training.train --config configs/es.yaml # arXiv:2509.24372
python -m training.train --config configs/eggroll.yaml # arXiv:2511.16652
How it works
- Full-rank ES samples N≈30 Gaussian perturbations of every trainable
parameter, evaluates each via inference + a scalar
reward_fn, and applies the centred-rank updateθ ← θ + α/(Nσ) · Σ wᵢ εᵢ. Perturbations are regenerated from(seed, step, member)so they never need to be stored. - EGGROLL replaces every
nn.Linear.forwardwith a per-population rank-r outer-product perturbationσ · uᵢ vᵢᵀ. The whole population runs in a single tiled forward pass; the cumulative rank-N·r update is written back toWafter the rewards are reduced. - The reward function is a Python callable resolved by dotted path
(
evolution.reward_fn: training.rewards.countdown_reward). Built-ins:exact_match_reward,length_reward,format_reward,countdown_reward,gsm8k_reward,compose. Plug in your own by passing any importableCallable[[seqs, prompts, ...], Tensor[B]]. - Theory:
research/Evolution_Strategies_at_Scale_LLM_Fine_Tuning.md,research/Evolution_Strategies_at_the_Hyperscale.md. Implementation:training/evolution.py,training/rewards.py.
The training_mode: supervised | es | eggroll switch lives at the top level
of the YAML config — leaving it unset (or set to supervised) keeps the
existing supervised + GRPO pipelines byte-identical.
Project Structure
Cynosure/
├── train/ # Training package
│ ├── training/ # Model architecture & training pipeline
│ │ ├── model.py # Transformer model with all optional modules
│ │ ├── train.py # Training loop (supervised / GRPO / DPO / ES)
│ │ ├── evolution.py # Evolution-Strategies steps (ES at Scale + EGGROLL)
│ │ ├── rewards.py # Reward-function library + rollout helper
│ │ └── dataset.py # Data loading
│ ├── configs/ # YAML configuration files
│ │ ├── default.yaml # Default training configuration
│ │ ├── es.yaml # Full-rank ES (arXiv:2509.24372)
│ │ └── eggroll.yaml # EGGROLL low-rank ES (arXiv:2511.16652)
│ ├── export_model.py # Export trained model for inference
│ ├── research/ # Architecture & optimization research notes
│ ├── data/ # Training datasets (placeholder)
│ ├── requirements.txt
│ └── setup.py
├── inference/ # Standalone inference module (no training dependency)
│ ├── model.py # Self-contained model architecture
│ ├── generate.py # Generation loop, sampling, CLI
│ ├── network/ # Bundled model (config.yaml + checkpoint.pt)
│ └── requirements.txt # Inference-only dependencies
Exporting a Model for Inference
After training, export the model so the inference module can run standalone:
cd train
python export_model.py \
--config configs/default.yaml \
--checkpoint checkpoints/cynosure_final.pt
This copies the config and checkpoint into ../inference/network/.
Inference
The inference module is fully standalone — it does not require the training package.
Once a network is exported into inference/network/, run:
# Single prompt (run from project root)
python -m inference --prompt "Once upon a time"
# Interactive mode (omit --prompt)
python -m inference
# Or with explicit paths
python -m inference \
--config path/to/config.yaml \
--checkpoint path/to/checkpoint.pt \
--prompt "Hello world"
Sampling options: --strategy greedy|sample|top_k|top_p, --temperature, --top-k, --top-p, --repetition-penalty.
Programmatic usage:
from inference import load_model, generate, get_tokenizer
# Uses inference/network/ by default
model = load_model(device="cuda")
tokenizer = get_tokenizer("gpt2")
input_ids = tokenizer.encode("Hello world", return_tensors="pt").cuda()
output_ids = generate(model, input_ids, max_new_tokens=128, strategy="top_p")
print(tokenizer.decode(output_ids[0]))
License
Research project — not affiliated with CD Projekt Red.