No description
- Kotlin 100%
| .idea | ||
| gradle/wrapper | ||
| src | ||
| .gitignore | ||
| build.gradle.kts | ||
| gradle.properties | ||
| gradlew | ||
| gradlew.bat | ||
| README.md | ||
| settings.gradle.kts | ||
NeuralBit - BitNet Language Model Framework
A Kotlin framework for training transformer-based language models with ternary weights ({-1, 0, +1}) using the BitNet approach.
Overview
NeuralBit implements BitNet-style transformers where weights are quantized to ternary values ({-1, 0, +1}). This enables:
- Memory efficiency: Ternary weights (1.58 bits) vs float32 (32 bits)
- Computational efficiency: Simplified matrix operations
- Energy efficiency: Ideal for edge devices and embedded systems
The framework is based on Microsoft's BitNet research, using:
- Absmean quantization for weights to ternary values
- Absmax quantization for activations
- RMSNorm for layer normalization
- Pre-norm transformer architecture
Features
- BitLinear: Linear layer with ternary weights {-1, 0, +1}
- BitMultiHeadAttention: Multi-head attention with BitLinear projections
- BitTransformerBlock: Pre-norm transformer block with RMSNorm
- BitNetLM: Complete GPT-style decoder-only language model
- Tokenizers: Character-level and word-level tokenization
- LLMTrainer: Training loop with gradient clipping
- Text Generation: Autoregressive generation with temperature and top-k sampling
- ZIM File Support: Train directly from Kiwix/Wikipedia ZIM archives (no preprocessing needed)
- GPU Acceleration: Auto-detected CUDA GPU support via DJL/PyTorch for faster training
- Parallel Training: Multi-threaded CPU matrix operations with automatic GPU fallback
Installation
Add the dependency to your build.gradle.kts:
dependencies {
implementation("tech.lenooby09:neural-bit:0.2.0")
}
Quick Start
Training a Language Model
import tech.lenooby09.neuralbit.transformer.*
// Prepare your text data
val text = "The quick brown fox jumps over the lazy dog. ".repeat(100)
// Create a tokenizer from the text
val tokenizer = CharTokenizer.fromText(text)
// Configure the model
val config = BitNetConfig(
vocabSize = tokenizer.vocabSize,
embedDim = 128,
numLayers = 4,
numHeads = 4,
maxSeqLen = 64
)
// Create the model
val model = BitNetLM(config)
println(model.summary())
// Create dataset for training
val dataset = TextDataset.fromText(text, tokenizer, seqLen = 32)
// Train the model
val trainer = LLMTrainer(model, learningRate = 0.001f)
for (epoch in 0 until 10) {
val loss = trainer.trainEpoch(dataset, batchSize = 4)
println("Epoch ${epoch + 1}: loss = $loss")
}
// Generate text
val prompt = tokenizer.encode("The quick")
val generated = model.generate(prompt, maxTokens = 50, temperature = 0.8f)
println(tokenizer.decode(generated))
Using the NeuralBit API
import tech.lenooby09.neuralbit.NeuralBit
// Create components using the NeuralBit API
val tokenizer = NeuralBit.charTokenizer(text)
val config = NeuralBit.smallConfig(tokenizer.vocabSize)
val model = NeuralBit.createModel(config)
val trainer = NeuralBit.createTrainer(model, learningRate = 0.001f)
val dataset = NeuralBit.createDataset(text, tokenizer, seqLen = 32)
Pre-built Configurations
// Tiny model (for testing)
val tiny = BitNetConfig.tiny(vocabSize = 100) // 64 dim, 2 layers, 2 heads
// Small model
val small = BitNetConfig.small(vocabSize = 256) // 128 dim, 4 layers, 4 heads
// Base model
val base = BitNetConfig.base(vocabSize = 32000) // 512 dim, 6 layers, 8 heads
BitNet Configuration Options
| Parameter | Description | Default |
|---|---|---|
vocabSize |
Vocabulary size | Required |
embedDim |
Embedding dimension | Required |
numLayers |
Number of transformer layers | Required |
numHeads |
Number of attention heads | Required |
ffnHiddenDim |
FFN hidden dimension | embedDim * 4 |
maxSeqLen |
Maximum sequence length | 512 |
tieWeights |
Tie embedding and output weights | true |
learnedPositions |
Use learned positional embeddings | false |
Command-Line Interface
NeuralBit provides a CLI for training and text generation.
Building the Standalone JAR
# Build a fat JAR with all dependencies
./gradlew shadowJar
# The JAR is created at build/libs/neural-bit-1.0-SNAPSHOT-all.jar
java -jar build/libs/neural-bit-1.0-SNAPSHOT-all.jar <command> [options]
Using Gradle Tasks
# Unified CLI (recommended)
./gradlew cli --args="<command> [options]"
# Direct commands
./gradlew train --args="[options]"
./gradlew generate --args="[options]"
Training a Model
# Train with synthetic data (default)
neuralbit train --epochs 5
# Train from a text file
neuralbit train --text-file story.txt --epochs 20
# Train from a ZIM file (Wikipedia/Kiwix offline archives)
neuralbit train --zim-path wikipedia.zim --max-articles 5000 --epochs 10
# Train with custom architecture
neuralbit train --embed-dim 128 --layers 4 --heads 4 --seq-len 64
# Minimal training example
neuralbit train --text "hello world " --epochs 3 --quiet
Train Options:
| Option | Short | Description | Default |
|---|---|---|---|
--text-file |
-f |
Path to text file for training | |
--text |
-t |
Direct text input | |
--zim-path |
-z |
Path to ZIM file (Wikipedia/Kiwix format) | |
--max-articles |
Max articles to extract from ZIM | 10000 | |
--device |
-d |
Compute device: auto, cpu, gpu | auto |
--threads |
Number of CPU threads | CPU cores | |
--batch-size |
-b |
Batch size (GPU: 64+ recommended) | 4 (auto-scaled to 64 for GPU) |
--embed-dim |
Embedding dimension | 64 | |
--layers |
Number of transformer layers | 2 | |
--heads |
Number of attention heads | 2 | |
--seq-len |
-s |
Sequence length | 32 |
--epochs |
-e |
Number of epochs | 10 |
--lr |
-l |
Learning rate | 0.001 |
--quiet |
-q |
Suppress output |
Generating Text
# Generate with default prompt (demo mode)
neuralbit generate
# Generate with custom prompt
neuralbit generate --prompt "Once upon a time" --max-tokens 100
# Control randomness with temperature
neuralbit generate -p "The quick" -t 0.7 --max-tokens 50
# Use top-k sampling
neuralbit generate -p "Hello" -k 10 -n 30
Generate Options:
| Option | Short | Description | Default |
|---|---|---|---|
--prompt |
-p |
Starting text for generation | |
--max-tokens |
-n |
Maximum tokens to generate | 100 |
--temperature |
-t |
Sampling temperature (lower = deterministic) | 1.0 |
--top-k |
-k |
Sample from top K tokens only | 0 (disabled) |
Architecture
neural-bit/
├── core/
│ └── FloatTensor.kt # Float tensor for computations
├── layer/
│ ├── Layer.kt # Layer interface
│ └── BitLinear.kt # BitNet linear layer (ternary weights)
├── activation/
│ └── RMSNorm.kt # RMSNorm, LayerNorm for transformers
├── transformer/ # BitNet LLM components
│ ├── Embedding.kt # Token + positional embeddings
│ ├── Attention.kt # Multi-head attention with BitLinear
│ ├── TransformerBlock.kt # Transformer block with FFN
│ ├── BitNetLM.kt # Complete language model
│ ├── Tokenizer.kt # Character and word tokenizers
│ └── TextDataset.kt # Dataset and trainer for LLMs
├── data/
│ └── Dataset.kt # Dataset interface
├── cli/
│ └── NeuralBitCLI.kt # Command-line interface
└── NeuralBit.kt # Main API
How BitNet Works
Weight Quantization
BitLinear uses absmean quantization to convert weights to ternary values:
W_ternary = RoundClip(W / γ, -1, 1)
where γ = mean(|W|)
This maps each weight to {-1, 0, +1}:
- Values close to 0 become 0
- Positive values become +1
- Negative values become -1
Activation Quantization
Activations are quantized using absmax quantization:
x_quantized = Quant(x) × (127 / Q_b)
where Q_b = max(|x|)
Training
During training:
- Full-precision weights are maintained as "latent" weights
- Weights are quantized for the forward pass
- Gradients flow through using Straight-Through Estimator (STE)
- Full-precision weights are updated with the optimizer
References
- BitNet: Scaling 1-bit Transformers for Large Language Models
- The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
- BitNet b1.58 Replication
License
MIT License - see LICENSE file for details.