No description
Find a file
2026-02-05 06:14:00 +01:00
.idea Add CLI and framework core for NeuralBit language models 2026-02-04 23:36:55 +01:00
gradle/wrapper Add CLI and framework core for NeuralBit language models 2026-02-04 23:36:55 +01:00
src Add GPU-accelerated tensor operations using DJL and implement GPU-aware training backend 2026-02-05 06:14:00 +01:00
.gitignore Add CLI and framework core for NeuralBit language models 2026-02-04 23:36:55 +01:00
build.gradle.kts Add GPU-accelerated tensor operations using DJL and implement GPU-aware training backend 2026-02-05 06:14:00 +01:00
gradle.properties Add CLI and framework core for NeuralBit language models 2026-02-04 23:36:55 +01:00
gradlew Add CLI and framework core for NeuralBit language models 2026-02-04 23:36:55 +01:00
gradlew.bat Add CLI and framework core for NeuralBit language models 2026-02-04 23:36:55 +01:00
README.md Add GPU-accelerated tensor operations using DJL and implement GPU-aware training backend 2026-02-05 06:14:00 +01:00
settings.gradle.kts Add CLI and framework core for NeuralBit language models 2026-02-04 23:36:55 +01:00

NeuralBit - BitNet Language Model Framework

A Kotlin framework for training transformer-based language models with ternary weights ({-1, 0, +1}) using the BitNet approach.

Overview

NeuralBit implements BitNet-style transformers where weights are quantized to ternary values ({-1, 0, +1}). This enables:

  • Memory efficiency: Ternary weights (1.58 bits) vs float32 (32 bits)
  • Computational efficiency: Simplified matrix operations
  • Energy efficiency: Ideal for edge devices and embedded systems

The framework is based on Microsoft's BitNet research, using:

  • Absmean quantization for weights to ternary values
  • Absmax quantization for activations
  • RMSNorm for layer normalization
  • Pre-norm transformer architecture

Features

  • BitLinear: Linear layer with ternary weights {-1, 0, +1}
  • BitMultiHeadAttention: Multi-head attention with BitLinear projections
  • BitTransformerBlock: Pre-norm transformer block with RMSNorm
  • BitNetLM: Complete GPT-style decoder-only language model
  • Tokenizers: Character-level and word-level tokenization
  • LLMTrainer: Training loop with gradient clipping
  • Text Generation: Autoregressive generation with temperature and top-k sampling
  • ZIM File Support: Train directly from Kiwix/Wikipedia ZIM archives (no preprocessing needed)
  • GPU Acceleration: Auto-detected CUDA GPU support via DJL/PyTorch for faster training
  • Parallel Training: Multi-threaded CPU matrix operations with automatic GPU fallback

Installation

Add the dependency to your build.gradle.kts:

dependencies {
    implementation("tech.lenooby09:neural-bit:0.2.0")
}

Quick Start

Training a Language Model

import tech.lenooby09.neuralbit.transformer.*

// Prepare your text data
val text = "The quick brown fox jumps over the lazy dog. ".repeat(100)

// Create a tokenizer from the text
val tokenizer = CharTokenizer.fromText(text)

// Configure the model
val config = BitNetConfig(
    vocabSize = tokenizer.vocabSize,
    embedDim = 128,
    numLayers = 4,
    numHeads = 4,
    maxSeqLen = 64
)

// Create the model
val model = BitNetLM(config)
println(model.summary())

// Create dataset for training
val dataset = TextDataset.fromText(text, tokenizer, seqLen = 32)

// Train the model
val trainer = LLMTrainer(model, learningRate = 0.001f)
for (epoch in 0 until 10) {
    val loss = trainer.trainEpoch(dataset, batchSize = 4)
    println("Epoch ${epoch + 1}: loss = $loss")
}

// Generate text
val prompt = tokenizer.encode("The quick")
val generated = model.generate(prompt, maxTokens = 50, temperature = 0.8f)
println(tokenizer.decode(generated))

Using the NeuralBit API

import tech.lenooby09.neuralbit.NeuralBit

// Create components using the NeuralBit API
val tokenizer = NeuralBit.charTokenizer(text)
val config = NeuralBit.smallConfig(tokenizer.vocabSize)
val model = NeuralBit.createModel(config)
val trainer = NeuralBit.createTrainer(model, learningRate = 0.001f)
val dataset = NeuralBit.createDataset(text, tokenizer, seqLen = 32)

Pre-built Configurations

// Tiny model (for testing)
val tiny = BitNetConfig.tiny(vocabSize = 100)  // 64 dim, 2 layers, 2 heads

// Small model
val small = BitNetConfig.small(vocabSize = 256)  // 128 dim, 4 layers, 4 heads

// Base model
val base = BitNetConfig.base(vocabSize = 32000)  // 512 dim, 6 layers, 8 heads

BitNet Configuration Options

Parameter Description Default
vocabSize Vocabulary size Required
embedDim Embedding dimension Required
numLayers Number of transformer layers Required
numHeads Number of attention heads Required
ffnHiddenDim FFN hidden dimension embedDim * 4
maxSeqLen Maximum sequence length 512
tieWeights Tie embedding and output weights true
learnedPositions Use learned positional embeddings false

Command-Line Interface

NeuralBit provides a CLI for training and text generation.

Building the Standalone JAR

# Build a fat JAR with all dependencies
./gradlew shadowJar

# The JAR is created at build/libs/neural-bit-1.0-SNAPSHOT-all.jar
java -jar build/libs/neural-bit-1.0-SNAPSHOT-all.jar <command> [options]

Using Gradle Tasks

# Unified CLI (recommended)
./gradlew cli --args="<command> [options]"

# Direct commands
./gradlew train --args="[options]"
./gradlew generate --args="[options]"

Training a Model

# Train with synthetic data (default)
neuralbit train --epochs 5

# Train from a text file
neuralbit train --text-file story.txt --epochs 20

# Train from a ZIM file (Wikipedia/Kiwix offline archives)
neuralbit train --zim-path wikipedia.zim --max-articles 5000 --epochs 10

# Train with custom architecture
neuralbit train --embed-dim 128 --layers 4 --heads 4 --seq-len 64

# Minimal training example
neuralbit train --text "hello world " --epochs 3 --quiet

Train Options:

Option Short Description Default
--text-file -f Path to text file for training
--text -t Direct text input
--zim-path -z Path to ZIM file (Wikipedia/Kiwix format)
--max-articles Max articles to extract from ZIM 10000
--device -d Compute device: auto, cpu, gpu auto
--threads Number of CPU threads CPU cores
--batch-size -b Batch size (GPU: 64+ recommended) 4 (auto-scaled to 64 for GPU)
--embed-dim Embedding dimension 64
--layers Number of transformer layers 2
--heads Number of attention heads 2
--seq-len -s Sequence length 32
--epochs -e Number of epochs 10
--lr -l Learning rate 0.001
--quiet -q Suppress output

Generating Text

# Generate with default prompt (demo mode)
neuralbit generate

# Generate with custom prompt
neuralbit generate --prompt "Once upon a time" --max-tokens 100

# Control randomness with temperature
neuralbit generate -p "The quick" -t 0.7 --max-tokens 50

# Use top-k sampling
neuralbit generate -p "Hello" -k 10 -n 30

Generate Options:

Option Short Description Default
--prompt -p Starting text for generation
--max-tokens -n Maximum tokens to generate 100
--temperature -t Sampling temperature (lower = deterministic) 1.0
--top-k -k Sample from top K tokens only 0 (disabled)

Architecture

neural-bit/
├── core/
│   └── FloatTensor.kt    # Float tensor for computations
├── layer/
│   ├── Layer.kt          # Layer interface
│   └── BitLinear.kt      # BitNet linear layer (ternary weights)
├── activation/
│   └── RMSNorm.kt        # RMSNorm, LayerNorm for transformers
├── transformer/          # BitNet LLM components
│   ├── Embedding.kt      # Token + positional embeddings
│   ├── Attention.kt      # Multi-head attention with BitLinear
│   ├── TransformerBlock.kt # Transformer block with FFN
│   ├── BitNetLM.kt       # Complete language model
│   ├── Tokenizer.kt      # Character and word tokenizers
│   └── TextDataset.kt    # Dataset and trainer for LLMs
├── data/
│   └── Dataset.kt        # Dataset interface
├── cli/
│   └── NeuralBitCLI.kt   # Command-line interface
└── NeuralBit.kt          # Main API

How BitNet Works

Weight Quantization

BitLinear uses absmean quantization to convert weights to ternary values:

W_ternary = RoundClip(W / γ, -1, 1)

where γ = mean(|W|)

This maps each weight to {-1, 0, +1}:

  • Values close to 0 become 0
  • Positive values become +1
  • Negative values become -1

Activation Quantization

Activations are quantized using absmax quantization:

x_quantized = Quant(x) × (127 / Q_b)

where Q_b = max(|x|)

Training

During training:

  1. Full-precision weights are maintained as "latent" weights
  2. Weights are quantized for the forward pass
  3. Gradients flow through using Straight-Through Estimator (STE)
  4. Full-precision weights are updated with the optimizer

References

License

MIT License - see LICENSE file for details.