No description

Kotlin 100%

Find a file

LeNooby09 3778e3c999 Add GPU-accelerated tensor operations using DJL and implement GPU-aware training backend		2026-02-05 06:14:00 +01:00
.idea	Add CLI and framework core for NeuralBit language models	2026-02-04 23:36:55 +01:00
gradle/wrapper	Add CLI and framework core for NeuralBit language models	2026-02-04 23:36:55 +01:00
src	Add GPU-accelerated tensor operations using DJL and implement GPU-aware training backend	2026-02-05 06:14:00 +01:00
.gitignore	Add CLI and framework core for NeuralBit language models	2026-02-04 23:36:55 +01:00
build.gradle.kts	Add GPU-accelerated tensor operations using DJL and implement GPU-aware training backend	2026-02-05 06:14:00 +01:00
gradle.properties	Add CLI and framework core for NeuralBit language models	2026-02-04 23:36:55 +01:00
gradlew	Add CLI and framework core for NeuralBit language models	2026-02-04 23:36:55 +01:00
gradlew.bat	Add CLI and framework core for NeuralBit language models	2026-02-04 23:36:55 +01:00
README.md	Add GPU-accelerated tensor operations using DJL and implement GPU-aware training backend	2026-02-05 06:14:00 +01:00
settings.gradle.kts	Add CLI and framework core for NeuralBit language models	2026-02-04 23:36:55 +01:00

README.md

NeuralBit - BitNet Language Model Framework

A Kotlin framework for training transformer-based language models with ternary weights ({-1, 0, +1}) using the BitNet approach.

Overview

NeuralBit implements BitNet-style transformers where weights are quantized to ternary values ({-1, 0, +1}). This enables:

Memory efficiency: Ternary weights (1.58 bits) vs float32 (32 bits)
Computational efficiency: Simplified matrix operations
Energy efficiency: Ideal for edge devices and embedded systems

The framework is based on Microsoft's BitNet research, using:

Absmean quantization for weights to ternary values
Absmax quantization for activations
RMSNorm for layer normalization
Pre-norm transformer architecture

Features

BitLinear: Linear layer with ternary weights {-1, 0, +1}
BitMultiHeadAttention: Multi-head attention with BitLinear projections
BitTransformerBlock: Pre-norm transformer block with RMSNorm
BitNetLM: Complete GPT-style decoder-only language model
Tokenizers: Character-level and word-level tokenization
LLMTrainer: Training loop with gradient clipping
Text Generation: Autoregressive generation with temperature and top-k sampling
ZIM File Support: Train directly from Kiwix/Wikipedia ZIM archives (no preprocessing needed)
GPU Acceleration: Auto-detected CUDA GPU support via DJL/PyTorch for faster training
Parallel Training: Multi-threaded CPU matrix operations with automatic GPU fallback

Installation

Add the dependency to your build.gradle.kts:

dependencies {
    implementation("tech.lenooby09:neural-bit:0.2.0")
}

Quick Start

Training a Language Model

import tech.lenooby09.neuralbit.transformer.*

// Prepare your text data
val text = "The quick brown fox jumps over the lazy dog. ".repeat(100)

// Create a tokenizer from the text
val tokenizer = CharTokenizer.fromText(text)

// Configure the model
val config = BitNetConfig(
    vocabSize = tokenizer.vocabSize,
    embedDim = 128,
    numLayers = 4,
    numHeads = 4,
    maxSeqLen = 64
)

// Create the model
val model = BitNetLM(config)
println(model.summary())

// Create dataset for training
val dataset = TextDataset.fromText(text, tokenizer, seqLen = 32)

// Train the model
val trainer = LLMTrainer(model, learningRate = 0.001f)
for (epoch in 0 until 10) {
    val loss = trainer.trainEpoch(dataset, batchSize = 4)
    println("Epoch ${epoch + 1}: loss = $loss")
}

// Generate text
val prompt = tokenizer.encode("The quick")
val generated = model.generate(prompt, maxTokens = 50, temperature = 0.8f)
println(tokenizer.decode(generated))

Using the NeuralBit API

import tech.lenooby09.neuralbit.NeuralBit

// Create components using the NeuralBit API
val tokenizer = NeuralBit.charTokenizer(text)
val config = NeuralBit.smallConfig(tokenizer.vocabSize)
val model = NeuralBit.createModel(config)
val trainer = NeuralBit.createTrainer(model, learningRate = 0.001f)
val dataset = NeuralBit.createDataset(text, tokenizer, seqLen = 32)

Pre-built Configurations

// Tiny model (for testing)
val tiny = BitNetConfig.tiny(vocabSize = 100)  // 64 dim, 2 layers, 2 heads

// Small model
val small = BitNetConfig.small(vocabSize = 256)  // 128 dim, 4 layers, 4 heads

// Base model
val base = BitNetConfig.base(vocabSize = 32000)  // 512 dim, 6 layers, 8 heads

BitNet Configuration Options

Parameter	Description	Default
`vocabSize`	Vocabulary size	Required
`embedDim`	Embedding dimension	Required
`numLayers`	Number of transformer layers	Required
`numHeads`	Number of attention heads	Required
`ffnHiddenDim`	FFN hidden dimension	embedDim * 4
`maxSeqLen`	Maximum sequence length	512
`tieWeights`	Tie embedding and output weights	true
`learnedPositions`	Use learned positional embeddings	false

Command-Line Interface

NeuralBit provides a CLI for training and text generation.

Building the Standalone JAR

# Build a fat JAR with all dependencies
./gradlew shadowJar

# The JAR is created at build/libs/neural-bit-1.0-SNAPSHOT-all.jar
java -jar build/libs/neural-bit-1.0-SNAPSHOT-all.jar <command> [options]

Using Gradle Tasks

# Unified CLI (recommended)
./gradlew cli --args="<command> [options]"

# Direct commands
./gradlew train --args="[options]"
./gradlew generate --args="[options]"

Training a Model

# Train with synthetic data (default)
neuralbit train --epochs 5

# Train from a text file
neuralbit train --text-file story.txt --epochs 20

# Train from a ZIM file (Wikipedia/Kiwix offline archives)
neuralbit train --zim-path wikipedia.zim --max-articles 5000 --epochs 10

# Train with custom architecture
neuralbit train --embed-dim 128 --layers 4 --heads 4 --seq-len 64

# Minimal training example
neuralbit train --text "hello world " --epochs 3 --quiet

Train Options:

Option	Short	Description	Default
`--text-file`	`-f`	Path to text file for training
`--text`	`-t`	Direct text input
`--zim-path`	`-z`	Path to ZIM file (Wikipedia/Kiwix format)
`--max-articles`		Max articles to extract from ZIM	10000
`--device`	`-d`	Compute device: auto, cpu, gpu	auto
`--threads`		Number of CPU threads	CPU cores
`--batch-size`	`-b`	Batch size (GPU: 64+ recommended)	4 (auto-scaled to 64 for GPU)
`--embed-dim`		Embedding dimension	64
`--layers`		Number of transformer layers	2
`--heads`		Number of attention heads	2
`--seq-len`	`-s`	Sequence length	32
`--epochs`	`-e`	Number of epochs	10
`--lr`	`-l`	Learning rate	0.001
`--quiet`	`-q`	Suppress output

Generating Text

# Generate with default prompt (demo mode)
neuralbit generate

# Generate with custom prompt
neuralbit generate --prompt "Once upon a time" --max-tokens 100

# Control randomness with temperature
neuralbit generate -p "The quick" -t 0.7 --max-tokens 50

# Use top-k sampling
neuralbit generate -p "Hello" -k 10 -n 30

Generate Options:

Option	Short	Description	Default
`--prompt`	`-p`	Starting text for generation
`--max-tokens`	`-n`	Maximum tokens to generate	100
`--temperature`	`-t`	Sampling temperature (lower = deterministic)	1.0
`--top-k`	`-k`	Sample from top K tokens only	0 (disabled)

Architecture

neural-bit/
├── core/
│   └── FloatTensor.kt    # Float tensor for computations
├── layer/
│   ├── Layer.kt          # Layer interface
│   └── BitLinear.kt      # BitNet linear layer (ternary weights)
├── activation/
│   └── RMSNorm.kt        # RMSNorm, LayerNorm for transformers
├── transformer/          # BitNet LLM components
│   ├── Embedding.kt      # Token + positional embeddings
│   ├── Attention.kt      # Multi-head attention with BitLinear
│   ├── TransformerBlock.kt # Transformer block with FFN
│   ├── BitNetLM.kt       # Complete language model
│   ├── Tokenizer.kt      # Character and word tokenizers
│   └── TextDataset.kt    # Dataset and trainer for LLMs
├── data/
│   └── Dataset.kt        # Dataset interface
├── cli/
│   └── NeuralBitCLI.kt   # Command-line interface
└── NeuralBit.kt          # Main API

How BitNet Works

Weight Quantization

BitLinear uses absmean quantization to convert weights to ternary values:

W_ternary = RoundClip(W / γ, -1, 1)

where γ = mean(|W|)

This maps each weight to {-1, 0, +1}:

Values close to 0 become 0
Positive values become +1
Negative values become -1

Activation Quantization

Activations are quantized using absmax quantization:

x_quantized = Quant(x) × (127 / Q_b)

where Q_b = max(|x|)

Training

During training:

Full-precision weights are maintained as "latent" weights
Weights are quantized for the forward pass
Gradients flow through using Straight-Through Estimator (STE)
Full-precision weights are updated with the optimizer

References

License

MIT License - see LICENSE file for details.

README.md Unescape Escape