mmBERT PII Detector (Merged for Rust)

This is a merged mmBERT model for PII (Personally Identifiable Information) detection, optimized for Rust inference using the candle framework.

Model Details

  • Base Model: jhu-clsp/mmBERT-base
  • Task: Token classification for PII detection
  • Languages: Multilingual (1800+ languages)
  • Training: LoRA fine-tuned then merged with base model
  • Inference: Optimized for Rust candle-binding

PII Types Detected

  • Names (PERSON)
  • Email addresses (EMAIL)
  • Phone numbers (PHONE)
  • Addresses (ADDRESS)
  • Credit card numbers (CREDIT_CARD)
  • Social Security Numbers (SSN)
  • Dates (DATE)
  • And more...

Usage with Rust (candle-binding)

use candle_semantic_router::model_architectures::traditional::TraditionalModernBertTokenClassifier;

let classifier = TraditionalModernBertTokenClassifier::new("path/to/model", true)?;
let results = classifier.classify_tokens("Contact [email protected] or call 555-1234")?;
for (token, label, confidence) in results {
    if label != "O" {
        println!("PII: {} -> {} ({:.1}%)", token, label, confidence * 100.0);
    }
}

Training Configuration

  • LoRA Rank: 32
  • LoRA Alpha: 64
  • Epochs: 10
  • Batch Size: 64
  • Learning Rate: 2e-5

License

Apache 2.0


Maintenance

  • Repo README updated by automation (timestamp: 2026-01-30 UTC).
Downloads last month
1,088
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for llm-semantic-router/mmbert-pii-detector-merged

Finetuned
(57)
this model

Space using llm-semantic-router/mmbert-pii-detector-merged 1