mmBERT PII Detector (Merged for Rust)
This is a merged mmBERT model for PII (Personally Identifiable Information) detection, optimized for Rust inference using the candle framework.
Model Details
- Base Model: jhu-clsp/mmBERT-base
- Task: Token classification for PII detection
- Languages: Multilingual (1800+ languages)
- Training: LoRA fine-tuned then merged with base model
- Inference: Optimized for Rust candle-binding
PII Types Detected
- Names (PERSON)
- Email addresses (EMAIL)
- Phone numbers (PHONE)
- Addresses (ADDRESS)
- Credit card numbers (CREDIT_CARD)
- Social Security Numbers (SSN)
- Dates (DATE)
- And more...
Usage with Rust (candle-binding)
use candle_semantic_router::model_architectures::traditional::TraditionalModernBertTokenClassifier;
let classifier = TraditionalModernBertTokenClassifier::new("path/to/model", true)?;
let results = classifier.classify_tokens("Contact [email protected] or call 555-1234")?;
for (token, label, confidence) in results {
if label != "O" {
println!("PII: {} -> {} ({:.1}%)", token, label, confidence * 100.0);
}
}
Training Configuration
- LoRA Rank: 32
- LoRA Alpha: 64
- Epochs: 10
- Batch Size: 64
- Learning Rate: 2e-5
License
Apache 2.0
Maintenance
- Repo README updated by automation (timestamp: 2026-01-30 UTC).
- Downloads last month
- 1,088
Model tree for llm-semantic-router/mmbert-pii-detector-merged
Base model
jhu-clsp/mmBERT-base