Standard RoBERTa models excel at context but often lack explicit knowledge of language rules. Introduce how the World Atlas of Language Structures (WALS)

Could we train RoBERTa to output zip-compatible representations of WALS features? That would be a form of neural compression, a variational autoencoder for typology. The phrase "136zip best" might then refer to the optimal compression rate—the point where information loss is minimized while model size is reduced.