Training a 2 bit SDR transformer , SDR learns the mapping of 'Internet to ’ ‘Usage’
Prompt: [The internet is]
Generated: [ available to millions by making such strong connections. The network encompasses
the function]
Step 122550 | LR: 0.000045 | Loss: 4.1755 | PPL: 65.1 | Grad Norm: 1.408 | SDR Active Bits: 25.8 | Attn Entropy: 2.179
Step 122600 | LR: 0.000045 | Loss: 4.9385 | PPL: 139.6 | Grad Norm: 0.990 | SDR Active Bits: 26.9 | Attn Entropy: 2.148
Step 122650 | LR: 0.000045 | Loss: 3.9783 | PPL: 53.4 | Grad Norm: 1.298 | SDR Active Bits: 23.7 | Attn Entropy: 2.450