🎯 Pico: Tiny Language Models for Learning Dynamics Research

Pico is a framework for training and analyzing small language models, designed with clarity and educational purposes in mind. Built on a LLAMA-style architecture, Pico makes it easy to experiment with and understand transformer-based language models.

🔑 Key Features

Simple Architecture: Clean, modular implementation of core transformer components
Educational Focus: Well-documented code with clear references to academic papers
Research Ready: Built-in tools for analyzing model learning dynamics
Efficient Training: Pre-tokenized dataset and optimized training loop
Modern Stack: Built with PyTorch Lightning, Wandb, and HuggingFace integrations

🏗️ Core Components

RMSNorm for stable layer normalization
Rotary Positional Embeddings (RoPE) for position encoding
Multi-head attention with KV-cache support
SwiGLU activation function
Residual connections throughout

📚 References

Our implementation draws inspiration from and builds upon:

🤝 Contributing

We welcome contributions! Whether it's:

Adding new features
Improving documentation
Fixing bugs
Sharing experimental results

📝 License

Apache 2.0 License

📫 Contact

GitHub: rdiehlmartinez/pico
Author: Richard Diehl Martinez