Back to all projects

Generative Pretrained Transformer

Mihai-Cristian Farcaș / 11 February 2025

GPT-like model implementation built from scratch using PyTorch.

Generative Pretrained Transformer
Image source

My PyTorch implementation of a GPT-like language model with text preprocessing utilities

Overview

This project implements a transformer-based language model similar to GPT, designed for character-level text generation. It includes utilities for vocabulary generation and dataset splitting.

In this example, I trained and tested it on the fabulous book The Brothers Karamazov, downloaded from Project Gutenberg. Feel free to change the text file or even try training it on an consacrated dataset (like OpenWebText, for example), though on larger datasets the vocab.py and split.py might not work properly.

Features

  • Character-level language modeling
  • Multi-head self-attention mechanism
  • Memory-efficient data loading using memory mapping
  • Text preprocessing utilities
  • Configurable model architecture

Requirements

  • Python 3.9+
  • PyTorch
  • Jupyter Notebooks
  • CUDA (optional, for GPU acceleration, on Windows)

Project Structure

  • vocab.py - Generates vocabulary from input text
  • split.py - Splits text data into training and validation sets
  • GPT.ipynb - Main model implementation and training

Usage

1. Initialization (details on my Github)

2. Prepare Your Data

  • First, add your desired data file and generate the vocabulary from your text:
python3 vocab.py
  • Then, split your data into training and validation sets:
python3 split.py

3. Train the Model

  • Install a new kernel to use in your Jupyter Notebook:
python3 -m ipykernel install --user --name=venv --display-name "GPTKernel"
  • Run Jupyter Notebook:
jupyter notebook
  • Open GPT.ipynb.

  • Select GPTKernel and run the cells sequentially.

The notebook contains

  • Model architecture implementation
  • Training loop
  • Text generation functionality

Model Architecture

The model implements a transformer architecture with:

  • Multi-head self-attention
  • Position embeddings
  • Layer normalization
  • Feed-forward networks

You can find the source code on my Github.

Thanks for reading! Cheers! 🍻

P.S. If you enjoyed this post, consider giving me some feedback and subscribing to my newsletter for more insights and updates. You can do so from my contact page. 🚀