This paper describes a bus mastering implementation of the PCI Express protocol using a Xilinx FPGA. While the theoretical peak performance of PCI Express is quite high, attaining that performance is a complex endeavor on top of an already complex protocol. The implementation is described and its performance is analyzed. Source code is offered for free download via the web.