Home page for the PyPDF2 project

Download Latest PyPDF2 from PyPI

PyPDF2's origin

In 2005, Mathieu Fenniak launched pyPdf "as a PDF toolkit ..." focused on

document manipulation: by-page splitting, concatenation, and merging;
document introspection;
page cropping; and
document encryption and decryption.

At the end of 2011, after consultation with Mathieu and others, Phaseit sponsored PyPDF2 as a fork of pyPdf on GitHub. The initial impetus was to handle a wider range of input PDF instances; Phaseit's commercial work often encounters PDF instances "in the wild" that it needs to manage (mostly concatenate and paginate), but that deviate so much from PDF standards that pyPdf can't read them. PyPDF2 reads a considerably wider range of real-world PDF instances.

Neither pyPdf nor PyPDF2 aims to be universal, that is, to provide all possible PDF-related functionality; here are descriptions of other PDF libraries, including Python-based ones. Note that the similar-appearing pyfpdf of Mariano Reingart is most comparable to ReportLab, in that both ReportLab and pyfpdf emphasize document generation. Interestingly enough, pyfpdf builds in a basic HTML->PDF converter; while PyPDF2 has no knowledge of HTML, HTML->PDF conversion is another interest of Phaseit's.

So what is PyPDF2 truly about? Think about popular [document] pdftk for a moment. PyPDF2 does what pdftk does, and it does so within your current Python process, and it handles a wider range of variant PDF formats [explain]. PyPDF2 has its own FAQ to answer other questions that have arisen.

[Compare pyPdf and PyPDF2. Spelling. Technical details: read; new functionality. Explain the other fork, and pyfpdf.]

Plans for PyPDF2 include:

[much] more complete documentation--probably just a list of the entry points which PyPDF2 adds to pyPdf;
[explain status of port to Python3;]
PyPackaging [emphasize use with Zope/Plone [more references]];
[licensing [update]];
[re-do donation button for JS-less contexts]
[clean-up HTML5]
[... other ...].

Current plans for PyPDF2 do not include the ability to:

merge annotations [but explain we're thinking about it seriously];
...

Let us know if you think these targets should change.

[README, ...]

The Reddit /r/python crowd chatted obliquely and briefly about PyPDF2 in March 2012.

Online demonstrations of PyPDF2

Phaseit supports a couple of online PyPDF2 tools, including a live Web application which diagnoses PDF instances. The FAQ for the latter explains more.

Documentation

pyPdf's documentation applies; that is, PyPDF2 aims to be a strict superset of pyPdf, doing everything that pyPdf's documentation claims for pyPdf, but also a bit more.

What are the differences between the two libraries? There are several cases where pyPdf tosses an (undocumented) exception, and PyPDF2 either provides a more descriptive exception, or simply returns a correct result.

PyPDF2 also extends pyPdf's application programming interface in quite a few regards [strict/non-strict] [none of which we've yet documented ...]. However, in July 2012, Mike Driscoll helpfully explained parts of what we're doing.

Remarks on GitHub

In 2011, when Phaseit committed to long-term support of PyPDF2 as an open-source project, GitHub seemed the most advantageous host because of ...

You can retrieve PyPDF2 sources even without GitHub familiarity or a GitHub account. If you have a command-line git client, you can simply write git clone git://github.com/knowah/PyPDF2.git, and copies of all the PyPDF2 sources will show up in the PyPDF2 directory (folder) of your current working directory (folder). On Debian-derived hosts, installation of the client should be as easy as sudo apt-get install git-core. [Explain wget, Win* clients, ...]

Contact the PyPDF2 maintainers

We welcome questions or comment about PyPDF2 through e-mail to PyPDF2@Phaseit.net. Announcements about PyPDF2 occasionally appear by way of Twitter.

Behind the scenes, one of PyPDF2's strengths is its extensive collection of test scripts (initiated by Mathieu with pyPdf). These are "behind the scenes" because many of the test instances are PDF files supplied by correspondents. If PyPDF2 fails you in some way, please understand that you can send us all artifacts--files, scripts, and so on--necessary to reproduce the symptom, with confidence that we'll maintain your confidentiality. That's how we are.