In 2005, Mathieu Fenniak launched pyPdf "as a PDF toolkit ..." focused on
Neither pyPdf nor PyPDF2 aims to be universal, that is, to provide all possible PDF-related functionality; here are descriptions of other PDF libraries, including Python-based ones. Note that the similar-appearing pyfpdf of Mariano Reingart is most comparable to ReportLab, in that both ReportLab and pyfpdf emphasize document generation. Interestingly enough, pyfpdf builds in a basic HTML->PDF converter; while PyPDF2 has no knowledge of HTML, HTML->PDF conversion is another interest of Phaseit's.
So what is PyPDF2 truly about? Think about popular [document] pdftk for a moment. PyPDF2 does what pdftk does, and it does so within your current Python process, and it handles a wider range of variant PDF formats [explain]. PyPDF2 has its own FAQ to answer other questions that have arisen.
[Compare pyPdf and PyPDF2. Spelling. Technical details: read; new functionality. Explain the other fork, and pyfpdf.]
Plans for PyPDF2 include:
[README, ...]
The Reddit /r/python crowd chatted obliquely and briefly about PyPDF2 in March 2012.
Phaseit supports a couple of online PyPDF2 tools, including a live Web application which diagnoses PDF instances. The FAQ for the latter explains more.
pyPdf's documentation applies; that is, PyPDF2 aims to be a strict superset of pyPdf, doing everything that pyPdf's documentation claims for pyPdf, but also a bit more.
What are the differences between the two libraries? There are several cases where pyPdf tosses an (undocumented) exception, and PyPDF2 either provides a more descriptive exception, or simply returns a correct result.
PyPDF2 also extends pyPdf's application programming interface in quite a few regards [strict/non-strict] [none of which we've yet documented ...]. However, in July 2012, Mike Driscoll helpfully explained parts of what we're doing.
In 2011, when Phaseit committed to long-term support of PyPDF2 as an open-source project, GitHub seemed the most advantageous host because of ...
You can retrieve PyPDF2 sources even without GitHub familiarity or
a GitHub account. If you have a command-line git client, you
can simply write git clone git://github.com/knowah/PyPDF2.git,
and copies of all the PyPDF2 sources will show up in the
PyPDF2 directory (folder) of your current
working directory (folder). On Debian-derived hosts, installation
of the client should be as easy as
sudo apt-get install git-core. [Explain wget,
Win* clients, ...]
We welcome questions or comment about PyPDF2 through e-mail to PyPDF2@Phaseit.net. Announcements about PyPDF2 occasionally appear by way of Twitter.
Behind the scenes, one of PyPDF2's strengths is its extensive collection of test scripts (initiated by Mathieu with pyPdf). These are "behind the scenes" because many of the test instances are PDF files supplied by correspondents. If PyPDF2 fails you in some way, please understand that you can send us all artifacts--files, scripts, and so on--necessary to reproduce the symptom, with confidence that we'll maintain your confidentiality. That's how we are.