This document describes how digital signatures are represented in a PDF document and what signature-related features the PDF language supports. Adobe® Reader® and Acrobat® have implemented all of PDF’s features and therefore provide comprehensive support for the authentication of digital data based on public key infrastructure (PKI) technologies. In OpenCertSign an API is available retrieving information upon the signatures in a pdf document such as revocation data, validity, number of revisions, document scope where signature applies on and much more.
Digital signatures can be used for many types of documents where traditional pen-and-ink signatures were used in the past. However, the mere existence of a digital signature is not adequate assurance that a document is what it appears to be. Moreover, government and enterprise settings often need to impose additional constraints on their signature workflows, such as restricting user choices and document behavior during and after signing.
For these reasons, the PDF language provides mechanisms for two broad categories of tasks:
- Fully trusting an electronic document by enabling verification that the signed document has not been altered and that it was signed by someone the recipient trusts.
- Creating and controlling feature-rich and secure digital signature workflows.
Representing a signature in a PDF file
In a PDF, signature information is contained in a signature dictionary. Objects in the dictionary are defined by the PDF Reference. The signature dictionary can reference, or be referenced by, other dictionaries, and it usually is. The entries in these dictionaries determine the nature and features of the signature, and by extension, what data can be available to any PDF viewer designed to process the signature data.
At a high level, these features can be grouped into these categories:
- Adding a digital signature to a document.
- Checking that signature for validity.
- Permissions and restrictions that control the signature workflow.
Naturally, PDF includes features which are related to these activities but are not essential to them. For example, support for adding signing reasons is tangential to signing, but valuable for many workflows.
Public key infrastructure
PDF’s digital signature capabilities are designed for compatibility with all the standards associated with mainstream public key infrastructures (PKI) deployed in enterprise and government settings. A PKI is the set of people, policies, procedures, hardware, and software used in creating, distributing, managing, and revoking, and using the digital IDs that contain the public/private key pairs used when signing a PDF.
In the context of PDF signature workflows, “PKI” generally refers to the digital ID issuers, users, administrators, and any hardware or software used in those workflows. PDF viewers that implement and conform to the PDF language specification are able to interact with all of these components in a seamless and robust way.
When signing an important paper document, a person usually signs it in front of a notary public or other trusted authority after providing them satisfactory evidence of their identity. Because the notary is deemed trustworthy, you can trust the signature the notary witnesses. Using a PKI is a method of providing a similar kind of trust.
Some common PKI components directly related to providing trust include:
- Certificate authority (CA): An ultimate trust authority that sells or issues digital IDs (such as Verisign or Geotrust). The CA signs it’s own certificate (self-signs) and its certificate is typically the “root” certificate at the top of the certificate chain.
- Intermediate certificates (ICAs): A type of CA whose certificate resides in the certificate chain between the end entity and root certificates. The certificate is not self-signed, and the ICA often provides services such as policies, timestamping, revocation lists, etc.
- End entity certificate (EE): The signer’s certificate and the last element of a signing chain. By definition, an end entity certificate does not contain the basic constraint value CA.
- Digital ID: An electronic representation of data based on the ITU-T X.509 v3 standard, associated with a person or entity. It is stored in a password-protected file on a computer or network, a USB token, a smart card, etc. A digital ID contains a public key certificate, a private key, and other data.
- Public key certificate: A file that contains the numeric public key portion of a public/private key pair along with the associated extensions and attributes used to define the certificates owner, validity period, and usage.
- Private key: The secret key in a PKI system, used to validate incoming messages and sign outgoing ones. A Private Key is always paired with its Public Key during those key generations.
While the digital ID and its issuing entities are central to any PKI, the PKI also includes many other enterprise-owned and 3rd party items. A PKI administrator will usually manage the creation and distribution of digital IDs, LDAP servers, timestamp servers, revocation lists, and other items. The PDF language supports all the data needed to interface with those components.
PKI, PDF, and signing
PDF includes support for signatures to be embedded in the document itself, rather than managed as separate data or added on to an existing document format. This means that the viewing application can perform certain types of modification without invalidating the signature. With other digital signature formats, the user may need either two applications to handle both the document and the signature, or would need to manage two separate files for each signed document.
Each digital signature in a PDF document is associated with a signature handler. The signature is placed in a PDF signature dictionary which contains the name of the signature handler which will be used to process that signature. The signature handler built into Adobe Acrobat leverages Public/Private Key (PPK) cryptography technologies. PPK is based on the idea that a value encrypted with a private key can only be decrypted using the public key (the reverse may also be true when encrypting documents for specific recipients, but that is outside the scope of this document). More information upon PKI (Public-key Infrastructure).
When a PDF is signed, the signer’s certificate is embedded in the PDF file. The signature value may also include additional information such as a signature graphic, a time stamp, and other data that may be specific to the user, system, or application.
PDF Signing process (high-level view)
The signing process is as follows:
- A document to be signed is turned into a stream of bytes.
- The entire PDF file is cached (on disk or in memory) with a suitably-sized space left for the signature value as well a with worst-case values in the ByteRange array. ByteRange is an array of four numbers. The first number in each pair is the offset in the file (from the beginning, starting from 0) of the beginning of a stream of bytes to be included in the hash. The second number is the length of that stream. The two pairs define two sequences of bytes that define what is to be hashed. The actual signature value is stored in the /Contents key between the end of the first sequence and the beginning of the second one. In Figure 4, the hash is calculated for bytes 0 through 839, and 960 through 1200.
- Once the location of the signature value is known in terms of offsets in the file, the ByteRange array is overwritten using the correct values. Because the byte offsets must not change, extra bytes following the new array statement are overwritten with zeros.
- The hash of the entire file is computed, using the bytes specified by the real ByteRange value using a hash algorithm such as SHA-512 (this will be configurable in OCS in a future release). Acrobat always computes the hash for a document signature over the entire PDF file, starting from byte 0 and ending with the last byte in the physical file, but excluding the signature value bytes.
- The hash value is encrypted with the signer’s private key and a hex-encoded PKCS#7 object signature object is generated.
- The signature object is cached(in memory or placed in the file on disk), overwriting the placeholder/Contents value. Any space not used for the signature object is overwritten with zeros.
- The PDF file is re-loaded in to ensure that the in-memory and on-disk versions are identical.
- The hash is signed using the private key from the signer of the document.
- The signed hash is integrated in the cached hex-encoded PKCS#7 document by overwriting the prepared space in the placeholder /Contents value.
- The document is stored, contains a signature and can be validated via Adobe Reader.
PKI, PDF, and signature validation
Since private and public keys are merely numbers, anyone can generate a public and private key pair using any number of tools. Applications like Acrobat provide a mechanism to generate a self-signed certificate which binds a simple user-provided identity to a public key generated by the application; it is then signed using the corresponding private key. Obviously, there is nothing to prevent someone from generating a self-signed certificate with someone else’s name. Hence, an unknown self-signed certificate does not have a high level of assurance.
To solve this type of trust problem, organizations use a PKI that includes an independent authority that issues, records, and tracks digital IDs. Because PDF supports embedding the signer’s public key as part of the signature, document recipient always have it for signature validation. To validate a signature, the validator simply retrieves the signer’s certificate and compares it to their own list of trusted certificates:
- The recipient’s application generates a one-way hash of the document using the same algorithm the signer used, excluding the signature value.
- The encrypted hash value in the document is decrypted using the signer’s public key.
- The decrypted hash value is compared to the locally generated hash value.
- If they are identical, the signature is reported as known.
Whether or not the signature is trusted or valid are separate issues. Signature trust depends on the recipient’s application configuration. Signature status also depends on a document integrity check.
PDF language signature features
PDF is itself an open ISO standard. Digital signature support in PDF is fully described in ISO 32000, and Adobe provides tools for interacting with PDF and the Acrobat family of products’ APIs in its open SDK.
Support for alternate signature methodologies
The majority of signatures are purely mathematical, such as the public/private-key encrypted document digest. However, they may also be a biometric form of identification, such as a handwritten signature, fingerprint, or retinal scan. Signature handler process the data and controls the form of authentication according to the rules defined in the PDF ISO standard.
Support for two signature types
PDF defines two types of signatures: approval and certification. Both types are byte range signatures over all file contents. Both take a visual snapshot of the document at the time it was signed and thus
provide a high level of document integrity.
The differences are as follows:
- Approval: There can be any number of approval signatures in a document.
- Certification: There can be only one certification signature and it must be the first one in a document.
PDF is designed to allow interoperability between signature handlers and conforming readers; that is, a PDF signed with handler ABC should be able to be validated with handler XYZ from a different vendor.
When present, the SubFilter entry in the signature dictionary specifies the encoding of the signature value and key information, while the Filter entry specifies the preferred handler that should be used to validate the signature. There are several defined values for the SubFilter entry, all based on public-key cryptographic standards published by RSA Security and also as part of the standards issued by the Internet Engineering Task Force (IETF) Public Key Infrastructure (PKIX) working group.
Some documents may require more than one signature. When using old-fashioned materialized document, signing with wet ink signatures it is as easy as 'drawing another line' on the paper. In the paper world, a person signing a document would be wise to save a copy of the document as it was signed. Then if another person changes the document, the signer can easily argue that the document had been altered.
However, with PDF, any attempt to alter the document by modifying the file (such as signing it again) will invalidate the existing digital signature. This is so because the hash value calculated at verification time will not match the encrypted hash created at signing time. PDF solves this problem by supporting the ability to do incremental updates (see Incremental updates).
As long as additional signatures are not prevented by other permissions restrictions, a signer can just add another signature field to the document and sign it without
invalidating the earlier signature.
The PDF file format defines an incremental update capability. Incremental updates are transparent to the person viewing the document, but allow for the detection and audit of modifications to the file. This feature of the PDF language generally, and of signed PDF files specifically, allows any PDF file to be modified by adding the modification information to the end of the file in an incremental update section.
No changes whatsoever are required to the bytes representing the earlier version of the file. This allows additional signatures to be added to a PDF file without modifying any data covered by an earlier signature.
Viewing previously signed document versions
The Incremental updates facility of the PDF language allows PDF viewers to effectively retain all signed revisions of any PDF file. This makes it possible for users to actually see the version of the PDF file that was signed. Acrobat takes full advantage of PDF’s ability to “remember” a document’s state at the time of signing by
providing two features:
- View Signed Version: Display the document as it existed at the time that the signature was applied by right-clicking on a signature and choosing View Signed Version. It can be mimicked manually by removing any bytes in the PDF file after the EOF corresponding to the signature.
- Compare Signed Version to Current Version: Compare a document’s current version with the signed version by right-clicking on a signature and choosing Compare Signed Version to Current Version.
Legal content attestations
In order to facilitate document trust, conforming writers of certification signatures such as Acrobat should also leverage PDF’s legal attestation dictionary. Dictionary entries specify all content that may result in unexpected rendering of the document contents. Additionally, authors may provide further clarification of such content by means of the Attestation entry. Reviewers should establish for themselves that they trust the author and document contents.
PDF enables feature rich certificate processing and handling because it certificate data is embedded in the signature. PDF viewers and signature handlers can be designed to use this data as needed. For example, when PKCS#7 signatures are used, the signature object can contain some or all of the following:
- Timestamp information
- Embedded revocation information
- Revocation checking details for both CRLs and OCSP
- Certificate polices and attribute certificates
Signature creation workflow