Wednesday, September 15, 2010

Finally! Apple Embraces A Standard for Metadata in PDFs

I draw your attention to the header file CGPDFContext.h in the iOS 4 SDK:

void CGPDFContextAddDocumentMetadata
(CGContextRef context, CFDataRef metadata) CG_AVAILABLE_STARTING(__MAC_10_7, __IPHONE_4_0);

The iOS 4 SDK has been out for several months but I hadn't noticed this change until today. Not that I have any use for it today on iOS, it's on the Mac where it is crucially needed.

Starting with Mac OS X 10.7, developers will be able to embed arbitrary XML data in the PDFs they can generate with Apple's APIs. [Update: I guess there is no actual requirement of XML based on just the API. I'd recommend standardizing on XML though.] They have decided to use the method advocated by Adobe in which a metadata stream object with a compressed XML payload is inserted into a PDF (Apple does the compression for you, you just have to provide a block of XML data). This is a good way of doing it, but more importantly, it is a simple and standard way of doing it. I would have preferred having an additional vendor tag where I could mark the metadata with a "com.genhelp.mydrawingapp" identifier, but that is not crucial; I can get that data from the XML.

Why is this Important?

As I've explained several times before, on the Mac we used to have this feature which I will call Round Trip Editing, wherein a user could make a drawing in one application, copy and paste the drawing into another application, and then later copy and paste back into the original application, and still be able to edit the drawing. Generations of Mac users relied on this feature to go from applications like ChemDraw into PowerPoint and back again.

This feature has been lost as applications transition from using the archaic PICT clipboard flavor to the modern and beautiful PDF clipboard flavor. There was no direct way to embed large data in Apple generated PDFs, and thus developers were left on their own to munge the format if they dared. And, with no standard or expectation of data embedding, applications did not bother to preserve the original PDF resulting in data loss.

Getting round trip editing working again has required 3 steps. Apple has had to provide an API for data embedding. Content generating applications have to be modified to use that API. Office applications have to be modified to return original PDFs to the clipboard when selecting a single image. With Mac OS X 10.7, step one will be here; about 4 OS versions tardy.

BTW Here is how to call it:

CFDataRef MakeAPDF(CFDataRef someXML)

CGRect mediaRect = CGRectMake(0, 0, 400, 600);
// use your own rect instead

CFMutableDataRef result = CFDataCreateMutable(kCFAllocatorDefault, 0);
CGDataConsumerRef PDFDataConsumer = CGDataConsumerCreateWithCFData(result);

// mark the PDF as coming from your program
CFMutableDictionaryRef auxInfo = CFDictionaryCreateMutable(kCFAllocatorDefault, 1, NULL, NULL);
CFDictionaryAddValue(auxInfo, kCGPDFContextCreator, CFSTR("Your Programs Name"));
CFDictionaryRef auxillaryInformation = CFDictionaryCreateCopy(kCFAllocatorDefault, auxInfo);

// create a context to draw into
CGContextRef graphicContext = CGPDFContextCreate(PDFDataConsumer, &mediaRect, auxillaryInformation);

// actually make the call to embed your XML
CGPDFContextAddDocumentMetadata(graphicContext, metaData);

CGContextBeginPage(graphicContext, &mediaRect);
// do your drawing, like this grey rectangle
CGContextSetGrayFillColor(graphicContext, 0.5, 0.5);
CGContextAddRect(graphicContext, mediaRect);
// end your drawing

return result;


And here's how to get the data back. Check that the XML is yours instead of some other program's.
CFDataRef ExtractMetaDataFromPDFData(CFDataRef pdf)

CFDataRef result = 0;

const UInt8 * pdfData = CFDataGetBytePtr(pdf);
CFIndex pdfDataLength = CFDataGetLength(pdf);
CGDataProviderRef dataProvider = CGDataProviderCreateWithData(kCFAllocatorDefault, pdfData, pdfDataLength, NULL);
CGPDFDocumentRef pdfDocument = CGPDFDocumentCreateWithProvider(dataProvider);

CGPDFDictionaryRef docDict = CGPDFDocumentGetCatalog(pdfDocument);
CGPDFStreamRef metastream = 0;
if(CGPDFDictionaryGetStream(docDict,"Metadata", &metastream))
CGPDFDataFormat format = CGPDFDataFormatRaw;
CFDataRef streamData = CGPDFStreamCopyData(metastream, &format);
if(format == CGPDFDataFormatRaw)
result = streamData;

return result; // check to see if this is your XML
//remember to release result when done