Monday, March 14, 2011

Embed fonts in PDF file using PDFLaTex

Download: Fast, Fun, Awesome

This post explains how to embed fonts in PDF file.
Embedding the font in the PDF file is useful when you are preparing a paper for conference submission or you want to ensure that your PDF file looks exactly same on other's machine as it does on your computer. 
In this post I will explain how to do it on Linux machine.  I am not sure how to achieve the same on Windows computer. 
We will use tool "pdffonts" to examine PDF file. 

$ pdffonts  mypaper.pdf
name                                                              type                emb  sub uni object ID
------------------------------------ ----------------- --- --- --- ---------
HVGYIY+NimbusRomNo9L-Medi             Type 1            yes   yes no     110  0
TFVQMQ+NimbusRomNo9L-Regu            Type 1            yes   yes no     111  0
XHGNKU+NimbusRomNo9L-MediItal       Type 1            yes   yes no     113  0
UUGCZC+NimbusRomNo9L-ReguItal        Type 1            yes   yes no     114  0
FDULPW+CMSY7                                        Type 1            yes   yes no     148  0
SPCNWZ+NimbusMonL-Regu                     Type 1            yes   yes no     150  0
ABCDEE+Times                                           TrueType        yes   yes no     152  0
Arial                                                              TrueType          no   no  no     153  0
Arial                                                          CID TrueType      yes  no  yes    154  0
Arial                                                               TrueType          no  no  no     220  0
Arial                                                          CID TrueType      yes  no  yes    221  0
ABCDEE+Times                                         TrueType          yes  yes no     222  0
Arial,Italic                                                     TrueType          no   no  no     223  0
ZLLMAJ+CMMI10                                       Type 1            yes  yes no     257  0
Arial                                                              TrueType          no   no  no     259  0
ABCDEE+Calibri                                        TrueType          yes  yes no     260  0
Arial,Italic                                                     TrueType          no   no  no     261  0
Arial                                                             TrueType          no   no  no     282  0
Arial,Italic                                                    TrueType          no   no  no     283  0

$

The important columns are name and emb.  The "name" column displays the name of the font and the "emb" column shows whether that font is embedded in your PDF file or not. "yes" is "emb" column indicates that the font is embedded in the PDF file and "no" indicates that the font is not embedded in the PDF file.  
For example, in the above oputput, Arial, and Arial,Italic fonts are not embedded in the PDF file.

To embed the un-embedded fonts into your PDF file using PDFLaTex:
$  updmap --edit 
The above command will open the configuration file for pdflatex.
Find the pdftexDownloadBase14 directive and make sure it is true. That is, when you're done, the following line should be in the file:
pdftexDownloadBase14 true

Save the file and rebuild your PDF file using "pdflatex". 
Then check your PDF file using "pdffonts" command. It should now have embedded all the fonts use in your PDF file. 
If there are still some fonts missing then it might be because your have embedded another pdf file (as a graphics) into your "mypaper.pdf" file. 
In that case, you need to embedded the fonts into those embedded PDF files as well. 

If you included figures in your PDF file then follow the steps given below:
1.  Convert your PDF file to PS file
  $ pdftops  mypaper.pdf

2. Convert back ps file to pdf using "prepress" settings
  $ ps2pdf14 -dPDFSETTINGS=/prepress mypaper.ps


Conversion from PDF to PS and again back from PS to PDF my cause some formatting errors. I recommend you to double check your PDF file for formatting errors. 


3. Check PDF fonts using pdffonts command
  $ pdffonts mypaper.pdf

Friday, March 11, 2011

LibXML Tutorial

Download: Fast, Fun, Awesome

In this blog post I will show some basic function of libxml, which is a freely licensed C language XML library.
This post gives an idea to beginners how to manipulate xml files using libxml library function. This post does not cover all XML API available in libxml, but it just gives an idea how to use libxml API's with the help of some basic functions.

For detailed XML API list please visit official website of libxml.

To Parse XML file:
xmlDocPtr doc;  // pointer to parse xml Document
  
  // Parse XML file
  doc = xmlParseFile(xmlFileName);

  // Check to see that the document was successfully parsed.
  if (doc == NULL ) {
    fprintf(stderr,"Error!. Document is not parsed successfully. \n");
    return;
  }


To Get the root Document:

// Retrieve the document's root element.
  cur = xmlDocGetRootElement(doc);

  // Check to make sure the document actually contains something
  if (cur == NULL) {
    fprintf(stderr,"Document is Empty\n");
    xmlFreeDoc(doc);
    return;
  }


To Get the child Nodes of the current node element:

  cur = cur->xmlChildrenNode;


To Search for an attribute:

// search for "hash" attribute in the node pointed by cur
 attr = xmlHasProp(cur, (const xmlChar*)"hash");


To add new Attribute:

/*
 * New Attribute "hash" is added to element node pointed by cur,
*  and default value of the attribute is set to "12345678"
 */
 attr = xmlNewProp(cur, (const xmlChar*)"hash", (const xmlChar*)"12345678");


To Save XML document to Disk:

xmlSaveFormatFile (xmlFileName, doc, 1);



Complete Example is given below:
Suppose data.xml file is as follows:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE root SYSTEM "secPolicy2.dtd">
<root>
  <url>
    <host hash="12345678">www.example1.com</host>
    <sctxid>2</sctxid>
  </url>
  <url>
    <host>www.example2.com</host>
    <sctxid>2</sctxid>
  </url>
    <url>
    <host>www.example3.com</host>
    <sctxid>3</sctxid>
  </url>

</root>

Following program reads the above xml file supplied as command line argument.
It adds "hash" attribute with default value set to "12345678" if its not present in the "host" element node.

/*
 * Filename = xmlexample.c
*/
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <libxml/xmlmemory.h>
#include <libxml/parser.h>

/*
 * Parse URL Element Node in XML file
 * <url>
 *    <host hash="hash_val_of_hostname">www.example.com</host>
 *    <sctxid>Integer</sctxid>
 * </url>
 */
void parseURL (xmlDocPtr doc, xmlNodePtr cur) {
  xmlChar *key;
  xmlAttrPtr attr;

  // Get the childern Element Node of "url" node
  cur = cur->xmlChildrenNode;

  while (cur != NULL) {
    // check for "host" childern element node of "url" node
    if ((!xmlStrcmp(cur->name, (const xmlChar *)"host"))) {
      key = xmlNodeListGetString(doc, cur->xmlChildrenNode, 1);
      fprintf(stderr,"host: %s\n", key);
      xmlFree(key);
  
      // search for "hash" attribute in the "host" node
      attr = xmlHasProp(cur, (const xmlChar*)"hash");
     
      // if attr is not found then set it
      if(attr == NULL){
    /*
     * Add the Attribute and value of the attribute
     */
    attr = xmlNewProp(cur, (const xmlChar*)"hash", (const xmlChar*)"12345678");
   
    /* Attribute is now set and has value.
     * Just retrieve the value and display it
     */
    key = xmlGetProp(cur, (const xmlChar*)"hash");
    fprintf(stderr,"hash: %s\n", key);
    xmlFree(key);   

      }else{
    /* Attribute is available
     * Just retrieve the value and display it
     */
    key = xmlGetProp(cur, (const xmlChar*)"hash");
    fprintf(stderr, "hash: %s\n", key);
    xmlFree(key);     
      }
          
    } // end of IF loop " host"
     
    // check for "sctxid" childern element node of "url" node
    if ((!xmlStrcmp(cur->name, (const xmlChar *)"sctxid"))) {
      key = xmlNodeListGetString(doc, cur->xmlChildrenNode, 1);
      fprintf(stderr,"sctxid: %s\n", key);
      xmlFree(key);
    } // end of If loop "sctxid"
 
      cur = cur->next;
  } // end of While loop

  return;

} // end of parseURL function()

/*
 * Parsing the XML file and Reading the Element Nodes
 */
static void parseDoc(char *xmlFileName) {
  xmlDocPtr doc;  // pointer to parse xml Document
  xmlNodePtr cur; // node pointer. It interacts with individual node

  // Parse XML file
  doc = xmlParseFile(xmlFileName);

  // Check to see that the document was successfully parsed.
  if (doc == NULL ) {
    fprintf(stderr,"Error!. Document is not parsed successfully. \n");
    return;
  }

  // Retrieve the document's root element.
  cur = xmlDocGetRootElement(doc);

  // Check to make sure the document actually contains something
  if (cur == NULL) {
    fprintf(stderr,"Document is Empty\n");
    xmlFreeDoc(doc);
    return;
  }

  /* We need to make sure the document is the right type.
   * "root" is the root type of the documents used in user Config XML file
   */
  if (xmlStrcmp(cur->name, (const xmlChar *) "root")) {
    fprintf(stderr,"Document is of the wrong type, root node != root");
    xmlFreeDoc(doc);
    return;
  }

  /* Get the first child node of cur.
   * At this point, cur points at the document root,
   * which is the element "root"
   */
  cur = cur->xmlChildrenNode;

  // This loop iterates through the elements that are children of "root"
  while (cur != NULL) {
    if ((!xmlStrcmp(cur->name, (const xmlChar *)"url"))){
      parseURL (doc, cur);
    }
    cur = cur->next;
  }

  /* Save XML document to the Disk
   * Otherwise, you changes will not be reflected to the file.
   * Currently it's only in the memory
   */
  xmlSaveFormatFile (xmlFileName, doc, 1);

  /*free the document */
  xmlFreeDoc(doc);

  /*
   * Free the global variables that may
   * have been allocated by the parser.
   */
    xmlCleanupParser();

  return;

} // end of XMLParseDoc function


int main(int argc, char **argv) {
  char *xmlFileName;

  if (argc <= 1) {
    printf("Usage: %s inputfile.xml\n", argv[0]);
    return(0);
  }

  // Get the file name from the argv[1]
  xmlFileName = argv[1];

  // Custom function to parse XML file
  parseDoc (xmlFileName);

  return (1);
}


To compile the above program use following command:
$ gcc `xml2-config --cflags --libs` -o xmlexample xmlexample.c

To run the program, use following command:
$ ./xmlexample data.xml