Monday, November 14, 2011

Email Address verification using Perl script

Checking correctness of one email address is easy and can be done manually, however, if you want to validate a bunch of email addresses then automated script plays a very handy role. 
I would like to thank my colleague and friend "Sai Sathyanarayam" for giving me this script. I think this might be useful for others therefore, I am posting it here. 

# file
#open "email.txt" file from current directory, 
# email.txt file contains email addresses separated by , (comma) and each address is on new line
while($line = <FILE> ) {
    if($line =~ /,/) { $line = $`; }else { print $line." is invalid\n";}
    if ($line =~ /^(\w|\-|\_|\.)+\@((\w|\-|\_)+\.)+[a-zA-Z]{2,}$/)
       print "$line is valid\n"; 
   else {
     print "$line is invalid\n";

Sample email.txt file is as follows:,,

To perform validation test run following command:
$ perl

Friday, August 26, 2011

JaegerMonkey Architecture

JaegerMonkey is a JavaScript engine used in Firefox 4.0 and later versions. The SpiderMonkey JavaScript engine was used by Firefox for version 3.0 or earlier. TraceMonkey is a tracing engine which is an improvement to SpiderMonkey. Trace Monkey was used in Firefox 3.5 and above versions.  Before we will look into architecture of JaegerMonkey, lets first have a glance at TraceMonkey JavaScript engine who is a predecessor of JaegerMonkey.

TraceMonkey Overview
TraceMonkey uses a trace monitor called jstracer. The jstracer monitors a script as interpreted by SpiderMonkey. Whenever jstracer sees a code that would benefit from the native compilation, it activates it recorder. The recorder records the execution of the IR and creates NanoJIT Low Level Intermediate Representation, which is then compiled into native code. NanoJIT produces optimized code. More information on TraceMonkey and its architecture diagram is available  here.

JaegerMonkey Architecture
JaegerMonkey used in Fireox 4.0 and above version is Just-in-Time (JIT) JavaScript execution engine.  JaegerMonkey JIT engine produces native code for JavaScripts. Usually JIT engines take an intermediate representation (IR) from a compiler and produce native (machine) code and execute it on the fly.  Therefore, JIT engines do not parse the code or check its syntax, or create intermediate representation (IR) of code.
Hence, JavaScript engine in Mozilla Firefox we divide into two parts: front-end and back-end. The front-end is responsible to parse the script, check its syntax and generate intermediate representation (IR) of script required for native code generation.  The back-end is responsible for generating native code and memory management.

In Mozilla Firefox front-end is SpiderMonkey which parses script syntax and generates an intermediate representation (IR) of the script. In SpiderMonkey intermediate representation of script is bytecode of the script.  This generated bytecode is then fed to JaegerMonkey JIT engine to be compiled into machine code. JaegerMonkey is a method-base JIT JavaScript engine which compiles script into non-optimized machine code.  JaegerMoneky uses Nitro (borrowed from the WebKit project) as its back-end assembler.  
Nitro does memory management and code generation in JaegerMonkey.

Nitro contains two parts assembler and memory unit. Assembler handles the code assembly and memory unit handles allocation and deallocation of memory for native code. The bulk of the bytecode to native code translation is performed in the mjit::compiler class and it can be found in js/src/methodjit/Compiler.cpp.  This compiler class translates SpiderMonkey bytecode instructions to their native code block equivalents using the AssemblerBuffer and LinkBuffer helper classes.

JaegerMonkey uses inline cache to improve the performance. Inline cache is used to perform faster object type lookups.  JavaScript supports dynamic typing during runtime. To support this feature, in SpiderMonkey JSOP_GETPROP bytecode is responsible to return the value of a specific property by looking up its type first. SpiderMonkey uses property cache which stores the Shape of existing objects.  Shape is a structure in SpiderMonkey that defines how the object can be accessed.

Inline Caching for good locality
When JIT compiles a property access bytecode, emitted machine code look like as follows:

type                     <- load addressof(object) + offsetof(JSObject, type)
shapeIsKnown    <- type equals IMPOSSIBLE_TYPE
None                   <- goto slowLookupCode if shapeIsKnown is False
property              <- load addressof(object) + IMPOSSIBLE_SLOT

JagerMonkey uses self modifying code to inline cache the Shape of the object. Self modifying code is a code that modifies code that currently exists in memory.  When first time JaegerMonkey performs a property access on object its shape is unknown therefore shapeIsKnow will be false.  Hence slowLookupCode will be executed.  After slowLookupCode resolves the property it fills the appropriate value for IMPOSSIBLE_TYPE and IMPOSSIBLE_SLOT.  Hence, next time when this piece of code is executed, if the type of object is not change then shapeIsKnown return true and there is no need to go into slowLookupCode.  This technique of modifying JIT-compiled code to reflect a probable value is called as inline caching: inline, as in "in the emitted code";  caching, as in "cache a probable value".

However, JavaScript supports dynamic typing. This is handles by polymorphic inline caching (PIC).  Lets consider an example of PIC code:

var vals = {1, "hello", [1, 2, 3]};
for (var i in vals) {

In above code vals array contains different data types such as a Number, a String and a array. For each object in the array, the interpreter has to perform an expressive type lookup and determine the correct toString method to call.  JaegerMonkey uses PIC slots to colve this problem, that is make a chain of cache entries. It creates several blocks of native code that perform property lookups for types the object has already been seen as. It the first type does not match, then a branch is taken to the next code block to perform a lookup. If type is match then it performs a fast slot lookup.  According to our example, first time it recognizes Number object and fills cache entry for it. Second time its a String. So a new piece of code memory is created for type String and modify the jump of first lookup (that is, Number type mismatch in our example) to go to this newly created piece of code memory instead of slowLookupCode.  and so on.


Friday, April 1, 2011

How to Merge Multiple PDF files into single PDF file on Ubuntu

Download: Fast, Fun, Awesome

Multiple PDF files can be merged into single PDF using two different ways: ghostscript or pdftk

A. Use Ghostscript to merge PDF files
1. Install two pacakeges GhostScript and PDFtk tools.
 $ sudo apt-get install gs pdftk

2. Use following command to combine multiple files into single PDF file. The output file name is "singleCombinedPdfFile.pdf". The input file names are all files in the current directory, bcoz we used "*.pdf".

$ gs -dNOPAUSE -sDEVICE=pdfwrite -sOUTPUTFILE=singleCombinedPdfFile.pdf -dBATCH *.pdf

If you want to join PDF files in specific order then you can also use file names.
$ gs -dNOPAUSE -sDEVICE=pdfwrite -sOUTPUTFILE=singleCombinedPdfFile.pdf -dBATCH 1.pdf 2.pdf 3.pdf

B. Use pdftk (PDF toolkit) to merge multiple PDF files into Single PDF file
1.  To merge PDF files by using names of the source PDF files:
$ pdftk one.pdf  two.pdf  three.pdf  cat  output  123-combined.pdf

2. To merge PDF files using wildcard when number of files are large and its not feasible to input filenames of all files:
pdftk *.pdf cat output combined.pdf

3. Select specific pages from Multiple PDFs and create new PDF document:
$ pdftk A=one.pdf B=two.pdf cat A1-7 B1-5 A8 output combined.pdf

Monday, March 14, 2011

Embed fonts in PDF file using PDFLaTex

Download: Fast, Fun, Awesome

This post explains how to embed fonts in PDF file.
Embedding the font in the PDF file is useful when you are preparing a paper for conference submission or you want to ensure that your PDF file looks exactly same on other's machine as it does on your computer. 
In this post I will explain how to do it on Linux machine.  I am not sure how to achieve the same on Windows computer. 
We will use tool "pdffonts" to examine PDF file. 

$ pdffonts  mypaper.pdf
name                                                              type                emb  sub uni object ID
------------------------------------ ----------------- --- --- --- ---------
HVGYIY+NimbusRomNo9L-Medi             Type 1            yes   yes no     110  0
TFVQMQ+NimbusRomNo9L-Regu            Type 1            yes   yes no     111  0
XHGNKU+NimbusRomNo9L-MediItal       Type 1            yes   yes no     113  0
UUGCZC+NimbusRomNo9L-ReguItal        Type 1            yes   yes no     114  0
FDULPW+CMSY7                                        Type 1            yes   yes no     148  0
SPCNWZ+NimbusMonL-Regu                     Type 1            yes   yes no     150  0
ABCDEE+Times                                           TrueType        yes   yes no     152  0
Arial                                                              TrueType          no   no  no     153  0
Arial                                                          CID TrueType      yes  no  yes    154  0
Arial                                                               TrueType          no  no  no     220  0
Arial                                                          CID TrueType      yes  no  yes    221  0
ABCDEE+Times                                         TrueType          yes  yes no     222  0
Arial,Italic                                                     TrueType          no   no  no     223  0
ZLLMAJ+CMMI10                                       Type 1            yes  yes no     257  0
Arial                                                              TrueType          no   no  no     259  0
ABCDEE+Calibri                                        TrueType          yes  yes no     260  0
Arial,Italic                                                     TrueType          no   no  no     261  0
Arial                                                             TrueType          no   no  no     282  0
Arial,Italic                                                    TrueType          no   no  no     283  0


The important columns are name and emb.  The "name" column displays the name of the font and the "emb" column shows whether that font is embedded in your PDF file or not. "yes" is "emb" column indicates that the font is embedded in the PDF file and "no" indicates that the font is not embedded in the PDF file.  
For example, in the above oputput, Arial, and Arial,Italic fonts are not embedded in the PDF file.

To embed the un-embedded fonts into your PDF file using PDFLaTex:
$  updmap --edit 
The above command will open the configuration file for pdflatex.
Find the pdftexDownloadBase14 directive and make sure it is true. That is, when you're done, the following line should be in the file:
pdftexDownloadBase14 true

Save the file and rebuild your PDF file using "pdflatex". 
Then check your PDF file using "pdffonts" command. It should now have embedded all the fonts use in your PDF file. 
If there are still some fonts missing then it might be because your have embedded another pdf file (as a graphics) into your "mypaper.pdf" file. 
In that case, you need to embedded the fonts into those embedded PDF files as well. 

If you included figures in your PDF file then follow the steps given below:
1.  Convert your PDF file to PS file
  $ pdftops  mypaper.pdf

2. Convert back ps file to pdf using "prepress" settings
  $ ps2pdf14 -dPDFSETTINGS=/prepress

Conversion from PDF to PS and again back from PS to PDF my cause some formatting errors. I recommend you to double check your PDF file for formatting errors. 

3. Check PDF fonts using pdffonts command
  $ pdffonts mypaper.pdf

Friday, March 11, 2011

LibXML Tutorial

Download: Fast, Fun, Awesome

In this blog post I will show some basic function of libxml, which is a freely licensed C language XML library.
This post gives an idea to beginners how to manipulate xml files using libxml library function. This post does not cover all XML API available in libxml, but it just gives an idea how to use libxml API's with the help of some basic functions.

For detailed XML API list please visit official website of libxml.

To Parse XML file:
xmlDocPtr doc;  // pointer to parse xml Document
  // Parse XML file
  doc = xmlParseFile(xmlFileName);

  // Check to see that the document was successfully parsed.
  if (doc == NULL ) {
    fprintf(stderr,"Error!. Document is not parsed successfully. \n");

To Get the root Document:

// Retrieve the document's root element.
  cur = xmlDocGetRootElement(doc);

  // Check to make sure the document actually contains something
  if (cur == NULL) {
    fprintf(stderr,"Document is Empty\n");

To Get the child Nodes of the current node element:

  cur = cur->xmlChildrenNode;

To Search for an attribute:

// search for "hash" attribute in the node pointed by cur
 attr = xmlHasProp(cur, (const xmlChar*)"hash");

To add new Attribute:

 * New Attribute "hash" is added to element node pointed by cur,
*  and default value of the attribute is set to "12345678"
 attr = xmlNewProp(cur, (const xmlChar*)"hash", (const xmlChar*)"12345678");

To Save XML document to Disk:

xmlSaveFormatFile (xmlFileName, doc, 1);

Complete Example is given below:
Suppose data.xml file is as follows:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE root SYSTEM "secPolicy2.dtd">
    <host hash="12345678"></host>


Following program reads the above xml file supplied as command line argument.
It adds "hash" attribute with default value set to "12345678" if its not present in the "host" element node.

 * Filename = xmlexample.c
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <libxml/xmlmemory.h>
#include <libxml/parser.h>

 * Parse URL Element Node in XML file
 * <url>
 *    <host hash="hash_val_of_hostname"></host>
 *    <sctxid>Integer</sctxid>
 * </url>
void parseURL (xmlDocPtr doc, xmlNodePtr cur) {
  xmlChar *key;
  xmlAttrPtr attr;

  // Get the childern Element Node of "url" node
  cur = cur->xmlChildrenNode;

  while (cur != NULL) {
    // check for "host" childern element node of "url" node
    if ((!xmlStrcmp(cur->name, (const xmlChar *)"host"))) {
      key = xmlNodeListGetString(doc, cur->xmlChildrenNode, 1);
      fprintf(stderr,"host: %s\n", key);
      // search for "hash" attribute in the "host" node
      attr = xmlHasProp(cur, (const xmlChar*)"hash");
      // if attr is not found then set it
      if(attr == NULL){
     * Add the Attribute and value of the attribute
    attr = xmlNewProp(cur, (const xmlChar*)"hash", (const xmlChar*)"12345678");
    /* Attribute is now set and has value.
     * Just retrieve the value and display it
    key = xmlGetProp(cur, (const xmlChar*)"hash");
    fprintf(stderr,"hash: %s\n", key);

    /* Attribute is available
     * Just retrieve the value and display it
    key = xmlGetProp(cur, (const xmlChar*)"hash");
    fprintf(stderr, "hash: %s\n", key);
    } // end of IF loop " host"
    // check for "sctxid" childern element node of "url" node
    if ((!xmlStrcmp(cur->name, (const xmlChar *)"sctxid"))) {
      key = xmlNodeListGetString(doc, cur->xmlChildrenNode, 1);
      fprintf(stderr,"sctxid: %s\n", key);
    } // end of If loop "sctxid"
      cur = cur->next;
  } // end of While loop


} // end of parseURL function()

 * Parsing the XML file and Reading the Element Nodes
static void parseDoc(char *xmlFileName) {
  xmlDocPtr doc;  // pointer to parse xml Document
  xmlNodePtr cur; // node pointer. It interacts with individual node

  // Parse XML file
  doc = xmlParseFile(xmlFileName);

  // Check to see that the document was successfully parsed.
  if (doc == NULL ) {
    fprintf(stderr,"Error!. Document is not parsed successfully. \n");

  // Retrieve the document's root element.
  cur = xmlDocGetRootElement(doc);

  // Check to make sure the document actually contains something
  if (cur == NULL) {
    fprintf(stderr,"Document is Empty\n");

  /* We need to make sure the document is the right type.
   * "root" is the root type of the documents used in user Config XML file
  if (xmlStrcmp(cur->name, (const xmlChar *) "root")) {
    fprintf(stderr,"Document is of the wrong type, root node != root");

  /* Get the first child node of cur.
   * At this point, cur points at the document root,
   * which is the element "root"
  cur = cur->xmlChildrenNode;

  // This loop iterates through the elements that are children of "root"
  while (cur != NULL) {
    if ((!xmlStrcmp(cur->name, (const xmlChar *)"url"))){
      parseURL (doc, cur);
    cur = cur->next;

  /* Save XML document to the Disk
   * Otherwise, you changes will not be reflected to the file.
   * Currently it's only in the memory
  xmlSaveFormatFile (xmlFileName, doc, 1);

  /*free the document */

   * Free the global variables that may
   * have been allocated by the parser.


} // end of XMLParseDoc function

int main(int argc, char **argv) {
  char *xmlFileName;

  if (argc <= 1) {
    printf("Usage: %s inputfile.xml\n", argv[0]);

  // Get the file name from the argv[1]
  xmlFileName = argv[1];

  // Custom function to parse XML file
  parseDoc (xmlFileName);

  return (1);

To compile the above program use following command:
$ gcc `xml2-config --cflags --libs` -o xmlexample xmlexample.c

To run the program, use following command:
$ ./xmlexample data.xml

Monday, February 28, 2011

Mercurial HG HOWTO guide

Download: Fast, Fun, Awesome

In this tutorial I will cover the basic commands you will need to use mercurial.
hg help is your first friend and Mercurial Wiki is your second.

Help for Command:
$ hg help <command>
$ hg <command> - -help

Commands to Create, Clone Repository
To make a new repository:
$ hg init <path>

To copy a repository from an existing repository:
$ hg clone <sourcePath>  [<DestinationPath>]

To clone specific branch of the repository:
$ hg clone -r <barnchName> <sourcePath> [<destinationPath>]

To copy existing repository to a new locaiton:
$ hg clone . <newPath>

To get changes from server repository and update working set:
$ hg pull -u

To get changes for specific branch from server repository:
$ hg pull -r <branchName>

To see what changes will come in on a PULL command:
$ hg incoming

To publish changes to specific branch on server repository:
$ hg push -r <branchName>

To see what changes will go out on a PUSH command:
$ hg outgoing

Commands for Add, Remove, Rename, Copy Operation
Add Specific file to repository:
$ hg add <filename1, filename2, ...>

To remove file from repository but don't delete from file system:
$hg remove <filename1, filename2...>

To remove file from repository and delete from file system as well"
$ hg remove -f <filename1, filename2,...>

To add all new files and remove all deleted files from repository:
$ hg addremove

To move or rename files in the repository:
$ hg move <oldfilename> <newfilename>

To copy files in the repository:
$ hg copy <oldfilename> <newfilename>

Commands for Commit, Revert Changes
To commit Changes to server repository:
$ hg commit
$ hg push

To commit as a particular user:
$ hg commit -u <username>

To revert all changes in local repository:
$ hg revert -a

To revert specific changes in local repositroy
$ hg revert <filename1, filename2, ..>

Commands to View Changes
To view changes between working set on your local repository and repository tip:
$ hg diff

To view changes between working set on your local repository and specific revision:
$ hg diff -r <revisionNumber>

To view changes between two revisions:
$ hg diff -r <revisionNumber> -r <revisionNumber>

To check what are changes in working set:
$ hg status

To list all changesets:
$ hg log

Commands to Update Working Set
To change working set to tip:
$ hg pull
$ hg up

To change working set with discarding any current work:
$ hg update -C

To change working set to specific revision:
$ hg update -r <revisionNumber>

To change working set to specific branch:
$ hg update -r <branchName>

To see the list of branches available for merging:
$ hg heads

Commands for Handling tags and Branches
To delete a tag:
$ hg tag -r <tagtext>

To tag a revision:
$ hg tag [-r <revisionNumber] <tagtext>

To list tags:
$ hg tags

To create new branch:
$ hg branch <branchName>
$ hg commit -m "New Branch created <branchName>"

To delete a branch:
$ hg commit - - close-branch <branchName>

To see the list of branches available:
$ hg branches

For HG Diff command setting in .hgrc file in /home/username folder: 


Commands related to Patch:
Generating a patch:
$ hg diff  >  patchfilename

Discarding all local changes:
$ hg revert -a

Thursday, February 3, 2011

Ubuntu commands

I am user of Ubuntu. I am writing this post to help people like me who forgets the stuff they used before.

How to check Installed Ubuntu version and its Codename
lsb_release -a

How to check Disk Space:
$ df -Th

Video editor in Ubuntu:
$ avidemux

To Upgrade Ubuntu Version Online:
Press ALT+F2 , then type "update-manager -d" without quotes and hit Enter key.

Mounting ISO image as a drive on Ubuntu:
$ sudo mount -o loop  ~/Desktop/filename.iso  /media/cdrom0

HOWTO: Move the Minimize/Maximize/Close Buttons back the Right Side
Hit ALT - F2
Type gconf-editor
Go to the following:
apps --> metacity --> general
Find the button_layout parameter, right mouse click, and select Edit Key
Change the value to the following:
Don't forget the colon on the left side of the text. "menu" is not necessary.

To change scree resolution from Command prompt:
$ xrandr -s 1024*768

How to change MAC address of NIC card in Ubuntu:
Type following command either in /etc/rc.local file or at command prompt:

      ifconfig eth0 down
      ifconfig eth0 hw ether NEW_MAC_ADDR
      ifconfig eth0 up

Alternate way :
  1. Install "macchanger"
  $ sudo apt-get install macchanger
  2. Bring down the interface whose MAC addr you want to change
  $ sudo ifconfig eth1 down
  3. Assign random or your choice of MAC
  $ sudo macchanger -r eth1

To determine a folder's size from the command line:
$ du  -sh  "/path/to/folder"
For example, to check the size of current directory du -sh . .

To view or cancel print Job on  Unix Printer
1. Login as a user or super-user to unix computer.
2. Use "lpq -Pprinter"  command to view printer queue.
    $ lpq -Ppsc011
3. Use "lprm -Pprinter [user_id] to cancel your job
    $ lprm -Ppsc011 g0xyzqwe

 Public key error while trying to run update command
$ sudo apt-get update
gives following error :
W: GPG error: lucid Release: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY D2B5F4E7C3BB95BB

Solution: Take last 8 characters in the key and run following command
$ gpg --keyserver --recv-keys C3BB95BB
$ gpg --export -armor C3BB95BB | sudo apt-key add -

To download the key on another computer and then to run on your computer :
$ gpg --keyserver --recv-keys C3BB95BB
$  gpg --export -armor C3BB95BB > key.asc

Now Copy key.asc file on your computer and run following command:
$ cat key.asc | sudo apt-key add -

To reflect the changes done in ~/.bashrc file without restarting your Terminal window:
$ source ~/.bashrc

To get Ubuntu Command prompt:
press : Ctrl + Alt + F2    or
  Ctrl  +  Alt  + F5

and to get back GUI screen (Xserver):
press:  Ctrl + Alt + F7

What to do if ubuntu hangs? Or how to restart Xserver:
press : Ctrl + Alt + Backspace
or press: Alt + Backspace
This usually restarts xserver. You need to login again to the system.

Using Bash History Effectively:
Type the following command to get a list of all related commands with their history numbers:
$ history | grep -i "search string"
Once you've found the command you want, you can execute it specifically by its number
$ !<history_command_number>

How to prevent Auto Locking the computer Screen:
1. Open file "/etc/default/acpi-support" and comment following line:

    # Comment this out to disable screen locking on resume

2.  Open "System > Preferences > Screen Saver" and uncheck following option:
     Lock screen when screensaver is active

How to Merge Multiple PDF files into single PDF file:
1. Install two pacakeges GhostScript and PDFtk tools.
 $ sudo apt-get install gs pdftk

2. Use following command to combine multiple files into single PDF file. The output file name is "singleCombinedPdfFile.pdf". The input file names are all files in the current directory, bcoz we used "*.pdf".

$ gs -dNOPAUSE -sDEVICE=pdfwrite -sOUTPUTFILE=singleCombinedPdfFile.pdf -dBATCH *.pdf

If you want to join PDF files in specific order then you can also use file names. 
$ gs -dNOPAUSE -sDEVICE=pdfwrite -sOUTPUTFILE=singleCombinedPdfFile.pdf -dBATCH 1.pdf 2.pdf 3.pdf

Wireless Networks: "Wireless is disabled"
$ rfkill  list all
You should see both the soft blocked and hard blocked as no. If either of them is yes then the connection would not be enabled. To enable type the following:
$ rfkill unblock wifi
Hard blocked: yes
This suggests that the wireless button is not switched on.

NetworkManager UI doesn't work:
1. sudo emacs /etc/NetworkManager/NetworkManager.conf
2. Set managed=true

How to get HostName from IP address?
 $ nmblookup -A 10.X.Y.Z

How to convert image into anotherformat using Command line and add border frame with color of your choice:
$ gm convert -mattecolor "#697B8F" -frame "6 6" k.jpg icon.png

Counter all lines of code including subdirectories:
find  .   -name   '*.c'   |   wc   -l

How to recover/reset forgotten Gnome Keyring Password?
$ rm ~/.gnome2/keyrings/login.keyring

How to find out the maximum RAM capacity and the number of RAM slots available in the Computer?
$ sudo dmidecode -t 16

To see complete memory information, including the info displayed by above command along with currently installed memory information (RAM speed, size, etc.), use:
$ sudo dmidecode -t memory