Kailas Patil

Friday, July 30, 2010

What is Really Simple Syndication( RSS)?

Really Simple Syndication (RSS) is a way to subscribe to a source of information, such as a Web site.

RSS works by having the website author maintain a list of notifications on their website in a standard way. This list of notifications is called an "RSS Feed". Many Blog services automatically create RSS Feeds. For Websites, RSS Feeds can be created manually or with software(such as Software Garden, Inc.'s ListGarden).

The RSS Feed that is created is an XML file that lives on a Webserver. Once RSS feed is ready on web server, RSS Feeds waits for an RSS Reader to subscribe to them. The RSS Feed Reader reads the RSS Feed file and displays it. That is, the RSS Reader displays only new items from the RSS Feed. An RSS Feed Reader reads the RSS feed file, finds what is new, converts it to HTML, and displays it.

Monday, May 24, 2010

Spider-monkey Internals

SpiderMonkey Internals

The JavaScript engine compiles and executes scripts containing JavaScript statements and functions. The engine handles memory allocation for the objects needed to execute scripts, and it cleans up—garbage collects—objects it no longer needs.

The word JavaScript may bring to mind features such as event handlers (like onclick), DOMobjects, window.open, and XMLHttpRequest. But in Mozilla, all of these features are actually provided by other components, not the SpiderMonkey engine itself. SpiderMonkey provides a few core JavaScript data types—numbers, strings, Arrays, Objects, and so on—and a few methods, such as Array.push. It also makes it easy for each application to expose some of its own objects and functions to JavaScript code. Browsers expose DOM objects.

Spidermonkey includes/implemented JavaScript functions such as eval(), charAt(), escape(), unescape(), encodeURI(), decodeURI(), etc.

JavaScript Event Handlers (like onClick) are implemented in the content/event/ module in Firefox. The file use to handle events is content/event/src/nsEventStateManager.cpp.

Window.open is implemented in the dom/ module. The file which implemented native function for window.open is dom/src/base/nsGlobalWindow::OpenJS() in Firefox. window object is not standardized in DOM specifications, therefore, it has vendor specific implementation of it.

XMLHttpRequest() is implemented in the content/base/ module in Firefox. The file is content/base/src/nsXMLHttpRequest.cpp implementing XHR request.

C/C++ code accesses SpiderMonkey via the JSAPI, by including the header "jsapi.h". The JSAPI provides functions for setting up the JavaScript runtime, compiling and executing scripts, creating and examining JavaScript data structures, handling errors, enabling security checks, and debugging scripts.

In order to run any JavaScript code in SpiderMonkey, an application must have three key elements: a JSRuntime, a JSContext, and a global object.

Runtimes. A JSRuntime, or runtime, is the space in which the JavaScript variables, objects, scripts, and contexts used by your application are allocated. Every JSContext and every object in an application lives within a JSRuntime. They cannot travel to other runtimes or be shared across runtimes. Most applications only need one runtime.

A program typically has only one JSRuntime, even if it has many threads.

Contexts. A JSContext, or context, is like a little machine that can do many things involving JavaScript code and objects. It can compile and execute scripts, get and set object properties, call JavaScript functions, convert JavaScript data from one type to another, create objects, and so on. Almost all JSAPI functions require a JSContext * as the first argument.

Global objects. Lastly, the global object contains all the classes, functions, and variables that are available for JavaScript code to use. Whenever JavaScript code does something like window.open("http://www.mozilla.org/"), it is accessing a global property, in this case window. JSAPI applications have full control over what global properties scripts can see. The application starts out by creating an object and populating it with the standard JavaScript classes, like Array and Object. Then it adds whatever custom classes, functions, and variables (like window) the application wants to provide; see Custom objects below. Each time the application runs a JS script (using, for example, JS_EvaluateScript), it provides the global object for that script to use. As the script runs, it can create global functions and variables of its own. All of these functions, classes, and variables are stored as properties of the global object.

References:

1. https://developer.mozilla.org/En/SpiderMonkey/JSAPI_User_Guide#JSAPI_basics

2. https://developer.mozilla.org/En/JSRuntime

Wednesday, May 5, 2010

V8 Build Error

Hi,

I downloaded v8 and followed the build instruction given at http://code.google.com/apis/v8/build.html#build .

However, When I executed the command "scons" (without quotes) it shown following errors on my machine.

cc1plus: error: dereferencing pointer 'dest' does break strict-aliasing rules

src/api.cc:3495: note: initialized from here

scons: *** [obj/release/api.o] Error 1

scons: building terminated because of errors.

This is bcoz warnings are considered as errors by the compiler.

To solve this problem,

open "SConstruct" file in the source directory of v8 and look for "V8_EXTRA_FLAGS"

Then comment the line (use # to comment line) '-Werror' in

gcc:{

all:{

WARNINGFLAGS: [

]

After doing this, I was able to build it. However, I was not able to embed v8 in C++ application.

The reason was, I was running ubuntu x64 bit OS and V8 build is of 32-bit.

I ran following command to build 64-bit libaray of V8.

$ scons mode=debug arch=x64

Where mode=debug to build DEBUG version instead of by default release version. And arch=x64 to build 64-bit version instead of by-default 32-bit version.

Sunday, April 11, 2010

Measure CPU usage time by Thread

   On Linux there are two commands you can use to measure CPU usage time of a thread in a multi threaded application. Those commands are getrusage() and clock_gettime(). You can sue same commands to measure CPU time consumed by process.

Sample program to measure CPU time in a thread is given below:

#include
#include
#include

   int kResult = 0;
   struct rusage start, end;
   struct timeval timeS, timeE;
   double t= 0.0, time= 0.0;

   struct timespec st, endt;
   double t1= 0.0, time1= 0.0;

   // measure CPU time
      kResult = getrusage(RUSAGE_THREAD, &start);

   if (kResult == -1)
   fprintf(stderr, "\n\n Error in getrusage command");

   timeS = start.ru_stime; // system time
   t = (double)timeS.tv_sec + (double) timeS.tv_usec / 1000000.0;
   timeS = start.ru_utime; // user time
   t = t + (double)timeS.tv_sec + (double) timeS.tv_usec / 1000000.0;

   kResult = clock_gettime(CLOCK_THREAD_CPUTIME_ID, &st);
   if (kResult == -1)
   fprintf(stderr, "\n\n Error in clock_gettime command");

   t1 = (double) (st.tv_sec * 1000000.0 )+ (double) st.tv_nsec / 1000.0 ;


   // Do some operation to consume CPU

  // measure CPU time

kResult = getrusage(RUSAGE_THREAD, &end);

   if (kResult == -1)
   fprintf(stderr, "\n\n Error in getrusage command");

   timeE = end.ru_stime; // system time
   time = (double)timeE.tv_sec + (double) timeE.tv_usec / 1000000.0;
   timeE = end.ru_utime; // user time
   time = time + (double)timeE.tv_sec + (double) timeE.tv_usec / 1000000.0;

   time = time - t;
   fprintf(stderr,"\n\n Total CPU usage time using 'getrusage' = %.12lf\n\n", time);

   kResult = clock_gettime(CLOCK_THREAD_CPUTIME_ID, &endt);
   if (kResult == -1)
   fprintf(stderr, "\n\n Error in clock_gettime command");

   time1 = (double) (endt.tv_sec * 1000000.0)+ (double) endt.tv_nsec / 1000.0 ;
   time1 = (time1 - t1) / 1000000.0;

   fprintf(stderr,"\n\n Total CPU usage time using 'clock_gettime' = %.12lf\n\n", time1);

Note: clock_gettime will generate error if you will not provide -lrt library to linker.

Friday, March 19, 2010

To remove old kernel from Ubuntu

To remove old kernels from ubuntu OS:

First check the current kernel version:
$ uname -r

Do not remove current version.

To remove old kernel version use command:
$sudo apt-get purge linux-image-2.6.XX-XX-generic

Then remove header of that version:
$sudo apt-get purge linux-headers-2.6.XX-XX

Tuesday, March 16, 2010

Cannot Update Ubuntu

Recently, When I was trying to update ubuntu, I was getting following error message:

E: Could not get lock /var/lib/apt/lists/lock - open (11 Resource temporarily unavailable)

E: Unable to lock the list directory

After searching on Ubuntu forums I found the solution to solve this Problem.

Use following command to see is their any synaptic pkg manager running?

$ ps -e | grep apt

If yes, then kill all those processes using

$ sudo kill -9 processID

Then remove lock file from ur system:

$ sudo rm /var/lib/apt/lists/lock

Now try to update ur system. It should work. Atleast it worked for me.

DOT Graphs

DOT (filename.dot) is a file format to draw graphs including directed graphs.
DOT writes graphs in .ps, .pdf, .gif, .png formats.

For example,

$ dot -Tps src.dot -o dest.ps

$ dot -Tpdf src.dot -o dest.pdf

Dot file format is as follows:

digraph graphName {
"Node 1" -> "Node 2" ;
"Node 1" -> "Node 2" -> "Node 3";
}

Each line is terminated by semicolon (;), and arrow (->) is used to show directed arc.

digraph means directed graph whereas graph means undirected graph.

Within a main graph a subgraph define a subset of graph.
For example,
digraph GraphName {
"Node 1" -> "Node 2";
subgraph SubGraphName{
   "Node 3" -> "Node 1";
   "Node 4" -> Node 3";
  }
}

Tuesday, March 2, 2010

JavaScript Injection

JavaScript Injection is a technique that allows you alter the content of current web page without actually leaving the current web page. It is extremely useful when you want to spoof the contents that are sent to server using Forms.

Basics of JS Injection:

JS injection means inserting or executing a script. You can execute a script from the URL bar of the web page which you want to alter. To execute JS code, you must first clear the URL bar (Note: Don't press enter yet), that is, no http:// or anything else.

Javascript can be executed from URL using javascript: protocol.

Try following code in the URL bar of the web page to display your message.

javascript:alert("Hello World!");

If you saw a window pop-up and saying Hello World, then congrats, you successfully did a JS injection test.

Cookie Editing:

This time we will try penetrate one level deeper and we will try to modify server state.

One of the mechanism used to represent server state is using Cookies. Server identifies client state and authorization using Cookies. Therefore, it is worth to learn cookie alteration using JS injection technique.

To check the cookies set by web site, use following script at URL bar:

javascript:alert(document.cookie);

Above script will show you cookies set by web site. To modify any key=value pair, use following syntax:

javascript:void(document.cookie="Key=Value");

Above command can either alter existing Key=Value pair or add new Key=Value pair if it doesn't exists. To edit or alter information we use void( ) function of JavaScript.

For example, server set Authorization=no in Cookie and you want to modify this Key=value pair. Then you can use script given below:

javascript:void(document.cookie="Autorization=yes");

It is also useful to try an alert(document.cookie); script at the end of the same line to see what effect your altering had.

Form Modifications:

One way to edit values sent to web server from client using a Form is to store a web page on a local disk and modify its Form field values with whatever values you want and then submit the form to the server.

For example:

Following HTML code snippet shows that hidden field is submitted when a submit button is clicked on Form. If we want to modify email address to get data sent by email to webmaster.

<form action="/missions/basic/process.php" method="post">

<input type="hidden" name="to" value="webmaster@mywebsite.com" />

<input type="submit" value="Click to Submit" />

</form>

First, we need to store this web page on local disk, and then modify it as shown below.

<form action="http://mywebsite.com/missions/basic/process.php" method="post">

<input type="hidden" name="to" value="altered@emailaddress.com" />

<input type="submit" value="Click to Submit" />

</form>

However, sometimes the website checks to see if you actually submitted it from the website or not. To get around this, we can just edit the form using from javascript Injection.

Every form on a given webpage (unless named otherwise) is stored in the forms[x] array... where "x" is the number, in order from top to bottom, of all the forms in a page. Note that the forms start at 0, so the first form on the page would actually be 0, and the second would be 1 and so on.

Lets consider our previous form example:

<form action="/missions/basic/process.php" method="post">

<input type="hidden" name="to" value="webmaster@mywebsite.com" />

<input type="submit" value="Click to Submit" />

</form>

Note:Since this is the first form on the page, it is forms[0].

To check the value using JS, use following command:
      javascript:alert(document.forms[0].to.value)

In this case, It would pop up an alert that says "webmaster@mywebsite.com"

So here's how to Inject your email into it. You can use the same technique as shown earlier in the cookies editing :
   javascript:void(document.forms[0].to.value="altered@emailaddress.com");

Above script would change email address to altered@emailaddress.com. You can use alert( ) JavaScript function to check your work.

These are the most basic things you need to know about JS injection and useful in many cases.

Tuesday, February 23, 2010

How Download accelerator Works?

In this post I will explain the basic principle of Download accelerators (such as DAP, wxDownload Fast, etc) used to download files.

How it speeds up the downloading.

First Let me explain the difference between normal downloading of a file and downloading of a file using download accelerator. If you use regular browser to download a file then it creates only one connection with the server to download the file, whereas if download accelerators is used to download file then it creates multiple connection with the server and downloads a file in chunks and upon completion of download it joins those chunks. Number of chunks created by download accelerators is depend on its configuration. I used wxDownload Fast and configured it to create 3 chunks of a file. That is, it creates 3 connections with the server to download any file.

Lets consider an example.

I used wxDownload Fast as download accelerator and downloaded a file ymsgr8us.exe (Yahoo messenger) which is of size 9.9 MB.
The initial request sent by wxDownload Fast to server was ordinary request to retrieve file, as given below:

Hypertext Transfer Protocol
GET /dl/9073e1f8a8d00eb735874cd9d3b6769c/4b824b4c/30%2Fymsgr8us.exe HTTP/1.1\r\n
Request Method: GET
Request Version: HTTP/1.1
HOST: us.download.soft32.com\r\n
User-Agent: wxDownload Fast\r\n
Range: bytes=0-\r\n

In HTTP response, from Content-Length header field wxDownload Fast learned the actual size of file. Once the file size is know to wxDownload Fast (in general, to download accelerators), depending on number of connection (chunks) to create it decides how big should be the chunk size. In our case, I configured it to create three chunks, therefore, it divides Content-Length value by 3 to create three chunks.

Then it sends another HTTP request to the server, by creating another connection. The HTTP request made by wxDownload to the server was as follows:

Hypertext Transfer Protocol
GET /download/63-164279-1/ymsgr8us.exe HTTP/1.1\r\n
Request Method: GET
Request URI: /download/63-164279-1/ymsgr8us.exe
Request Version: HTTP/1.1
HOST: www.soft32.com\r\n
User-Agent: wxDownload Fast\r\n
Range: bytes=3474600-\r\n

Note the Range field in HTTP request. The range header field was
Range: bytes=3474600-

It instructs the server to return a file from 3474600th byte onwards. Although, download manager requested for entire file in its first connection, it would however terminate that connection as soon as it will receive upto bytes 3474599. Hence it would not wast resources and download duplicate byte streams.

As now you can image what would be the third HTTP request. It is given below. observe the Range HTTP header field.

Hypertext Transfer Protocol
GET /download/63-164279-1/ymsgr8us.exe HTTP/1.1\r\n
Request Method: GET
Request URI: /download/63-164279-1/ymsgr8us.exe
Request Version: HTTP/1.1
HOST: www.soft32.com\r\n
User-Agent: wxDownload Fast\r\n
Range: bytes=6949200-\r\n

This is the basic principle followed by download accelerators (such as wxDownload Fast, etc) to download file quickly than normal browser download.

Friday, February 19, 2010

Heritrix and HTMLUnit

Hi folks, In this post I will explain how to build Heritrix from its source code and how to Integrate HTMLUnit into Heritrix.

First question comes to mind is, What is Heritrix? and why do we need HTMLUnit to Integrate into Heritrix?

Well, Heritrix is a open-source, Web crawler. Heritrix does not include web page level DOM model and JavaScript Interpreter. Therefore, if you want to crawl the web to look for malicious scripts or obfuscated JS, then you need a JS interpreter. Hence. HTMLUnit comes into play. HTMLUnit is a headless browser, which has got JS interpreter.

Steps to Build Heritrix:

Download latest version of JDK rpm from sun website and install it.
Set JAVA_HOME and PATH environment to .bashrc file (~/.bashrc)

     export JAVA_HOME=/usr/java/jdk1.6.x.x
     export PATH=$JAVA_HOME/bin:$PATH
   Now JDK is ready to be used by Heritirx and Maven
   3. We need maven 1.0.2 to build heritix. Note: We need src of heritix so that we can modify it in future. Therefore do not use heritix binaries available on Ineternet. Build heritirx from src. Also Note the version of maven. it is very very important. Do not try with latest version of maven. It may not work.
   4. Download binary of maven 1.0.2 and extract it somewhere on disk. Now set MAVEN_HOME
   environment for it as mention below.
   Edit (/etc/profile) file to insert following lines before unset i and unset pathmunge commands at the end of file.
   export MAVEN_HOME=/path_of_Maven_directory
   pathmunge $MAVEN_HOME/bin before

     Now logout and login again to reflect environment variable changes done above to be get reflected.

   5. Run maven -v command to test maven is running properly.
   6. Run maven jar command. this will create /root/.maven/repository directory.
   7. Now go into heritirx directory and run command maven dist
   8. This will create subdirectory target, and many other subdirectories inside target directory.
   target/distribution directory holds heritirx build version.
   It there is failure due to any dependency jar file then download that file from Internet and store it in either /root/.maven/cache or /root/.maven/repository/.../jar/ directory.
   9. Heritirx is build Successfully. Extract build version and test heritirx.
  10. Launch heritrix by using command:
     $ HERITRIX_HOME/bin/heritrix --admin=LOGIN:PASSWORD
   where $HERITRIX_HOME is the location of your untarred heritrix.?.?.?.tar.gz.

Integrating HTMLUNIT into Heritrix:

This is little bit tricky. You are at this point means you already have heritrix, sun JDK and maven.

Follow the steps given below:
Step 1: Download HTMLUnit (I used HTMLUnit 2.5). We don't need source code of HTMLUnit therefore download binary of HTMLUnit. We only need its JAR files.

Step 2: Copy all JAR files in HTMLUnit into lib sub-directory of heritrix folder. Do not replace files, which are already there, if you replace them, then you need to modify project.properties file. Only add those files which are not there.

Step 3: Edit project.xml file in heritrix directory. Bcoz we want to tell heritrix where HTMLUnit classes can be found. Add tag for each JAR file of HTMLUnit.

Sample of dependency tag is given below:
   <dependency>
   <id>htmlunit</id>
   <version>2.5</version>
     <url>http://htmlunit.sourceforge.net/ </url>
   <properties>
     <war.bundle>true </war.bundle >
   <ear.bundle>true</ear.bundle>
   <ear.bundle.dir>APP-INF/lib</ear.bundle.dir>
   <description>
   Use to handle JS obfuscation. It is a headless browser.
   </description>
   <license>Apache 2.0
   http://www.apache.org/licenses/LICENSE-2.0 </license>
   </properties>
   </dependency>
Add this dependency tag for all JAR files of HTMLUnit.

Step 4: Edit project.properties file in heritirx directory to instruct maven that, do not try to download those dependency files from Internet, rather look into local directory. Syntax to do this can be easily found in project. properties file, simply make use of it.

For example:
   maven.jar.htmlunit = ${basedir}/lib/htmlunit-2.5.jar

Add a entry for each JAR file (that is, each dependency entry done in Step 3 ) of HTMLUnit.

Step 5. Done. Now build Heritrix again.