Monday, December 21, 2009

How to automate existing instances of Internet Explorer - Tutorial(4)

As explained in a previous tutorial, a web macro can start by connecting to an already running instance of Internet Explorer browser. A web macro usually starts a new Internet Explorer page and then navigates to a URL to begin with. There are times when automating an existing IE window is a more natural alternative. Let's start with a JScript/WSH simple web macro:
// Create core object.
var core = new ActiveXObject("Twebst.Core");

// Find a browser for which the displayed URL contains "yahoo".
core.useRegExp = true;
var b1 = core.FindBrowser("url=.*yahoo.*");

// Find a browser for which the title is "Google" (exact match).
core.useRegExp = false;
var b2 = core.FindBrowser("title=Google");


Basically FindBrowser method searches for a browser page based on TITLE and/or URL. If core.useRegExp is false then an exact search is used. If core.useRegExp is true then the search uses regular expressions. Once connected to a IE browser instance, you get a Browser object to play with.

Of course, this works for IE6, IE7, IE8 on Windows XP, Vista and Windows 7 (32 and 64 bit) and it's compatible with Internet Explorer Protected Mode.

What can you do with a Browser object? Quite a lot of things: Navigate to a URL, FindElement and FindFrame inside it, Close the browser, get information about the browser, automate modal and modeless HTML dialgos and wait to complete navigation and loading.

Enough talk! It's time to Download Twebst Automation Studio and see it for yourself.


Monday, December 14, 2009

IE macro - how to get programmatic control over HTML elements - Tutorial(3)

Finding elements inside HTML frames/iframes hierarchy with Twebst Web Automation Library

Once you have started a web macro and you've got a Browser object to play with, it's time to get programmatic access HTML controls inside the web page. As any web developer should know, you can easily find elements inside a document using document.getElementById.

In the world of web automation things are a bit different; web macros need to access HTML elements in any web document and at any level of DOM hierarchy. The target HTML element can be inside frames/iframes loaded from various domains, subject of cross-frame scripting security restrictions (see my older posts: "When IHTMLWindow2::get_document returns E_ACCESSDENIED" and "When IHTMLWindow2.document throws UnauthorizedAccessException").

Here's a simple VBscript web macro that automates a Google search in Internet Explorer:
Dim core
Dim browser
Set core = CreateObject("Twebst.Core")
Set browser = core.StartBrowser("")

Call browser.FindElement("input text", "name=q").InputText("codecentrix")
Call browser.FindElement("input submit", "name=btnG").Click()

What is so great about the code above is the way FindElement method works: first it waits for the browser to load the HTML document then it searches for the element thru all frame/iframe hierarchy.

Once an Element is found you can perform actions on it like: Click, RightClick, InputText, Check, Uncheck, Select, Highlight etc. What is even cooler than this, is that you don't have to write these statements by yourself. They will be automatically generated by Twebst Web Recorder which you can get for free:

Get Free Web Recorder for Internet Explorer


Wednesday, December 09, 2009

IE8 automation: How to programatically open an URL - Tutorial (2)

Get started with web macros in IE

Every web macro has to start somewhere, has to start somehow. There are basically two scenarios of automating Internet Explorer browser:
  • start a new IE browser and navigate to a URL to begin with

  • connect to an existing instance of Internet Explorer browser and continue automation
In this short tutorial I'll show you how to open a new Internet Explorer browser and open an URL with Twebst Automation Studio. Let's start with a short JScript web macro.
var core    = new ActiveXObject('Twebst.Core');
var browser = core.StartBrowser('');
The code is quite self-explaining. It creates a Twebst object and then opens a given URL in a new IE browser instance. What you get back from StartBrowser call is a Browser object that can be used to further automate Internet Explorer.

What can you do with a browser object? Quite a lot of things: navigate to a URL, find HTML elements and frames inside it, close the browser, get information about the browser, automate modal and modeless HTML dialgos and wait to complete navigation and loading.

That's all for now, but these feature will be covered in next tutorials!

(Automation Library and Macro Recorder for Internet Explorer)

Sunday, November 08, 2009

Automate HTML File Upload Control - Tutorial (1)

If you ever tried to programatically change the value of a HTML File Upload Control you probably noticed that <input type="file" /> element is read-only! This happens for good security reasons: web pages scripts should not be able to upload random files without user consent. Unfortunatelly the control is read-only for browser extensions and add-ons too (BHO, toolbars, side bars, etc).

For IE6 and IE7 the upload control may be automated by setting the focus to "Internet Explorer_Server" window and generating Win32 keyboard events. For IE8, the upload control is read-only, the user can only set a value by pressing "Browse" button and choosing a file.

With Twebst Automation Studio, automating HTML File Upload Control can't be easier. Here's a short VBScript sample of doing it:

Option Explicit
Dim core
Dim browser
Set core = CreateObject("Twebst.Core")
Set browser = core.StartBrowser("")

Call browser.FindElement("input file", "id=upField").InputText("C:\somepath\photo1.jpg")

Tuesday, November 03, 2009

New Web Macro Recorder released!

After several months of complete radio silence, things have been moving forward for Twebst Library which turned into a more powerful and mature product.
Please welcome Twebst Web Automation Studio!

Twebst Automation Studio is an advanced web automation framework for Internet Explorer that can be used within any environment that supports COM, from scripting languages (JScript, VBScript) to high level programming languages (C#, VB.Net, C++).

The framework includes two components: Twebst Web Recorder that automatically generates most of the automation code and Twebst Library which is a collection of programable objects that creates a web automation API.

But one image is better than thousand words...

Web Macro Recorder in Action

Twebst Web Recorder features:

  • Easily create web macros through an intuitive graphical interface.
  • Generate web automation code using the language of your choice: JScript, VBScript, C#, VB.Net, C++
  • Record web actions on all HTML controls (button, combo-box, list-box, edit-box)
And one more thing: FREE version available!

Tuesday, May 26, 2009

Web Data Extraction with C++ Web Macro

Web data extraction or web scraping can be implemented in various ways. Today I will use Twebst Web Automation Library to extract search results from Google using DOM parsing method and Internet Explorer automation (you need to install Twebst Library first).

Here are the steps that C++ web macro will perform in order to extract results from Google search:
  • Open an Internet Explorer browser and navigate to Google site.
  • Find the search edit box and fill out the word to search.
  • Find the submit button and click it.
  • Wait until the page is loaded and find a DIV with id=res
  • Find the collection of all H3 elements inside the DIV element.
  • Extract the text and URL and display it.

Enough talk! Let the code speak for itself.

// Start a new Internet Explorer instance and navigate to a given URL.
IBrowserPtr pBrowser = pCore->StartBrowser("");

// Find search edit box in page and type some text into it.
IElementPtr pSearchEdit = pBrowser->FindElement("input text", SearchCondition("name=q"));

// Find search button and click it.
IElementPtr pSearchBtn = pBrowser->FindElement("input submit", SearchCondition("text=Google Search"));

// Find the DIV element where the result are displayed.
IElementPtr pResultDiv = pBrowser->FindElement("div", SearchCondition("id=res"));

// Get all found results and print them in console.
IElementListPtr pResultList = pResultDiv->FindAllElements("h3", SearchCondition());

// Display only the header result (text and url).
for (int i = 0; i < pResultList->length; ++i)
    // Get current H3 in the list.
    IElementPtr pCrntResult = pResultList->Getitem(i);

    // Find first and only anchor inside H3
    IElementPtr pCrntAnchor = pCrntResult->FindElement("a", SearchCondition());
    CComQIPtr<IHTMLAnchorElement> spCrntAnchor = pCrntAnchor->nativeElement;

    // Get URL from IHTMLAnchorElement.
    CComBSTR bstrURL = "";

    // Display results.
    wcout << pCrntResult->text << L"\n" << bstrURL.m_str << L"\n\n";


Tuesday, May 19, 2009

IE Web Login Automation

One highly repetitive web task is the logon to a web site. This is a common scenario where Twebst Web Automation Library really shines. Here is a short web macro written in JScript language that automatically logs you on Yahoo Mail site. All you have to do is to replace "UUUUUUUUUU" and "PPPPPPPPPP" with your user name and password in the code below.

// Open a browser and navigate to yahoo mail login page.
var core = new ActiveXObject("Twebst.Core");
var browser = core.StartBrowser("");

// Find login fields.
var u = browser.FindElement("input text");
var p = browser.FindElement("input password");
var s = browser.FindElement("input submit");

// Log on to site by filling the user-name and password fileds and then click submit boutton.

FindElement searches thru all frames/iframes hierarchy for the first input element of type text/password/submit. Additional conditions can be specified for search (like searching an element by id/name or any other HTML attribute). Search conditions can make use of regular expressions if needed.

One more important thing is that FindElement method waits for the web page to be completely loaded before searching the element (the timeout can be specified by using core.loadTimeout property). Read more about Twebst Library...


Monday, May 18, 2009

Twebst Web Automation Library v1.40 released

Twebst version 1.40 is launched!
Main changes include IE8 compatibility, better support for working with embeded IE browser control, support for modal and modeless HTML dialogs and functions for clipboard access.

Here is the list of new features and enhancements:
- NEW: IE8 is now supported
- ENH: core.AttachToNative* methods work now with hosted IE browser control
- BUG: various fixes
- NEW: core.foregroundBrowser property
- NEW: core.productName property
- NEW: core.productVersion property
- NEW: core.GetClipboardText method
- NEW: core.SetClipboardText method
- NEW: core.AttachToWnd method
- NEW: core.NativeWindowToNativeBrowser method
- NEW: core.NativeWindowToNativeDocument method
- NEW: core.NativeWindowToNativeDocument
- NEW: browser.FindModalHtmlDialog method
- NEW: browser.FindModelessHtmlDialog method
- NEW: element.GetAttribute method
- NEW: element.SetAttribute method
- NEW: element.RemoveAttribute method
- NEW: element.tagName property
- NEW: element.FindParentElement method
- NEW: core.RightClick method-
- Find more ...

Free Download Twebst Library 1.40

Wednesday, April 29, 2009

Homemade Handcrafted Help System

A good documentation is very important for any serious project. It comes a moment in life, when programmers find themselves working on the help system. That is what happened to me during the later stages of Twebst Web Automation Library project. Even though this project is not open source, I try to make public as many parts of it as possible. Today I will present Twebst Help System and how this tedious and annoying task of creating it, was automated.

Twebst is a library of COM objects used to automate Internet Explorer browser. The objects and the supported properties and methods have to be documented . The page structure is the same for every object/method/property and it also contains code samples that need syntax highlighting. This is good news because it leaves a lot of place for templates and automation.

Here is the solution:
  • The template is an XML document. When documenting an object/method or property the focus is on the content rather than on formatting the text. There is one XML file for each object/method/property.
  • A WSH script written in jscript parses the XML document and adds syntax highlighting to sample code in the documentation page. Regular expression are used for parsing.
  • cross references are added automatically by the same script.
  • then a XSL transformation is applied to convert XML source to a HTML document that will be eventually written to disk.
  • The whole process is optimized by removing unnecessary operations like generating the HTML when it already exists and is newer than its XML source.
  • Finally the HTML documents refers a CSS style sheet to easily change the look.

It goes like this:
XML + JScript-> XML with color syntax and cross references + XSL -> HTML + CSS -> CHM

For local help, the CHM compiler is invoked as a final step and a CHM Help File is generated. All you have to do is launching Build.js script you may find in the archive below.


Prerequisites: In order to build the CHM file you'll need HTML Help Workshop from Microsoft.

Wednesday, March 18, 2009

SQL CE 3.5 SP1 and MSVCR80

I am involved in Web Replay (the best password manager on the market) development from its inception. Recently, we have upgraded SQL CE engine to the latest 3.5 SP1 version. Web Replay is built against Visual C++ runtime version 9 which is deployed by including Microsoft Visual C++ 2008 Redistributable Package (x86) in the setup. But that was not enough because Microsoft SQL CE 3.5 SP1 depends on Visual C++ runtime version 8.0 which is NOT included in the redistributable kit!

If you encounter this problem you have no option but to include vc2005 redist in your setup. You could think that giving up the upgrade to 3.5 version and continuing using SQL CE 3.1 is an option. Well, it isn't! SQL CE 3.1 is crashing randomly on Windows XP SP3 (even Microsoft SQL Server Management Studio is reporting some memory corruption error before completely freezing).

Friday, February 20, 2009

RegisterHotKey in C#

I like .Net framework for its huge class library covering almost anything you'll ever need. One thing I could not find was registering a windows hot-key in C#. It seems the framework does not provide a solution for system-wide hot keys. So I had to use the good old RegisterHotKey Win32 function.

To call it from C#, you need to use P/Invoke service to invoke the unmanaged function residing in user32.dll

using System.Runtime.InteropServices;

private static extern bool RegisterHotKey(IntPtr hWnd, int id, int fsModifiers, int vk);

The hWnd parameter is the handle of a form that will receive WM_HOTKEY message. To get the handle of a Form simply use Handle property. In order to process WM_HOTKEY message, your form needs to override WndProc method.

Just one more thing you should know: RegisterHotKey and ShowInTaskBar property don't mix well. It seems that a new window is created each time ShowInTaskBar property is called so your WndProc mehod will stop recieving hot-keys. You need to RegisterHotKey again because Handle property returns a new window handle after ShowInTaskBar was called.

Monday, January 05, 2009

WSH and clipboard access

I did some Windows Script Host programming recently and I was pleasantly surprised by its power, features and flexibility. One thing that I couldn't accomplish was accessing the clipboard from WSH. Digging the internet I found some solutions like this one based on Internet Explorer Automation. There are several problems with this approach as you can read in my article about Internet Explorer Automation: What's wrong with Internet Explorer Automation?

My solution for scripting the clipboard content in WSH is a regular COM object created with VC++ and ATL.

Download full source code and compiled DLL:
To install the COM object run register.bat

I found scripting the clipboard useful enough to add this feature to the next release of Twebst Web Automation Library.