Saturday, May 31, 2014

Automating WebBrowser with C# and Open Twebst

The .Net WebBrowser class is actually an Internet Explorer control. You can add it to your Windows Forms application do display HTML content.

Using WebBrowser class from C# code you can have access to HTML document but automating the IE control is not as easy as it should be. You will have to take into account things like:
  • browser events and timeouts: navigating, document complete
  • an asynchronous programming model: web automation code mixed with event handlers
  • document ready state
  • HTML events - you'll have to fire the right events so web pages will react properly
  • dealing with DOM and cross frames security restrictions
  • manually writing web automation code
Open Twebst addresses all this issues with a consistent programming model and an integrated web recorder that assists you in generating web automation code. Basically, things work like this with Open Twebst:
  • attach to WebBrowser control so you can use Twebst objects
  • optionally you can register to OnCancel event so you can cancel web automation at any point (the current web automation method will throw an error when canceled)
  • generate web automation code with Open Twebst web recorder
  • use the code in your C#/VB.Net Windows Forms project
Web automation code executes on UI thread but Open Twebst methods dispatch windows messages under the hood so the application does not freeze. OnCancel event is raised from time to time so you have a chance to stop automation code at any point.

Here is a sample that comes with Open Twebst setup and you can also download from here:

Sunday, April 27, 2014

Excel web automation in 6 easy steps

This short tutorial is about automating Internet Explorer browser from an Excel VBA macro using Open Twebst, an open source IE automation framework.
  • download and install Open Twebst from:
  • start recording your web actions and choose VBA code generator
  • open Excel, create an workbook and insert a new macro: View/Macros/View Macros => fill out a Macro name + Create
  • copy the VBA code from Twebst recorder window and paste it into VBA macro editor
  • add reference to Open Twebst library from Excel VBA macro editor: Tools/References => Open Twebst 1.0 Type Library
  • save the workbook; choose Excel Macro-Enabled Workbook (*.xslm) type in Save dialog box
Here is the whole story in pictures:

Wednesday, March 05, 2014

Web Automation - dealing with navigation timeout in Open Twebst

The most important aspect of web automation is object recognition. In Open Twebst, FindElement method makes use of HTML tag name and attributes to find objects inside web pages. But before accessing the DOM, the web page must be loaded. Deciding when the web page is completely loaded is not always an exact science, the loading time depends on various things like:

  • the speed of the internet connection
  • the size of the HTML document
  • the size and number of the images inside web document
  • some web pages seem to update and load continuously
  • sometimes Open Twebst library fails to detect if the page is completely loaded when the page embeds resources like PDF or office documents

To address page loading, Open Twebst has three useful properties: core.loadTimeout, core.searchTimeout and core.loadTimeoutIsError which affect Find methods and work together like this:

  • wait a maximum core.loadTimeout milliseconds for the page to be completely loaded. If core.loadTimeout is zero don’t wait and start searching the element right away.
  • if the page is completely loaded start searching the element in core.searchTimeout milliseconds. If the element is not found returns null.
  • if the page is not completely loaded and core.loadTimeoutIsError is true (which is default value) then throw an exception. If core.loadTimeoutIsError is false try  to find the element in core.searchTimeout milliseconds; returns null if not found.
  • searchTimeout is useful for elements that are dynamically created by web page scripts + Ajax after the DOM is created.
Usually we have to wait for the page to load before performing actions on HTML controls; some web pages just don't work if we access the DOM as soon as the controls are available because not all the content is loaded and the initialization scripts are not yet executed.

However it is possible to set core.loadTimeout to zero (don't wait for the page to load) and set core.searchTimeout to a greater value like 1 minute. This way the element is retrieved as soon as it is loaded.

Other related features: core.isLoading, browser.WaitToLoad.

Sunday, February 23, 2014

Open Twebst has a new home

Open Twebst has a new home, the transition to hosting at Github is now complete:

Twebst went open source on February 19, 2012. Two years, ~3700 downloads and 16 forks (on Google code) later, the version is released: