IE Automation Library and Web Macro Recorder: IHTMLDocument2

Showing posts with label IHTMLDocument2. Show all posts

Sunday, November 08, 2009

Automate HTML File Upload Control - Tutorial (1)

If you ever tried to programatically change the value of a HTML File Upload Control you probably noticed that <input type="file" /> element is read-only! This happens for good security reasons: web pages scripts should not be able to upload random files without user consent. Unfortunatelly the control is read-only for browser extensions and add-ons too (BHO, toolbars, side bars, etc).

For IE6 and IE7 the upload control may be automated by setting the focus to "Internet Explorer_Server" window and generating Win32 keyboard events. For IE8, the upload control is read-only, the user can only set a value by pressing "Browse" button and choosing a file.

With Twebst Automation Studio, automating HTML File Upload Control can't be easier. Here's a short VBScript sample of doing it:


Option Explicit
Dim core
Dim browser
Set core = CreateObject("Twebst.Core")
Set browser = core.StartBrowser("http://www.2shared.com/")

Call browser.FindElement("input file", "id=upField").InputText("C:\somepath\photo1.jpg")

Get Twebst Automation Studio Free Edition!

Thursday, June 19, 2008

IHTMLDocument3::getElementsByTagName and IHTMLElementCollection

A common task when writing Internet Explorer extensions is to browse a collection of objects based on a specified element tag-name. To work with collections IE provides IHTMLElementCollection interface that represents a collection of elements in an HTML document.

Usually a collection is retrieved by calling methods of IHTMLDocument2 interface. For some tag-names there specialized methods to retrieve collection (IHTMLDocument2::get_anchors , IHTMLDocument2::get_applets, IHTMLDocument2::get_forms, IHTMLDocument2::get_images, IHTMLDocument2::get_links, IHTMLDocument2::get_scripts).

To get all elements collection there is IHTMLDocument2::get_all.
One way to get a collection of elements having a specified tag-name is:

// CComQIPt<IHTMLDocument2> spDocument is a document object.

CComQIPtr<IHTMLElementCollection> spAllCollection;
HRESULT hRes = spDocument->get_all(&spAllCollection);
_ASSERTE(SUCCEDED(hRes) && (spAllCollection != NULL));

// Get the sub-collection of elements that have the "input" tag name.
CComVariant varTagName(CComBSTR("input"));
CComQIPtr<IDispatch> spDispCollection;
hRes = spAllCollection->tags(varTagName, &spDispCollection);

CComQIPtr<IHTMLElementCollection> spInputCollection = spDispCollection;
_ASSERTE(spInputCollection != NULL);

// Now you can browse spInputCollection using
// IHTMLElementCollection::item and IHTMLElementCollection::get_length methods.

The second method is:

// CComQIPt spDocument is a document object.
// Query for IHTMLDocument3 interface
CComQIPtr<IHTMLDocument3> spDoc3 = spDocument;
_ASSERTE(spDoc3 != NULL);

// Get the collection of elements that have the "input" tag name.
CComQIPtr<IHTMLElementCollection> spInputCollection;
HRESULT hRes = spDoc3->getElementsByTagName(CComBSTR("input"), &spInputCollection);
_ASSERTE(SUCCEDEED(hRes));

// Now you can browse spInputCollection using
// IHTMLElementCollection::item and IHTMLElementCollection::get_length methods.

Monday, March 31, 2008

.Net and COM interop story

.Net allows programmers to reuse COM components in their managed code. To make this possible a managed wrapper object around the native object is needed. Besides that, one can use the COM object like any other managed object. Even if it sounds simple, you have to be aware of the differences between the CLR's object lifetime management and the COM version of object lifetime management.

COM programmers have to call Release on every interface that has been AddRef'ed. For C# programmers using COM objects that means AddRef is called when:
- a COM object is created.
- a COM object is returned by calling a method or a property.
- a COM object is cast'ed to another COM interface type.

To release a COM object in C# there are two options:
- leave the GC to collect managed wrappers and to call their finalizers that will call Release on native COM object.
- manually call Marshal.ReleaseComObject on every interface used in the code.

Let's see a short example using COM objects exposed by IE. The code bellow changes the color of every link in a HTML document.


// IHTMLDocument2 doc;
foreach (IHTMLElement elem in doc.all)
{
    IHTMLAnchorElement anchor = elem as IHTMLAnchorElement;
    if (anchor != null)
    {
        elem.style.color = "red";
    }
}

This first approach leaves the task of releasing COM objects to garbage collector. Let's manually release COM objects now:

// IHTMLDocument2 doc;
IHTMLElementCollection allCollection = doc.all;
foreach (IHTMLElement crntElem in allCollection)
{
   IHTMLAnchorElement anchor = crntElem as IHTMLAnchorElement;
   if (anchor != null)
   {
       IHTMLStyle style = crntElem.style;
       style.color = "red";

       Marshal.ReleaseComObject(style);
       Marshal.ReleaseComObject(anchor);
   }

   Marshal.ReleaseComObject(crntElem);
}

Marshal.ReleaseComObject(allCollection);

As you can see the number of code lines doubles! I personally prefer to leave the task of releasing COM objects to GC even if they will be eventually released after some time when GC comes into action.

Some might be tempted to call GC.Collect after a large chunk of code that work with COM objects but this could be even worse because other managed objects could be promoted to next GC generation and their lifespan is therefore longer than necessary.

In theory it is possible to create a lot of large COM objects that will exceed the native heap while the managed heap has a lot of available memory because managed wrappers are smaller in size. GC won't be called in this scenario so the native heap won't be freed.

If your application suffers from this kind of memory allocation problem, maybe using COM objects from managed code is not the best approach for you.

Saturday, February 02, 2008

When IHTMLWindow2.document throws UnauthorizedAccessException

This is basically a C# translation of one of my older articles "When IHTMLWindow2::get_document returns E_ACCESSDENIED". Some .Net people encountered difficulties to use it, so I decided to make their life easier.

The main problem is the confusion created by System.IServiceProvider .Net interface because it has the same name as the COM interface. Once this issue is passed the code translation is straightforward. Here's the interop code to declare the COM interface IServiceProvider.

// This is the COM IServiceProvider interface, not System.IServiceProvider .Net interface!
[ComImport(), ComVisible(true), Guid("6D5140C1-7436-11CE-8034-00AA006009FA"),
InterfaceTypeAttribute(ComInterfaceType.InterfaceIsIUnknown)]
public interface IServiceProvider
{
   [return: MarshalAs(UnmanagedType.I4)][PreserveSig]
   int QueryService(ref Guid guidService, ref Guid riid, [MarshalAs(UnmanagedType.Interface)] out object ppvObject);
}

You find here full source code of the sample assembly.

This technique was successfully implemented and tested in Twebst web automation library.

Monday, November 12, 2007

When IWebBrowser2::get_HWND returns E_FAIL

IWebBrowser2::get_HWND "gets the handle of the Microsoft Internet Explorer main window". As any COM method get_HWND returns a HRESULT value. According to MSDN, the method "returns S_OK if successful, or an error value otherwise".

It was hard for me to imagine how this method could fail but I still got an E_FAIL return value. This happened because the IWebBrowser2 object was not the top level browser. A web page containing frames/iframes is represented by a hierarchy of IHTMLWindow objects. Each window has an associated IHTMLDocument2 object exposed by IHTMLWindow2::get_document. An IHTMLWindow can be also converted to a IWebBrowser2 object. Here's my solution to get the main window handle starting from a non-top level browser object (this is a common scenario when adding your custom menu item in the IE context menu).

// IHTMLWindow2 to IWebBrowser2
CComQIPtr<IWebBrowser2> IHTMLWindow2ToIWebBrowser2(CComQIPtr<IHTMLWindow2> spHTMLWindow)
{
     ATLASSERT(spHTMLWindow != NULL);

     // Query for a service provider.
     CComQIPtr<IWebBrowser2>     spBrowser;
     CComQIPtr<IServiceProvider> spServiceProvider = spHTMLWindow;

     if (spServiceProvider != NULL)
     {
          // Ask the service provider for a IWebBrowser2 object.
          spServiceProvider->QueryService(IID_IWebBrowserApp, IID_IWebBrowser2, (void**)&spBrowser);
     }

     return spBrowser;
}

// IWebBrowser2 to IHTMLWindow2
CComQIPtr<IHTMLWindow2> IWebBrowserToIHTMLWindow(CComQIPtr<IWebBrowser2> spBrowser)
{
     ATLASSERT(spBrowser != NULL);
     CComQIPtr<IHTMLWindow2> spWindow;

     // Get the document of the browser.
     CComQIPtr<IDispatch> spDisp;
     spBrowser->get_Document(&spDisp);

     // Get the window of the document.
     CComQIPtr<IHTMLDocument2> spDoc = spDisp;
     if (spDoc != NULL)
     {
          spDoc->get_parentWindow(&spWindow);
     }

     return spWindow;
}


CComQIPtr<IWebBrowser2> TopBrowser(CComQIPtr<IWebBrowser2> spBrowser)
{
     ATLASSERT(spBrowser != NULL);

     // Retrieve IHTMLWindow2 from browser.
     CComQIPtr<IHTMLWindow2> spHTMLWnd = IWebBrowserToIHTMLWindow(spBrowser);
     if (spHTMLWnd != NULL)
     {
          // Find top window.
          CComQIPtr<IHTMLWindow2> spTopWindow;
          HRESULT hResult = spHTMLWnd->get_top(&spTopWindow);

          if (SUCCEEDED(hResult) && (spTopWindow != NULL))
          {
               // Convert the browser object to window.

               return IHTMLWindow2ToIWebBrowser2(spTopWindow);
          }
     }

     return CComQIPtr<IWebBrowser2>();
}

This technique was successfully implemented and tested in My web automation library.

Wednesday, October 10, 2007

When IHTMLWindow2::get_document returns E_ACCESSDENIED

Internet Explorer extensions usually needs to access HTML elements. When extensions are initialized they get a IWebBrowser2 pointer representing the browser. Starting with this pointer one can get any HTML element in the web page but to do that we need to browse a hierarchy of frames first. The simplest web pages only have one frame and one document. Web pages containing <frame> or <iframe> have a hierarchy of frames, each frame having its own document.

Here are the objects involved and the corresponding interfaces:

browser      - IWebBrowser2
frame/iframe - IHTMLWindow2
document     - IHTMLDocument2
element      - IHTMLElement

The list bellow shows what method to call to get one object from another:

browser      -> document        IWebBrowser2::get_Document
document     -> frame           IHTMLDocument2::get_parentWindow
frame        -> document        IHTMLWindow2::get_document
frame        -> parent frame    IHTMLWindow2::get_parent
frame        -> children frames IHTMLWindow2::get_frames

A normal call chain to get a HTML element is:

browser -> document -> frame -> child frame -> ... -> child frame -> document -> element

This will work almost all the time. The problems arise when different frames contain documents loaded from different internet domains. In this case IHTMLWindow2::get_document returns E_ACCESSDENIED when trying to get the document from the frame object. I think this happens to prevent cross frame scripting atacks.

Here is HtmlWindowToHtmlDocument function I wrote to be used instead IHTMLWindow2::get_document to bypass the restriction:


// Converts a IHTMLWindow2 object to a IHTMLDocument2. Returns NULL in case of failure.
// It takes into account accessing the DOM across frames loaded from different domains.
CComQIPtr<IHTMLDocument2> HtmlWindowToHtmlDocument(CComQIPtr<IHTMLWindow2> spWindow)
{
     ATLASSERT(spWindow != NULL);

     CComQIPtr<IHTMLDocument2> spDocument;
     HRESULT hRes = spWindow->get_document(&spDocument);
    
     if ((S_OK == hRes) && (spDocument != NULL))
     {
          // The html document was properly retrieved.
          return spDocument;
     }

     // hRes could be E_ACCESSDENIED that means a security restriction that
     // prevents scripting across frames that loads documents from different internet domains.
     CComQIPtr<IWebBrowser2>  spBrws = HtmlWindowToHtmlWebBrowser(spWindow);
     if (spBrws == NULL)
     {
          return CComQIPtr<IHTMLDocument2>();
     }

     // Get the document object from the IWebBrowser2 object.
     CComQIPtr<IDispatch> spDisp;
     hRes = spBrws->get_Document(&spDisp);
     spDocument = spDisp;

     return spDocument;
}


// Converts a IHTMLWindow2 object to a IWebBrowser2. Returns NULL in case of failure.
CComQIPtr<IWebBrowser2> HtmlWindowToHtmlWebBrowser(CComQIPtr<IHTMLWindow2> spWindow)
{
     ATLASSERT(spWindow != NULL);

     CComQIPtr<IServiceProvider>  spServiceProvider = spWindow;
     if (spServiceProvider == NULL)
     {
          return CComQIPtr<IWebBrowser2>();
     }

     CComQIPtr<IWebBrowser2> spWebBrws;
     HRESULT hRes = spServiceProvider->QueryService(IID_IWebBrowserApp, IID_IWebBrowser2, (void**)&spWebBrws);
     if (hRes != S_OK)
     {
          return CComQIPtr<IWebBrowser2>();
     }

     return spWebBrws;
}

Here is the C# version of the code: "When IHTMLWindow2.document throws UnauthorizedAccessException".

This technique was successfully implemented and tested in My web automation library.

IE Automation Library and Web Macro Recorder