URL filtering for UIWebView on the iPhone

iCab Mobile provides a filter manager which allows to filter out advertising banners and other stuff from web pages. It has a list of simple URL-based filter rules (which is even editable by the user) and when a web page contains resources (image files, JavaScript files, stylesheets etc.) whose URLs match one of these rules, the resources won’t be loaded.

But implementing filters seems to be impossible. When you look at the public API of the UIWebView class, you won’t see anything which would allow to find out which resources the UIWebView object is loading, and even worse, nothing is available which can be used to force the UIWebView not to load these resources when you want to filter them out.

But of course there is a solution, otherwise this blog post wouldn’t make much sense ;-).

To implement filters we don’t have to look at UIWebView. As I mentioned above, nothing in the UIWebView API would allow to implement filtering.

To find a hook where we can intercept all the HTTP requests which are done by UIWebView we have to know a little bit about the URL loading system of Cocoa because UIWebView is using the URL loading system to get all the data from the web. One part of the URL loading system is the NSURLCache class, and this is our hook we’re looking for. Though the iPhone OS doesn’t cache any data on “disk” at the moment (this can be different in later iPhone OS release) and therefore the cache that is managed by the NSURLCache class is usually empty, UIWebView nevertheless checks if the requested resources are in the cache. So all we need to do is to subclass NSURLCache and overwrite the method

- (NSCachedURLResponse*)cachedResponseForRequest:(NSURLRequest*)request

This method is called for all resources the UIWebView is requesting. So all we need to do is to check if the URL of the request matches one of the filters. If it does, we create a fake response with no content, otherwise we just call the super class.  This is basically all we need to do.

Here’re some more details:

1. Subclassing NSURLCache:
In the Header file there’s almost nothing to do:

FilteredWebCache.h:

@interface FilteredWebCache : NSURLCache
{
}
@end

Now the main code for the subclass:

FilteredWebCache.m:

#import "FilteredWebCache.h"
#import "FilterManager.h"

@implementation FilteredWebCache

- (NSCachedURLResponse*)cachedResponseForRequest:(NSURLRequest*)request
{
    NSURL *url = [request URL];
    BOOL blockURL = [[FilterMgr sharedFilterMgr] shouldBlockURL:url];
    if (blockURL) {
        NSURLResponse *response =
              [[NSURLResponse alloc] initWithURL:url
                                        MIMEType:@"text/plain"
                           expectedContentLength:1
                                textEncodingName:nil];

        NSCachedURLResponse *cachedResponse =
              [[NSCachedURLResponse alloc] initWithResponse:response
                             data:[NSData dataWithBytes:" " length:1]];

        [super storeCachedResponse:cachedResponse forRequest:request];

        [cachedResponse release];
        [response release];
    }
    return [super cachedResponseForRequest:request];
}
@end

The code first checks if the URL should be blocked (the FilterManager class is doing all these checks, this class isn’t shown here). If yes, it creates a new response object with no content and stores this in the cache. One could assume that it should be possible to just return the fake response object and we don’t need to store it in the cache. But if we do this, the app would crash very soon because our fake response object is over-released by the iPhone OS. I don’t know why exactly this happens, this might be a bug in the iPhone OS (and also in MacOSX 10.5.x where the same thing happens. This works fine in 10.4.x and all older MacOSX releases) or caused by some undocumented internal dependencies between the different classes of the URL loading system. So we just store our fake response in the Cache. This makes sure that all response objects we return are really stored in the Cache and this is what the iPhone OS expects and then it won’t crash.

Update: It seems that it is also necessary that the “fake” response is initialized with a NSData object which has a size larger than zero.

2. Creating a new Cache:
We also need to create a new cache and tell the iPhone OS that it has to use this new cache instead of the default one so we really get called when the URL loading system checks the cache for a resource. This should be done before any of the UIWebView objects are starting to load web pages, very early within the launching process of the app.

NSString *path = ...// the path to the cache file
NSUInteger discCapacity = 10*1024*1024;
NSUInteger memoryCapacity = 512*1024;

FilteredWebCache *cache =
      [[FilteredWebCache alloc] initWithMemoryCapacity: memoryCapacity
                             diskCapacity: discCapacity diskPath:path];
[NSURLCache setSharedURLCache:cache];
[cache release];

We have to provide a path where the cache file is stored. The cache file is automatically created by the NSURLCache objects, so we don’t need to create the file, we only have to define where the file should be saved (this must be somewhere in the “sandbox” of our application, for example in the “tmp” folder or in the “Documents” folder).

This is all the magic to implement URL-based filters for UIWebView on the iPhone. You see, it’s not that complicated.

Note: If the filters can change while your app is running, you may need to remove the fake responses from the cache again. The NSURLCache class provides a method for this, so this isn’t a problem. If your filters are static, then you don’t need to care about this.