Skip to content


URL filtering for UIWebView on the iPhone

iCab Mobile provides a filter manager which allows to filter out advertising banners and other stuff from web pages. It has a list of simple URL-based filter rules (which is even editable by the user) and when a web page contains resources (image files, JavaScript files, stylesheets etc.) whose URLs match one of these rules, the resources won’t be loaded.

But implementing filters seems to be impossible. When you look at the public API of the UIWebView class, you won’t see anything which would allow to find out which resources the UIWebView object is loading, and even worse, nothing is available which can be used to force the UIWebView not to load these resources when you want to filter them out.

But of course┬áthere is a solution, otherwise this blog post wouldn’t make much sense ;-).

To implement filters we don’t have to look at UIWebView. As I mentioned above, nothing in the UIWebView API would allow to implement filtering.

To find a hook where we can intercept all the HTTP requests which are done by UIWebView we have to know a little bit about the URL loading system of Cocoa because UIWebView is using the URL loading system to get all the data from the web. One part of the URL loading system is the NSURLCache class, and this is our hook we’re looking for. Though the iPhone OS doesn’t cache any data on “disk” at the moment (this can be different in later iPhone OS release) and therefore the cache that is managed by the NSURLCache class is usually empty, UIWebView nevertheless checks if the requested resources are in the cache. So all we need to do is to subclass NSURLCache and overwrite the method

- (NSCachedURLResponse*)cachedResponseForRequest:(NSURLRequest*)request

This method is called for all resources the UIWebView is requesting. So all we need to do is to check if the URL of the request matches one of the filters. If it does, we create a fake response with no content, otherwise we just call the super class.  This is basically all we need to do.

Here’re some more details:

1. Subclassing NSURLCache:
In the Header file there’s almost nothing to do:

FilteredWebCache.h:

@interface FilteredWebCache : NSURLCache
{
}
@end

Now the main code for the subclass:

FilteredWebCache.m:

#import "FilteredWebCache.h"
#import "FilterManager.h"

@implementation FilteredWebCache

- (NSCachedURLResponse*)cachedResponseForRequest:(NSURLRequest*)request
{
    NSURL *url = [request URL];
    BOOL blockURL = [[FilterMgr sharedFilterMgr] shouldBlockURL:url];
    if (blockURL) {
        NSURLResponse *response =
              [[NSURLResponse alloc] initWithURL:url
                                        MIMEType:@"text/plain"
                           expectedContentLength:1
                                textEncodingName:nil];

        NSCachedURLResponse *cachedResponse =
              [[NSCachedURLResponse alloc] initWithResponse:response
                             data:[NSData dataWithBytes:" " length:1]];

        [super storeCachedResponse:cachedResponse forRequest:request];

        [cachedResponse release];
        [response release];
    }
    return [super cachedResponseForRequest:request];
}
@end

The code first checks if the URL should be blocked (the FilterManager class is doing all these checks, this class isn’t shown here). If yes, it creates a new response object with no content and stores this in the cache. One could assume that it should be possible to just return the fake response object and we don’t need to store it in the cache. But if we do this, the app would crash very soon because our fake response object is over-released by the iPhone OS. I don’t know why exactly this happens, this might be a bug in the iPhone OS (and also in MacOSX 10.5.x where the same thing happens. This works fine in 10.4.x and all older MacOSX releases) or caused by some undocumented internal dependencies between the different classes of the URL loading system. So we just store our fake response in the Cache. This makes sure that all response objects we return are really stored in the Cache and this is what the iPhone OS expects and then it won’t crash.

Update: It seems that it is also necessary that the “fake” response is initialized with a NSData object which has a size larger than zero.

2. Creating a new Cache:
We also need to create a new cache and tell the iPhone OS that it has to use this new cache instead of the default one so we really get called when the URL loading system checks the cache for a resource. This should be done before any of the UIWebView objects are starting to load web pages, very early within the launching process of the app.

NSString *path = ...// the path to the cache file
NSUInteger discCapacity = 10*1024*1024;
NSUInteger memoryCapacity = 512*1024;

FilteredWebCache *cache =
      [[FilteredWebCache alloc] initWithMemoryCapacity: memoryCapacity
                             diskCapacity: discCapacity diskPath:path];
[NSURLCache setSharedURLCache:cache];
[cache release];

We have to provide a path where the cache file is stored. The cache file is automatically created by the NSURLCache objects, so we don’t need to create the file, we only have to define where the file should be saved (this must be somewhere in the “sandbox” of our application, for example in the “tmp” folder or in the “Documents” folder).

This is all the magic to implement URL-based filters for UIWebView on the iPhone. You see, it’s not that complicated.

Note: If the filters can change while your app is running, you may need to remove the fake responses from the cache again. The NSURLCache class provides a method for this, so this isn’t a problem. If your filters are static, then you don’t need to care about this.

Posted in iPhone & iPod Touch, Programming.

Tagged with , , , .


78 Responses

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.

  1. iavian says

    Nice article. How to use FilteredWebCache in Webview ?

  2. Alexander Alexander says

    You mean WebView on the Mac? It works the same way. But you don’t need the FilteredWebCache here because on the Mac the WebView class has several delegate methods where you can intercept all HTTP requests and filter them out much easier.

    For example you can do all the filtering in the delegate method…
    webView:resource:willSendRequest:redirectResponse:fromDataSource:

  3. phoenix says

    Good dig.
    I just wondering why you don’t use the UIWebView delegate method to intercept? For example use

    (BOOL)webView:(UIWebView*)webView shouldStartLoadWithRequest:(NSURLRequest*)request navigationType:(UIWebViewNavigationType)navigationType

  4. Alexander Alexander says

    @phoenix
    This UIWebView delegate is only called for the main URL of a page (the HTML code) but not for embedded resources like images, stylesheets, external javascript code etc. This means for filtering advertising banners this delegate is useless because it is not called for these.

    On the Mac you would use WebView instead of UIWebView and for WebView there’s a delegate method which is called for all resources so here you can use the delegate and don’t need to use the Cache.

  5. iavian says

    Thanks for clarification , but how to use it on UIWebView [iPhone SDK] ?

  6. Alexander Alexander says

    You only need to initialize an object of the class FilteredWebCache and set it as “shared Cache” as explained in the blog post. Checking if a certain URL must be filtered is done in the “FilterMgr” class in my example, and this class must be implemented according to your requirements. UIWebView will internally call the shared cache object to find out if a resource needs to be loaded from the internet. So you don’t need to create any connections to your UIWebView objects, the iPhone OS already has all the required connections. You only need to set the FilteredWebCache as the “shared Cache” as soon as possible, before any of the UIWebView objects will load any data.

  7. iavian says

    I made it work 90% , however when it goes inside if block , the app crashes @ return [super cachedResponseForRequest:request];

  8. Alexander Alexander says

    @iavian
    OK, I’ve checked this again and you’re right. The problem seems to be the “fake” response which was created with an empty NSData object. If you create the NSData object with at least one byte (doesn’t matter which value it has), then it doesn’t crash anymore. I think this can be called a bug of the iPhone OS, because empty server responses are valid and there’s no reason why these shouldn’t be cached as well. Also in the Apple docs there’s nothing mentioned about any restrictions for the data object.

    In iCab Mobile I’ve used a static replacement object for filtered data, wrapped in an NSData object, so it was never empty. But for this tutorial these details are not important and so I just used an empty NSData object. I should have tested this before, but I didn’t expect that this would cause any crashes.

    I’ve updated the source of the blog post now. Now it will no longer crash.

  9. iavian says

    That works , that for the help & detailed article

  10. Mirko says

    Thanks for a great post. I have one question.
    What about POST requests, they are not cached (RFC 2616 section 13), so Safari will not even check the cache for POST requests.
    How to solve that problem?

  11. Alexander Alexander says

    @Mirko
    I think POST requests are usually not a real problem because these requests are usually coming from form submissions only. And the user usually submits the form himself/herself and it is unlikely that you need to filter these requests.

    But if you nevertheless need to filter these requests, you should know that UIWebView will call the delegate method

    - (BOOL)webView:shouldStartLoadWithRequest: navigationType:

    for form submissions (the “navigationType” argument will have the value UIWebViewNavigationTypeFormSubmitted in this case). And you only need to return NO in this delegate method to block the request.

  12. Mirko says

    Thanks Alexander! ;)

    I have one more short question, not really related to the post.
    How did you made iCabMobile Navigationbar?
    It looks like UINavigationBar with prompt displaying page title, but its thinner than regular UINavigationbar, also it contains two buttons and input, interface builder isn’t allowing me to do that.

    Thaks,
    Mirko

  13. Alexander Alexander says

    @Mirko
    You’re right, you can’t do this with IB. Basically this is just an empty UINavigationBar where the buttons, title and URL field are added as subviews programmatically. UINavigationBar is a subclass of UIView, so you can add subviews as you can do this with other UIViews.

  14. Mirko says

    Thanks for your answer Alex,
    I couldn’t get the label into navbar’s top view.
    Can you post your code for this if possible?

    Once again huge Thanks!

  15. Claus Kinkel says

    Thank you for your helpful articles! Good iPhone tutorials are really hard to find.

    I’m also interested in the code for programatically filling UINavigationBar items and title.

    Also can you please write an article about creating progressbar for UIWebView, there is really little information about that on the Web.

  16. Alexander Alexander says

    @Mirko & Claus Kinkel
    I’ll write an article about populating a UINavigationBar object. This is probable better than posting the code here in the comments. But it’s really easy, because you only need to use the UINavigationBar like an ordinary UIView object in which you place other objects.

  17. BiB1 says

    Hi,
    I try do play with webView, but i’m faced a small problem.
    Example, for a mail adresse : [request URL] contain “mailto:myName@myFai.com”
    [[request URL] scheme => give me the “prefix” : “mailto”
    but how can i get just “myName@myFai.com” ???

    Thanks
    BiB1

  18. Alexander Alexander says

    @BiB1
    The NSURL class als has the method “resourceSpecifier” which returns everything after the colon. So for your “mailto” URL it would return “myName@myFai.com”.

  19. Cheryl Lindsay says

    Hi,
    I have tried playing with caches but I have one problem.
    When I invoke reload method, [webView reload], caches are not used and my filters are not working. How can I handle UIWebView refreshes while still invoking my custom cache?
    If I just reopen URL it will create one history instance.
    I thought of rewriting UIWebView history mechanism because of this, but hope thats not needed :(

  20. Alexander Alexander says

    @Cheryl Lindsay
    Yes, reloading will bypass the cache. UIWebView won’t do a smart reload where it would check first if the data on the server is really newer than the data that is already in the cache. So all data is loaded from the internet.

    But you could use a simple line of JavaScript code instead of calling the “reload” method to do the reload. For example you could use

    [webView stringByEvaluatingJavaScriptFromString:@"location.replace(location.href)"];

    instead of

    [webView reload]

    to reload the page. The “location.replace()” function loads the current page again, but should replace the current history entry instead of adding a new entry to the history.

  21. Mark Aufflick says

    The pain with this approach (not that I have any better ideas) is that the first hit still results in a web download – so you save no time or bandwidth. In the simulator anyway (no firewall logs on the device for me to check, though i could setup a proxy).

  22. Alexander Alexander says

    @Mark Aufflick
    Are you sure? Why should the first request result in a download?

    The first request of a resource that is filtered will be answered by a newly created “fake” response from the web cache. So UIWebView should have no need to get the data from the web anymore.

    I’ve just checked this with a real device (iPod Touch) and a proxy which logs all the requests from my device. None of the “filtered” requests can be found in the proxy logs. So I don’t see any problems so far.

  23. Manu says

    Thanks a lot, clever hook for a closed source code.

  24. Dj says

    I was trying to cache a webpage but noticed that “- (void)storeCachedResponse:(NSCachedURLResponse *)cachedResponse forRequest:(NSURLRequest *)request
    ” method doesn’t store the cached response. I have customized the NSURLCache the way you have mentioned but still the cached response isn’t getting stored. As the cached response isn’t getting stored while retrieving the cache I am getting data as nil. What could be a possible solution to this issue..

  25. Alexander Alexander says

    @Dj
    I don’t know what exactly you’re doing. I think you could have done something wrong, or maybe my solution doesn’t match to your problem.
    You can send me an example project by email and I’ll check what’s going wrong.

  26. Dj says

    I am implementing an application that will use safari to browse the web. My module is to save the webpage appearing on the browser before the user quits the app. So for this I created an instance of NSURLConnection and invoked the initWithRequest method. The delegate methods are getting called properly. But when I try to fetch the cached response I am getting nil object . After observing closely , I found that -(NSCachedURLResponse *)connection:(NSURLConnection *)connection willCacheResponse:(NSCachedURLResponse *)cachedResponse
    method is invoked properly. But after I save the cached response storeCachedResponse method , it’s isnt getting stored in the cache.

  27. m says

    Thanks for much! I’ve been trying to do this with webView:shouldStartLoadWithRequest:navigationType but never got it to work properly. I’m going to give this a shot this weekend. Great post!

  28. m says

    wow. brilliant – works like a charm – you can filter out whatever you want!
    Thanks again!

  29. m says

    is there a way to know which UIWebView the request came from? So we could filter for some webviews, but not others?

  30. Alexander Alexander says

    @m
    No, you can’t see from which UIWebView the request came from.

  31. m says

    i would have expected using [ NSURLCache removeAllCachedResponses] would have cleared everything I’ve already filtered out. And if I disable filtering, it would download the entire page. But it doesnt appear to be so.
    If I use [webView reload] it redownloads everything. But I really wanted to avoid to 2 page loads for a UIWebView where I dont want filtering to take place.

  32. m says

    oh my previous post didnt work.
    When I display a webview where I dont want filtering to take place, I set a global and then bypass my filtering. Problem is that if that page uses something I’ve previously filtered out it doesnt reloaded it. So I’m using [webView reload] to make that happen (with 2 page loads – ugh!). I was hoping I could use removeAllCachedResponses to tell the cache to forget what I’ve filtered out previously.

  33. Alexander Alexander says

    @m
    Reloading via [webView reload] will bypass the cache and so it will always load everything. But if you reload a web page via javascript, you can reload the page in a way, so that the cache is still used and you can still filter out requests:
    [webView stringByEvaluatingJavaScriptFromString:@"location.replace(location.href)"]

    Another problem is that internally UIWebView uses memory caches as well, so even if you empty the NSURLCache, you may still get the “old” states because of the internal memory caches.

    In a normal web browser context where filters are usually used to filter out banner ads, the internal caches are not a big deal. The filters are usually more or less static and won’t change. But if the filters do change often, you may get in trouble…

  34. Paul says

    I have done something very similar here with subclassing NSURLCache. I’m trying to route http:// image requests through this NSURLCache method which only apparently tries to cache things that are in the bundle(?), so I replaced http:// with http___ in my view which triggers cachedResponseForRequest. Then I switch out the http___ with http:// and do the request and cache it to disk with Three20′s TTURLRequest. I am able to get the data from my request and put together the cache response, but the image which I’m trying to cache appears only as a missing image from the UIWebView that is requesting it. Has anyone run into this before? It’s driving me batty. I can post code if necessary.

    Thanks
    Paul

  35. Alexander Alexander says

    @Paul
    I’m not exactly sure what exactly you’re doing. If you start your own request within the cachedResponseForRequest method, you may have some problems if your request is done asynchronically, because then the WebView that has called the cachedResponseForRequest method will process the image before it is actually loaded, and so you only get the missing image placeholder.

  36. Taeyun Kim says

    It’s a great article.
    BTW, I want to supply alternative data ‘asynchronously’ from the other web sites rather than empty or local one synchronously. But the above method can only supply data synchronously since NSCachedURLResponse object must have the data before cachedResponseForRequest function returns.
    I hooked NSURLConnection’s init*() methods (ex: initWithRequest) using Objective C’s method swizzling, but the methods were not called. WebView does not seem to use NSURLConnection at all.
    If you have any suggestion or hint, please let me know.
    Thanks in advance.

  37. Michael says

    Hey, great blog post, maybe only one of its kind on the Internet…?

    Anyways, I’ve managed to block images from loading in my UIWebView, but in their place I have those ugly little blue squares with a question mark.

    Any suggestions on how to just get a blank square instead of the little blue placeholder?

    Thanks in advance (BTW I bought your app lol so you owe me this answer)

  38. Alexander Alexander says

    @Michael
    In my example I’m just returning an empty “dummy” response instead of an image from within the Cache object. But I think if you would return the data of a real image file (like a 1×1 pixel fully transparent GIF image file), you won’t get these “missing image” icons anymore.

    But please note that in many cases it can be nevertheless a good idea to let UIWebView show the “missing image” icons. This shows the user that some objects are not loaded. So if the users misses some important information, the “missing image” icons can help to show the user that a filter might be responsible for this and switching off the filters might solve the issue. If there’s no indication at all that something is missing, the user might be confused.

  39. Gabriel says

    Hello, thanks for sharing.
    What if you must filter the actual contents of the response? not only from rules on the URL? In the code presented in this article, I can’t see how to edit/parse the actual response data.
    Thanks.

  40. Alexander Alexander says

    @Gabriel
    Filtering the content of a web page is a totally different task. When using UIWebView you can only modify the content after the page is loaded and this can only be done using JavaScript. Using JavaScript you can modify the page content at code level and DOM level. This can be difficult if you need to do this on all web pages that are loaded. In case you do exactly which pages are loaded and have to be modified, you can explicitly target the code structure of this page.

  41. Gabriel says

    @Alexander
    I have thought of the JS approach, but here are two remarks (suppose I am writing some Parental-Control Child-Safe Browser app):

    1) The JS code that your application can make the UIWebView execute, is run after the page has been completed loaded (ie . after the body onload event), so that you cannot actually prevent other inline onload handlers written in the HTML source, for example.

    2) It is quite complex to parse in JS the whole HTML file for specific keywords or phrases in order to finally strip them out or completely blank out the page, whereas it is pretty simple to do so while working on an NSString content, prior to handing it over the UIWebView.

    Would you recommend another approach, more low-level than the JS one,
    Thanks a lot.

  42. Alexander Alexander says

    @Gabriel
    You’re right. But the problem is that you don’t have many choices on iOS. The only way to manipulate the web page that is shown in a UIWebView object is through JavaScript. Apple has removed all other ways to access the content, which you might know from the Mac APIs.

    If you need to manipulate the content before it is displayed, you could load the data manually yourself (using the NSURLConnection API) and then feed the result to UIWebView. But this can get extremely complicated, because you have to parse the content yourself to find all the references to external files and you need to load them yourself as well. And after all the files are loaded, filtered and modified according to your needs, you have to combine all the stuff again so that a valid web page with valid references to these external files can be passed to UIWebView.

    Maybe you can hide the UIWebView while loading the web page, manipulate the page after the page load has finished and afterwards show the page to the user. This could be a much easier solution that loading all the stuff yourself, like described above. But in all cases, it’s most likely no longer possible to allow the user to navigate and interact with the page while it is still loading.

    My blog post addresses the “normal” filtering, to block ads and similar stuff. Here the URL-based filtering is usually working just fine. If you need to filter the content itself, you have to do much more and this can get messy and complicated.

  43. Polo says

    Hello! You this article is too great! I have a problem, hoping to get your help! So I would be very grateful! Is the case, I use UIWebView embedded in a web page, but the page inside the image is too big to As for my memory is increasing, and the final crash! you help me solve this problem? Thank you!

  44. Alexander Alexander says

    If the image is so big that it can not be loaded/decoded without running out of memory, than you can not do anything here. The only choice would be to reduce the size of the image that is embedded. And the important size is the dimension (width and height) of the image, not the file size.

    A rough estimate about the final memory requirement for an image is width * height * 4 Bytes (each pixels takes one byte for red, green, blue and the alpha channel). So if an image is 1000*4000 pixels large, it needs about 16 MB of RAM. While decoding the image, additional buffers might need additional memory.

    If you deal with images yourself, make sure that you release images you don’t need anymore as soon as possible. Don’t rely on the default autorelease pools to clean up the memory, when you’re creating lots of temporary objects in loops. The default autorelease pools can do their work only after your delegate and action methods have given control back the the system. In general make sure that you don’t forget to release or autorelease objects you’ve created.

  45. david says

    Hey, can someone provide an example or something that has successfully set this up on IPhone. My UIWebView continues to load everything even though my FilterURLCache tells it not too which is being called, its just not doing anything.

    Thanks

  46. Alexander Alexander says

    @david
    If your FilterURLCache methods are called then the filter should work just fine. Please note that the UIWebView object also uses memory caches, so as long as a certain image is still in the memory cache, it is still displayed. Which means if you create a filter after a web page was loaded, almost everything of the web site will be in the memory cache and displayed even when you load the page again afterwards. Also using the “reload” method of the UIWebView object will always bypass the FilterURLCache.

    Also make sure you create the new cache and configure it as a new default cache before you’re loading web pages within your UIWebView.

  47. JC says

    Are you using a variation on this method with NSURLCache to save pages for “offline viewing?” Was wondering how to store the contents of a web page without having to reload each element of the page (and parse through and determine all of the images, .cs, .js etc).

  48. Alexander Alexander says

    @JC
    If you overwrite the method “storeCachedResponse:forRequest:” of the NSURLCache class, you’ll get the data of the files of a web page while it is loaded. The data can be retrieved from the “cached Response” argument.

  49. KH says

    Alexander, Great post.

    Question: Would it be possible to stream contents from a local resource embedded into the application as the response? Let’s say there is a javascript (Resource/abc.js) file that is in the application “Resource” and if the UIWebView page requests for this file, stream it from the local resource instead of getting the file remotely?

  50. Alexander Alexander says

    @KH
    Yes and no :-) You can’t stream it directly from the local resource, but you could save the data of the local resource in the cache if it is not yet in the cache. And then just return the cached data. This way you don’t need to load the data from the internet and can feed UIWebView with your local data instead.

1 2



Some HTML is OK

or, reply to this post via trackback.