Microsoft & .NETVisual BasicHow to Snatch HTML Using Visual Basic Code!

How to Snatch HTML Using Visual Basic Code! content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

Need to visit a competitor Web page and parse out the latest rival product prices? Looking to retrieve data from a company that hasn’t yet figured out Web services? Whatever your motives, if you’re looking to grab the HTML of a Web page, the following little function should be able to help.

Just call the following GetPageHTML function, passing in the URL of the page you want to retrieve. It’ll return a string containing the HTML:

Public Function GetPageHTML( _
           ByVal URL As String) As String
  ' Retrieves the HTML from the specified URL
  Dim objWC As New System.Net.WebClient()
  Return New System.Text.UTF8Encoding().GetString( _
End Function

Here’s an example of its usage:

strHTML = _

An extremely short function, but incredibly useful.

How to Snatch HTML, with a Timeout

The last function is great for many applications. You pass it a URL, and it’ll work on grabbing the page HTML. The problem is that it will keep trying until it either times out or retrieves the page.

Sometimes, you don’t have that luxury. Say you’re running a Web site that needs to retrieve the HTML, parse it, and display results to a user. You can’t wait two minutes for the server to respond, then download the page and feed it back to your visitor. You need a response within ten seconds—or not at all.

Unfortunately, despite numerous developer claims to the contrary, this cannot be done through the WebClient class. Rather, you need to use some of the more in-depth System.Net classes to handle the situation. Here’s my offering, wrapped into a handy little function:

Public Function GetPageHTML(ByVal URL As String, _
      Optional ByVal TimeoutSeconds As Integer = 10) _
      As String
   ' Retrieves the HTML from the specified URL,
   ' using a default timeout of 10 seconds
   Dim objRequest As Net.WebRequest
   Dim objResponse As Net.WebResponse
   Dim objStreamReceive As System.IO.Stream
   Dim objEncoding As System.Text.Encoding
   Dim objStreamRead As System.IO.StreamReader

       ' Setup our Web request
       objRequest = Net.WebRequest.Create(URL)
       objRequest.Timeout = TimeoutSeconds * 1000
       ' Retrieve data from request
       objResponse = objRequest.GetResponse
       objStreamReceive = objResponse.GetResponseStream
       objEncoding = System.Text.Encoding.GetEncoding( _
       objStreamRead = New System.IO.StreamReader( _
           objStreamReceive, objEncoding)
       ' Set function return value
       GetPageHTML = objStreamRead.ReadToEnd()
       ' Check if available, then close response
       If Not objResponse Is Nothing Then
       End If
       ' Error occured grabbing data, simply return nothing
       Return ""
   End Try
End Function

Here, the code creates objects to request the data from the Web, setting the absolute server timeout. If the machine responds within the given timeframe, the response is fed into a stream, converted into the UTF8 text format we all understand, and then passed back as the result of the function. You can use it a little like this:

strHTML = GetPageHTML("", 5)

Admittedly, this all seems like a lot of work just to add a timeout. But it does its job—and well. Enjoy!

TOP TIP Remember, the timeout we’ve added is for our request to be acknowledged by the server, rather than for the full HTML to have been received.

About the Author

Karl Moore (MCSD, MVP) is an experience author living in Yorkshire, England. He is author of numerous technology books, including the new Ultimate VB .NET and ASP.NET Code Book (ISBN 1-59059-106-2), plus regularly features at industry conferences and on BBC radio. Moore also runs his own creative consultancy, White Cliff Computing Ltd. Visit his official Web site at

# # #

Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends & analysis

Latest Posts

Related Stories