How do I read the contents of a remote web page?

You can include static txt and HTML files from remote servers by using a component (such as AspHTTP, ASPTear 1.50, or VB's built in InetCtrls) to parse the remote URL's content. 

 

You can also try this method out; it was tested with the MSXML objects which are installed with Windows 2000. You should make sure you have the latest versions of MSXML and XML Core Services (see MSXML Downloads). If you download the newer version, take special note of the new ProgID you should be using -- MSXML 4.0 now supports side-by-side installation, which means the ProgID below will actually use the older version. 

 

<% 

    url = "http://www.espn.com/main.html" 

    set xmlhttp = CreateObject("MSXML2.ServerXMLHTTP") 

    xmlhttp.open "GET", url, false 

    xmlhttp.send "" 

    Response.write xmlhttp.responseText 

    set xmlhttp = nothing 

%>

 

And here it is in JavaScript: 

 

<script language=javascript runat=server> 

    var url = "http://www.espn.com/main.html"; 

    var xmlhttp = new ActiveXObject("MSXML2.ServerXMLHTTP"); 

    xmlhttp.open("GET", url, 0); 

    xmlhttp.send(""); 

    Response.Write(xmlhttp.responseText); 

    var xmlhttp = null; 

</script>

 

If you use a URL that doesn't exist, or you are behind a firewall that blocks certain web sites, or the site is behind a firewall that blocks traffic to port 80 / 443, or you are using a proxy server, or the site requires authentication, you will receive this error: 

 

msxml4.dll (0x80072EE7) 

Server name or address could not be resolved

 

To correct, you will have to figure out which of the issue(s) is standing in your way, and discuss workarounds with your or their network administrator(s). 

 

Don't forget that if your remote page has relative image URLs, or style sheets, or JavaScript files, or frames, or links, it won't work perfectly when ported to your server(s). To overcome this, you'll want to add a BASE HREF tag to keep all the images coming from the correct location. For example, the above code (which gets all the text from espn.com, but is formatted weird and doesn't function 100% as intended), is modified only slightly to work correctly: 

 

<% 

    url = "http://www.espn.com/main.html" 

 

    ' add a BASE HREF tag 

    Response.write "<BASE HREF='" & url & "'>" 

 

    set xmlhttp = CreateObject("MSXML2.ServerXMLHTTP") 

    xmlhttp.open "GET", url, false 

    xmlhttp.send "" 

    Response.write xmlhttp.responseText 

    set xmlhttp = nothing 

%>

 

For information on increasing or decreasing the time allowed for the XMLHTTP objects to retrieve a response from a remote server, see Article #2407. 

 

If you need to POST data you can so by adding a header that tells the receiver you're sending FORM data: 

 

<% 

    url = "http://www.espn.com/main.html" 

    set xmlhttp = CreateObject("MSXML2.ServerXMLHTTP") 

    xmlhttp.open "POST", url, false 

    xmlhttp.setRequestHeader "Content-Type", "application/x-www-form-urlencoded" 

    xmlhttp.send "x=1&y=2" 

    Response.write xmlhttp.responseText 

    set xmlhttp = nothing 

%>

 

Another thing you may want to do, going back to the original script, is make sure the server is there! If not, you can display a message... and you can customize it to display whether the server was not found at all, or if the server was found but you got a bad response (e.g. a 404 Page Not Found). Note that if you do not need to parse the content of the remote web page, that using the HEAD method here is far more efficient than using GET or POST... since only the headers are retrieved from the remote server, not any of the content. 

 

<%  

    ' deliberate typo:  

    url = "http://www.espn.co/main.html"  

 

    set xmlhttp = CreateObject("MSXML2.ServerXMLHTTP")  

    on error resume next  

    xmlhttp.open "HEAD", url, false  

    xmlhttp.send ""  

    status = xmlhttp.status 

    if err.number <> 0 or status <> 200 then 

        if status = 404 then 

            Response.Write "Page does not exist (404)." 

        elseif status >= 401 and status < 402 then 

            Response.Write "Access denied (401)." 

        elseif status >= 500 and status <= 600 then 

            Response.Write "500 Internal Server Error on remote site." 

        else 

            Response.write "Server is down or does not exist." 

        end if 

    else  

        Response.Write "Server is up and URL is available."  

    end if  

    set xmlhttp = nothing  

%>

 

You might want to parse the results, instead of sending them straight to the client: 

 

<% 

    url = "http://www.espn.com/main.html"  

    set xmlhttp = CreateObject("MSXML2.ServerXMLHTTP")  

    on error resume next 

    xmlhttp.open "GET", url, false 

    xmlhttp.send ""  

    if err.number <> 0 then 

        response.write "Url not found" 

    else 

        if instr(xmlhttp.responseText,"Stanley Cup")>0 then 

            response.write "There's a story about the playoffs." 

            response.write "<a href=" & url & ">Go there</a>?" 

        else 

            response.write "There is no story about the playoffs." 

        end if 

    end if 

    set xmlhttp = nothing 

%>

 

You may be interested in performing an asynchronous request, e.g. hitting an ASP page that acts like a batch file that gets fired but does not need to return any results. You can simply change the third parameter of the open call to TRUE (and leave out the reference to the responseText value): 

 

<% 

    url = "http://www.espn.com/main.html" 

    set xmlhttp = CreateObject("MSXML2.ServerXMLHTTP") 

    xmlhttp.open "GET", url, true 

    xmlhttp.send "" 

    set xmlhttp = nothing 

%>

 

Finally, you may want to spoof your user agent, since the MSXML object sends something like "Mozilla/4.0 (compatible; Win32; WinHttp.WinHttpRequest.5)" -- many sites will view this as a spider or 'screen scraper', and for various reasons, might present alternate content -- here are two samples: 

 

<% 

    url = "http://www.espn.com/main.html"  

 

 

    ' this sample posts as the actual browser being used: 

 

 

    br = Request.ServerVariables("HTTP_USER_AGENT") 

    set xmlhttp = CreateObject("MSXML2.ServerXMLHTTP")  

    on error resume next 

    xmlhttp.open "GET", url, false 

    xmlhttp.setRequestHeader "User-Agent",br 

    xmlhttp.send ""  

    if err.number <> 0 then 

        response.write "Url not found" 

    else 

        response.write xmlhttp.responseText 

    end if 

    set xmlhttp = nothing 

 

 

 

    ' this sample posts as "My funky browser." 

 

 

    set xmlhttp = CreateObject("MSXML2.ServerXMLHTTP")  

    on error resume next 

    xmlhttp.open "GET", url, false 

    xmlhttp.setRequestHeader "User-Agent","My funky browser." 

    xmlhttp.send ""  

    if err.number <> 0 then 

        response.write "Url not found" 

    else 

        response.write xmlhttp.responseText 

    end if 

    set xmlhttp = nothing 

%>

 

 

If you encounter errors... you can use ParseError to determine the problem. 

 

<% 

    set xmlhttp = CreateObject("MSXML2.ServerXMLHTTP")  

    ' ... stuff ... 

    on error resume next 

    xmlhttp.send ""  

    if err.number <> 0 then 

        response.write "Error: " & xmlhttp.parseError.URL & _ 

            "<br>" & xmlhttp.parseError.Reason 

        response.end 

    end if 

    ' ... stuff ... 

%>

 

A common error you might receive: 

 

msxml3.dll error '80072efd'  

A connection with the server could not be established

 

Make sure that the URL is actually reachable. You may have spelled the domain name wrong, or the site may actually be down. 

 

Test using a browser from that machine, or simply running a tracert / ping. Note that ping won't always return results, because many sites block all such traffic (mainly to help eliminate DOS attacks). However, ping should at least let you know the IP address, which means that the domain name was resolved correctly through DNS. Otherwise, it might be that your DNS server is preventing connection.

'Development > ASP' 카테고리의 다른 글

ASP 에서 XML파서를 이용한 AJAX 한글처리  (0) 2013.05.30
XMLHTTP  (0) 2013.05.30
MSXML2.ServerXMLHTTP  (0) 2013.05.30
ASP Dictionary Object  (0) 2012.10.17
ASP, VBScript 형변환, 날짜/시간함수  (0) 2012.05.23
ADO 객체 메소드, 속성  (0) 2012.04.17

+ Recent posts