java - What is the best way to check each link of a website? -
I want to create a crawler that follows every link on a site, and the URL checks to see if it Works or not. Now my code is , what is the best way to create a crawler? Use an HTML parser. You want to send a HEAD instead of that loop to make it more efficient, but then you instead use url.openStream () .
set
& lt; String & gt; Valid links = new hashset & lt; String & gt; (); Set up & lt; String & gt; InvalidLinks = NewHashet & lt; String & gt; (); Document Document = Jsoup.connect ("http://example.com") .get (); Element link = document. Select ("a"); (String link: links) {string url = link.bsr ("href"); If (! Legitimate link link (url) & illegal link link (url)) {iit status code = jsoup.connect (url) .execute (). Statuscode (); If (200 & lt; = statusCode & amp; status code & lt; 400) {validLinks.add (url); } And {invalidLinks.add (url); }} Hold (exception e) {invalidLinks.add (url); }}}
URLConnection as Jason Do not support it by design (a HEAD does not give any content)
Comments
Post a Comment