java - What is the best way to check each link of a website? -
 I want to create a crawler that follows every link on a site, and the URL checks to see if it Works or not. Now my code is  , what is the best way to create a crawler?       Use an HTML parser.    You want to send a HEAD instead of that loop to make it more efficient, but then you instead use  url.openStream () .   set 
 & lt; String & gt; Valid links = new hashset & lt; String & gt; (); Set up & lt; String & gt; InvalidLinks = NewHashet & lt; String & gt; (); Document Document = Jsoup.connect ("http://example.com") .get (); Element link = document. Select ("a"); (String link: links) {string url = link.bsr ("href"); If (! Legitimate link link (url) & illegal link link (url)) {iit status code = jsoup.connect (url) .execute (). Statuscode (); If (200 & lt; = statusCode & amp; status code & lt; 400) {validLinks.add (url); } And {invalidLinks.add (url); }} Hold (exception e) {invalidLinks.add (url); }}}    URLConnection  as Jason Do not support it by design (a HEAD does not give any content)   
 
  
Comments
Post a Comment