java - What is the best way to check each link of a website? -


I want to create a crawler that follows every link on a site, and the URL checks to see if it Works or not. Now my code is url.openStream () .

, what is the best way to create a crawler?

Use an HTML parser.

 set  & lt; String & gt; Valid links = new hashset & lt; String & gt; (); Set up & lt; String & gt; InvalidLinks = NewHashet & lt; String & gt; (); Document Document = Jsoup.connect ("http://example.com") .get (); Element link = document. Select ("a"); (String link: links) {string url = link.bsr ("href"); If (! Legitimate link link (url) & illegal link link (url)) {iit status code = jsoup.connect (url) .execute (). Statuscode (); If (200 & lt; = statusCode & amp; status code & lt; 400) {validLinks.add (url); } And {invalidLinks.add (url); }} Hold (exception e) {invalidLinks.add (url); }}}   

You want to send a HEAD instead of that loop to make it more efficient, but then you instead use URLConnection as Jason Do not support it by design (a HEAD does not give any content)

Comments

Popular posts from this blog

mysql - BLOB/TEXT column 'value' used in key specification without a key length -

c# - Using Vici cool Storage with monodroid -

python - referencing a variable in another function? -