So, I am running through a list of URLs to check if they are dead or redirected, and then logging the results. I also have some exceptions, to mark domains that redirect to places like godaddy.com or hugedomains.com as dead, as they basically are.
My issue, is that it's spotty. For example, the domains
- custommarbleproducts.com
- danielharderandsons.com
Redirect to these:
I try to filter out "?reqp=1&reqr=" and it works some of the time. I can run the script and out of ten dead/redirected URLs, four will be marked dead, and then re-run and have either three or five mark as dead (and being different results, one marked dead last time might not this time), I am looking for more consistent results. Here is the funcitons:
function get_url_status($url) {
$cookie = realpath(dirname(__FILE__)) . "/cookie.txt";
file_put_contents($cookie, "");
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_NOBODY, 1);
if ($curl = curl_init()) {
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, FALSE);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); // follow redirects
curl_setopt($ch, CURLOPT_AUTOREFERER, 1); // set referer on redirect
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (X11; Linux x86_64; rv:58.0) Gecko/20100101 Firefox/58.0');
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 20);
curl_setopt($ch, CURLOPT_TIMEOUT, 30);
curl_exec($ch);
$http_code = curl_getinfo($ch, CURLINFO_HTTP_CODE);
$final_url = curl_getinfo($ch, CURLINFO_EFFECTIVE_URL);
curl_close($ch);
if ((strpos($final_url, "hugedomains.com") !== FALSE) ||
(strpos($final_url, "namecheap.com") !== FALSE) ||
(strpos($final_url, "uniregistry.com") !== FALSE) ||
(strpos($final_url, "afternic.com") !== FALSE) ||
(strpos($final_url, "buydomains.com") !== FALSE) ||
(strpos($final_url, "/?nr=0") !== FALSE) ||
(strpos($final_url, "?reqp=1&reqr=") !== FALSE) ||
(strpos($final_url, "godaddy.com") !== FALSE)) {
return 'dead';
}
if (in_array($http_code, array('404', '403', '500', '0'))) {
return 'dead';
} elseif (($http_code == 200) || ($url == $final_url)) {
return 'ok';
} elseif ($http_code > 300 || $http_code < 400) {
return $final_url;
} else {
return '';
}
}
}
function quote_string($string) {
$string = str_replace('"', "'", $string);
$string = str_replace('&', '&', $string);
$string = str_replace(' ', ' ', $string);
$string = preg_replace('!\s+!', ' ', $string);
return '"' . trim($string) . '"';
}
Does anyone have any ideas to make this more reliable?
Aucun commentaire:
Enregistrer un commentaire