用 PHP 脚本批量下载谷歌音乐
转载
网上批量下谷歌音乐的很多,但是我信不过,谁知道里面放了多少马
功能是按艺人下其所有专辑,后来发现下载大约 200 首后要填验证码,最简单的方法是……重新拨号……我只针对自家的中兴831(前几年北京网通送的)做的
外部调用 curl 和 wget,如果在 windows 下运行,估计需要把下载文件名转成 GBK,不过我没做,谁让家里功率最小的一台电脑是 Ubuntu 呢
其实谷歌的音乐质量实在是糟糕透顶,编码混乱,曲目不全(取决与买下的唱片公司版权),但好就好在省事
八年前 IBM 45G 钴玻璃硬盘和 Seagate 80G 相继挂掉,痛失很多东西,很多作品都没了,甚至连丢了 10G MP3 也很郁闷,当时在论坛里感慨麻烦的不是下多少 G 音乐,而是曲目列表是筛选了很久的,这个损失要比 MP3 本身大很多
有位叫 [url=http://tigerlee.me/]tiger.lee[/url] 的网友指出取网页标题的 getTitle 函数处理中文有问题,如果你也碰到同样的问题可以把
[code]
$sFind = trim(str_replace("- 谷歌音乐搜索", "", $sFind));
[/code]
替换为
[code]
$sFind = trim(preg_replace("/ - .*/", "", $sFind));
[/code]
详情见评论,下面是脚本正文
[phpcode]
$aURL = array(
"http://www.google.cn/music/artist?id=Adb2bebd95c26ab51", // Daydream
"http://www.google.cn/music/artist?id=A7cdc1fd29ed38664", // Within Temptation
"http://www.google.cn/music/artist?id=A45d890322e5b4818", // Paul Simon
"http://www.google.cn/music/artist?id=A0589e046b4d8b04d", // Secret Garden
"http://www.google.cn/music/artist?id=A1710a233e9073df8", // Enigma
"http://www.google.cn/music/artist?id=A93bd6a3bf469e09f", // Bob Dylan
"http://www.google.cn/music/artist?id=Ae2300d8b0232c06c", // Sarah Brightman
"http://www.google.cn/music/artist?id=A9ba1ff50b05e1da1", // Nightwish
"http://www.google.cn/music/artist?id=A2d10d82d9f6cbc05", // Nine Inch Nails
"http://www.google.cn/music/artist?id=Af482c993463f67fe", // Marilyn Manson
"http://www.google.cn/music/artist?id=A14b22d6c4a40d18a", // Metallica
"http://www.google.cn/music/artist?id=Ad8854f0fe8a5bdf2", // 孟庭苇
"http://www.google.cn/music/artist?id=Af288bddaf3d525ee", // 张学友
"http://www.google.cn/music/artist?id=A4907197a9f9f6d64", // 田震
"http://www.google.cn/music/artist?id=A3234f5bba37cfd44", // 女子十二乐坊
"http://www.google.cn/music/artist?id=Ac78376a4f9d9387c", // 刘若英
"http://www.google.cn/music/artist?id=Afd4f4777602bc2cf", // Santana
);
// 工具函数
function colorspan($sMessage, $iColor = 2) {
echo "\033[3".$iColor."m".$sMessage."\033[0m";
}
function strdig($sHaystack, $sNeedle, $bReverse = FALSE) {
$iPos = strpos($sHaystack, $sNeedle);
if ($iPos === FALSE) {
return $sHaystack;
}
if ($bReverse) {
return substr($sHaystack, 0, $iPos);
}
$iPos += strlen($sNeedle);
return substr($sHaystack, $iPos);
}
function getInner($sContent, $sStart, $sEnd) {
$sContent = strdig($sContent, $sStart);
$sContent = strdig($sContent, $sEnd, TRUE);
return $sContent;
}
function getTitle($sPage) {
$sFind = getInner($sPage, "<title>", "</title>");
$sFind = trim(str_replace("- 谷歌音乐搜索", "", $sFind));
$sFind = html_entity_decode($sFind, ENT_COMPAT, "UTF-8");
return $sFind;
}
function slashfilter($sContent) {
return str_replace(array("\\", "/", ":"), "", $sContent);
}
function debugLog($sContent = "", $sFileName = "/tmp/php_debug.txt") {
if (is_array($sFileName)) {
$sFileName = print_r($sFileName, TRUE);
}
$sMode = "a";
if (file_exists($sFileName)) {
clearstatcache();
$iFilesize = filesize($sFileName);
if (($iFilesize < 1)||($iFilesize > 900000000)) { // 900M
$sMode = "w";
}
}
if ($hFile = @fopen($sFileName, $sMode."b")) {
fwrite($hFile, $sContent);
fclose($hFile);
}
}
foreach ($aURL as $sURL) {
$aInfo = array();
// 取 艺人 和 专辑列表
$sArtistPage = file_get_contents($sURL);
$aInfo["Artist"] = getTitle($sArtistPage);
$sAlbumAll = getInner($sArtistPage, "所有专辑", "Recommendation");
$sPattern = "/\\<\\!\\-\\-freemusic\\/album\\/result\\/(.*)\\-\\-\\>/";
preg_match_all($sPattern, $sAlbumAll, $aAlbum);
$aAlbumHash = $aAlbum[1];
// 取 专辑页
foreach ($aAlbumHash as $sAlbumHash) {
echo $sAlbumHash;
echo "\n";
$sURL = "http://www.google.cn/music/album?id=".$sAlbumHash;
$sAlbumPage = file_get_contents($sURL);
$aInfo["Album"] = getTitle($sAlbumPage);
$sAlbumDir = implode(DIRECTORY_SEPARATOR, $aInfo);
if (is_dir($sAlbumDir)) {
// continue;
}
$aSongHash = explode("freemusic/song/result", $sAlbumPage);
unset($aSongHash[0]);
// 取 歌曲
$iTrack = 0;
foreach ($aSongHash as $sSongPage) {
$iTrack++;
$bNotDown = FALSE;
// 取 歌曲名
$sStart = "");\">";
$sSongName = strdig($sSongPage, $sStart);
$sSongName = getInner($sSongName, $sStart, "</a>");
$sSongName = html_entity_decode($sSongName, ENT_COMPAT, "UTF-8");
$aInfo["Song"] = sprintf("%02d", $iTrack)." ".$sSongName;
$sFile = implode(DIRECTORY_SEPARATOR, $aInfo).".mp3";
$sFile = str_replace("\"", "\\\"", $sFile);
if (file_exists($sFile)&&(filesize($sFile) > 10)) {
colorspan("\n".$sFile." 文件已下载,跳过\n", 3);
continue;
}
// 取下载地址
$sSongHash = getInner($sSongPage, """, """);
$bNotDown = FALSE;
while (1) {
$sURL = "http://www.google.cn/music/top100/musicdownload?id=".$sSongHash;
$sSongPage = file_get_contents($sURL);
file_put_contents("last.html", $sSongPage);
if (strpos($sSongPage, "暂不支持下载") !== FALSE) {
colorspan("\n".$sFile." 暂不支持下载\n", 4);
$bNotDown = TRUE;
break;
}
$sStart = "<a href=\"";
$sSongURL = strdig($sSongPage, $sStart);
$sSongURL = strdig($sSongURL, $sStart);
$sSongURL = getInner($sSongURL, $sStart, "\">");
$sSongURL = "http://www.google.cn".htmlspecialchars_decode($sSongURL);
$aInfo = array_map("slashfilter", $aInfo);
// 如果出现校验码,让猫重新拨号
if (strlen($sSongPage) > 1000 && strpos($sSongPage, "captcha") === FALSE) {
break;
}
print_r($aInfo);
colorspan("\n出现验证码,尝试重新拨号", 1);
exit;
$sCmd = "curl -s --user user:password http://192.168.0.1:8081/disconnect.cgi";
exec($sCmd);
do {
echo ".";
sleep(10);
$sCmd = "curl -s --user user:password http://192.168.0.1:8081/connect.html | grep pppstatus=\\'Up";
$sConnect = exec($sCmd);
} while (strlen($sConnect) < 10);
colorspan("\n重新拨号完毕,60 秒后开始重新下载", 1);
sleep(60);
}
if (!empty($bNotDown)) {
continue;
}
// 相应创建目录,准备下载
colorspan($sFile);
echo "\n\n";
$sWgetArg = "";
if (file_exists("limit.txt")) {
// 如果想限速就在脚本同目录下创建 limit.txt 文件,不限了就再删掉
$sWgetArg = "--limit-rate=150k ";
}
echo $sCmd = "wget ".$sWgetArg."-c \"".$sSongURL."\" -O \"".$sFile."\"";
echo "\n\n";
debugLog(print_r($aInfo, TRUE), "log.txt");
debugLog($sSongURL."\n", "log.txt");
debugLog($sFile."\n\n", "log.txt");
$sDir = dirname($sFile);
if (!is_dir($sDir)) {
mkdir($sDir, 0777, TRUE);
}
exec($sCmd);
}
}
}
[/phpcode]
[b]Update in 2009-11-12: 程序更新。发现 HTML 标签里的 freemusic_album_result 被改成 freemusic/album/result 了,搞不明白这种举动有何意义[/b]