本帖最后由 ttgogo 于 2021-05-02 19:38 编辑
前言在正常情况下(不使用其他工具或插件),Web端的bilibili似乎无法(彻底白嫖)下载视频,遂学习了如何利用Python爬虫下载b站视频(不包括会员视频),详情(手法)且看下文。 在分析b站网页源代码的过程中发现其视频和音频是分开的,下载后一个只有声音,一个只有画面,这显然不能满足我们的要求。解决方案是:利用 ffmpeg 这款强大的开源工具把下载后的音视频进行合并。故想要完美体验,先得下载安装并配置好 ffmpeg 。(到官网下载,解压后把文件夹内的bin 添加到环境变量) Python中使用到的模块有:requests 、re 、json 、subprocess 、os
准备工作视频的url 比较显眼,容易获取。headers 也好找,但还需要一重要信息。 通过浏览器(F12)查看分析目标网页,找到我们的下一目标,即视(音)频下载链接。 一番查找后,发现在head 里的第四个script 标签内似乎有我们想要的东西。 data:image/s3,"s3://crabby-images/c13ed/c13ed49d4380e483478f9e111c0c958d3aa0e1a2" alt="image-20210321234819818"
可访问此链接,却出现403,即没有权限访问此站。 data:image/s3,"s3://crabby-images/2e5c7/2e5c755770a6d0b735169fabb687143972800e1c" alt="image-20210321235102571"
这又怎么回事?查看Request Headers 信息,发现没有referer这一项,于是尝试在数据包中加上referer信息看能否访问。(这里直接上bp了) data:image/s3,"s3://crabby-images/01014/0101400de996e7a9360faeed876f21311b1b7f4a" alt="image-20210321235900739"
Forward后,出现文件下载页面。 data:image/s3,"s3://crabby-images/b23ea/b23eae35b18b39375e901857bb0a5299f31e7509" alt="image-20210322000022154"
下载后打开改文件,确为目标视频。 data:image/s3,"s3://crabby-images/9135f/9135f17a89a10bcf6cc707f1b9b642227c20eef7" alt="image-20210322000250594"
获取数据通过requests库向目标站点发起请求,请求需包含header、referer等信息,以伪装成是浏览器发出请求。如果服务器能正常响应,会得到一个Response,便是所要获取的页面内容。 测试代码: import requests
headers = {"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36","referer": "https://message.bilibili.com/"
}
def send_request(url):
response = requests.get(url=url, headers=headers) #发送get请求,获得响应
return response
html_data = send_request("https://www.bilibili.com/video/BV1Qy4y147H1").text
print(html_data)
运行结果: data:image/s3,"s3://crabby-images/a9243/a9243ea0ef58e73fb6dce48d04c43d27b712df1e" alt="image-20210322002017006"
解析内容得到的内容可能是HTML、json等格式,可以用页面解析库、正则表达式等进行解析。 title信息比较好找,就在head中。 data:image/s3,"s3://crabby-images/55aa5/55aa544efdc3af5284019e55960e43ca964e793f" alt="image-20210322003021075"
利用正则表达式对其进行提取。 title = re.findall('<title>(.*?)</title>',html_data)[0].replace("_哔哩哔哩 (゜-゜)つロ 干杯~-bilibili",""
音视频下载链接在json数据中。 data:image/s3,"s3://crabby-images/aa89e/aa89eb11fc4f066a3d835325d7fd560975b0e236" alt="image-20210322003718230"
利用正则表达式和字典(列表)的“键”对其提取。 json_data = re.findall(r'<script>window.__playinfo__=(.*?)</script>',html_data)[0]
json_data = json.loads(json_data) #解码 JSON 数据,返回 Python 字段的数据类型。
audio_url = json_data["data"]["dash"]["audio"][0]["backupUrl"][0]
video_url = json_data["data"]["dash"]["video"][0]["backupUrl"][0]
测试代码: import requests
import re
import json
import pprint
headers = {"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36","referer": "https://message.bilibili.com/"
}
def send_request(url):
response = requests.get(url=url, headers=headers)
return response
def get_video_data(html_data):
title = re.findall('<title>(.*?)</title>',html_data)[0].replace("_哔哩哔哩 (゜-゜)つロ 干杯~-bilibili","")
json_data = re.findall(r'<script>window.__playinfo__=(.*?)</script>',html_data)[0]
json_data = json.loads(json_data)
#pprint.pprint(json_data)
audio_url = json_data["data"]["dash"]["audio"][0]["backupUrl"][0]
video_url = json_data["data"]["dash"]["video"][0]["backupUrl"][0]
video_data = [title, audio_url, video_url]
return video_data
html_data = send_request("https://www.bilibili.com/video/BV1Qy4y147H1").text
video_data = get_video_data(html_data)
for item in video_data:
print(item)
运行结果: data:image/s3,"s3://crabby-images/37399/373990bbb85633d48191a3965fb09014e4c70ab5" alt="image-20210322004738048"
保存数据通过下载链接,将音视频下载到本地并保存。 测试代码: import requests
import re
import json
headers = {"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36","referer": "https://message.bilibili.com/"
}
def send_request(url):
response = requests.get(url=url, headers=headers)
return response
def get_video_data(html_data):
title = re.findall('<title>(.*?)</title>',html_data)[0].replace("_哔哩哔哩 (゜-゜)つロ 干杯~-bilibili","")
json_data = re.findall(r'<script>window.__playinfo__=(.*?)</script>',html_data)[0]
json_data = json.loads(json_data)
audio_url = json_data["data"]["dash"]["audio"][0]["backupUrl"][0]
video_url = json_data["data"]["dash"]["video"][0]["backupUrl"][0]
video_data = [title, audio_url, video_url]
return video_data
def save_data(file_name,audio_url,video_url):
print("正在下载 " + file_name + "的音频...")
audio_data = send_request(audio_url).content
print("完成下载 " + file_name + "的音频!")
print("正在下载 " + file_name + "的视频...")
video_data = send_request(video_url).content
print("完成下载 " + file_name + "的视频!")
with open(file_name + ".mp3", "wb") as f:
f.write(audio_data)
with open(file_name + ".mp4", "wb") as f:
f.write(video_data)
html_data = send_request("https://www.bilibili.com/video/BV1Qy4y147H1").text
video_data = get_video_data(html_data)
save_data(video_data[0],video_data[1],video_data[2])
运行结果: data:image/s3,"s3://crabby-images/d9b63/d9b63f85bdb196859d0ccca30c342c6084fa8dcf" alt="image-20210322010005615"
data:image/s3,"s3://crabby-images/e0dbe/e0dbe60ad3bfc8ff46a1eb0d8bb679edcf2b4481" alt="image-20210322010033745"
合并音视频把分开的音频和视频进行合并。(几次测试下来,发现如果用视频标题作为文件名去执行ffmpeg命令会导致其出现错误,暂时没找到解决方法,后来试着将文件名先重命名为1.mp3、1.mp4这种简单的名字,可以完成合并,再删除之) 测试代码: import requests
import re
import json
import subprocess
import os
headers = {"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36","referer": "https://message.bilibili.com/"
}
def send_request(url):
response = requests.get(url=url, headers=headers)
return response
def get_video_data(html_data):
title = re.findall('<title>(.*?)</title>',html_data)[0].replace("_哔哩哔哩 (゜-゜)つロ 干杯~-bilibili","")
json_data = re.findall(r'<script>window.__playinfo__=(.*?)</script>',html_data)[0]
json_data = json.loads(json_data)
audio_url = json_data["data"]["dash"]["audio"][0]["backupUrl"][0]
video_url = json_data["data"]["dash"]["video"][0]["backupUrl"][0]
video_data = [title, audio_url, video_url]
return video_data
def save_data(file_name,audio_url,video_url):
print("正在下载 " + file_name + "的音频...")
audio_data = send_request(audio_url).content
print("完成下载 " + file_name + "的音频!")
print("正在下载 " + file_name + "的视频...")
video_data = send_request(video_url).content
print("完成下载 " + file_name + "的视频!")
with open(file_name + ".mp3", "wb") as f:
f.write(audio_data)
with open(file_name + ".mp4", "wb") as f:
f.write(video_data)
def merge_data(video_name):
os.rename(video_name + ".mp3","1.mp3")
os.rename(video_name + ".mp4","1.mp4")
print("正在合并 " + video_name + "的视频...")
subprocess.call("ffmpeg -i 1.mp4 -i 1.mp3 -c:v copy -c:a aac -strict experimental output.mp4", shell=True)
os.rename("output.mp4", video_name + ".mp4")
os.remove("1.mp3")
os.remove("1.mp4")
print("完成合并 " + video_name + "的视频!")
html_data = send_request("https://www.bilibili.com/video/BV1Qy4y147H1").text
video_data = get_video_data(html_data)
save_data(video_data[0],video_data[1],video_data[2])
merge_data(video_data[0])
运行结果: data:image/s3,"s3://crabby-images/888cf/888cf8707ada16219b66e43b19c03716c196a8de" alt="image-20210322010405722"
合并后视频正常播放,有声有色。 data:image/s3,"s3://crabby-images/b3fa8/b3fa8ee0bd4627dc03523d19169222a269e1f345" alt="image-20210322010551889"
最终代码# -*- coding : utf-8 -*-
# home.php?mod=space&uid=238618 : 2021/3/21 16:11
# home.php?mod=space&uid=686208 : wawyw
# home.php?mod=space&uid=267492 : bilibili_video.py
# home.php?mod=space&uid=371834 : PyCharm
import requests
import re
import json
import subprocess
import os
headers = {"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36","referer": "https://message.bilibili.com/"
}
def send_request(url):
response = requests.get(url=url, headers=headers)
return response
def get_video_data(html_data):
title = re.findall('<title>(.*?)</title>',html_data)[0].replace("_哔哩哔哩 (゜-゜)つロ 干杯~-bilibili","")
json_data = re.findall(r'<script>window.__playinfo__=(.*?)</script>',html_data)[0]
json_data = json.loads(json_data)
audio_url = json_data["data"]["dash"]["audio"][0]["backupUrl"][0]
video_url = json_data["data"]["dash"]["video"][0]["backupUrl"][0]
video_data = [title, audio_url, video_url]
return video_data
def save_data(file_name,audio_url,video_url):
print("正在下载 " + file_name + "的音频...")
audio_data = send_request(audio_url).content
print("完成下载 " + file_name + "的音频!")
print("正在下载 " + file_name + "的视频...")
video_data = send_request(video_url).content
print("完成下载 " + file_name + "的视频!")
with open(file_name + ".mp3", "wb") as f:
f.write(audio_data)
with open(file_name + ".mp4", "wb") as f:
f.write(video_data)
def merge_data(video_name):
os.rename(video_name + ".mp3","1.mp3")
os.rename(video_name + ".mp4","1.mp4")
print("正在合并 " + video_name + "的视频...")
subprocess.call("ffmpeg -i 1.mp4 -i 1.mp3 -c:v copy -c:a aac -strict experimental output.mp4", shell=True)
os.rename("output.mp4", video_name + ".mp4")
os.remove("1.mp3")
os.remove("1.mp4")
print("完成合并 " + video_name + "的视频!")
def main():
url = input("输入bilibili视频对应的链接即可下载:")
html_data = send_request(url).text
video_data = get_video_data(html_data)
save_data(video_data[0],video_data[1],video_data[2])
merge_data(video_data[0])
if __name__ == "__main__":
main()
效果: data:image/s3,"s3://crabby-images/660b9/660b9046c66b2cee71b9f2eb35ca00ba7862d092" alt="image-20210322010945179"
虽说这次是将白嫖进行到底了,但B站UP主们创作视频确实不易,我们也从中收获很多,能三连还是要多多支持下~
打包成exe首先我们要先安装Pyinstaller,直接在cmd使用pip命令 pip install pyinstaller
然后,把ffmpeg和py文件放置到同一文件夹下。 因为ffmpeg是要一起打包的,需要对代码中的相应目录做小幅修改。修改后的代码如下: import requests
import re
import json
import subprocess
import os
import shutil
headers = {"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36","referer": "https://message.bilibili.com/"
}
def send_request(url):
response = requests.get(url=url, headers=headers)
return response
def get_video_data(html_data):
title = re.findall('<title>(.*?)</title>',html_data)[0].replace("_哔哩哔哩 (゜-゜)つロ 干杯~-bilibili","")
json_data = re.findall(r'<script>window.__playinfo__=(.*?)</script>',html_data)[0]
json_data = json.loads(json_data)
audio_url = json_data["data"]["dash"]["audio"][0]["backupUrl"][0]
video_url = json_data["data"]["dash"]["video"][0]["backupUrl"][0]
video_data = [title, audio_url, video_url]
return video_data
def save_data(file_name,audio_url,video_url):
print("正在下载 " + file_name + "的音频...")
audio_data = send_request(audio_url).content
print("完成下载 " + file_name + "的音频!")
print("正在下载 " + file_name + "的视频...")
video_data = send_request(video_url).content
print("完成下载 " + file_name + "的视频!")
with open(file_name + ".mp3", "wb") as f:
f.write(audio_data)
with open(file_name + ".mp4", "wb") as f:
f.write(video_data)
def merge_data(video_name):
os.rename(video_name + ".mp3","1.mp3")
os.rename(video_name + ".mp4","1.mp4")
shutil.move("1.mp3","ffmpeg/bin/1.mp3")
shutil.move("1.mp4","ffmpeg/bin/1.mp4")
print("正在合并 " + video_name + "的视频...")
os.chdir("ffmpeg/bin/")
subprocess.call("ffmpeg -i 1.mp4 -i 1.mp3 -c:v copy -c:a aac -strict experimental output.mp4", shell=True)
os.rename("output.mp4", video_name + ".mp4")
os.remove("1.mp3")
os.remove("1.mp4")
shutil.move("%s.mp4"%video_name,"../../%s.mp4"%video_name)
print("完成合并 " + video_name + "的视频!")
def main():
url = input("输入bilibili视频对应的链接即可下载:\n")
html_data = send_request(url).text
video_data = get_video_data(html_data)
save_data(video_data[0],video_data[1],video_data[2])
merge_data(video_data[0])
if __name__ == "__main__":
main()
修改好后,cmd切换到我们刚刚放文件的目录,执行如下命令: Pyinstall -F -i bilibili.ico bilibili_video_download.py
data:image/s3,"s3://crabby-images/f26dc/f26dc09b48f9be13185c1b5d9158ee967508d664" alt="image-20210417235918679"
(这里-i bilibili.ico 是对程序的图标进行设置,为可选项) 执行完毕会发现当前目录多了几个文件夹,打开其中名为dist的文件夹,里面生成了一个名为bilibili_video_download 的exe应用程序,并且图标也是我们设置的图案。(这里要把exe文件移动到上一级目录,即ffmpeg的同级目录) data:image/s3,"s3://crabby-images/df8dc/df8dc9a20f2c7c8d2b5d02e544c4484f2bef6400" alt="image-20210418000736927"
点击运行exe应用程序,输入视频URL即可下载。 data:image/s3,"s3://crabby-images/78951/78951f15884104d91da8097e7b38d4cf0ec68fb0" alt="image-20210418001107366"
下载完毕! data:image/s3,"s3://crabby-images/2e798/2e798057125d4ad96b977403b53f6ffd7d7cb753" alt="image-20210418001228931"
所有相关资源已放在下面的链接中,需要的朋友可以自取。(下载后解压此压缩包,运行bilibili_video_download.exe 并输入视频对应链接即可完成视频下载 )
下方隐藏内容为本帖所有文件下载链接:
游客你好,如果您要查看本帖隐藏链接需要登录才能查看,
请先登录
|