使用多线程让Python应用飞起来

很多时候，我们最终在Python中编写代码来执行远程请求或读取多个文件或对某些数据进行处理。在很多这种情况下，我看到程序员使用一个简单的程序员for loop，需要永远完成执行。例如：

import requests
from time import time
url_list = [
    "https://img.wangzhan5u.com/images/10/sydxc400".jpg,
    "https://img.wangzhan5u.com/images/10/sydxc410".jpg,
    "https://img.wangzhan5u.com/images/10/sydxc420".jpg,
    "https://img.wangzhan5u.com/images/10/sydxc430".jpg,
    "https://img.wangzhan5u.com/images/10/sydxc440".jpg,
    "https://img.wangzhan5u.com/images/10/sydxc450".jpg,
    "https://img.wangzhan5u.com/images/10/sydxc460".jpg,
    "https://img.wangzhan5u.com/images/10/sydxc470".jpg,
    "https://img.wangzhan5u.com/images/10/sydxc480".jpg,
    "https://img.wangzhan5u.com/images/10/sydxc490".jpg,
    "https://img.wangzhan5u.com/images/10/sydxc500".jpg,
    "https://img.wangzhan5u.com/images/10/sydxc510".jpg,
    "https://img.wangzhan5u.com/images/10/sydxc520".jpg,
    "https://img.wangzhan5u.com/images/10/sydxc530".jpg,
]
def download_file(url):
    html = requests.get(url, stream=True)
    return html.status_code
start = time()
for url in url_list:
    print(download_file(url))
print(f'Time taken: {time() - start}')

Output:

<--truncated-->
Time taken: 4.128157138824463

这是一个理智的示例，代码将打开每个URL，等待它加载，打印其状态代码，然后转到下一个URL。这种代码非常适合多线程。

现代系统可以运行大量线程，这意味着您可以使用非常低的开销一次完成多个任务。为什么我们不尝试使用它来使上述代码更快地处理这些URL？

我们将利用ThreadPoolExecutor从concurrent.futures库。它非常易于使用。让我向您展示一些代码，然后解释它是如何工作的。

import requests
from concurrent.futures import ThreadPoolExecutor, as_completed
from time import time
url_list = [
    "https://img.wangzhan5u.com/images/10/sydxc400".jpg,
    "https://img.wangzhan5u.com/images/10/sydxc410".jpg,
    "https://img.wangzhan5u.com/images/10/sydxc420".jpg,
    "https://img.wangzhan5u.com/images/10/sydxc430".jpg,
    "https://img.wangzhan5u.com/images/10/sydxc440".jpg,
    "https://img.wangzhan5u.com/images/10/sydxc450".jpg,
    "https://img.wangzhan5u.com/images/10/sydxc460".jpg,
    "https://img.wangzhan5u.com/images/10/sydxc470".jpg,
    "https://img.wangzhan5u.com/images/10/sydxc480".jpg,
    "https://img.wangzhan5u.com/images/10/sydxc490".jpg,
    "https://img.wangzhan5u.com/images/10/sydxc500".jpg,
    "https://img.wangzhan5u.com/images/10/sydxc510".jpg,
    "https://img.wangzhan5u.com/images/10/sydxc520".jpg,
    "https://img.wangzhan5u.com/images/10/sydxc530".jpg,
]
def download_file(url):
    html = requests.get(url, stream=True)
    return html.status_code
start = time()
processes = []
with ThreadPoolExecutor(max_workers=10) as executor:
    for url in url_list:
        processes.append(executor.submit(download_file, url))
for task in as_completed(processes):
    print(task.result())
print(f'Time taken: {time() - start}')

Output:

<--truncated-->
Time taken: 0.4583399295806885

我们的代码加速了近9倍！我们甚至没有做任何超级参与。如果有更多网址，性能优势会更高。

那么发生了什么？当我们调用时，executor.submit 我们正在向线程池添加新任务。我们将该任务存储在进程列表中。稍后我们迭代过程并打印出结果。

该as_completed方法在完成后立即从进程列表中生成项（任务）。任务可以进入完成状态有两个原因。它已完成执行或已取消。我们也可以传入一个timeout参数as_completed，如果任务花费的时间超过了那个时间段，那么as_completed就会产生这个任务。

您应该多探索多线程。对于琐碎的项目，它是加快代码速度的最快方法。如果你想学习，请阅读官方文档https://img.wangzhan5u.com/images/10/sydxcconcurrent.futures.html,非常有帮助.

常见问题FAQ

免费下载或者VIP会员专享资源能否直接商用？: 本站所有资源版权均属于原作者所有，这里所提供资源均只能用于参考学习用，请勿直接商用。若由于商用引起版权纠纷，一切责任均由使用者承担。更多说明请参考 VIP介绍。

提示下载完但解压或打开不了？: 最常见的情况是下载不完整: 可对比下载完压缩包的与网盘上的容量，若小于网盘提示的容量则是这个原因。这是浏览器下载的bug，建议用百度网盘软件或迅雷下载。若排除这种情况，可在对应资源底部留言，或联络我们.。

找不到素材资源介绍文章里的示例图片？: 对于PPT，KEY，Mockups，APP，网页模版等类型的素材，文章内用于介绍的图片通常并不包含在对应可供下载素材包内。这些相关商业图片需另外购买，且本站不负责(也没有办法)找到出处。同样地一些字体文件也是这种情况，但部分素材会在素材包内有一份字体下载链接清单。