0

Python扫描网站目录简单测试

已有 79 阅读此文人 - - Python爬虫 -

新手尝试用Python写一个简单扫描网站的目录的脚本,扫出来的页面不是404的就把目录名称保存到txt中,404页面直接丢弃,就是不存在的,如果有防火墙,会出现很多误报的情况,或者直接卡死,大家再看着优化处理吧,简单代码如下:

import requests
import time
def getstatus(str,urls):

        url = urls+str
        headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) Gecko/20100101 Firefox/23.0'}
        proxies = {'http': None, 'https': None}
        try:
            response = requests.get(url,  headers=headers, proxies=proxies)
            print(url)
            ##print(response.status_code)
            ##print(response.content)
            if response.status_code == 404:
                print(response.status_code)
            else:
                f2 = open('h2bcc-result.txt', 'a')
                f2.write('\n' + str)
                f2.close()
                print(response.status_code)

        except Exception as e:
            time.sleep(5000)
            pass





f = open("h2bcc.txt")
urls = input('请输入你要查的网址:')
f1 = open('h2bcc-result.txt', 'a')
f1.write('\n'+'===='+urls+'====')
f1.close()
result = list()
for line in f.readlines():

    line = line.strip()

    if not len(line) or line.startswith('#'):
        continue
    else:
        getstatus(line, urls)

上面的代码中h2bcc.txt是你的目录的字典, h2bcc-result.txt 就是保存结果的文本

相关标签:python 扫描 爬虫
期待你一针见血的评论,Come on!