<template>
  <div>
    <!-- 标题 -->
    <h1>🤗 Hugging Face</h1>
    <br />
    <p></p>
    <p>
      Huggingface在我看来就是大模型的GitHub，在Huggingface上面你不仅可以找到已经训练好的大模型参数，还可以找到很多已经开源的数据集，你可以点击👉<a
        href="https://huggingface.co/"
        style="color: #002fa7; font-weight: bolder; text-decoration: none">这里</a>👈访问Huggingface主页
    </p>
    <p>
      如果你没有科学上网，你也可以尝试访问国内的镜像网站：👉<a href="https://hf-mirror.com/"
        style="color: #002fa7; font-weight: bolder; text-decoration: none">hf-mirror</a>
    </p>
    <p>看到如下界面表示访问成功：</p>
    <br>
    <el-row type="flex" justify="center">
      <el-col :span="16">
        <el-card>
          <img src="./img/huggingface.png" alt="" style="width: 100%" />
        </el-card>
      </el-col>
    </el-row>
    <el-divider></el-divider>

    <p>但是大模型的参数往往都很大，我下过最大的模型是Qwen-72B-Chat，大概有140多GB，这就对网络有一定要求了。上文刚刚说了，如果没有科学上网，是连
      Huggingface的官网都进不去的，可即使有了科学上网也很困难，也存在链接不稳定导致下载失败的问题，网上也有很多关于Huggingface模型参数下载的教程，
      我也看了很多，帮大家踩了很多坑，下面我以Qwen-7B基础模型为例子，向大家介绍四种下载方法，写得有不周到的地方还请大家谅解。
    </p>
    <el-divider></el-divider>
    <!-- 第一种方法 -->
    <h2>1️⃣ 直接使用网页下载 </h2>
    <br>
    <p>打开👉<a href="https://huggingface.co/Qwen/Qwen-7B"
        style="color: #002fa7; font-weight: bolder; text-decoration: none" target="_blank">Qwen-7B</a>
      的Huggingface链接（或者国内镜像网站）</p>
    <p>
      点击Files and versions 按钮就可以看到Qwen模型的参数了：
    </p>
    <br>
    <el-row type="flex" justify="center">
      <el-col :span="16">
        <el-card>
          <img src="./img/Qwen.png" alt="" style="width: 100%" />
        </el-card>
      </el-col>
    </el-row>
    <p>这时候点击文件名右边的
      <svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true"
        focusable="false" role="img" width="1em" height="1em" viewBox="0 0 32 32">
        <path fill="currentColor"
          d="M26 24v4H6v-4H4v4a2 2 0 0 0 2 2h20a2 2 0 0 0 2-2v-4zm0-10l-1.41-1.41L17 20.17V2h-2v18.17l-7.59-7.58L6 14l10 10l10-10z">
        </path>
      </svg>
      按钮就可以通过浏览器下载模型文件啦~
    </p>
    <p>
      该方法适用于模型文件不多的情况，如果出现多级目录且还有上百GB的模型参数，该方法就不再适用了
    </p>
    <el-divider />
    <h2>2️⃣ 使用脚本下载 </h2>
    <br>
    <p>有一个网站叫 <a href="https://aliendao.cn/#/" target="_blank">“异形岛”</a>该网站存储了一些Huggingface上主流的大模型，可以通过一个python脚本进行下载</p>
    <br>
    <el-row type="flex" justify="center">
      <el-col :span="16">
        <el-card>
          <img src="./img/异形岛.png" alt="" style="width: 100%" />
        </el-card>
      </el-col>
    </el-row>
    <p>
      以下是脚本内容，你可以创建一个py文件，然后复制粘贴一份，或者直接点击此👉 <a href="https://e.aliendao.cn/model_download.py" target="_blank">链接</a>
      进行下载：
    </p>
    <br>
    <el-row type="flex" justify="center">
      <el-col :span="16">
        <codemirror ref="codeEditor" :value="code" :options="editorOptions" readonly
          style="border-radius: 10px; overflow: hidden;font-weight: bolder;font-size: large;" />
      </el-col>
    </el-row>
    <br>
    <p>然后进入该文件目录，执行：python model_download.py --repo_id Qwen/Qwen-7B 就可以下载通义千问7B模型了，如果你想下载其他大模型，只需要更改--repo_id参数的值就行</p>

    <p>该方法下载速度不快，🚅大概600KB/s，但是胜在稳定，尤其是下载几百GB的模型时，我推荐这个方法😏，或者开通企业版，速度就会提上来</p>

    <p>😥但是该方法有一个巨大问题，Huggingface最新的模型可能还没有被同步到该网站，因此会出现token错误。但是大多数经典大模型该网站都可以正常下载⏬️</p>
    <!-- 第三种方法 -->
    <el-divider />
    <h2>3️⃣ 使用官方脚手架，但是需要改为镜像网站 </h2>
    <p>该方法有个致命的弱点就是下载不稳定，经常中断，很难用。</p>
    <p>
      首先安装官方脚手架：
    </p>
    <br>
    <el-row type="flex" justify="center">
      <el-col :span="16">
        <linux_shell :linuxCode="code1"></linux_shell>
      </el-col>
    </el-row>
    <br>
    <p>
      由于Huggingface.co是无法访问的，所以需要设置代理：
    </p>
    <br>
    <el-row type="flex" justify="center">
      <el-col :span="16">
        <linux_shell :linuxCode="code2"></linux_shell>
      </el-col>
    </el-row>
    <br>

    <p>
      安装好后就可以直接使用，首先确定想要下载的模型名称，比如<a href="https://huggingface.co/Qwen/Qwen-7B"
        style="color: #002fa7; font-weight: bolder; text-decoration: none" target="_blank">Qwen-7B</a>的下载命令如下：
    </p>
    <br>
    <el-row type="flex" justify="center">
      <el-col :span="16">
        <linux_shell :linuxCode="code3"></linux_shell>
      </el-col>
    </el-row>
    <br>
    <p>下面解析一下这条命令：💪✨</p>

    <p>
      <ul>
        <li>💡🚀huggingface-cli: 这是Hugging Face提供的命令行工具，用于处理Hugging Face Model Hub上的各种模型和数据集。</li>
        <li>💻📝download: 命令选项，表示要从Hugging Face Model Hub下载资源。</li>
        <li>💥🚫--resume-download: 选项参数，表示如果下载中断，则从中断处恢复下载，而不是重新开始。</li>
        <li>📂👀--local-dir-use-symlinks False: 选项参数，表示在本地目录中存储模型时不要创建符号链接（symlink）。这意味着会直接复制模型文件到指定目录，而非创建指向原始位置的快捷方式。</li>
        <li>👨‍💼👩‍💼Qwen/Qwen-7B: 这是模型的作者用户名和模型名，表示要下载Qwen用户上传的名为Qwen-7B的模型。</li>
        <li>💾🏠--local-dir bloom-560mCopy: 指定将模型下载并保存到本地的目录名称为“bloom-560mCopy”。</li>
      </ul>
    </p>
    <p>有些模型还可能需要加上token下载，token可以点击：👉<a href="https://huggingface.co/settings/tokens"
        style="color: #002fa7; font-weight: bolder; text-decoration: none" target="_blank">这里</a>👈获取</p>
    <br>
    <el-row type="flex" justify="center">
      <el-col :span="16">
        <linux_shell :linuxCode="code4"></linux_shell>
      </el-col>
    </el-row>
    <br>
    <el-divider />
    <h2>4️⃣使用git下载 </h2>
    <p>使用https下载，可惜git没有断点机制，一旦下载失败 就必须删除所有文件，重头开始，并且git会保存一份源文件，导致下载100GB模型时，实际占用空间是200GB，不是很推荐，但是git下载速度很快</p>
    <p>至于git的安装方法查看 👉<a href="https://git-scm.com/"
        style="color: #002fa7; font-weight: bolder; text-decoration: none" target="_blank">官方文档</a>👈 就行，这里不过多赘述。</p>
    <p>git安装完成后，先在git中部署下载工具：</p>
    <br>
    <el-row type="flex" justify="center">
      <el-col :span="16">
        <linux_shell :linuxCode="code5"></linux_shell>
      </el-col>
    </el-row>
    <br>
    <p>打开Huggingface官方，找到想要的模型，然后点击Clone repository</p>
    <br>
    <el-row type="flex" justify="center">
      <el-col :span="16">
        <el-card>
          <img src="./img/gitdown.png" alt="" style="width: 100%" />
        </el-card>
      </el-col>
    </el-row>
    <br>
    <p>再复制HTTPS方式的下载链接</p>
    <br>
    <el-row type="flex" justify="center">
      <el-col :span="12">
        <el-card>
          <img src="./img/gitdown1.png" alt="" style="width: 100%" />
        </el-card>
      </el-col>
    </el-row>
    <br>
    <p>然后粘贴在git中，不过要注意将huggingface.co替换为hf-mirror.com，这样才能成功下载</p>
    <br>
    <el-row type="flex" justify="center">
      <el-col :span="16">
        <linux_shell :linuxCode="code6"></linux_shell>
      </el-col>
    </el-row>
    <br>
    <p>其中GIT_LFS_SKIP_SMUDGE=1代表跳过下载大型文件，也就是LFS文件,如果你也想这么做，将这段参数加在下载命令之前即可</p>
    <br>
    <el-row type="flex" justify="center">
      <el-col :span="16">
        <linux_shell :linuxCode="code7"></linux_shell>
      </el-col>
    </el-row>
    <br>



  </div>
</template>

<script>
import linux_shell from '@/components/myLinux'
export default {
  name: "Huggingface",
  components: {
    linux_shell
  },
  data() {
    return {
      code: `# # usage     : python model_download.py --repo_id repo_id
# example   : python model_download.py --repo_id facebook/opt-350m
import argparse
import time
import requests
import json
import os
from huggingface_hub import snapshot_download
import platform
from tqdm import tqdm
from urllib.request import urlretrieve


def _log(_repo_id, _type, _msg):
    date1 = time.strftime('%Y-%m-%d %H:%M:%S')
    print(date1 + " " + _repo_id + " " + _type + " :" + _msg)


def _download_model(_repo_id, _repo_type):
    if _repo_type == "model":
        _local_dir = 'dataroot/models/' + _repo_id
    else:
        _local_dir = 'dataroot/datasets/' + _repo_id
    try:
        if _check_Completed(_repo_id, _local_dir):
            return True, "check_Completed ok"
    except Exception as e:
        return False, "check_Complete exception," + str(e)
    _cache_dir = 'caches/' + _repo_id

    _local_dir_use_symlinks = True
    if platform.system().lower() == 'windows':
        _local_dir_use_symlinks = False
    try:
        if _repo_type == "model":
            snapshot_download(repo_id=_repo_id, cache_dir=_cache_dir, local_dir=_local_dir, local_dir_use_symlinks=_local_dir_use_symlinks,
                              resume_download=True, max_workers=4)
        else:
            snapshot_download(repo_id=_repo_id, cache_dir=_cache_dir, local_dir=_local_dir, local_dir_use_symlinks=_local_dir_use_symlinks,
                              resume_download=True, max_workers=4, repo_type="dataset")
    except Exception as e:
        error_msg = str(e)
        if ("401 Client Error" in error_msg):
            return True, error_msg
        else:
            return False, error_msg
    _removeHintFile(_local_dir)
    return True, ""


def _writeHintFile(_local_dir):
    file_path = _local_dir + '/~incomplete.txt'
    if not os.path.exists(file_path):
        if not os.path.exists(_local_dir):
            os.makedirs(_local_dir)
        open(file_path, 'w').close()


def _removeHintFile(_local_dir):
    file_path = _local_dir + '/~incomplete.txt'
    if os.path.exists(file_path):
        os.remove(file_path)


def _check_Completed(_repo_id, _local_dir):
    _writeHintFile(_local_dir)
    url = 'https://huggingface.co/api/models/' + _repo_id
    response = requests.get(url)
    if response.status_code == 200:
        data = json.loads(response.text)
    else:
        return False
    for sibling in data["siblings"]:
        if not os.path.exists(_local_dir + "/" + sibling["rfilename"]):
            return False
    _removeHintFile(_local_dir)
    return True


def download_model_retry(_repo_id, _repo_type):
    i = 0
    flag = False
    msg = ""
    while True:
        flag, msg = _download_model(_repo_id, _repo_type)
        if flag:
            _log(_repo_id, "success", msg)
            break
        else:
            _log(_repo_id, "fail", msg)
            if i > 1440:
                msg = "retry over one day"
                _log(_repo_id, "fail", msg)
                break
            timeout = 60
            time.sleep(timeout)
            i = i + 1
            _log(_repo_id, "retry", str(i))
    return flag, msg


def _fetchFileList(files):
    _files = []
    for file in files:
        if file['type'] == 'dir':
            filesUrl = 'https://e.aliendao.cn/' + file['path'] + '?json=true'
            response = requests.get(filesUrl)
            if response.status_code == 200:
                data = json.loads(response.text)
                for file1 in data['data']['files']:
                    if file1['type'] == 'dir':
                        filesUrl = 'https://e.aliendao.cn/' + \
                            file1['path'] + '?json=true'
                        response = requests.get(filesUrl)
                        if response.status_code == 200:
                            data = json.loads(response.text)
                            for file2 in data['data']['files']:
                                _files.append(file2)
                    else:
                        _files.append(file1)
        else:
            if file['name'] != '.gitattributes':
                _files.append(file)
    return _files


def _download_file_resumable(url, save_path, i, j, chunk_size=1024*1024):
    headers = {}
    r = requests.get(url, headers=headers, stream=True, timeout=(20, 60))
    if r.status_code == 403:
        _log(url, "download", '下载资源发生了错误，请使用正确的token')
        return False
    bar_format = '{desc}{percentage:3.0f}%|{bar}|{n_fmt}M/{total_fmt}M [{elapsed}<{remaining}, {rate_fmt}]'
    _desc = str(i) + ' of ' + str(j) + '(' + save_path.split('/')[-1] + ')'
    total_length = int(r.headers.get('content-length'))
    if os.path.exists(save_path):
        temp_size = os.path.getsize(save_path)
    else:
        temp_size = 0
    retries = 0
    if temp_size >= total_length:
        return True
    # 小文件显示
    if total_length < chunk_size:
        with open(save_path, 'wb') as f:
            for chunk in r.iter_content(chunk_size=chunk_size):
                if chunk:
                    f.write(chunk)
        with tqdm(total=1, desc=_desc, unit='MB', bar_format=bar_format) as pbar:
            pbar.update(1)
    else:
        headers['Range'] = f'bytes={temp_size}-{total_length}'
        r = requests.get(url, headers=headers, stream=True,
                         verify=False, timeout=(20, 60))
        data_size = round(total_length / 1024 / 1024)
        with open(save_path, 'ab') as fd:
            fd.seek(temp_size)
            initial = temp_size//chunk_size
            for chunk in tqdm(iterable=r.iter_content(chunk_size=chunk_size), initial=initial, total=data_size, desc=_desc, unit='MB', bar_format=bar_format):
                if chunk:
                    temp_size += len(chunk)
                    fd.write(chunk)
                    fd.flush()
    return True


def _download_model_from_mirror(_repo_id, _repo_type, _token, _e):
    if _repo_type == "model":
        filesUrl = 'https://e.aliendao.cn/models/' + _repo_id + '?json=true'
    else:
        filesUrl = 'https://e.aliendao.cn/datasets/' + _repo_id + '?json=true'
    response = requests.get(filesUrl)
    if response.status_code != 200:
        _log(_repo_id, "mirror", str(response.status_code))
        return False
    data = json.loads(response.text)
    files = data['data']['files']
    for file in files:
        if file['name'] == '~incomplete.txt':
            _log(_repo_id, "mirror", 'downloading')
            return False
    files = _fetchFileList(files)
    i = 1
    for file in files:
        url = 'http://61.133.217.142:20800/download' + file['path']
        if _e:
            url = 'http://61.133.217.139:20800/download' + \
                file['path'] + "?token=" + _token
        file_name = 'dataroot/' + file['path']
        if not os.path.exists(os.path.dirname(file_name)):
            os.makedirs(os.path.dirname(file_name))
        i = i + 1
        if not _download_file_resumable(url, file_name, i, len(files)):
            return False
    return True


def download_model_from_mirror(_repo_id, _repo_type, _token, _e):
    if _download_model_from_mirror(_repo_id, _repo_type, _token, _e):
        return
    else:
        #return download_model_retry(_repo_id, _repo_type)
        _log(_repo_id, "download", '下载资源发生了错误，请使用正确的token')


if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('--repo_id', default=None, type=str, required=True)
    parser.add_argument('--repo_type', default="model",
                        type=str, required=False)  # models,dataset
    # --mirror为从aliendao.cn镜像下载，如果aliendao.cn没有镜像，则会转到hf
    # 默认为True
    parser.add_argument('--mirror', action='store_true',
                        default=True, required=False)
    parser.add_argument('--token', default="", type=str, required=False)
    # --e为企业付费版
    parser.add_argument('--e', action='store_true',
                        default=False, required=False)
    args = parser.parse_args()
    if args.mirror:
        download_model_from_mirror(
            args.repo_id, args.repo_type, args.token, args.e)
    else:
        download_model_retry(args.repo_id, args.repo_type)`,
      code1: 'pip install -U huggingface_hubCopy',
      code2: 'export HF_ENDPOINT=https://hf-mirror.com',
      code3: 'huggingface-cli download --resume-download --local-dir-use-symlinks False Qwen/Qwen-7B --local-dir bloom-560mCopy',
      code4:'huggingface-cli download --token hf_*** --resume-download --local-dir-use-symlinks False meta-llama/Llama-2-7b-hf --local-dir Llama-2-7b-hfCopy',
      code5:'git lfs install',
      code6:'git clone https://hf-mirror.com/Qwen/Qwen-7B',
      code7:'GIT_LFS_SKIP_SMUDGE=1 git clone https://hf-mirror.com/Qwen/Qwen-7B',
      editorOptions: {
        mode: 'python',
        theme: 'blackboard',
        lineNumbers: true,
        readOnly: true,
        styleActiveLine: true, // 可选，高亮当前行
        fontFamily: '"Microsoft YaHei", monospace', // 添加微软雅黑作为字体选项
      }
    }
  }
};
</script>

<style scoped>
.container {
  width: 100%;
  height: 100%;
  background: repeating-linear-gradient(45deg,
      #92c9b1,
      #92c9b1 20px,
      #b3e0d2 20px,
      #b3e0d2 40px);
}

p {
  font-weight: bolder;
  margin-top: 10px;
}

a {
  color: #002fa7;
  font-weight: bolder;
  text-decoration: none
}
li{
  margin-top: 5px;
}
</style>