术语

  • Laguage Model
  • Large Laguage Model
  • GPT
  • LLM
  • Tokenization
  • Vector
  • Embedding

概念

Quantization

大模型的特点就是非常大,主要是因为大模型的底层深度神经网络有大量的参数和权重,GPT2 有 15亿参数,如果用32位的浮点数值,则需要 6 GB的内存( 4 bytes * 1,500,000,000 = 6GB RAM )。最新发布的LLaMa2,650亿参数的大模型,则需要260GB的内存,而且算上持续性的输入和输出,还需要更多内存,因此一般的个人计算机是无法运行大模型的。

在模型推理过程中,RAM和CPU之间的带宽通常是性能瓶颈,而不是处理核心的数量或速度。这是因为处理器在执行任务时可能会缺乏足够的数据。一种缓解这个问题的方法是使用更小的数据类型,如16位浮点数,以减少RAM使用量和内存带宽。NVIDIA的最新硬件开始支持bfloat16数据类型,该类型保留了float32的完整指数范围,但牺牲了2/3的精度。研究表明,这种做法在质量和性能之间达到了良好的平衡,而且模型对于数据精度的损失并不特别敏感。

Format Significand Exponent
bfloat16 8 bits 8 bits
float16 11 bits 5 bits
float32 24 bits 8 bits

在 torch_dtype 中,默认是 float32,但也可以指定 float16bfloat16

虽然可以进一步从16位降低到8位或4位,但这样做通常不支持硬件加速的浮点运算。为了获得硬件加速,可以使用较小的整数和向量化指令集。比如,Intel的AVX(Advanced Vector Extensions)指令集可以对整数进行硬件加速,从而实现快速的向量运算。

一个简单的量化方法是“后训练量化”(Post-training quantization)。具体来说,你首先找到模型权重的最大值和最小值,然后根据你的整数类型所能表达的范围(例如,8位整数有256个不同的值,4位整数有16个不同的值)将这个范围划分为若干个“桶”(buckets)。然后,你可以将模型权重的浮点数值映射到这些桶中最近的整数值。

通过这种方式,即使模型最初是用32位浮点数进行训练的,也可以将其转换为更小的整数格式,同时仍然能够从硬件加速的指令集中受益。这不仅可以减少模型的大小,还可以加速其推理过程,尤其是在资源受限的设备上。

在深度学习和机器学习模型中,为了加速计算和减少存储需求,经常会对模型进行量化。量化就是将模型的浮点数权重和激活值转换为更小的整数定点数。这个过程通常会降低模型的计算复杂性,提高模型在边缘设备上的性能。

模型类型

GGML

GGML 是一个库,专门用于运行机器学习模型。库(Library)在编程中是一组预编译的代码,这些代码可以被多个程序共用。通过使用库,开发者可以避免从零开始编写代码,从而提高开发速度和效率。

GGML 库的显著特点是在 CPU 上能够高效运行。传统上,这种机器学习模型通常是在 GPU(图形处理器)上运行的,因为 GPU 在并行计算方面非常强大,适合进行大量的数学运算。然而,拥有大量内存的专用 GPU 通常非常昂贵。相比之下,GGML 能够在普通的硬件上达到可接受的运行速度。

这里的“普通硬件”(commodity hardware)通常指的是成本相对较低、易于获取的计算资源,与专门定制或者是高端硬件相对。

GPTQ

GPT Quantization 来自这篇论文:GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers

生成预训练变换器模型(Generative Pre-trained Transformer),简称为 GPT 或 OPT,通过在复杂的语言建模任务中取得突破性的表现而脱颖而出。但这些模型也存在极高的计算和存储成本。特别是,由于模型规模巨大,即使是用于大型、高精度 GPT 模型的推断也可能需要多个高性能 GPU,这限制了这类模型的可用性。尽管有针对缓解这种压力的模型压缩工作正在进行,但现有压缩技术的适用性和性能受到 GPT 模型的规模和复杂性的限制。

在这篇论文中,我们提出了 GPTQ,这是一种基于近似二阶信息的新型一次性权重量化方法,具有高精度和高效率。具体来说,GPTQ 能在大约四个 GPU 小时内量化拥有 1750 亿参数的 GPT 模型,将每个权重的比特宽度减少到 3 或 4 比特,同时相对于未压缩的基线几乎不损失精度。与之前提出的一次性量化方法相比,我们的方法将压缩效益提高了一倍多,同时保留了精度,这使我们首次能够在单个 GPU 内执行含有 1750 亿参数的模型进行生成推断。

此外,我们还展示了在极端量化情况下,我们的方法仍然能提供合理的准确度,即权重被量化为 2 位甚至三元量化水平。我们通过实验表明,使用这些改进可以实现端到端推断速度的提升,在使用高端 GPU(NVIDIA A100)时大约是 3.25 倍,在使用更具成本效益的 GPU(NVIDIA A6000)时为 4.5 倍。

总体来说,GPTQ 通过一次性权重量化方法成功地降低了 GPT 模型的计算和存储成本,同时基本上没有损失准确性,这在一定程度上解决了 GPT 模型的可用性问题。与 GGML 相比,GPTQ 更专注于针对特定(尤其是大型)模型的优化,能够在单一 GPU 中运行更大规模的模型,还能在高端和成本效益更高的 GPU 上实现更快的推断速度。

GGUF

GGUF(以前称为 GGML)是一种量化方法,允许用户使用 CPU 来运行大型语言模型 (LLM),但也可以将其某些层加载到 GPU 以提高速度3。虽然使用 CPU 进行推理通常比使用 GPU 慢,但对于那些在 CPU 或苹果设备上运行模型的人来说,这是一种非常好的格式3。

HF

###

模型选用推荐

最佳性能:如果你追求最高性能,选择配备高端 GPU(如 NVIDIA 的最新 RTX 4090)或双 GPU 设置的机器,以适应最大的模型(65B 和 70B)。拥有足够 RAM(最低 16 GB,最佳为 64 GB)的系统将是最优选择。

预算限制:如果你的预算有限,可以集中关注适合在系统 RAM 内运行的 WizardLM GGML/GGUF 模型。请记住,虽然你可以将一些权重卸载到系统 RAM 中,但这将会以性能为代价。

CPU要求

为了获得最佳性能,建议使用现代多核CPU。从第8代开始的Intel Core i7或从第3代开始的AMD Ryzen 5都是不错的选择。具有6核或8核的CPU是理想的。更高的时钟速度也会提高命令处理速度,因此建议选择3.6GHz或更高的速度。

如果可用,拥有CPU指令集,如AVX、AVX2、AVX-512,可以进一步提高性能。关键是要有一个相当现代的消费级CPU,具有不错的核心数量和时钟速度,以及至少达到AVX2级别的基线向量处理能力(这对于使用llama.cpp进行CPU推理是必需的)。具备这些规格的CPU应该能够很好地处理WizardLM模型大小。

Tokenization

就是把句子转成token的过程,常见的Tokenization的类型有:

# Character Tokenization

text = "Hello World"
tokenized_text = list(text)
print(tokenized_text)
# ['H', 'e', 'l', 'l', 'o', ' ', 'W', 'o', 'r', 'l', 'd']


# Word Tokenization

text = "Hello World"
tokenized_text = text.split(" ")
print(tokenized_text)
# ['Hello', 'World']

# Subword Tokenization
from transformers import BertTokenizer

tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
tokenizer_output = tokenizer.tokenize("This is an example of the bert tokenizer")
print(tokenizer_output)
# ['this', 'is', 'an', 'example', 'of', 'the', 'bert', 'token', '##izer']


# 给token赋 ID 

token_ids = tokenizer.encode("This is an example of the bert tokenizer")
print(token_ids)
# [101, 2023, 2003, 2019, 2742, 1997, 1996, 14324, 19204, 17629, 102]


# ID 转回 Tokens
tokens = tokenizer.convert_ids_to_tokens(token_ids)
print(tokens)
# ['[CLS]', 'this', 'is', 'an', 'example', 'of', 'the', 'bert', 'token', '##izer', '[SEP]']

# ID 转 向量,以 exmple 为例
example_token_id = tokenizer.convert_tokens_to_ids(["example"])[0]
example_embedding = model.embeddings.word_embeddings(torch.tensor([example_token_id]))
print (example_embedding)

# tensor([[ 7.0699e-03,  3.9590e-02, -6.2164e-02, -8.4340e-02, -1.2362e-02,
          1.0582e-02, -1.2302e-01, -6.6595e-03, -6.5421e-02,  2.0174e-03,
         -6.1219e-03, -3.7570e-02, -1.0751e-01,  6.5124e-02, -8.2510e-03,
         -6.3290e-02, -1.6745e-02,  7.7046e-02, -5.7412e-03, -3.5633e-02,
         -1.3281e-02,  1.0091e-02, -2.9987e-02, -1.9298e-02, -7.6704e-02,
         -7.6498e-03,  1.4793e-02,  1.8764e-02, -7.9018e-02, -1.6882e-02,
          3.9476e-02, -5.0676e-02,  2.0185e-02, -8.1285e-02, -1.0244e-02,
         -1.2035e-02, -1.6211e-02, -1.5720e-03, -4.7858e-02,  8.2827e-03,
         -4.4718e-03,  4.8962e-02, -9.9693e-03,  2.4308e-02,  6.6937e-02,
         -7.0327e-02, -1.2011e-02,  2.0608e-02, -3.3565e-02, -5.4982e-03,
          4.4249e-02,  2.8569e-02, -5.2312e-02, -2.2065e-04, -2.1409e-02,
         -1.3903e-02, -5.2360e-02,  1.5349e-02, -1.7530e-02, -1.1394e-02,
          1.3658e-02, -5.4222e-02,  2.2562e-02, -4.7320e-02, -3.4117e-02,
         -8.0298e-02, -1.7642e-03, -6.5204e-02, -1.1769e-02, -2.3253e-02,
         -6.3942e-02,  4.5275e-02,  4.2098e-03, -1.2199e-01, -1.8311e-02,
          6.7559e-02, -6.6224e-02, -2.3297e-02,  2.9269e-02, -2.0648e-03,
          1.8371e-02, -5.0904e-02, -1.9767e-02,  2.1506e-02, -4.5937e-03,
          7.0943e-02,  1.6163e-02,  2.4667e-02, -3.8306e-02, -1.3618e-02,
          8.6903e-03,  4.3551e-02,  1.9301e-02, -2.2320e-02,  3.5148e-02,
         -3.6050e-02,  1.4636e-02, -6.4484e-03, -9.3494e-02, -4.4629e-02,
         -2.6367e-02, -7.3136e-02, -1.3190e-02, -1.5075e-02,  8.8226e-03,
         -6.1747e-03, -3.8607e-02, -9.7726e-02, -1.0046e-01, -3.5408e-02,
          3.4333e-02,  6.1120e-03, -4.6993e-03,  4.4117e-02, -1.6183e-02,
          6.9830e-02, -3.0057e-02, -4.9450e-03, -4.5372e-02,  3.2861e-02,
          6.2437e-03,  5.1822e-02,  1.4299e-02,  9.1526e-03, -5.4018e-02,
         -4.4643e-02, -4.2167e-02, -6.9415e-02, -3.6879e-03, -4.5361e-02,
         -4.5440e-02, -2.6327e-02,  4.6571e-02, -6.0156e-03,  1.5169e-02,
         -6.5987e-02, -2.5210e-02, -3.9571e-02, -2.7412e-03,  2.1896e-02,
         -4.3288e-02, -2.7288e-02, -4.8685e-02,  1.9880e-02, -2.7992e-03,
         -2.4550e-02, -1.4445e-02, -5.5736e-02, -8.7696e-03,  2.2657e-02,
         -3.4242e-02,  5.3342e-02, -1.1428e-01, -6.1443e-02, -5.4557e-02,
         -3.9567e-02, -3.2674e-02, -1.2447e-02, -2.6195e-02, -9.9981e-03,
          1.6057e-02, -1.0632e-02, -9.5835e-02, -2.9818e-02, -1.7617e-03,
          7.3697e-03, -1.1922e-02,  1.1008e-02, -5.0937e-02, -2.2250e-02,
         -6.5774e-02, -7.4557e-02,  1.4160e-02,  4.1136e-03, -1.8592e-02,
         -2.3664e-02, -2.9301e-02, -5.8833e-02,  6.3900e-02, -4.9290e-02,
         -4.8749e-03, -4.9676e-02, -8.2066e-03, -5.8137e-03, -1.2928e-02,
         -5.1574e-02, -1.5861e-02, -1.6832e-02, -1.3478e-01,  4.8432e-02,
         -8.1026e-03,  1.6698e-02,  8.9118e-03, -3.6768e-02, -8.4938e-02,
         -2.9143e-02, -1.0831e-01, -3.0485e-02, -2.9427e-02,  1.7691e-03,
          2.3616e-02, -3.7807e-02,  5.8815e-03, -5.0612e-02,  3.3289e-02,
          1.7695e-02, -3.8988e-02,  1.2258e-02, -5.8438e-02,  5.6098e-02,
         -3.0482e-02, -2.6542e-02, -5.7482e-02,  2.1037e-03,  3.2330e-02,
          1.6018e-02, -6.8087e-02,  6.2649e-02,  3.5315e-02,  1.1822e-02,
          3.7212e-03,  1.0973e-03, -3.2089e-02,  2.5410e-05,  1.8270e-02,
         -3.0727e-02, -5.8657e-02, -6.4603e-02, -1.7537e-02, -7.7394e-03,
          2.8646e-02,  3.6014e-04, -3.7037e-02, -5.2062e-02, -6.6579e-03,
         -2.8872e-02, -5.4437e-02,  3.0052e-02, -3.4586e-02, -1.7010e-02,
         -6.1950e-02, -8.5817e-03, -5.9731e-02, -8.7727e-02, -1.6919e-02,
         -5.5647e-04,  2.4644e-02,  3.9383e-02, -4.0700e-02, -9.8387e-02,
         -8.7318e-03, -3.6101e-03, -2.0859e-02, -3.3360e-02, -2.0806e-02,
         -5.7804e-02,  1.0422e-02, -7.5704e-03,  4.9071e-02,  4.6601e-03,
         -2.1753e-02, -3.5478e-02, -7.0700e-02,  4.8086e-02, -6.1905e-03,
         -1.8874e-02,  3.9722e-02,  1.7354e-02,  2.1380e-02,  8.2149e-03,
         -7.0203e-03,  2.6811e-02,  2.7719e-02, -2.7086e-02, -4.6475e-02,
          5.2837e-03, -4.3353e-02, -2.8427e-02, -6.3651e-02,  6.7016e-03,
          2.7362e-02,  1.9784e-02, -1.4149e-02,  1.9462e-02, -3.3761e-03,
         -7.8721e-03,  2.9993e-02, -5.2176e-02, -1.0621e-02,  1.8695e-02,
         -3.4593e-02,  5.8586e-03,  6.0422e-03, -3.0553e-02, -5.2275e-03,
         -1.4228e-02, -3.3665e-02, -2.2340e-02, -8.7536e-02,  1.7754e-02,
         -4.9244e-02, -5.8273e-02,  4.5041e-02, -4.8328e-02, -4.5554e-02,
         -2.4852e-03, -4.2156e-02, -3.6773e-02, -2.1667e-03, -7.1676e-02,
         -5.6762e-02, -1.8036e-02, -3.9770e-02, -5.6033e-02, -8.0266e-02,
         -9.7157e-03, -1.7021e-02, -1.4318e-02,  2.8480e-02, -2.2190e-02,
         -1.7887e-03,  5.4169e-02, -2.6767e-02, -1.4898e-02,  2.8154e-02,
          2.5034e-02, -3.6213e-02, -4.4813e-02, -9.1689e-03, -3.2198e-02,
         -1.6636e-02, -1.4978e-02, -2.8206e-02, -4.1953e-02, -4.0311e-02,
          5.9899e-02,  4.2443e-02, -8.3173e-02, -2.1016e-02, -3.3347e-02,
         -4.4438e-02, -2.0547e-02,  2.1145e-02,  4.3809e-02,  4.7394e-02,
          2.6411e-02,  5.4722e-03, -3.6503e-03, -7.2999e-03,  4.5254e-02,
         -7.6047e-02,  1.9477e-02, -7.3366e-02, -1.6679e-02, -2.5634e-02,
         -3.0964e-02,  5.5001e-03, -5.4133e-02,  5.1056e-02,  6.3931e-02,
         -2.4935e-02, -3.3326e-02, -1.1319e-02,  1.9233e-02, -9.6289e-03,
         -4.9795e-02,  1.3875e-02, -1.9540e-02, -8.8291e-03, -8.6859e-02,
         -2.0432e-02, -5.2432e-02, -2.4241e-02,  3.1216e-02, -3.9785e-02,
         -4.6648e-02, -3.9422e-02,  5.7173e-02, -2.9177e-02, -9.5334e-02,
          2.1703e-02, -9.0428e-03, -6.5013e-02, -1.1061e-01, -3.4407e-03,
         -5.2887e-02, -1.9445e-02,  4.8831e-03, -5.8450e-02, -9.2286e-02,
         -6.5292e-02, -1.2930e-02, -8.4700e-02, -1.5953e-02, -7.3099e-03,
         -1.5699e-03,  3.4423e-02,  5.4114e-02, -4.3994e-02,  4.6609e-03,
          5.0194e-02, -6.8088e-02,  1.8760e-02, -2.4052e-02,  3.4346e-02,
         -5.1139e-02, -2.1997e-02, -8.5159e-02, -4.9588e-02, -5.1107e-02,
         -3.1411e-02,  6.9119e-03,  4.8961e-02, -6.7452e-02, -9.1206e-02,
          6.4104e-03, -6.1130e-02,  3.1270e-02,  4.7101e-03,  7.2501e-02,
         -7.9401e-03,  2.2673e-03, -1.0041e-01, -3.0945e-02, -5.7122e-03,
         -8.3479e-02,  1.6212e-02, -8.6807e-02, -6.7495e-02, -9.1090e-02,
         -7.5114e-02, -5.4963e-02,  1.3542e-02,  3.2677e-02,  1.7948e-03,
          2.1669e-02, -7.4197e-02, -4.2839e-03, -4.2448e-03, -3.8043e-03,
         -4.3662e-02,  1.9261e-02, -5.5844e-02,  2.1757e-02,  2.2363e-03,
          1.5457e-02,  2.2129e-02, -8.7475e-02, -5.0357e-02,  1.6714e-02,
         -1.5754e-02, -7.0555e-02,  8.9811e-03, -7.5253e-02,  3.4680e-02,
          2.9781e-02, -1.2340e-02, -3.8187e-02, -9.8114e-03, -1.3336e-02,
         -5.0382e-02,  3.3591e-03, -2.4408e-02,  2.1342e-02, -3.4077e-02,
         -2.3193e-03, -5.4703e-02,  1.2290e-02, -1.9406e-02,  4.2661e-02,
         -4.5764e-02,  3.2867e-03,  4.6457e-03,  2.5678e-02,  4.3554e-03,
          4.4673e-02,  3.6042e-02, -5.0848e-02, -5.3908e-02, -9.1864e-03,
         -1.9508e-02,  2.5916e-02, -1.1468e-02, -8.7909e-03, -1.0544e-02,
         -3.5443e-02,  5.5650e-03,  2.5160e-02, -9.0767e-03,  4.5627e-02,
         -1.8734e-02, -3.0797e-02, -1.8707e-02, -1.0187e-01, -8.1011e-02,
         -1.5662e-02, -8.8661e-02, -8.4613e-02, -3.6600e-03,  1.1971e-02,
          4.6272e-02, -2.1290e-02,  2.3129e-02, -7.9258e-02, -1.1582e-01,
         -2.4505e-02,  4.0532e-03, -2.7489e-02,  8.4129e-03, -1.7656e-02,
         -1.4416e-02, -1.1170e-02, -5.9296e-02, -6.9202e-03,  2.9395e-02,
         -6.7834e-02, -8.3561e-02,  4.1367e-02, -5.6588e-02,  1.6671e-02,
          7.5029e-03,  2.2772e-02,  6.8327e-02, -1.1417e-02, -5.9277e-02,
         -7.8861e-02, -2.4853e-02, -8.2301e-02, -4.6063e-02,  3.7698e-02,
          1.0681e-02, -6.9332e-03, -5.0324e-02, -4.1617e-02, -4.3231e-02,
          5.3351e-02, -6.6745e-02, -1.1989e-02, -3.8668e-02,  4.3444e-02,
         -6.7733e-02,  3.1424e-02, -1.8017e-03,  1.4843e-02, -3.2234e-03,
         -3.4876e-02,  1.3246e-02, -2.9505e-02,  2.8412e-02,  1.7895e-02,
          5.7844e-03, -4.2882e-02,  2.6622e-02,  1.4931e-02, -2.1757e-02,
         -2.1075e-02,  4.6209e-02, -3.7092e-02, -2.7483e-02, -1.2659e-02,
         -2.9806e-02, -1.1591e-02,  1.2242e-02, -3.9037e-02, -1.1860e-02,
          1.8619e-02, -4.5176e-02, -7.7971e-02, -3.9850e-02, -6.5438e-02,
          3.5440e-02,  2.5308e-02, -5.8435e-02, -1.8811e-03, -2.8518e-02,
         -6.0963e-02, -1.1092e-01,  2.0978e-02,  9.8720e-03, -3.2421e-02,
         -1.0786e-01, -1.1143e-04, -6.9075e-02, -7.8690e-02,  3.3350e-02,
          5.0108e-03, -3.8232e-02, -5.0940e-02, -4.7356e-02, -2.3847e-02,
         -8.8199e-02, -2.1803e-02, -2.5640e-02,  1.1859e-02, -9.4681e-02,
          5.9033e-02, -2.7911e-02, -7.4125e-02,  1.3654e-02, -7.4683e-02,
         -2.6221e-02,  3.9234e-02,  3.8811e-02, -3.6114e-02, -6.5355e-02,
         -5.3331e-02,  2.3426e-02, -1.0981e-02, -2.8386e-02, -3.8378e-02,
         -3.8951e-02, -3.7126e-02, -9.1236e-02, -4.0407e-02, -4.0821e-03,
         -1.1980e-02, -5.9070e-02, -5.4969e-02, -3.1875e-02, -4.9518e-02,
          3.7196e-04, -2.4913e-02, -1.8567e-02, -4.7982e-02, -2.1875e-02,
         -1.0688e-02, -4.7933e-02, -1.7806e-02,  1.6425e-02, -2.8015e-02,
         -5.1616e-02, -9.4458e-03, -8.1441e-02, -7.3196e-02,  2.5089e-02,
          3.1195e-03, -4.0170e-03, -7.0565e-02, -1.0268e-01,  5.6357e-02,
          7.2759e-02,  1.5875e-02, -3.2757e-03,  2.4413e-02,  5.2270e-02,
          3.6066e-02,  1.1467e-02,  9.7023e-04,  2.9172e-03, -3.0505e-02,
         -6.4037e-03,  1.7488e-02, -2.1797e-02,  1.7163e-02,  8.0928e-03,
         -3.5020e-02, -7.7396e-02, -3.3842e-02, -3.1056e-02,  7.5218e-03,
         -9.6323e-03,  2.3698e-02, -1.4847e-02, -5.5555e-02, -5.3886e-03,
         -1.8770e-02,  7.9117e-02, -1.6056e-02, -1.7292e-02, -3.8195e-02,
          2.6017e-02, -4.5755e-02, -6.5598e-03,  1.6035e-03,  7.3584e-02,
         -7.5565e-02, -7.5782e-02, -6.0178e-02,  1.6869e-02,  2.5808e-02,
          7.7043e-03, -3.9930e-02, -3.8114e-03,  7.1530e-02, -1.4234e-02,
          4.2238e-05, -4.9247e-02,  7.1346e-02,  2.6794e-02,  4.1325e-02,
          1.8150e-02, -7.8744e-02, -8.0625e-03,  4.2598e-02, -9.1964e-02,
         -8.0097e-02, -9.4992e-03, -8.0187e-02, -1.3076e-02,  4.7093e-03,
         -6.3652e-02,  7.1776e-02,  1.8090e-03, -6.0156e-02,  4.7806e-03,
          3.8947e-03, -5.8476e-02, -1.3171e-02,  2.0067e-02, -5.5717e-02,
          1.8472e-02,  5.4452e-02, -4.4256e-02,  4.0274e-02,  1.7242e-02,
         -9.0756e-02,  4.2329e-03, -3.6412e-02,  3.2679e-02,  5.5789e-02,
          1.9003e-02, -6.1356e-02, -3.6005e-02,  3.5700e-03,  9.0347e-04,
         -7.3200e-02, -4.8060e-02, -6.1942e-02, -3.9321e-02, -2.0603e-02,
          2.8123e-02, -1.3196e-02,  2.3674e-02, -9.5218e-02, -4.7920e-02,
         -1.8811e-02, -3.8724e-02, -1.8437e-03,  2.2110e-02, -2.9245e-02,
          3.6088e-02,  1.5044e-03, -4.8473e-02,  3.4779e-02,  2.4720e-02,
         -9.7820e-03,  4.3833e-02, -4.0755e-02,  3.4088e-02, -5.4636e-02,
         -4.3619e-02,  1.3160e-02, -4.7641e-03, -5.9633e-03, -2.0940e-02,
         -2.2532e-03, -2.0453e-02, -1.1275e-02, -8.1422e-02, -4.5355e-02,
          1.3038e-02, -2.6077e-02, -2.9745e-02, -5.2280e-02, -2.7975e-03,
         -1.7751e-02, -2.9460e-03, -7.4038e-02]], grad_fn=<EmbeddingBackward0>)
        
# 输出向量维度

pritn (example_embedding.shape)
# torch.Size([1, 768])




Embedding

计算两个词的相似度

king_token_id = tokenizer.convert_tokens_to_ids(["king"])[0]
king_embedding = model.embeddings.word_embeddings(torch.tensor([king_token_id]))

queen_token_id = tokenizer.convert_tokens_to_ids(["queen"])[0]
queen_embedding = model.embeddings.word_embeddings(torch.tensor([queen_token_id]))

cos = torch.nn.CosineSimilarity(dim=1)
similarity = cos(king_embedding, queen_embedding)
print(similarity[0])
# 0.6469

similarity = cos(example_embedding, queen_embedding)
print(similarity[0])
# 0.2392

常见大模型

中文

大模型微调

备忘项目

参考