博客
  • 首页
  • 推荐
  • 标签
  • 轻览
  • 日历

论文 | Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity

标签: 论文 , Transformer   更新于: 2024/08/17 阅读:470 原文发表于:2021-11-18

参考

  • NLP炼丹笔记:Switch Transformers 朴实无华 大招秒杀

相关文档

  • 论文 | TRANSFORMER - VQ: LINEAR - TIME TRANSFORMERS VIA VECTOR QUANTIZATION
  • 论文阅读 TOKEN MERGING: YOUR VIT BUT FASTER(ToMe模型)
  • 论文 | Fast Transformer Decoding: One Write-Head is All You Need

论文相关文章

  • 字节LONGER: Scaling Up Long Sequence Modeling in Industrial Recommenders
  • To the Globe (TTG): Towards Language-Driven Guaranteed Travel Planning
  • Efficient Streaming Language Models with Attention Sinks
  • Asynchronous Stochastic Gradient Descent with Delay Compensation
  • 论文:Perceiver - General Perception with Iterative Attention
  • AdaF2M2 : Comprehensive Learning and Responsive Leveraging Features in Recommendation System
  • CLS, COMPOSITE SLICE TRANSFORMER: AN EFFICIENT TRANSFORMER WITH COMPOSITION OF MULTI-SCALE MULTI-RANGE ATTENTIONS
  • Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing
  • TIGER:Recommender Systems with Generative Retrieval 生成式召回
  • Soft MoE《FROM SPARSE TO SOFT MIXTURES OF EXPERTS》

Transformer相关文章

  • 论文:Perceiver - General Perception with Iterative Attention
  • CLS, COMPOSITE SLICE TRANSFORMER: AN EFFICIENT TRANSFORMER WITH COMPOSITION OF MULTI-SCALE MULTI-RANGE ATTENTIONS
  • ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs
  • KV Cache(键值缓存)
  • Vision Transformer(ViT)
  • 可逆Transformer(Reversible Transformer)
  • Reformer: The Efficient Transformer
  • Q-Former技术(Querying Transformer)
  • 论文:The Lazy Neuron Phenomenon: On Emergence of Activation Sparsity in Transformers
  • Speculative decoding(推测性解码)

最近热门

  • 阿里二向箔召回算法NANN
  • TensorFlow 实战之调优经验
  • spark两种optimization方法:SGD和LBFGS
  • RetroMAE:一种基于掩码自编码器(Masked Auto-Encoder,MAE)的检索导向预训练框架
  • [TODO] 逻辑回归
  • SSB - Sample Selection Bias - 样本选择偏差问题
  • 实战 - 物品推荐
  • 1.3.1 基础语法
  • 1.1 scala基础教程
  • python库:Google的ABSL(Abseil)库

最常浏览

  • 016 推荐系统 | 排序学习(LTR - Learning To Rank)
  • 偏微分符号
  • i.i.d(又称IID)
  • 利普希茨连续条件(Lipschitz continuity)
  • (error) MOVED 原因和解决方案
  • TextCNN详解
  • 找不到com.google.protobuf.GeneratedMessageV3的类文件
  • Deployment failed: repository element was not specified in the POM inside distributionManagement
  • cannot access com.google.protobuf.GeneratedMessageV3 解决方案
  • CLUSTERDOWN Hash slot not served 问题原因和解决办法
×

如侵犯您的权益,请联系本站删除!

Copyright © 2023-2024