Python Transformers By Huggingface Hands On: 101 practical implementation hands-on of ALBERT/ViT/BigBird and other latest models with huggingface transformers

Python Transformers By Huggingface Hands On 101 practical implementation hands-on of ALBERT/ViT/BigBird and other late

127 67 2MB

English Pages 204 [186]

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Table of Contents
Introduction
Latest Trend in Deep Learning
Cautions
Disclaimer
Trademarks
Feedback
Jupyter Notebook
Chapter 1 pipeline
1:Set up Google’s Colaboratory Environment
2:Sentiment Analysis
3:Question Answering
Chapter 2 Fine-tuning and Evaluation of DistilBERT using real data
Preparation: GPU preparation
4:IMDB Data Set
5:Label Encoding
6:Split training and validation data
7:Tokenize and Encoding
8:Creating your own dataset class
9:Load Pre-trained Model(DistilBertForSequenceClassification)
10:Define TrainingArguments
11:Transfer to GPU
12:Fine-tuning by Trainer class
13:Fine-Tuning by Pytorch
Chapter 3 Model Performance Evaluation
14:Accuracy
15:Recall/Precision/F1-Score
16:Classification Report
Chapter4 Composition using GPT series
17:Preparing a writing environment with GPT Neo
18:Tokenize by GPT-Neo
19:Composition by GPT-Neo
20:distilgpt2 environment setting
21:Composition by distilgpt2
22:DialoGPT Environment Setting
23:Composition by DialoGPT
Chapter 5 MLM(Masked Language Model)
24:MLM pipleline loading BERT
25:MLM pipleline loading DistilBERT
26:MLM pipleline loading ALBERT
Chapter6 CLIP~Bridging Image Recognition and Natural Language Processing~
27:CLIP module install
28:Sample Image Dataset
29:Load CLIP based pre-trained model
30:Check the network of CLIP based pre-trainedmodel
31:CLIP Preprocessing
32:Check the image after preprocessing
33:Encode and Decode
34:inference by CLIP
35:Get the logit of CLIP inference
36:Display the CLIP caption prediction result
Chapter7 Wave2Vec2 Automatic Speech Recognition
37:Wav2Vec module install
38:Load Pre-trained Wav2Vec2
39:Preparing a Data Set for Automatic Speech Recognition(TIMIT_ASR)
40:Check the audio data in Colab
41:Wav2Vec2 Pre-processing
42:ASR by Wav2Vec2
Chapter 8 Multi-class classification in BERT
43:Load the pre-trained BERT for Multi-class classification
44:Pre-pare our own dataset for three-class classification of BERT
45:BERT Classification before fine-tuning
46:BERT fine-tuning for 3 class classification
47:Visualizing the learning process of Fine-tuning BERT for Three-Class Classification
48:BERT Classification after fine-tuning
49:Classification accuracy
Chapter9 Automatic Summarization by BART
50:Setting up the BART library and loading the pre-training model
51:Preprocessing using regular expressions
52:Tokenizing with the BART prior learning model
53:Cast the BART tokenize output to numpy array
54:BART Inference
55:Decode the BART inference’s result
Chapter10 Ensemble learning with two BERTs
56:Setting up the BERT ensemble learning library
57:Preparation of dataset of your own for BERT ensemble
58:Definition of BERT ensemble network
59:Load the pretrained BERT for ensemble training
60:BERT ensemble learning Data Augmentation
61:BERT ensemble learning Defining a custom dataset
62:BERT Ensemble Learning: DataLoader
63:BERT ensemble fine-tuning
64:BERT ensemble learning prediction using training data
65:BERT ensemble learning Prediction outside of training data
Chapter11 BigBird
66:Setting up the BigBird library and loading the pre-training model
67:Preparation of Data for BigBird inference
68:BigBird tokenization and encoding
69:BigBird inference
Chapter12 PEGASUS
70:PEGASUS library setup and pre-training model loading
71:Tokenization and Encode
72:PEGASUS Automatic Summarization
Chapter 13 M2M100
73:Install the M2M100 library and load the pre-training model
74:Preparation of M2M100 translation source (Chinese text)
75:M2M100 Tokenize in source language
76:M2M100 automatic translation
77:M2M100 Decode the output of generate method
78:M2M100 Specify source language (Japanese) and create text
79:M2M100 Japanese text tokenization
80:M2M100 Japanese/English translation
81: M2M100 Japanese to English Translation Decode
Chapter14 Mobile BERT
82:Install the MobileBERT library and load the pre-training model
Code(MOBILE BERT)
Code(BERT)
83:Mobile BERT vs. BERT Tokenizer
84:Last hidden layer during Mobile BERT inference
85:Mobile BERT Fill-in-the-Blanks Quiz
Chapter15 GPT, DialoGPT, DistilGPT2
86:Setting up the DistilGPT2 library and loading the pre-training model
87:Visualization with distilgpt2 tool
88:distilgpt2 text generation
89:Loading DialoGPT (Dialogue Text Pre-Learning Model)
90:Text Generation by DialoGPT
Chapter16 Practical exercise Moderna v.s. Pfizer (compare with BERT and tSNE)
91:Wikipediaからキーワード検索
92:Retrieved from Wikipedia "Moderna COVID-19 vaccine" full text
93:Retrieved from Wikipedia, Pfizer–BioNTech COVID-19 vaccine
94:Installing a module to handle document vectors in BERT
95:Load the pre-trained BERT to pipeline
96:Get document vector representations by BERT
97:Meaning of Vector Dimensionality in BERT
98: Definition of the function getting the document vector representation of BERT [CLS] token and Simple Preprocessing for BERT
99:Get BERT [CLS] vectors of Moderna/Pfizer Covid-19 vaccine
100: Frequency aggregation by tokenizer
101: Visualization by t-SNE "Moderna" v.s. "Pfizer".
Reference
In Closing

Python Transformers By Huggingface Hands On: 101 practical implementation hands-on of ALBERT/ViT/BigBird and other latest models with huggingface transformers

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Recommend Papers