CS5720 - Week 7
Slide 136 of 140

Using Pre-trained Word Embeddings

Popular Pre-trained Embeddings

πŸ“š Google Word2Vec
300d vectors, 3M vocab
Trained on Google News
🌐 Stanford GloVe
Multiple dimensions available
Common Crawl, Wikipedia, Twitter
⚑ Facebook FastText
Subword information
157 languages available
πŸ₯ Domain-Specific
BioWordVec, Law2Vec
Specialized vocabularies

Integration Techniques

πŸ” Feature Extraction
🎯 Fine-tuning
🧱 As Embedding Layer
❓ Out-of-Vocabulary Words

Typical Workflow

πŸ“₯
Load Embeddings
Download and load pre-trained vectors
πŸ—ΊοΈ
Map Vocabulary
Align with your dataset vocab
πŸš€
Initialize Model
Set up embedding layer
πŸŽ“
Train Model
Fine-tune or freeze
# Quick example: Using GloVe with Keras from tensorflow.keras.layers import Embedding embedding_layer = Embedding( input_dim=vocab_size, output_dim=300, weights=[embedding_matrix], trainable=False # Freeze pre-trained weights )
Prepared by Dr. Gorkem Kar