Datasets:
Languages:
English
License:
apache-2.0
Dataset Viewer
Go to dataset viewer
Viewer
The dataset viewer is not available for this dataset.
Cannot get the config names for the dataset.
Error code: ConfigNamesError Exception: FileNotFoundError Message: Couldn't find a dataset script at /src/services/worker/nickmuchi/CFA_Level_1_Text_Embeddings/CFA_Level_1_Text_Embeddings.py or any data file in the same directory. Couldn't find 'nickmuchi/CFA_Level_1_Text_Embeddings' on the Model Database Hub either: FileNotFoundError: No (supported) data files or dataset script found in nickmuchi/CFA_Level_1_Text_Embeddings. Traceback: Traceback (most recent call last): File "/src/services/worker/src/worker/job_runners/dataset/config_names.py", line 55, in compute_config_names_response for config in sorted(get_dataset_config_names(path=dataset, token=hf_token)) File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/inspect.py", line 351, in get_dataset_config_names dataset_module = dataset_module_factory( File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/load.py", line 1508, in dataset_module_factory raise FileNotFoundError( FileNotFoundError: Couldn't find a dataset script at /src/services/worker/nickmuchi/CFA_Level_1_Text_Embeddings/CFA_Level_1_Text_Embeddings.py or any data file in the same directory. Couldn't find 'nickmuchi/CFA_Level_1_Text_Embeddings' on the Model Database Hub either: FileNotFoundError: No (supported) data files or dataset script found in nickmuchi/CFA_Level_1_Text_Embeddings.
Need help to make the dataset viewer work? Open a discussion for direct support.
Vector store of embeddings for CFA Level 1 Curriculum
This is a faiss vector store created with Sentence Transformer embeddings using LangChain . Use it for similarity search, question answering or anything else that leverages embeddings! ๐
Creating these embeddings can take a while so here's a convenient, downloadable one ๐ค
How to use
Download data Load to use with LangChain
pip install -qqq langchain sentence_transformers faiss-cpu huggingface_hub
import os
from langchain.embeddings import HuggingFaceEmbeddings, HuggingFaceInstructEmbeddings
from langchain.vectorstores.faiss import FAISS
from huggingface_hub import snapshot_download
download the vectorstore for the book you want
cache_dir="cfa_level_1_cache"
vectorstore = snapshot_download(repo_id="nickmuchi/CFA_Level_1_Text_Embeddings",
repo_type="dataset",
revision="main",
allow_patterns=f"books/{book}/*", # to download only the one book
cache_dir=cache_dir,
)
get path to the vectorstore
folder that you just downloaded
we'll look inside the cache_dir
for the folder we want
target_dir = f"cfa/cfa_level_1"
Walk through the directory tree recursively
for root, dirs, files in os.walk(cache_dir):
# Check if the target directory is in the list of directories
if target_dir in dirs:
# Get the full path of the target directory
target_path = os.path.join(root, target_dir)
load embeddings
this is what was used to create embeddings for the text
embed_instruction = "Represent the financial paragraph for document retrieval: "
query_instruction = "Represent the question for retrieving supporting documents: "
model_sbert = "sentence-transformers/all-mpnet-base-v2"
sbert_emb = HuggingFaceEmbeddings(model_name=model_sbert)
model_instr = "hkunlp/instructor-large"
instruct_emb = HuggingFaceInstructEmbeddings(model_name=model_instr,
embed_instruction=embed_instruction,
query_instruction=query_instruction)
# load vector store to use with langchain
docsearch = FAISS.load_local(folder_path=target_path, embeddings=sbert_emb)
# similarity search
question = "How do you hedge the interest rate risk of an MBS?"
search = docsearch.similarity_search(question, k=4)
for item in search:
print(item.page_content)
print(f"From page: {item.metadata['page']}")
print("---")
- Downloads last month
- 1