The dataset viewer is not available for this dataset.
Cannot get the config names for the dataset.
Error code:   ConfigNamesError
Exception:    FileNotFoundError
Message:      Couldn't find a dataset script at /src/services/worker/nickmuchi/CFA_Level_1_Text_Embeddings/CFA_Level_1_Text_Embeddings.py or any data file in the same directory. Couldn't find 'nickmuchi/CFA_Level_1_Text_Embeddings' on the Model Database Hub either: FileNotFoundError: No (supported) data files or dataset script found in nickmuchi/CFA_Level_1_Text_Embeddings. 
Traceback:    Traceback (most recent call last):
                File "/src/services/worker/src/worker/job_runners/dataset/config_names.py", line 55, in compute_config_names_response
                  for config in sorted(get_dataset_config_names(path=dataset, token=hf_token))
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/inspect.py", line 351, in get_dataset_config_names
                  dataset_module = dataset_module_factory(
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/load.py", line 1508, in dataset_module_factory
                  raise FileNotFoundError(
              FileNotFoundError: Couldn't find a dataset script at /src/services/worker/nickmuchi/CFA_Level_1_Text_Embeddings/CFA_Level_1_Text_Embeddings.py or any data file in the same directory. Couldn't find 'nickmuchi/CFA_Level_1_Text_Embeddings' on the Model Database Hub either: FileNotFoundError: No (supported) data files or dataset script found in nickmuchi/CFA_Level_1_Text_Embeddings.

Need help to make the dataset viewer work? Open a discussion for direct support.

Vector store of embeddings for CFA Level 1 Curriculum

This is a faiss vector store created with Sentence Transformer embeddings using LangChain . Use it for similarity search, question answering or anything else that leverages embeddings! ๐Ÿ˜ƒ

Creating these embeddings can take a while so here's a convenient, downloadable one ๐Ÿค—

How to use

Download data Load to use with LangChain

pip install -qqq langchain sentence_transformers faiss-cpu huggingface_hub
import os
from langchain.embeddings import HuggingFaceEmbeddings, HuggingFaceInstructEmbeddings

from langchain.vectorstores.faiss import FAISS
from huggingface_hub import snapshot_download

download the vectorstore for the book you want

cache_dir="cfa_level_1_cache"
vectorstore = snapshot_download(repo_id="nickmuchi/CFA_Level_1_Text_Embeddings",
                                repo_type="dataset",
                                revision="main",
                                allow_patterns=f"books/{book}/*", # to download only the one book
                                cache_dir=cache_dir,
                                )

get path to the vectorstore folder that you just downloaded

we'll look inside the cache_dir for the folder we want

target_dir = f"cfa/cfa_level_1"

Walk through the directory tree recursively

for root, dirs, files in os.walk(cache_dir):
    # Check if the target directory is in the list of directories
    if target_dir in dirs:
        # Get the full path of the target directory
        target_path = os.path.join(root, target_dir)

load embeddings

this is what was used to create embeddings for the text

embed_instruction = "Represent the financial paragraph for document retrieval: "
query_instruction = "Represent the question for retrieving supporting documents: "

model_sbert = "sentence-transformers/all-mpnet-base-v2"
sbert_emb = HuggingFaceEmbeddings(model_name=model_sbert)

model_instr = "hkunlp/instructor-large"
instruct_emb = HuggingFaceInstructEmbeddings(model_name=model_instr,
                                             embed_instruction=embed_instruction,
                                             query_instruction=query_instruction)

# load vector store to use with langchain
docsearch = FAISS.load_local(folder_path=target_path, embeddings=sbert_emb)

# similarity search
question = "How do you hedge the interest rate risk of an MBS?"
search = docsearch.similarity_search(question, k=4)

for item in search:
    print(item.page_content)
    print(f"From page: {item.metadata['page']}")
    print("---")
Downloads last month
1

Space using nickmuchi/CFA_Level_1_Text_Embeddings 1