Datasets:

nickmuchi
/

CFA_Level_1_Text_Embeddings

Tasks:

Question Answering

Summarization

Conversational

Languages: English

Tags: faiss langchain instructor embeddings

License: apache-2.0

Dataset card Files Files and versions Community

Dataset Viewer

Go to dataset viewer

Viewer

The dataset viewer is not available for this dataset.

Cannot get the config names for the dataset.

Error code:   ConfigNamesError
Exception:    FileNotFoundError
Message:      Couldn't find a dataset script at /src/services/worker/nickmuchi/CFA_Level_1_Text_Embeddings/CFA_Level_1_Text_Embeddings.py or any data file in the same directory. Couldn't find 'nickmuchi/CFA_Level_1_Text_Embeddings' on the Model Database Hub either: FileNotFoundError: No (supported) data files or dataset script found in nickmuchi/CFA_Level_1_Text_Embeddings. 
Traceback:    Traceback (most recent call last):
                File "/src/services/worker/src/worker/job_runners/dataset/config_names.py", line 55, in compute_config_names_response
                  for config in sorted(get_dataset_config_names(path=dataset, token=hf_token))
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/inspect.py", line 351, in get_dataset_config_names
                  dataset_module = dataset_module_factory(
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/load.py", line 1508, in dataset_module_factory
                  raise FileNotFoundError(
              FileNotFoundError: Couldn't find a dataset script at /src/services/worker/nickmuchi/CFA_Level_1_Text_Embeddings/CFA_Level_1_Text_Embeddings.py or any data file in the same directory. Couldn't find 'nickmuchi/CFA_Level_1_Text_Embeddings' on the Model Database Hub either: FileNotFoundError: No (supported) data files or dataset script found in nickmuchi/CFA_Level_1_Text_Embeddings.

Need help to make the dataset viewer work? Open a discussion for direct support.

Vector store of embeddings for CFA Level 1 Curriculum

This is a faiss vector store created with Sentence Transformer embeddings using LangChain . Use it for similarity search, question answering or anything else that leverages embeddings! 😃

Creating these embeddings can take a while so here's a convenient, downloadable one 🤗

How to use

Download data Load to use with LangChain

pip install -qqq langchain sentence_transformers faiss-cpu huggingface_hub
import os
from langchain.embeddings import HuggingFaceEmbeddings, HuggingFaceInstructEmbeddings

from langchain.vectorstores.faiss import FAISS
from huggingface_hub import snapshot_download

download the vectorstore for the book you want

cache_dir="cfa_level_1_cache"
vectorstore = snapshot_download(repo_id="nickmuchi/CFA_Level_1_Text_Embeddings",
                                repo_type="dataset",
                                revision="main",
                                allow_patterns=f"books/{book}/*", # to download only the one book
                                cache_dir=cache_dir,
                                )

get path to the `vectorstore` folder that you just downloaded

we'll look inside the `cache_dir` for the folder we want

target_dir = f"cfa/cfa_level_1"

Walk through the directory tree recursively

for root, dirs, files in os.walk(cache_dir):
    # Check if the target directory is in the list of directories
    if target_dir in dirs:
        # Get the full path of the target directory
        target_path = os.path.join(root, target_dir)

load embeddings

this is what was used to create embeddings for the text

embed_instruction = "Represent the financial paragraph for document retrieval: "
query_instruction = "Represent the question for retrieving supporting documents: "

model_sbert = "sentence-transformers/all-mpnet-base-v2"
sbert_emb = HuggingFaceEmbeddings(model_name=model_sbert)

model_instr = "hkunlp/instructor-large"
instruct_emb = HuggingFaceInstructEmbeddings(model_name=model_instr,
                                             embed_instruction=embed_instruction,
                                             query_instruction=query_instruction)

# load vector store to use with langchain
docsearch = FAISS.load_local(folder_path=target_path, embeddings=sbert_emb)

# similarity search
question = "How do you hedge the interest rate risk of an MBS?"
search = docsearch.similarity_search(question, k=4)

for item in search:
    print(item.page_content)
    print(f"From page: {item.metadata['page']}")
    print("---")

Downloads last month: 1

Edit dataset card Evaluate models HF Leaderboard

download the vectorstore for the book you want

get path to the vectorstore folder that you just downloaded

we'll look inside the cache_dir for the folder we want

Walk through the directory tree recursively

load embeddings

this is what was used to create embeddings for the text

Space using nickmuchi/CFA_Level_1_Text_Embeddings 1

get path to the `vectorstore` folder that you just downloaded

we'll look inside the `cache_dir` for the folder we want