Detailed parameters
Which task is used by this model ?
In general the 🤗 Hosted API Inference accepts a simple string as an input. However, more advanced usage depends on the “task” that the model solves.
The “task” of a model is defined here on it’s model page:
Natural Language Processing
Fill Mask task
Tries to fill in a hole with a missing word (token to be precise). That’s the base task for BERT models.
Recommended model: bert-base-uncased (it’s a simple model, but fun to play with).
Available with: 🤗 Transformers
Example:
import json
import requests
headers = {"Authorization": f"Bearer {API_TOKEN}"}
API_URL = "https://api-inference.huggingface.co/models/bert-base-uncased"
def query(payload):
data = json.dumps(payload)
response = requests.request("POST", API_URL, headers=headers, data=data)
return json.loads(response.content.decode("utf-8"))
data = query({"inputs": "The answer to the universe is [MASK]."})
When sending your request, you should send a JSON encoded payload. Here are all the options
All parameters | |
---|---|
inputs (required): | a string to be filled from, must contain the [MASK] token (check model card for exact name of the mask) |
options | a dict containing the following keys: |
use_cache | (Default: true ). Boolean. There is a cache layer on the inference API to speedup requests we have already seen. Most models can use those results as is as models are deterministic (meaning the results will be the same anyway). However if you use a non deterministic model, you can set this parameter to prevent the caching mechanism from being used resulting in a real new query. |
wait_for_model | (Default: false ) Boolean. If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error as it will limit hanging in your application to known places. |
Return value is either a dict or a list of dicts if you sent a list of inputs
self.assertEqual(
deep_round(data),
[
{
"sequence": "the answer to the universe is no.",
"score": 0.1696,
"token": 2053,
"token_str": "no",
},
{
"sequence": "the answer to the universe is nothing.",
"score": 0.0734,
"token": 2498,
"token_str": "nothing",
},
{
"sequence": "the answer to the universe is yes.",
"score": 0.0580,
"token": 2748,
"token_str": "yes",
},
{
"sequence": "the answer to the universe is unknown.",
"score": 0.044,
"token": 4242,
"token_str": "unknown",
},
{
"sequence": "the answer to the universe is simple.",
"score": 0.0402,
"token": 3722,
"token_str": "simple",
},
],
)
Returned values | |
---|---|
sequence | The actual sequence of tokens that ran against the model (may contain special tokens) |
score | The probability for this token. |
token | The id of the token |
token_str | The string representation of the token |
Summarization task
This task is well known to summarize longer text into shorter text. Be careful, some models have a maximum length of input. That means that the summary cannot handle full books for instance. Be careful when choosing your model. If you want to discuss your summarization needs, please get in touch with us: [email protected]
Recommended model: facebook/bart-large-cnn.
Available with: 🤗 Transformers
Example:
import json
import requests
headers = {"Authorization": f"Bearer {API_TOKEN}"}
API_URL = "https://api-inference.huggingface.co/models/facebook/bart-large-cnn"
def query(payload):
data = json.dumps(payload)
response = requests.request("POST", API_URL, headers=headers, data=data)
return json.loads(response.content.decode("utf-8"))
data = query(
{
"inputs": "The tower is 324 metres (1,063 ft) tall, about the same height as an 81-storey building, and the tallest structure in Paris. Its base is square, measuring 125 metres (410 ft) on each side. During its construction, the Eiffel Tower surpassed the Washington Monument to become the tallest man-made structure in the world, a title it held for 41 years until the Chrysler Building in New York City was finished in 1930. It was the first structure to reach a height of 300 metres. Due to the addition of a broadcasting aerial at the top of the tower in 1957, it is now taller than the Chrysler Building by 5.2 metres (17 ft). Excluding transmitters, the Eiffel Tower is the second tallest free-standing structure in France after the Millau Viaduct.",
"parameters": {"do_sample": False},
}
)
# Response
self.assertEqual(
data,
[
{
"summary_text": "The tower is 324 metres (1,063 ft) tall, about the same height as an 81-storey building. Its base is square, measuring 125 metres (410 ft) on each side. During its construction, the Eiffel Tower surpassed the Washington Monument to become the tallest man-made structure in the world.",
},
],
)
When sending your request, you should send a JSON encoded payload. Here are all the options
All parameters | |
---|---|
inputs (required) | a string to be summarized |
parameters | a dict containing the following keys: |
min_length | (Default: None ). Integer to define the minimum length in tokens of the output summary. |
max_length | (Default: None ). Integer to define the maximum length in tokens of the output summary. |
top_k | (Default: None ). Integer to define the top tokens considered within the sample operation to create new text. |
top_p | (Default: None ). Float to define the tokens that are within the sample operation of text generation. Add tokens in the sample for more probable to least probable until the sum of the probabilities is greater than top_p . |
temperature | (Default: 1.0 ). Float (0.0-100.0). The temperature of the sampling operation. 1 means regular sampling, 0 means always take the highest score, 100.0 is getting closer to uniform probability. |
repetition_penalty | (Default: None ). Float (0.0-100.0). The more a token is used within generation the more it is penalized to not be picked in successive generation passes. |
max_time | (Default: None ). Float (0-120.0). The amount of time in seconds that the query should take maximum. Network can cause some overhead so it will be a soft limit. |
options | a dict containing the following keys: |
use_cache | (Default: true ). Boolean. There is a cache layer on the inference API to speedup requests we have already seen. Most models can use those results as is as models are deterministic (meaning the results will be the same anyway). However if you use a non deterministic model, you can set this parameter to prevent the caching mechanism from being used resulting in a real new query. |
wait_for_model | (Default: false ) Boolean. If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error as it will limit hanging in your application to known places. |
Return value is either a dict or a list of dicts if you sent a list of inputs
Returned values | |
---|---|
summary_text | The string after summarization |
Question Answering task
Want to have a nice know-it-all bot that can answer any question?
Recommended model: deepset/roberta-base-squad2.
Available with: 🤗Transformers and AllenNLP
Example:
import json
import requests
headers = {"Authorization": f"Bearer {API_TOKEN}"}
API_URL = "https://api-inference.huggingface.co/models/deepset/roberta-base-squad2"
def query(payload):
data = json.dumps(payload)
response = requests.request("POST", API_URL, headers=headers, data=data)
return json.loads(response.content.decode("utf-8"))
data = query(
{
"inputs": {
"question": "What's my name?",
"context": "My name is Clara and I live in Berkeley.",
}
}
)
When sending your request, you should send a JSON encoded payload. Here are all the options
Return value is a dict.
self.assertEqual(
deep_round(data),
{"score": 0.9327, "start": 11, "end": 16, "answer": "Clara"},
)
Returned values | |
---|---|
answer | A string that’s the answer within the text. |
score | A float that represents how likely that the answer is correct |
start | The index (string wise) of the start of the answer within context . |
stop | The index (string wise) of the stop of the answer within context . |
Table Question Answering task
Don’t know SQL? Don’t want to dive into a large spreadsheet? Ask questions in plain english!
Recommended model: google/tapas-base-finetuned-wtq.
Available with: 🤗 Transformers
Example:
import json
import requests
headers = {"Authorization": f"Bearer {API_TOKEN}"}
API_URL = "https://api-inference.huggingface.co/models/google/tapas-base-finetuned-wtq"
def query(payload):
data = json.dumps(payload)
response = requests.request("POST", API_URL, headers=headers, data=data)
return json.loads(response.content.decode("utf-8"))
data = query(
{
"inputs": {
"query": "How many stars does the transformers repository have?",
"table": {
"Repository": ["Transformers", "Datasets", "Tokenizers"],
"Stars": ["36542", "4512", "3934"],
"Contributors": ["651", "77", "34"],
"Programming language": [
"Python",
"Python",
"Rust, Python and NodeJS",
],
},
}
}
)
When sending your request, you should send a JSON encoded payload. Here are all the options
All parameters | |
---|---|
inputs (required) | |
query (required) | The query in plain text that you want to ask the table |
table (required) | A table of data represented as a dict of list where entries are headers and the lists are all the values, all lists must have the same size. |
options | a dict containing the following keys: |
use_cache | (Default: true ). Boolean. There is a cache layer on the inference API to speedup requests we have already seen. Most models can use those results as is as models are deterministic (meaning the results will be the same anyway). However if you use a non deterministic model, you can set this parameter to prevent the caching mechanism from being used resulting in a real new query. |
wait_for_model | (Default: false ) Boolean. If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error as it will limit hanging in your application to known places. |
Return value is either a dict or a list of dicts if you sent a list of inputs
self.assertEqual(
data,
{
"answer": "AVERAGE > 36542",
"coordinates": [[0, 1]],
"cells": ["36542"],
"aggregator": "AVERAGE",
},
)
Returned values | |
---|---|
answer | The plaintext answer |
coordinates | a list of coordinates of the cells referenced in the answer |
cells | a list of coordinates of the cells contents |
aggregator | The aggregator used to get the answer |
Sentence Similarity task
Calculate the semantic similarity between one text and a list of other sentences by comparing their embeddings.
Recommended model: sentence-transformers/all-MiniLM-L6-v2.
Available with: Sentence Transformers
Example:
import json
import requests
headers = {"Authorization": f"Bearer {API_TOKEN}"}
API_URL = "https://api-inference.huggingface.co/models/sentence-transformers/all-MiniLM-L6-v2"
def query(payload):
data = json.dumps(payload)
response = requests.request("POST", API_URL, headers=headers, data=data)
return json.loads(response.content.decode("utf-8"))
data = query(
{
"inputs": {
"source_sentence": "That is a happy person",
"sentences": ["That is a happy dog", "That is a very happy person", "Today is a sunny day"],
}
}
)
When sending your request, you should send a JSON encoded payload. Here are all the options
All parameters | |
---|---|
inputs (required) | |
source_sentence (required) | The string that you wish to compare the other strings with. This can be a phrase, sentence, or longer passage, depending on the model being used. |
sentences (required) | A list of strings which will be compared against the source_sentence. |
options | a dict containing the following keys: |
use_cache | (Default: true ). Boolean. There is a cache layer on the inference API to speedup requests we have already seen. Most models can use those results as is as models are deterministic (meaning the results will be the same anyway). However if you use a non deterministic model, you can set this parameter to prevent the caching mechanism from being used resulting in a real new query. |
wait_for_model | (Default: false ) Boolean. If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error as it will limit hanging in your application to known places. |
The return value is a list of similarity scores, given as floats.
self.assertEqual(
deep_round(data),
deep_round([0.6945773363113403, 0.9429150819778442, 0.2568760812282562]),
)
Returned values | |
---|---|
Scores | The associated similarity score for each of the given strings |
Text Classification task
Usually used for sentiment-analysis this will output the likelihood of classes of an input.
Recommended model: distilbert-base-uncased-finetuned-sst-2-english
Available with: 🤗 Transformers
Example:
import json
import requests
headers = {"Authorization": f"Bearer {API_TOKEN}"}
API_URL = "https://api-inference.huggingface.co/models/distilbert-base-uncased-finetuned-sst-2-english"
def query(payload):
data = json.dumps(payload)
response = requests.request("POST", API_URL, headers=headers, data=data)
return json.loads(response.content.decode("utf-8"))
data = query({"inputs": "I like you. I love you"})
When sending your request, you should send a JSON encoded payload. Here are all the options
All parameters | |
---|---|
inputs (required) | a string to be classified |
options | a dict containing the following keys: |
use_cache | (Default: true ). Boolean. There is a cache layer on the inference API to speedup requests we have already seen. Most models can use those results as is as models are deterministic (meaning the results will be the same anyway). However if you use a non deterministic model, you can set this parameter to prevent the caching mechanism from being used resulting in a real new query. |
wait_for_model | (Default: false ) Boolean. If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error as it will limit hanging in your application to known places. |
Return value is either a dict or a list of dicts if you sent a list of inputs
self.assertEqual(
deep_round(data),
[
[
{"label": "POSITIVE", "score": 0.9999},
{"label": "NEGATIVE", "score": 0.0001},
]
],
)
Returned values | |
---|---|
label | The label for the class (model specific) |
score | A floats that represents how likely is that the text belongs the this class. |
Text Generation task
Use to continue text from a prompt. This is a very generic task.
Recommended model: gpt2 (it’s a simple model, but fun to play with).
Available with: 🤗 Transformers
Example:
import json
import requests
headers = {"Authorization": f"Bearer {API_TOKEN}"}
API_URL = "https://api-inference.huggingface.co/models/gpt2"
def query(payload):
data = json.dumps(payload)
response = requests.request("POST", API_URL, headers=headers, data=data)
return json.loads(response.content.decode("utf-8"))
data = query({"inputs": "The answer to the universe is"})
When sending your request, you should send a JSON encoded payload. Here are all the options
All parameters | |
---|---|
inputs (required): | a string to be generated from |
parameters | dict containing the following keys: |
top_k | (Default: None ). Integer to define the top tokens considered within the sample operation to create new text. |
top_p | (Default: None ). Float to define the tokens that are within the sample operation of text generation. Add tokens in the sample for more probable to least probable until the sum of the probabilities is greater than top_p . |
temperature | (Default: 1.0 ). Float (0.0-100.0). The temperature of the sampling operation. 1 means regular sampling, 0 means always take the highest score, 100.0 is getting closer to uniform probability. |
repetition_penalty | (Default: None ). Float (0.0-100.0). The more a token is used within generation the more it is penalized to not be picked in successive generation passes. |
max_new_tokens | (Default: None ). Int (0-250). The amount of new tokens to be generated, this does not include the input length it is a estimate of the size of generated text you want. Each new tokens slows down the request, so look for balance between response times and length of text generated. |
max_time | (Default: None ). Float (0-120.0). The amount of time in seconds that the query should take maximum. Network can cause some overhead so it will be a soft limit. Use that in combination with max_new_tokens for best results. |
return_full_text | (Default: True ). Bool. If set to False, the return results will not contain the original query making it easier for prompting. |
num_return_sequences | (Default: 1 ). Integer. The number of proposition you want to be returned. |
do_sample | (Optional: True ). Bool. Whether or not to use sampling, use greedy decoding otherwise. |
options | a dict containing the following keys: |
use_cache | (Default: true ). Boolean. There is a cache layer on the inference API to speedup requests we have already seen. Most models can use those results as is as models are deterministic (meaning the results will be the same anyway). However if you use a non deterministic model, you can set this parameter to prevent the caching mechanism from being used resulting in a real new query. |
wait_for_model | (Default: false ) Boolean. If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error as it will limit hanging in your application to known places. |
Return value is either a dict or a list of dicts if you sent a list of inputs
data == [
{
"generated_text": 'The answer to the universe is that we are the creation of the entire universe," says Fitch.\n\nAs of the 1960s, six times as many Americans still make fewer than six bucks ($17) per year on their way to retirement.'
}
]
Returned values | |
---|---|
generated_text | The continuated string |
Text2Text Generation task
Essentially Text-generation task. But uses Encoder-Decoder architecture, so might change in the future for more options.
Token Classification task
Usually used for sentence parsing, either grammatical, or Named Entity Recognition (NER) to understand keywords contained within text.
Recommended model: dbmdz/bert-large-cased-finetuned-conll03-english
Available with: 🤗 Transformers, Flair
Example:
import json
import requests
headers = {"Authorization": f"Bearer {API_TOKEN}"}
API_URL = "https://api-inference.huggingface.co/models/dbmdz/bert-large-cased-finetuned-conll03-english"
def query(payload):
data = json.dumps(payload)
response = requests.request("POST", API_URL, headers=headers, data=data)
return json.loads(response.content.decode("utf-8"))
data = query({"inputs": "My name is Sarah Jessica Parker but you can call me Jessica"})
When sending your request, you should send a JSON encoded payload. Here are all the options
All parameters | |
---|---|
inputs (required) | a string to be classified |
parameters | a dict containing the following key: |
aggregation_strategy | (Default: simple ). There are several aggregation strategies: none : Every token gets classified without further aggregation. simple : Entities are grouped according to the default schema (B-, I- tags get merged when the tag is similar). first : Same as the simple strategy except words cannot end up with different tags. Words will use the tag of the first token when there is ambiguity. average : Same as the simple strategy except words cannot end up with different tags. Scores are averaged across tokens and then the maximum label is applied. max : Same as the simple strategy except words cannot end up with different tags. Word entity will be the token with the maximum score. |
options | a dict containing the following keys: |
use_cache | (Default: true ). Boolean. There is a cache layer on the inference API to speedup requests we have already seen. Most models can use those results as is as models are deterministic (meaning the results will be the same anyway). However if you use a non deterministic model, you can set this parameter to prevent the caching mechanism from being used resulting in a real new query. |
wait_for_model | (Default: false ) Boolean. If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error as it will limit hanging in your application to known places. |
Return value is either a dict or a list of dicts if you sent a list of inputs
self.assertEqual(
deep_round(data),
[
{
"entity_group": "PER",
"score": 0.9991,
"word": "Sarah Jessica Parker",
"start": 11,
"end": 31,
},
{
"entity_group": "PER",
"score": 0.998,
"word": "Jessica",
"start": 52,
"end": 59,
},
],
)
Returned values | |
---|---|
entity_group | The type for the entity being recognized (model specific). |
score | How likely the entity was recognized. |
word | The string that was captured |
start | The offset stringwise where the answer is located. Useful to disambiguate if word occurs multiple times. |
end | The offset stringwise where the answer is located. Useful to disambiguate if word occurs multiple times. |
Named Entity Recognition (NER) task
Translation task
This task is well known to translate text from one language to another
Recommended model: Helsinki-NLP/opus-mt-ru-en. Helsinki-NLP uploaded many models with many language pairs. Recommended model: t5-base.
Available with: 🤗 Transformers
Example:
import json
import requests
headers = {"Authorization": f"Bearer {API_TOKEN}"}
API_URL = "https://api-inference.huggingface.co/models/Helsinki-NLP/opus-mt-ru-en"
def query(payload):
data = json.dumps(payload)
response = requests.request("POST", API_URL, headers=headers, data=data)
return json.loads(response.content.decode("utf-8"))
data = query(
{
"inputs": "Меня зовут Вольфганг и я живу в Берлине",
}
)
# Response
self.assertEqual(
data,
[
{
"translation_text": "My name is Wolfgang and I live in Berlin.",
},
],
)
When sending your request, you should send a JSON encoded payload. Here are all the options
All parameters | |
---|---|
inputs (required) | a string to be translated in the original languages |
options | a dict containing the following keys: |
use_cache | (Default: true ). Boolean. There is a cache layer on the inference API to speedup requests we have already seen. Most models can use those results as is as models are deterministic (meaning the results will be the same anyway). However if you use a non deterministic model, you can set this parameter to prevent the caching mechanism from being used resulting in a real new query. |
wait_for_model | (Default: false ) Boolean. If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error as it will limit hanging in your application to known places. |
Return value is either a dict or a list of dicts if you sent a list of inputs
Returned values | |
---|---|
translation_text | The string after translation |
Zero-Shot Classification task
This task is super useful to try out classification with zero code, you simply pass a sentence/paragraph and the possible labels for that sentence, and you get a result.
Recommended model: facebook/bart-large-mnli.
Available with: 🤗 Transformers
Request:
import json
import requests
headers = {"Authorization": f"Bearer {API_TOKEN}"}
API_URL = "https://api-inference.huggingface.co/models/facebook/bart-large-mnli"
def query(payload):
data = json.dumps(payload)
response = requests.request("POST", API_URL, headers=headers, data=data)
return json.loads(response.content.decode("utf-8"))
data = query(
{
"inputs": "Hi, I recently bought a device from your company but it is not working as advertised and I would like to get reimbursed!",
"parameters": {"candidate_labels": ["refund", "legal", "faq"]},
}
)
When sending your request, you should send a JSON encoded payload. Here are all the options
All parameters | |
---|---|
inputs (required) | a string or list of strings |
parameters (required) | a dict containing the following keys: |
candidate_labels (required) | a list of strings that are potential classes for inputs . (max 10 candidate_labels, for more, simply run multiple requests, results are going to be misleading if using too many candidate_labels anyway. If you want to keep the exact same, you can simply run multi_label=True and do the scaling on your end. ) |
multi_label | (Default: false ) Boolean that is set to True if classes can overlap |
options | a dict containing the following keys: |
use_cache | (Default: true ). Boolean. There is a cache layer on the inference API to speedup requests we have already seen. Most models can use those results as is as models are deterministic (meaning the results will be the same anyway). However if you use a non deterministic model, you can set this parameter to prevent the caching mechanism from being used resulting in a real new query. |
wait_for_model | (Default: false ) Boolean. If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error as it will limit hanging in your application to known places. |
Return value is either a dict or a list of dicts if you sent a list of inputs
Response:
self.assertEqual(
deep_round(data),
{
"sequence": "Hi, I recently bought a device from your company but it is not working as advertised and I would like to get reimbursed!",
"labels": ["refund", "faq", "legal"],
"scores": [
# 88% refund
0.8778,
0.1052,
0.017,
],
},
)
Returned values | |
---|---|
sequence | The string sent as an input |
labels | The list of strings for labels that you sent (in order) |
scores | a list of floats that correspond the the probability of label, in the same order as labels . |
Conversational task
This task corresponds to any chatbot like structure. Models tend to have shorter max_length, so please check with caution when using a given model if you need long range dependency or not.
Recommended model: microsoft/DialoGPT-large.
Available with: 🤗 Transformers
Example:
import json
import requests
headers = {"Authorization": f"Bearer {API_TOKEN}"}
API_URL = "https://api-inference.huggingface.co/models/microsoft/DialoGPT-large"
def query(payload):
data = json.dumps(payload)
response = requests.request("POST", API_URL, headers=headers, data=data)
return json.loads(response.content.decode("utf-8"))
data = query(
{
"inputs": {
"past_user_inputs": ["Which movie is the best ?"],
"generated_responses": ["It's Die Hard for sure."],
"text": "Can you explain why ?",
},
}
)
# Response
# This is annoying
data.pop("warnings")
self.assertEqual(
data,
{
"generated_text": "It's the best movie ever.",
"conversation": {
"past_user_inputs": [
"Which movie is the best ?",
"Can you explain why ?",
],
"generated_responses": [
"It's Die Hard for sure.",
"It's the best movie ever.",
],
},
# "warnings": ["Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation."],
},
)
When sending your request, you should send a JSON encoded payload. Here are all the options
All parameters | |
---|---|
inputs (required) | |
text (required) | The last input from the user in the conversation. |
generated_responses | A list of strings corresponding to the earlier replies from the model. |
past_user_inputs | A list of strings corresponding to the earlier replies from the user. Should be of the same length of generated_responses . |
parameters | a dict containing the following keys: |
min_length | (Default: None ). Integer to define the minimum length in tokens of the output summary. |
max_length | (Default: None ). Integer to define the maximum length in tokens of the output summary. |
top_k | (Default: None ). Integer to define the top tokens considered within the sample operation to create new text. |
top_p | (Default: None ). Float to define the tokens that are within the sample operation of text generation. Add tokens in the sample for more probable to least probable until the sum of the probabilities is greater than top_p . |
temperature | (Default: 1.0 ). Float (0.0-100.0). The temperature of the sampling operation. 1 means regular sampling, 0 means always take the highest score, 100.0 is getting closer to uniform probability. |
repetition_penalty | (Default: None ). Float (0.0-100.0). The more a token is used within generation the more it is penalized to not be picked in successive generation passes. |
max_time | (Default: None ). Float (0-120.0). The amount of time in seconds that the query should take maximum. Network can cause some overhead so it will be a soft limit. |
options | a dict containing the following keys: |
use_cache | (Default: true ). Boolean. There is a cache layer on the inference API to speedup requests we have already seen. Most models can use those results as is as models are deterministic (meaning the results will be the same anyway). However if you use a non deterministic model, you can set this parameter to prevent the caching mechanism from being used resulting in a real new query. |
wait_for_model | (Default: false ) Boolean. If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error as it will limit hanging in your application to known places. |
Return value is either a dict or a list of dicts if you sent a list of inputs
Returned values | |
---|---|
generated_text | The answer of the bot |
conversation | A facility dictionnary to send back for the next input (with the new user input addition). |
past_user_inputs | List of strings. The last inputs from the user in the conversation, after the model has run. |
generated_responses | List of strings. The last outputs from the model in the conversation, after the model has run. |
Feature Extraction task
This task reads some text and outputs raw float values, that are usually consumed as part of a semantic database/semantic search.
Recommended model: Sentence-transformers.
Available with: 🤗 Transformers Sentence-transformers
Request:
All parameters | |
---|---|
inputs (required): | a string or a list of strings to get the features from. |
options | a dict containing the following keys: |
use_cache | (Default: true ). Boolean. There is a cache layer on the inference API to speedup requests we have already seen. Most models can use those results as is as models are deterministic (meaning the results will be the same anyway). However if you use a non deterministic model, you can set this parameter to prevent the caching mechanism from being used resulting in a real new query. |
wait_for_model | (Default: false ) Boolean. If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error as it will limit hanging in your application to known places. |
Return value is either a dict or a list of dicts if you sent a list of inputs
Returned values | |
---|---|
A list of float (or list of list of floats) | The numbers that are the representation features of the input. |
Audio
Automatic Speech Recognition task
This task reads some audio input and outputs the said words within the audio files.
Recommended model: Check your langage.
English: facebook/wav2vec2-large-960h-lv60-self.
Available with: 🤗 Transformers ESPnet and SpeechBrain
Request:
import json
import requests
headers = {"Authorization": f"Bearer {API_TOKEN}"}
API_URL = "https://api-inference.huggingface.co/models/facebook/wav2vec2-base-960h"
def query(filename):
with open(filename, "rb") as f:
data = f.read()
response = requests.request("POST", API_URL, headers=headers, data=data)
return json.loads(response.content.decode("utf-8"))
data = query("sample1.flac")
When sending your request, you should send a binary payload that simply contains your audio file. We try to support most formats (Flac, Wav, Mp3, Ogg etc...). And we automatically rescale the sampling rate to the appropriate rate for the given model (usually 16KHz).
All parameters | |
---|---|
no parameter (required) | a binary representation of the audio file. No other parameters are currently allowed. |
Return value is either a dict or a list of dicts if you sent a list of inputs
Response:
self.assertEqual(
data,
{
"text": "GOING ALONG SLUSHY COUNTRY ROADS AND SPEAKING TO DAMP AUDIENCES IN DRAUGHTY SCHOOL ROOMS DAY AFTER DAY FOR A FORTNIGHT HE'LL HAVE TO PUT IN AN APPEARANCE AT SOME PLACE OF WORSHIP ON SUNDAY MORNING AND HE CAN COME TO US IMMEDIATELY AFTERWARDS"
},
)
Returned values | |
---|---|
text | The string that was recognized within the audio file. |
Audio Classification task
This task reads some audio input and outputs the likelihood of classes.
Recommended model: superb/hubert-large-superb-er.
Available with: 🤗 Transformers SpeechBrain
Request:
import json
import requests
headers = {"Authorization": f"Bearer {API_TOKEN}"}
API_URL = "https://api-inference.huggingface.co/models/superb/hubert-large-superb-er"
def query(filename):
with open(filename, "rb") as f:
data = f.read()
response = requests.request("POST", API_URL, headers=headers, data=data)
return json.loads(response.content.decode("utf-8"))
data = query("sample1.flac")
When sending your request, you should send a binary payload that simply contains your audio file. We try to support most formats (Flac, Wav, Mp3, Ogg etc...). And we automatically rescale the sampling rate to the appropriate rate for the given model (usually 16KHz).
All parameters | |
---|---|
no parameter (required) | a binary representation of the audio file. No other parameters are currently allowed. |
Return value is a dict
self.assertEqual(
deep_round(data, 4),
[
{"score": 0.5928, "label": "neu"},
{"score": 0.2003, "label": "hap"},
{"score": 0.128, "label": "ang"},
{"score": 0.079, "label": "sad"},
],
)
Returned values | |
---|---|
label | The label for the class (model specific) |
score | A float that represents how likely it is that the audio file belongs to this class. |
Computer Vision
Image Classification task
This task reads some image input and outputs the likelihood of classes.
Recommended model: google/vit-base-patch16-224.
Available with: 🤗 Transformers
Request:
import json
import requests
headers = {"Authorization": f"Bearer {API_TOKEN}"}
API_URL = "https://api-inference.huggingface.co/models/google/vit-base-patch16-224"
def query(filename):
with open(filename, "rb") as f:
data = f.read()
response = requests.request("POST", API_URL, headers=headers, data=data)
return json.loads(response.content.decode("utf-8"))
data = query("cats.jpg")
When sending your request, you should send a binary payload that simply contains your image file. We support all image formats Pillow supports.
All parameters | |
---|---|
no parameter (required) | a binary representation of the image file. No other parameters are currently allowed. |
Return value is a dict
self.assertEqual(
deep_round(data, 4),
[
{"score": 0.9374, "label": "Egyptian cat"},
{"score": 0.0384, "label": "tabby, tabby cat"},
{"score": 0.0144, "label": "tiger cat"},
{"score": 0.0033, "label": "lynx, catamount"},
{"score": 0.0007, "label": "Siamese cat, Siamese"},
],
)
Returned values | |
---|---|
label | The label for the class (model specific) |
score | A float that represents how likely it is that the image file belongs to this class. |
Object Detection task
This task reads some image input and outputs the likelihood of classes & bounding boxes of detected objects.
Recommended model: facebook/detr-resnet-50.
Available with: 🤗 Transformers
Request:
import json
import requests
headers = {"Authorization": f"Bearer {API_TOKEN}"}
API_URL = "https://api-inference.huggingface.co/models/facebook/detr-resnet-50"
def query(filename):
with open(filename, "rb") as f:
data = f.read()
response = requests.request("POST", API_URL, headers=headers, data=data)
return json.loads(response.content.decode("utf-8"))
data = query("cats.jpg")
When sending your request, you should send a binary payload that simply contains your image file. We support all image formats Pillow supports.
All parameters | |
---|---|
no parameter (required) | a binary representation of the image file. No other parameters are currently allowed. |
Return value is a dict
self.assertEqual(
deep_round(data, 4),
[
{
"score": 0.9982,
"label": "remote",
"box": {"xmin": 40, "ymin": 70, "xmax": 175, "ymax": 117},
},
{
"score": 0.9960,
"label": "remote",
"box": {"xmin": 333, "ymin": 72, "xmax": 368, "ymax": 187},
},
{
"score": 0.9955,
"label": "couch",
"box": {"xmin": 0, "ymin": 1, "xmax": 639, "ymax": 473},
},
{
"score": 0.9988,
"label": "cat",
"box": {"xmin": 13, "ymin": 52, "xmax": 314, "ymax": 470},
},
{
"score": 0.9987,
"label": "cat",
"box": {"xmin": 345, "ymin": 23, "xmax": 640, "ymax": 368},
},
],
)
Returned values | |
---|---|
label | The label for the class (model specific) of a detected object. |
score | A float that represents how likely it is that the detected object belongs to the given class. |
box | A dict (with keys [xmin,ymin,xmax,ymax]) representing the bounding box of a detected object. |
Image Segmentation task
This task reads some image input and outputs the likelihood of classes & bounding boxes of detected objects.
Recommended model: facebook/detr-resnet-50-panoptic.
Available with: 🤗 Transformers
Request:
import json
import requests
headers = {"Authorization": f"Bearer {API_TOKEN}"}
API_URL = "https://api-inference.huggingface.co/models/facebook/detr-resnet-50-panoptic"
def query(filename):
with open(filename, "rb") as f:
data = f.read()
response = requests.request("POST", API_URL, headers=headers, data=data)
return json.loads(response.content.decode("utf-8"))
data = query("cats.jpg")
When sending your request, you should send a binary payload that simply contains your image file. We support all image formats Pillow supports.
All parameters | |
---|---|
no parameter (required) | a binary representation of the image file. No other parameters are currently allowed. |
Return value is a dict
import base64
from io import BytesIO
from PIL import Image
with Image.open("cats.jpg") as img:
masks = [d["mask"] for d in data]
self.assertEqual(img.size, (640, 480))
mask_imgs = [Image.open(BytesIO(base64.b64decode(mask))) for mask in masks]
for mask_img in mask_imgs:
self.assertEqual(mask_img.size, img.size)
self.assertEqual(mask_img.mode, "L") # L (8-bit pixels, black and white)
first_mask_img = mask_imgs[0]
min_pxl_val, max_pxl_val = first_mask_img.getextrema()
self.assertGreaterEqual(min_pxl_val, 0)
self.assertLessEqual(max_pxl_val, 255)
Returned values | |
---|---|
label | The label for the class (model specific) of a segment. |
score | A float that represents how likely it is that the segment belongs to the given class. |
mask | A str (base64 str of a single channel black-and-white img) representing the mask of a segment. |