Items and Users Properties¶
Items and users meta-data is called “properties”. For instance an item might be represented as follows:
{
"item_id": "57243",
"category_id": 3,
"genre": "drama",
"tags": ["family", "sci-fi"],
"price": 9.99,
"summary": "An eccentric yet compassionate extraterrestrial Time Lord zips through time and space [...]",
"poster": "https://www.themoviedb.org/tv/57243-doctor-who.jpg"
}
Why and How?¶
Using rich properties gives two advantages:
it improves the recommendations, especially for both cold-start problems where the algorithm relies only on properties (such as Semantic Graph Embedding from genres and tags, or Deep Content Extraction from text and images)
it enables your client to dynamically filter the recommendations on items satisfying certain criteria (such as a price smaller than a threshold given at runtime, or a geo-location close to given coordinates, see Filtering on Item Property)
Like in a SQL database, properties must be defined before you can insert items or users with these properties. The API does not automatically create new properties if a new key is detected during an API request. This choice effectively prevents development errors as soon as they occur.
In most use case of the Crossing Minds API, you don’t need to upload and maintain your item catalog yourself. Instead it is more common to leverage a CDP integration (like Segment, mParticle, or Shopify), or share entire data files with your dedicated ML Engineer. Nevertheless for some use cases it is preferable that you maintain all the properties by calling the API endpoints. See Uploading and Maintaining An Item Catalog below.
Property Types¶
Item properties can be of various types.
The available value_type
may be found in the “Value Types” column of the following table:
Domain |
Value Types |
Kind |
Filters |
Examples |
Example Values |
---|---|---|---|---|---|
integer |
|
scalar |
|
number of pages, year |
|
integer |
|
categorical scalar |
|
category ID |
|
float |
|
scalar |
|
price |
|
string |
|
categorical |
|
UTF8 genre name |
|
bytes |
|
categorical |
|
ASCII tag name, encrypted tag |
|
text |
|
long text |
|
review, synopsis |
|
url |
|
image |
poster, screenshot |
|
Notes:
Domains with
=
in “Filters” means they support theeq
,neq
,in
,notin
operators in recommendations filtersDomains with
<
in “Filters” means they support thelt
,gt
operators in recommendations filtersDomains with both
=
and<
in “Filters” also support thelte
,gte
operators in recommendations filtersDomains with
ft
in “Filters” means they support theftsearch
operator in recommendations filtersCategorical domains contribute to Semantic Graph Embedding
Scalar domains contribute to Shallow Content Extraction
Long text and image domains contribute to Deep Content Extraction
In integer domains, valid
<NBITS>
are8
,16
,32
and64
In float domains, valid
<NBITS>
are32
and64
In string and bytes domains,
<NCHARS>
encodes the maximum size. It defaults to the maximum of 255 chars
Repeated Values¶
Any property may be “repeated”, meaning that a single item may have an array of many values for this property. This is typically the case with properties like “tags” or “genres”.
Properties with repeated=True
also support recommendation filters.
For most filter operators, an item with a repeated property satisfies a filter on this property
if any of the repeated value satisfies the filter.
See Filter Logic for the detailed logic of filtering on repeated values.
Array-Optimized Format (Optional)¶
In JSON, the repeated values are represented using a simple list.
When using a client supporting binary serialization of the data (not JSON), you can use an “array-optimized” format to represent repeated values of many items. Using this format will save you memory, CPU and network bandwidth, but it is more complicated. This array-optimized format requires to separate the item properties in two groups:
items
, a single array to represent the non-repeated values, with at least the item ID;items_m2m
, a mapping from property name to arrays representing the repeated values.
The arrays in items_m2m
store a collection of 2-tuples for each of the many-to-many relations.
The first element item_index
is the (0-based) index of the item with respect to items
.
The second element value_id
is the property value.
For instance let’s take the following bulk of 4 items:
{
"items": [
{
"item_id": "a",
"price": 1.1,
"tags": [],
"genres": ["drama"]
},
{
"item_id": "b",
"price": 2.2,
"tags": [1, 2, 3],
"genres": ["drama", "comedy"]
},
{
"item_id": "c",
"price": 3.3,
"tags": [1, 2],
"genres": []
},
{
"item_id": "d",
"price": 4.4,
"tags": [1],
"genres": ["thriller", "romance"]
}
]
}
The array-optimized format would be:
|
|
|
||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
Uploading and Maintaining An Item Catalog¶
In rare use cases, you cannot maintain your item catalog using only a CDP integration or sharing entire data files regularly. Then, you can use the API endpoints to upload and maintain the property values.
This setup is needed when it is important that newly created items are recommended immediately, without waiting for the next catalog sync. This use case is fairly rare, as typically items can be created in advance, and controlling which items are recommended is simply achieved with filters.
Use the API Endpoint
POST items-properties/
to create new item properties.Use the API Endpoints
PUT items/<str:item_id>/properties/
,PATCH items/<str:item_id>/properties/
, andDELETE items/<str:item_id>/properties/
to replace, partial update, or delete property values of a single item.Use the API Endpoints
PUT items-bulk/properties/
,PATCH items-bulk/properties/
, andDELETE items-bulk/properties/
to replace, partial update, or delete items property values in bulk.
Sending large amount of data using HTTP calls can decrease performances. We suggest to follow these guidelines for better results:
use partial updates when possible, to only update what needs to be updated. Partial update is “shallow”, meaning it allows to send only a subset of key/value property mapping. But for repeated properties, you still send the full list of values (allowing to “delete” previous values from the list)
use the bulk operations when possible, to send many updates at the same time. Do not send huge requests to a single HTTP call, they would timeout. A good rule of thumb is to target a latency of single HTTP calls below one second, which would mean a few hundreds of KB approximately. Depending on the size of your values, it may be between ~50 items in the bulk to ~500 items per bulk.
use the default
wait_for_completion=true
when developing, to get error messages synchronously. When moving to production, usewait_for_completion=false
so that our backend will move the update to a background queue, and immediately return an empty response.