Change Password

Please enter the password.
Please enter the password. Between 8-64 characters. Not identical to your email address. Contain at least 3 of: uppercase, lowercase, numbers, and special characters.
Please enter the password.
Submit

Change Nickname

Current Nickname:
Submit

Apply New License

License Detail

Please complete this required field.

  • Ultipa Blaze (v4)
  • Ultipa Powerhouse (v5)

Standalone

learn more about the four main severs in the architecture of Ultipa Powerhouse (v5) , click

here

Please complete this required field.

Please complete this required field.

Please complete this required field.

Please complete this required field.

Leave it blank if an HDC service is not required.

Please complete this required field.

Leave it blank if an HDC service is not required.

Please complete this required field.

Please complete this required field.

Mac addresses of all servers, separated by line break or comma.

Please complete this required field.

Please complete this required field.

Cancel
Apply
ID
Product
Status
Cores
Maximum Shard Services
Maximum Total Cores for Shard Service
Maximum HDC Services
Maximum Total Cores for HDC Service
Applied Validity Period(days)
Effective Date
Expired Date
Mac Address
Reason for Application
Review Comment
Close
Profile
  • Full Name:
  • Phone:
  • Company:
  • Company Email:
Change Password
Apply

You have no license application record.

Apply
Certificate Issued at Valid until Serial No. File
Serial No. Valid until File

Not having one? Apply now! >>>

Product Created On ID Amount (USD) Invoice
Product Created On ID Amount (USD) Invoice

No Invoice

v5.2
Search
    English
    v5.2

      Jaccard Similarity

      HDC

      Overview

      Jaccard similarity, or Jaccard index, was proposed by Paul Jaccard in 1901. It’s a metric of similarity for two sets of data. In the graph, collecting the neighbors of a node into a set, two nodes are considered similar if their neighborhood sets are similar.

      Jaccard similarity ranges from 0 to 1, where 1 indicates that two sets are identical, and 0 indicates that they share no common elements.

      Concepts

      Jaccard Similarity

      Given two sets A and B, the Jaccard similarity between them is computed as:

      In the following example, set A = {b,c,e,f,g}, set B = {a,d,b,g}, their intersection A⋂B = {b,g}, their union A⋃B = {a,b,c,d,e,f,g}, hence the Jaccard similarity between A and B is 2 / 7 = 0.285714.

      When applying Jaccard Similarity to compare two nodes in a graph, we use the 1-hop neighborhood set to represent each target node. The 1-hop neighborhood set:

      • contains no repeated nodes;
      • excludes the two target nodes.

      In this graph, the 1-hop neighborhood set of nodes u and v is:

      • Nu = {a,b,c,d,e}
      • Nv = {d,e,f}

      Therefore, the Jaccard similarity between nodes u and v is 2 / 6 = 0.333333.

      In practice, you may need to convert some node properties into node schemas in order to calculate the similarity index that is based on common neighbors, just as the Jaccard Similarity. For instance, when considering the similarity between two applications, information like phone number, email, device IP, etc. of the application might have been stored as properties of @application node schema; they need to be designed as nodes and incorporated into the graph in order to be used for comparison.

      Weighted Jaccard Similarity

      The Weighted Jaccard Similarity is an extension of the classic Jaccard Similarity that takes into account the weights associated with elements in the sets being compared.

      The formula for Weighted Jaccard Similarity is given by:

      In this weighted graph, the union of the 1-hop neighborhood sets Nu and Nv is {a,b,c,d,e,f}. For each element in the union set, assign a value equal to the sum of the edge weights between the target node and the corresponding node; assign 0 if no edge exists between them:

      a b c d e f
      N'u 1 1 1 1 0.5 0
      N'v 0 0 0 0.5 1.5 + 0.1 =1.6 1

      Therefore, the Weighted Jaccard Similarity between nodes u and v is (0+0+0+0.5+0.5+0) / (1+1+1+1+1.6+1) = 0.151515.

      Please ensure that the sum of the edge weights between the target node and the neighboring node is greater than or equal to 0.

      Considerations

      • The Jaccard Similarity algorithm ignores the direction of edges but calculates them as undirected edges.
      • The Jaccard Similarity algorithm ignores any self-loop.

      Example Graph

      Run the following statements on an empty graph to define its structure and insert data:

      ALTER GRAPH CURRENT_GRAPH ADD NODE {
        user (),
        sport()
      };
      ALTER GRAPH CURRENT_GRAPH ADD EDGE {
        like ()-[{weight int32}]->()
      };
      INSERT (userA:user {_id: "userA"}),
             (userB:user {_id: "userB"}),
             (userC:user {_id: "userC"}),
             (userD:user {_id: "userD"}),
             (running:sport {_id: "running"}),
             (tennis:sport {_id: "tennis"}),
             (baseball:sport {_id: "baseball"}),
             (swimming:sport {_id: "swimming"}),
             (badminton:sport {_id: "badminton"}),
             (iceball:sport {_id: "iceball"}),
             (userA)-[:like {weight: 2}]->(tennis),
             (userA)-[:like {weight: 1}]->(baseball),
             (userA)-[:like {weight: 3}]->(swimming),
             (userA)-[:like {weight: 2}]->(badminton),
             (userB)-[:like {weight: 1}]->(running),
             (userB)-[:like {weight: 3}]->(swimming),
             (userC)-[:like {weight: 2}]->(swimming),
             (userD)-[:like {weight: 1}]->(running),
             (userD)-[:like {weight: 2}]->(badminton),
             (userD)-[:like {weight: 2}]->(iceball);
      

      create().node_schema("user").node_schema("sport").edge_schema("like");
      create().edge_property(@like, "weight", int32);
      insert().into(@user).nodes([{_id:"userA"}, {_id:"userB"}, {_id:"userC"}, {_id:"userD"}]);
      insert().into(@sport).nodes([{_id:"running"}, {_id:"tennis"}, {_id:"baseball"}, {_id:"swimming"}, {_id:"badminton"}, {_id:"iceball"}]);
      insert().into(@like).edges([{_from:"userA", _to:"tennis", weight:2}, {_from:"userA", _to:"baseball", weight:1}, {_from:"userA", _to:"swimming", weight:3}, {_from:"userA", _to:"badminton", weight:2}, {_from:"userB", _to:"running", weight:1}, {_from:"userB", _to:"swimming", weight:3}, {_from:"userC", _to:"swimming", weight:2}, {_from:"userD", _to:"running", weight:1}, {_from:"userD", _to:"badminton", weight:2}, {_from:"userD", _to:"iceball", weight:2}]);
      

      Creating HDC Graph

      To load the entire graph to the HDC server hdc-server-1 as my_hdc_graph:

      CREATE HDC GRAPH my_hdc_graph ON "hdc-server-1" OPTIONS {
        nodes: {"*": ["*"]},
        edges: {"*": ["*"]},
        direction: "undirected",
        load_id: true,
        update: "static"
      }
      

      hdc.graph.create("my_hdc_graph", {
        nodes: {"*": ["*"]},
        edges: {"*": ["*"]},
        direction: "undirected",
        load_id: true,
        update: "static"
      }).to("hdc-server-1")
      

      Parameters

      Algorithm name: similarity

      Name Type Spec Default Optional Description
      ids/uuids _id/_uuid
      /
      /
      Yes Specifies the first group of nodes by their _id or _uuid. If unset, all nodes in the graph are used as the first group of nodes. The algorithm supports two calculation modes:

      • Pairing mode: When both ids/uuids and ids2/uuids2 are set, each node in ids/uuids is paired with each node in ids2/uuids2 (excluding self-pairs), and their pairwise similarities are computed.
      • Selection mode: When only ids/uuids is set, the algorithm computes similarities between each specified node and all other nodes in the graph. Results include all (or a limited number of) nodes with a similarity > 0, sorted in descending order.
      ids2/uuids2 _id/_uuid
      /
      /
      Yes Specifies the second group of nodes for pairwise similarity by their _id or _uuid. If only ids2/uuids2 is set (and ids/uuids is not), the algorithm returns no result.
      type String jaccard cosine No Specifies the type of similarity to compute; for Jaccard Similarity, keep it as jaccard.
      edge_weight_property []"<@schema.?><property>"
      /
      /
      Yes Specifies numeric edge properties to be used as edge weights by summing their values; edges without these properties are ignored.
      return_id_uuid String uuid,id,both uuid Yes Includes _uuid, _id, or both to represent nodes in the results.
      order String asc,desc
      /
      Yes Sorts the results by similarity.
      limit Integer ≥-1 -1 Yes Limits the number of results returned. Set to -1 to include all results.
      top_limit Integer ≥-1 -1 Yes Limits the number of results returned for each node specified with ids/uuids in selection mode. Set to -1 to include all results with a similarity greater than 0. This parameter is invalid in pairing mode.

      File Writeback

      CALL algo.similarity.write("my_hdc_graph", {
        return_id_uuid: "id",
        ids: "userC",
        ids2: ["userA", "userB", "userD"],
        type: "jaccard"
      }, {
        file: {
          filename: "jaccard"
        }
      })
      

      algo(similarity).params({
        projection: "my_hdc_graph",
        return_id_uuid: "id",
        ids: "userC",
        ids2: ["userA", "userB", "userD"],
        type: "jaccard"  
      }).write({
        file: {
          filename: "jaccard"
        }
      })
      

      Result:

      _id1,_id2,similarity
      userC,userA,0.25
      userC,userB,0.5
      userC,userD,0
      

      Full Return

      CALL algo.similarity.run("my_hdc_graph", {
        return_id_uuid: "id",
        ids: ["userA","userB"], 
        ids2: ["userB","userC","userD"],
        type: "jaccard"
      }) YIELD jacc
      RETURN jacc
      

      exec{
        algo(similarity).params({
          return_id_uuid: "id",
          ids: ["userA","userB"], 
          ids2: ["userB","userC","userD"],
          type: "jaccard"
        }) as jacc
        return jacc
      } on my_hdc_graph
      

      Result:

      _id1 _id2 similarity
      userA userB 0.2
      userA userC 0.25
      userA userD 0.166667
      userB userC 0.5
      userB userD 0.25

      Stream Return

      CALL algo.similarity.stream("my_hdc_graph", {
        return_id_uuid: "id",
        ids: ["userA"], 
        type: "jaccard",
        edge_weight_property: "weight",
        top_limit: 2    
      }) YIELD jacc
      RETURN jacc
      

      exec{
        algo(similarity).params({
          return_id_uuid: "id",
          ids: ["userA"], 
          type: "jaccard",
          edge_weight_property: "weight",
          top_limit: 2  
        }).stream() as jacc
        return jacc
      } on my_hdc_graph
      

      Result:

      _id1 _id2 similarity
      userA userB 0.333333
      userA userC 0.25
      Please complete the following information to download this book
      *
      公司名称不能为空
      *
      公司邮箱必须填写
      *
      你的名字必须填写
      *
      你的电话必须填写
      Privacy Policy
      Please agree to continue.

      Copyright © 2019-2025 Ultipa Inc. – All Rights Reserved   |  Security   |  Legal Notices   |  Web Use Notices