UltipaDocs
Products
Solutions
Resources
Company
Start Free Trial
UltipaDocs
Start Free Trial
  • Ultipa CLI
    • Overview
    • Importer
    • Import Configurations
      • Import from CSV
      • Import from JSON / JSONL
      • Import from a Relational Database
      • Import from Neo4j
      • Import from BigQuery
      • Import from Kafka
      • Import from Hive
      • Import from Salesforce
      • Import from RDF
      • Import from GraphML
    • Exporter
    • Export Configurations
      • Export to CSV
      • Export to JSON / JSONL
      • Export to GraphML
  1. Docs
  2. /
  3. Ultipa Tools
  4. /
  5. Data Import

Import from Hive

This page walks through importing data from Apache Hive into a graph using gqldb-importer. The importer connects to HiveServer2, runs one HiveQL query per node/edge type, and streams the result rows into the graph.

Usage Guides

Verify Connectivity

Make sure HiveServer2 is reachable from the host where gqldb-importer will run, and that the account you'll use has SELECT on the tables you plan to read. The default HiveServer2 port is 10000.

Generate Configuration File

Bash
./gqldb-importer -sample hive

A file named import.sample.hive.yml will be created in the current directory. Rename it before editing so a re-run of -sample hive doesn't clobber your changes:

Bash
mv import.sample.hive.yml import.hive.yml

Modify Configuration File

Edit import.hive.yml. Hive-specific configuration lives under the top-level hive: block; see the Import Configurations for the rest of the file (server, settings).

config snippet
hive:
  host: "localhost"
  port: 10000
  auth: "NONE"               # NONE, NOSASL, KERBEROS
  database: "default"
  username: ""
  password: ""

  nodes:
    - schema: "Person"
      query: "SELECT id AS _id, name, age FROM users"
      id_column: "_id"
      properties:
        age: int32

  edges:
    - schema: "FOLLOWS"
      query: "SELECT follower_id, following_id FROM follows"
      from_column: "follower_id"
      to_column: "following_id"

Authentication Mode

The auth field selects how the HiveServer2 connection authenticates:

ValueDescription
NONEPlain authentication. username and password are sent as-is.
NOSASLNo SASL framing. Typically used when HiveServer2 has SASL disabled.
KERBEROSKerberos / GSSAPI. The host must have a valid Kerberos ticket cache available to the importer process.

Execute Import

Bash
./gqldb-importer -c import.hive.yml

Writing the Queries

The importer treats each query as a flat row source. Column names in the result are what id_column, from_column, to_column, and properties reference.

Node query — return one row per node with one column acting as the node _id:

SQL
-- Maps directly to id_column: "_id", schema: "Person"
SELECT id AS _id, name, age FROM users WHERE active = TRUE

Edge query — return one row per edge with source and destination _id columns:

SQL
-- Maps to from_column: "follower_id", to_column: "following_id", schema: "FOLLOWS"
SELECT follower_id, following_id, created_at FROM follows

A few practical tips:

  • Filter and aggregate at the source — Hive scans can be expensive, and pre-filtering with WHERE is faster than discarding rows downstream.
  • For partitioned tables, include the partition column in the WHERE clause to enable partition pruning.
  • Hive's MAP, ARRAY, and STRUCT column types are not directly representable as graph properties — flatten or stringify them in the query before they reach the importer.