This page walks through importing data from Google BigQuery into a graph using gqldb-importer. The importer authenticates with a service-account key, runs one SQL query per node/edge type against your BigQuery datasets, and streams the result rows into the graph.
Create (or reuse) a Google Cloud service account with BigQuery Data Viewer and BigQuery Job User roles on the project containing your datasets, then download its key as a JSON file. The path to that file is what you'll set as certFile in the configuration.
Bash./gqldb-importer -sample bigQuery
A file named import.sample.bigQuery.yml will be created in the current directory. Rename it before editing so a re-run of -sample bigQuery doesn't clobber your changes:
Bashmv import.sample.bigQuery.yml import.bigQuery.yml
Edit import.bigQuery.yml. BigQuery-specific configuration lives under the top-level bigQuery: block; see the Import Configurations for the rest of the file (server, settings).
config snippetbigQuery: projectId: "my-gcp-project" certFile: "./service-account.json" nodes: - schema: "Person" query: "SELECT id AS _id, name, age FROM my_dataset.users" id_column: "_id" properties: age: int32 edges: - schema: "FOLLOWS" query: "SELECT follower_id, following_id FROM my_dataset.follows" from_column: "follower_id" to_column: "following_id"
projectId — the GCP project that holds the datasets to read from.certFile — relative or absolute path to the downloaded service-account JSON key.Bash./gqldb-importer -c import.bigQuery.yml
Use standard BigQuery SQL (GoogleSQL). Reference tables with their fully-qualified dataset.table (or project.dataset.table if reading across projects). The aliases returned by the query are what id_column, from_column, to_column, and properties reference.
Node query — return one row per node, with one column as the node _id:
SQL-- Maps directly to id_column: "_id", schema: "Person" SELECT id AS _id, name, age FROM my_dataset.users WHERE active = TRUE
Edge query — return one row per edge, with columns for the source and destination _ids:
SQL-- Maps to from_column: "follower_id", to_column: "following_id", schema: "FOLLOWS" SELECT follower_id, following_id, created_at FROM my_dataset.follows
A few practical tips:
WHERE rather than pulling every row and discarding downstream. BigQuery bills on bytes scanned._PARTITIONTIME filters) or split a single logical type across multiple entries to parallelize the extraction.SAFE_CAST when the source column type may not match the target GQLDB type cleanly.