BigQuery¶
Before starting, make sure the bq command is set up to work with the
ord-on-gcp project on Google Cloud. Follow the instructions
here.
For simplicity, we currently do not attempt to modify existing tables as new data is added to the database; each upload uses a new table and imports a full copy of the current database.
Set up a BigQuery table¶
Run bq_schema.py to build an updated BigQuery schema for the
Reactionmessage:$ cd "${ORD_SCHEMA_ROOT}" $ python ord_schema/proto/bq_schema.py --output=bq_schema.json
Create a new BigQuery table and set the schema. Be sure to give the table a descriptive name that includes the date of the database snapshot:
$ DATASET=test $ TABLE=ORD_2020_05_06 $ bq mk --table "${DATASET}.${TABLE}" bq_schema.json
Load the database into BigQuery¶
Use proto_to_json.py to generate a JSONL dump of the database:
$ cd "${ORD_SCHEMA_ROOT}" $ python ord_schema/proto/proto_to_json.py \ --input="${ORD_DATA_ROOT}/data/*/*.pbtxt" \ --output=bq_data.jsonl
Upload the data to BigQuery:
$ bq load --source_format=NEWLINE_DELIMITED_JSON "${DATASET}.${TABLE}" bq_data.jsonl