BigQuery

Before starting, make sure the bq command is set up to work with the ord-on-gcp project on Google Cloud. Follow the instructions here.

For simplicity, we currently do not attempt to modify existing tables as new data is added to the database; each upload uses a new table and imports a full copy of the current database.

Set up a BigQuery table

  1. Run bq_schema.py to build an updated BigQuery schema for the Reaction message:

    $ cd "${ORD_SCHEMA_ROOT}"
    $ python ord_schema/proto/bq_schema.py --output=bq_schema.json
    
  2. Create a new BigQuery table and set the schema. Be sure to give the table a descriptive name that includes the date of the database snapshot:

    $ DATASET=test
    $ TABLE=ORD_2020_05_06
    $ bq mk --table "${DATASET}.${TABLE}" bq_schema.json
    

Load the database into BigQuery

  1. Use proto_to_json.py to generate a JSONL dump of the database:

    $ cd "${ORD_SCHEMA_ROOT}"
    $ python ord_schema/proto/proto_to_json.py \
        --input="${ORD_DATA_ROOT}/data/*/*.pbtxt" \
        --output=bq_data.jsonl
    
  2. Upload the data to BigQuery:

    $ bq load --source_format=NEWLINE_DELIMITED_JSON "${DATASET}.${TABLE}" bq_data.jsonl