Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bigquery: support for nested JSON types #10930

Open
leclark opened this issue Sep 27, 2024 · 1 comment
Open

bigquery: support for nested JSON types #10930

leclark opened this issue Sep 27, 2024 · 1 comment
Assignees
Labels
api: bigquery Issues related to the BigQuery API. priority: p2 Moderately-important priority. Fix may not be included in next release.

Comments

@leclark
Copy link

leclark commented Sep 27, 2024

Is your feature request related to a problem? Please describe.

This PR added initial json support to the bigquery managedwriter/adapt pkg.

Unfortunately, it doesn't support using a json object as a field value. It errors if a regular nested json object is passed in.

The current test is using strings as the field values instead of json objects.

	sampleJSONData := [][]byte{
		[]byte(`{"json_type":"{\"foo\": \"bar\"}"}`),
		[]byte(`{"json_type":"{\"key\": \"value\"}"}`),
		[]byte(`{"json_type":"\"a string\""}`),
	}

Describe the solution you'd like

Add support for nested json objects that aren't strings and update the test to include a mix of types including nested json objects.

e.g.

	sampleJSONData := [][]byte{
		[]byte(`{"json_type":{"foo": "bar"}}`),
		[]byte(`{"json_type":{"key": "value"}}`),
		[]byte(`{"json_type":"a string"}`),
		[]byte(`{"json_type":{"outer": {"a": "value", "b": true, "c": 1, "d":[1, 2, 3]}}}`),
	}

Describe alternatives you've considered

Converting json objects to strings, but that requires custom data handling per message type.

Additional context

N/A

@leclark leclark added the triage me I really want to be triaged. label Sep 27, 2024
@leclark leclark changed the title packagename: short description of feature request bigquery: support for nested JSON types Sep 27, 2024
@product-auto-label product-auto-label bot added the api: bigquery Issues related to the BigQuery API. label Sep 28, 2024
@alvarowolfx alvarowolfx added priority: p2 Moderately-important priority. Fix may not be included in next release. and removed triage me I really want to be triaged. labels Sep 30, 2024
@alvarowolfx
Copy link
Contributor

alvarowolfx commented Oct 1, 2024

@leclark thanks for the report.

BigQuery JSON fields should be set with a valid JSON String, which is the same format that we need to send via the Storage Write API (some details on field conversions). For nested JSON objects to work in the example that was provided, the protobuf field type would have to be google.protobuf.Any or message, which would not be accepted by the Storage Write API, as it would be a mismatch from the table schema.

A workaround would be to have a custom type to insert data that converts the given field from a map[string]any to a escaped string like this:

package main

import (
	"encoding/json"
	"fmt"
)

type Example struct {
	JsonField JsonFieldType `json:"json_field"`
}

type JsonFieldType map[string]any

func (f JsonFieldType) MarshalJSON() ([]byte, error) {
	out, err := json.Marshal(map[string]any(f))
	if err != nil {
		return nil, err
	}
	// escape string
	return json.Marshal(string(out))
}

func main() {
	value := map[string]any{
		"outer": map[string]any{
			"a": "value",
			"b": true,
			"c": 1,
			"d": []int{1, 2, 3, 4},
		},
	}
	ex := &Example{JsonField: value}
	out, _ := json.Marshal(ex)
	fmt.Println(string(out))
}
// OUTPUT: {"json_field":"{\"outer\":{\"a\":\"value\",\"b\":true,\"c\":1,\"d\":[1,2,3,4]}}"}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the BigQuery API. priority: p2 Moderately-important priority. Fix may not be included in next release.
Projects
None yet
Development

No branches or pull requests

2 participants