-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AVRO-4026: [c++] Add new methods to CustomAttributes to allow non-string custom attributes in schemas #3266
base: main
Are you sure you want to change the base?
Changes from all commits
ab67c99
07a2f4d
6740ac4
3820153
005e082
2fd25e1
0f0efd2
e95f691
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -16,26 +16,63 @@ | |
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
#include "CustomAttributes.hh" | ||
#include "Exception.hh" | ||
|
||
#if defined(__clang__) | ||
// Even though CustomAttributes::ValueMode::STRING is deprecated, we still have to | ||
// handle/implement it. | ||
#pragma clang diagnostic ignored "-Wdeprecated-declarations" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it better to add |
||
#endif | ||
|
||
#include <map> | ||
#include <memory> | ||
|
||
#include "CustomAttributes.hh" | ||
#include "Exception.hh" | ||
|
||
#include "json/JsonDom.hh" | ||
|
||
namespace avro { | ||
|
||
CustomAttributes::CustomAttributes(ValueMode valueMode) { | ||
switch (valueMode) { | ||
case ValueMode::STRING: | ||
case ValueMode::JSON: | ||
valueMode_ = valueMode; | ||
break; | ||
default: | ||
throw Exception("invalid ValueMode: " + std::to_string(static_cast<int>(valueMode))); | ||
} | ||
} | ||
|
||
std::optional<std::string> CustomAttributes::getAttribute(const std::string &name) const { | ||
std::optional<std::string> result; | ||
std::map<std::string, std::string>::const_iterator iter = | ||
attributes_.find(name); | ||
auto iter = attributes_.find(name); | ||
if (iter == attributes_.end()) { | ||
return result; | ||
return {}; | ||
} | ||
result = iter->second; | ||
return result; | ||
return iter->second; | ||
} | ||
|
||
void CustomAttributes::addAttribute(const std::string &name, | ||
const std::string &value) { | ||
// Validate the incoming value. | ||
// | ||
// NOTE: This is a bit annoying that we accept the data as a string instead of | ||
// as an Entity. That means the compiler must convert the value to a string only | ||
// for this method to convert it back. But we can't directly refer to the | ||
// json::Entity type in the signatures for this class (and thus cannot accept | ||
// that type directly as a parameter) because then it would need to be included | ||
// from a header file: CustomAttributes.hh. But the json header files are not | ||
// part of the Avro distribution (intentionally), so CustomAttributes.hh cannot | ||
// #include any of the json header files. | ||
const std::string &jsonVal = (valueMode_ == ValueMode::STRING) | ||
? std::move("\"" + value + "\"") | ||
: value; | ||
try { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it expensive to validate the input? If we don't do this, what will happen for malformed input? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we don't validate it, then we could end up generating invalid Avro files. I suppose we could also make a private method that skips the validation and declare the Compiler as a friend class that can invoke it. From the compiler, the values will have already been validated when the JSON schema was originally parsed, so it would be fine to skip in those cases. The existing "string only" usage is error-prone enough, since it still requires that interior quotes and newlines be escaped (since all it does is add a leading and trailing |
||
json::loadEntity(jsonVal.c_str()); | ||
} catch (json::TooManyValuesException &e) { | ||
throw Exception("string has malformed or missing escapes"); | ||
} | ||
|
||
auto iter_and_find = | ||
attributes_.insert(std::pair<std::string, std::string>(name, value)); | ||
if (!iter_and_find.second) { | ||
|
@@ -45,9 +82,15 @@ void CustomAttributes::addAttribute(const std::string &name, | |
|
||
void CustomAttributes::printJson(std::ostream &os, | ||
const std::string &name) const { | ||
if (attributes().find(name) == attributes().end()) { | ||
auto iter = attributes_.find(name); | ||
if (iter == attributes_.end()) { | ||
throw Exception(name + " doesn't exist"); | ||
} | ||
os << "\"" << name << "\": \"" << attributes().at(name) << "\""; | ||
os << json::Entity(std::make_shared<std::string>(name)).toString() << ": "; | ||
if (valueMode_ == ValueMode::STRING) { | ||
os << "\"" << iter->second << "\""; | ||
} else { | ||
os << iter->second; | ||
} | ||
} | ||
} // namespace avro |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -92,7 +92,11 @@ Entity loadEntity(const char *text) { | |
Entity loadEntity(InputStream &in) { | ||
JsonParser p; | ||
p.init(in); | ||
return readEntity(p); | ||
Entity e = readEntity(p); | ||
if (p.hasMore()) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why not checking it in the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just from the names and signatures, it looked like But for this method, the caller can't safely do anything with the input after it's called, so seemed more intuitive that it should enforce that the entire stream is consumed. None of these have doc comments, so it wasn't clear the intent everywhere, so I left things flexible and only changed this function, which appears safe to change without breaking any caller assumptions. |
||
throw TooManyValuesException(); | ||
} | ||
return e; | ||
} | ||
|
||
Entity loadEntity(const uint8_t *text, size_t len) { | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -177,6 +177,12 @@ AVRO_DECL Entity loadEntity(const uint8_t *text, size_t len); | |
|
||
void writeEntity(JsonGenerator<JsonNullFormatter> &g, const Entity &n); | ||
|
||
class AVRO_DECL TooManyValuesException : public virtual std::runtime_error { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't see any other exception is defined except the one in the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Well, we catch this type of exception. If we just threw |
||
public: | ||
explicit TooManyValuesException() : | ||
std::runtime_error("invalid JSON document: expecting a single JSON value but found more than one") {} | ||
}; | ||
|
||
} // namespace json | ||
} // namespace avro | ||
|
||
|
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -49,6 +49,28 @@ char JsonParser::next() { | |||||
return ch; | ||||||
} | ||||||
|
||||||
bool JsonParser::hasMore() { | ||||||
if (peeked || hasNext && !isspace(nextChar)) { | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
return true; | ||||||
} | ||||||
if (!hasNext) { | ||||||
nextChar = ' '; | ||||||
} | ||||||
// We return true if there are any more tokens; we ignore | ||||||
// any trailing whitespace. | ||||||
while (isspace(nextChar)) { | ||||||
if (nextChar == '\n') { | ||||||
line_++; | ||||||
} | ||||||
if (!in_.hasMore()) { | ||||||
return false; | ||||||
} | ||||||
nextChar = in_.read(); | ||||||
} | ||||||
hasNext = true; | ||||||
return true; | ||||||
} | ||||||
|
||||||
void JsonParser::expectToken(Token tk) { | ||||||
if (advance() != tk) { | ||||||
if (tk == Token::Double) { | ||||||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -116,6 +116,8 @@ public: | |
return curToken; | ||
} | ||
|
||
bool hasMore(); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it possible not to add |
||
|
||
void expectToken(Token tk); | ||
|
||
bool boolValue() const { | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -19,27 +19,62 @@ | |
#ifndef avro_CustomAttributes_hh__ | ||
#define avro_CustomAttributes_hh__ | ||
|
||
#include "Config.hh" | ||
#include <iostream> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: is it possible to remove this line? It looks weird that a public header exposes iostream. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think so. One of the methods of
|
||
#include <map> | ||
#include <optional> | ||
#include <string> | ||
|
||
#include "Config.hh" | ||
|
||
namespace avro { | ||
|
||
// CustomAttributes class stores avro custom attributes. | ||
// Each attribute is represented by a unique name and value. | ||
// User is supposed to create CustomAttributes object and then add it to Schema. | ||
class AVRO_DECL CustomAttributes { | ||
|
||
public: | ||
// Retrieves the custom attribute json entity for that attributeName, returns an | ||
// null if the attribute doesn't exist. | ||
enum class ValueMode : uint8_t { | ||
// When a CustomAttributes is created using this mode, all values are expected | ||
// to be strings. The value should not be quoted, but any interior quotes and | ||
// special characters (such as newlines) must be escaped. | ||
STRING | ||
[[deprecated("The JSON ValueMode is less error-prone and less limited.")]], | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Perhaps add (in the comment) that it has been deprecated since 1.3.0 and will be removed in XXX version? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have no idea what to put for "XXX" version. Is there a standard support policy for deprecating things and making backwards-breaking changes like this? |
||
|
||
// When a CustomAttributes is created using this mode, all values are formatted | ||
// JSON values. So string values must be quoted and escaped. | ||
JSON | ||
}; | ||
|
||
// Creates a new CustomAttributes object where all values are strings. | ||
// All values passed to addAttribute() and returned from getAttribute() or the | ||
// attributes() map will not be enclosed in quotes. However, any internal quotes | ||
// WILL be escaped and other special characters MAY be escaped. | ||
// | ||
// To support non-string values, one must instead use | ||
// CustomAttributes(CustomAttributes::ValueMode::JSON) | ||
[[deprecated("Use CustomAttributes(ValueMode) instead.")]] | ||
CustomAttributes() : CustomAttributes(ValueMode::STRING) {} | ||
|
||
// Creates a new CustomAttributes object with the given value mode. | ||
CustomAttributes(ValueMode valueMode); | ||
|
||
// Retrieves the custom attribute string for the given name. Returns an empty | ||
// value if the attribute doesn't exist. | ||
// | ||
// See ValueMode for details on the format of the returned value. | ||
std::optional<std::string> getAttribute(const std::string &name) const; | ||
|
||
// Adds a custom attribute. If the attribute already exists, throw an exception. | ||
// Adds a custom attribute. | ||
// | ||
// See ValueMode for details on the require format of the value parameter. | ||
// | ||
// If the attribute already exists or if the given value is not a valid JSON | ||
// document or not a correctly escaped string, throw an exception. | ||
void addAttribute(const std::string &name, const std::string &value); | ||
|
||
// Provides a way to iterate over the custom attributes or check attribute size. | ||
// The values in this map are the same as those returned from getAttribute. | ||
const std::map<std::string, std::string> &attributes() const { | ||
return attributes_; | ||
} | ||
|
@@ -48,6 +83,7 @@ public: | |
void printJson(std::ostream &os, const std::string &name) const; | ||
|
||
private: | ||
ValueMode valueMode_; | ||
std::map<std::string, std::string> attributes_; | ||
}; | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we change this? Is it a breaking change to downstream users who are calling
stringValue
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this is a method on
json::Entity
which I did not touch. This was the source of the exception: if there was a non-string value for a custom attribute, thestringValue
method would throw an exception. But now that we support all kinds of values, we use itstoString
method, which will return properly formatted JSON string to represent whatever is theEntity
's value.