reaction.proto
Schema for the Open Reaction Database.
Compound
| Field |
Type |
Label |
Description |
| identifiers |
CompoundIdentifier |
repeated |
Set of identifiers used to uniquely define this compound. Solutions or mixed compounds should use the NAME identifier and list all constituent compounds in the "components" field. |
| mass |
Mass |
|
|
| moles |
Moles |
|
|
| volume |
Volume |
|
|
| reaction_role |
Compound.ReactionRole.ReactionRoleType |
|
|
| is_limiting |
bool |
|
Whether this species was intended to be a limiting reactant. |
| preparation |
CompoundPreparation |
|
|
| vendor_source |
string |
|
Name of the vendor or supplier the compound was purchased from. |
| vendor_id |
string |
|
Compound ID in the vendor database or catalog. |
| vendor_lot |
string |
|
Batch/lot identification. |
| features |
Compound.Feature |
repeated |
|
Compound.Feature
Compounds can accommodate any number of features. These may include simple
properties of the compound (e.g., molecular weight), heuristic estimates
of physical properties (e.g., ClogP), optimized geometries (e.g., through
DFT), and calculated stereoselectronic descriptors.
CompoundIdentifier
Compound identifiers uniquely define a single (pure) chemical species.
While we encourage the use of SMILES strings, these do not work well in
all cases (e.g., handling tautomerism, axial chirality). Multiple
identifiers may be specified for a single compound to avoid ambiguity.
We discourage chemicals from being defined only by a name. For compounds
that are prepared or isolated as salts, the identifier should include
specification of which salt.
CompoundPreparation
Compounds may undergo additional preparation before being used in a
reaction after being received from a supplier or vendor. We encourage
the use of the ‘preparation’ enum when possible, even if the description
is an oversimplification of the full procedure, which can be described
in the ‘details’ field.
Data
Data is a container for arbitrary string or bytes data.
| Field |
Type |
Label |
Description |
| value |
string |
|
|
| bytes_value |
bytes |
|
|
| url |
string |
|
URL for data stored elsewhere. |
| description |
string |
|
|
| format |
string |
|
Description of the file format (if applicable); usually the file extension. For example, 'png' or 'tiff' for images. If empty, we assume string data. |
DateTime
TODO(ccoley): If we want the DateTime to be a string that we parse as
needed, should it simply be “string datetime” when used? Or is there any
benefit to having a separate message type that could be changed in the
future if needed?
| Field |
Type |
Label |
Description |
| value |
string |
|
|
ElectrochemistryConditions
ElectrochemistryConditions.Measurement
Length
| Field |
Type |
Label |
Description |
| value |
float |
|
|
| precision |
float |
|
Precision of the measurement (with the same units as value). |
| units |
Length.LengthUnit |
|
|
Mass
| Field |
Type |
Label |
Description |
| value |
float |
|
|
| precision |
float |
|
Precision of the measurement (with the same units as value). |
| units |
Mass.MassUnit |
|
|
Moles
| Field |
Type |
Label |
Description |
| value |
float |
|
|
| precision |
float |
|
Precision of the measurement (with the same units as value). |
| units |
Moles.MolesUnit |
|
|
Percentage
Used for things like conversion and yield.
| Field |
Type |
Label |
Description |
| value |
float |
|
|
| precision |
float |
|
Precision of the measurement (with the same units as value). |
PressureConditions.Atmosphere
PressureConditions.Measurement
PressureConditions.PressureControl
Reaction
Throughout this schema, we introduce enums to encourage consistency in
nomenclature and to avoid unnecessary downstream data processing that would
otherwise be required to consolidate equivalent entries. However, we do
not wish to restrict what users are able to specify if their synthesis
does not fit cleanly into a pre-existing enum field. For that reason, many
enums contain a CUSTOM field, which must be accompanied by setting the
‘details’ field (or ‘<field_name>_details’, where appropriate).
NOTE(kearnes): In many places, we deliberately violate the style guide for
enums by nesting instead of prefixing; this is not done lightly. The primary
consideration is API consistency and the ability to use unqualified strings
as enum values. For instance, we want ‘CUSTOM’ to be a valid value for all
enums that support custom types.
ReactionAnalysis.ProcessedDataEntry
| Field |
Type |
Label |
Description |
| key |
string |
|
|
| value |
Data |
|
|
ReactionAnalysis.RawDataEntry
| Field |
Type |
Label |
Description |
| key |
string |
|
|
| value |
Data |
|
|
ReactionIdentifier
Reaction identifiers define descriptions of the overall reaction.
While we encourage the use of SMILES strings, these do not work well in
all cases. The <reaction_smiles> field should be able to be derived
from the information present in the ReactionInput and ReactionOutcome
fields of any Reaction message.
ReactionNotes
| Field |
Type |
Label |
Description |
| is_heterogeneous |
bool |
|
Equivalent to "not single phase". |
| is_exothermic |
bool |
|
Qualitative exothermicity (primarily for safety). |
| is_offgasses |
bool |
|
Qualitative offgassing (primarily for safety). |
| is_sensitive_to_moisture |
bool |
|
|
| is_sensitive_to_oxygen |
bool |
|
|
| is_sensitive_to_light |
bool |
|
|
| safety_notes |
string |
|
|
| procedure_details |
string |
|
Overflow field for full procedure details |
ReactionObservation
| Field |
Type |
Label |
Description |
| time |
Time |
|
|
| comment |
string |
|
e.g. what color is the reaction? |
| image |
Data |
|
|
ReactionOutcome
The outcomes of a reaction describe the conversion, yield, and/or other
analyses of the resulting product mixture after workup step(s). Each
outcome is associated with a reaction/residence time. To allow for
one Reaction message to contain the results of a full kinetic profiling
experiment, this is a repeated field of the Reaction message.
It is the parent message for product characterization and any analytical
data.
| Field |
Type |
Label |
Description |
| reaction_time |
Time |
|
Reaction time (for flow, equivalent to residence time or spacetime). |
| conversion |
Percentage |
|
Conversion with respect to the limiting reactant. |
| products |
ReactionProduct |
repeated |
|
| analyses |
ReactionOutcome.AnalysesEntry |
repeated |
Analyses are stored in a map to associate each with a unique key. The key is cross-referenced in ReactionProduct messages to indicate which analyses were used to derive which performance values/metrics. The string used for the key carries no meaning outside of this cross-referencing. |
ReactionOutcome.AnalysesEntry
ReactionProduct
| Field |
Type |
Label |
Description |
| compound |
Compound |
|
|
| is_desired_product |
bool |
|
|
| compound_yield |
Percentage |
|
|
| purity |
Percentage |
|
|
| selectivity |
Selectivity |
|
|
| analysis_identity |
string |
repeated |
Key(s) of the analysis used to confirm identity. |
| analysis_yield |
string |
repeated |
Key(s) of the analysis used to assess yield. |
| analysis_purity |
string |
repeated |
Key(s) of the analysis used to assess purity. |
| analysis_selectivity |
string |
repeated |
Key(s) of the analysis used to assess selectivity |
| isolated_color |
string |
|
TODO(ccoley): How to allow specification of the state of matter of the purified compound? For example, "___ was recovered as a white powder in x% yield (y.z mg)". Or oils, crystal texture, etc. This is only relevant for compounds that are isolated. TODO(kearnes): Should this be an Observation message? |
| texture |
ReactionProduct.Texture.TextureType |
|
|
| texture_details |
string |
|
|
ReactionProvenance.RecordEvent
Metadata for the public database.
ReactionSetup
| Field |
Type |
Label |
Description |
| vessel |
Vessel |
|
|
| is_automated |
bool |
|
Specification of automated protocols. |
| automation_platform |
string |
|
Automated platform name, brand, or model number. |
| automation_code |
ReactionSetup.AutomationCodeEntry |
repeated |
Raw automation code or synthetic recipe definition. |
ReactionSetup.AutomationCodeEntry
| Field |
Type |
Label |
Description |
| key |
string |
|
|
| value |
Data |
|
|
StirringConditions.StirringMethod
StirringConditions.StirringRate
TemperatureConditions.Measurement
TemperatureConditions.TemperatureControl
Time
To allow users to describe synthetic processes in whatever units they find
most natural, we define a fixed list of allowable units for each measurement
type. Upon submission to a centralized database, or using a validation and
canonicalization script, we will convert all values to the default units
(the first nonzero item in each enum).
Each message also contains a precision field, which specifies the precision
of the measurement in the same units as the measurement itself. Often the
precision will be the standard deviation from an instrument calibration.
| Field |
Type |
Label |
Description |
| value |
float |
|
|
| precision |
float |
|
Precision of the measurement (with the same units as value). |
| units |
Time.TimeUnit |
|
|
Volume
| Field |
Type |
Label |
Description |
| value |
float |
|
|
| precision |
float |
|
Precision of the measurement (with the same units as value). |
| units |
Volume.VolumeUnit |
|
|
Compound.ReactionRole.ReactionRoleType
| Name |
Number |
Description |
| UNSPECIFIED |
0 |
|
| REACTANT |
1 |
TODO(ccoley): Do we want to use the definition of a reactant aligned with Reaxys, or say that any species that contributes heavy atoms to a desired product is a reactant? This field might be kind of a throwaway anyway... |
| REAGENT |
2 |
|
| SOLVENT |
3 |
|
| CATALYST |
4 |
|
| WORKUP |
5 |
|
CompoundIdentifier.IdentifierType
| Name |
Number |
Description |
| UNSPECIFIED |
0 |
|
| CUSTOM |
1 |
|
| SMILES |
2 |
Simplified molecular-input line-entry system. |
| INCHI |
3 |
IUPAC International Chemical Identifier. |
| MOLBLOCK |
4 |
Molblock from a MDL Molfile V3000. |
| IUPAC_NAME |
5 |
Chemical name following IUPAC nomenclature recommendations. |
| NAME |
6 |
Any accepted common name, trade name, etc. |
| CAS_NUMBER |
7 |
Chemical Abstracts Service Registry Number (with hyphens). |
| PUBCHEM_CID |
8 |
PubChem Compound ID number. |
| CHEMSPIDER_ID |
9 |
ChemSpider ID number. |
| CXSMILES |
10 |
ChemAxon extended SMILES |
| INCHI_KEY |
11 |
IUPAC International Chemical Identifier key |
| XYZ |
12 |
XYZ molecule file |
| UNIPROT_ID |
13 |
UniProt ID (for enzymes) |
| PDB_ID |
14 |
Protein data bank ID (for enzymes) |
| RDKIT_BINARY |
15 |
RDKit binary format (for fast loading) |
CompoundPreparation.PreparationType
| Name |
Number |
Description |
| UNSPECIFIED |
0 |
|
| CUSTOM |
1 |
|
| NONE |
2 |
Compound used as received. |
| REPURIFIED |
3 |
Compound repurified (e.g., recrystallized). |
| SPARGED |
4 |
Compound sparged, most likely to be the case with solvents. |
| DRIED |
5 |
Moisture removed, e.g., using molecular sieves. |
| SYNTHESIZED |
6 |
Compound synthesized in-house |
Concentration.ConcentrationUnit
| Name |
Number |
Description |
| UNSPECIFIED |
0 |
|
| MOLAR |
1 |
|
| MILLIMOLAR |
2 |
|
| MICROMOLAR |
3 |
|
Current.CurrentUnit
| Name |
Number |
Description |
| UNSPECIFIED |
0 |
|
| AMPERE |
1 |
|
| MILLIAMPERE |
2 |
|
ElectrochemistryConditions.ElectrochemistryType
| Name |
Number |
Description |
| UNSPECIFIED |
0 |
|
| CUSTOM |
1 |
|
| CONSTANT_CURRENT |
2 |
|
| CONSTANT_VOLTAGE |
3 |
|
FlowConditions.FlowType
| Name |
Number |
Description |
| UNSPECIFIED |
0 |
|
| CUSTOM |
1 |
|
| PLUG_FLOW_REACTOR |
2 |
|
| CONTINUOUS_STIRRED_TANK_REACTOR |
3 |
|
| PACKED_BED_REACTOR |
4 |
|
FlowConditions.Tubing.TubingMaterialType
| Name |
Number |
Description |
| UNSPECIFIED |
0 |
|
| CUSTOM |
1 |
|
| STEEL |
2 |
|
| COPPER |
3 |
|
| PFA |
4 |
|
| FEP |
5 |
|
| TEFLONAF |
6 |
|
| PTFE |
7 |
|
| GLASS |
8 |
|
| QUARTZ |
9 |
|
| SILICON |
10 |
e.g., a chip-based microreactor |
| PDMS |
11 |
|
FlowRate.FlowRateUnit
| Name |
Number |
Description |
| UNSPECIFIED |
0 |
|
| MICROLITER_PER_MINUTE |
1 |
|
| MICROLITER_PER_SECOND |
2 |
|
| MILLILITER_PER_MINUTE |
3 |
|
| MILLILITER_PER_SECOND |
4 |
|
| MICROLITER_PER_HOUR |
5 |
|
IlluminationConditions.IlluminationType
| Name |
Number |
Description |
| UNSPECIFIED |
0 |
|
| CUSTOM |
1 |
|
| AMBIENT |
2 |
|
| DARK |
3 |
|
| LED |
4 |
|
| HALOGEN_LAMP |
5 |
|
| DEUTERIUM_LAMP |
6 |
|
| SOLAR_SIMULATOR |
7 |
|
| BROAD_SPECTRUM |
8 |
|
Length.LengthUnit
| Name |
Number |
Description |
| UNSPECIFIED |
0 |
|
| CENTIMETER |
1 |
|
| MILLIMETER |
2 |
|
| METER |
3 |
|
| INCH |
4 |
|
| FOOT |
5 |
|
Mass.MassUnit
| Name |
Number |
Description |
| UNSPECIFIED |
0 |
|
| GRAM |
1 |
|
| MILLIGRAM |
2 |
|
| MICROGRAM |
3 |
|
| KILOGRAM |
4 |
|
Moles.MolesUnit
| Name |
Number |
Description |
| UNSPECIFIED |
0 |
|
| MOLES |
1 |
|
| MILLIMOLES |
2 |
|
| MICROMOLES |
3 |
|
| NANOMOLES |
4 |
|
Pressure.PressureUnit
| Name |
Number |
Description |
| UNSPECIFIED |
0 |
|
| BAR |
1 |
|
| ATMOSPHERE |
2 |
|
| PSI |
3 |
|
| KPSI |
4 |
|
| PASCAL |
5 |
|
| KILOPASCAL |
6 |
|
PressureConditions.Atmosphere.AtmosphereType
| Name |
Number |
Description |
| UNSPECIFIED |
0 |
|
| CUSTOM |
1 |
|
| AIR |
2 |
|
| NITROGEN |
3 |
|
| ARGON |
4 |
|
| OXYGEN |
5 |
|
| HYDROGEN |
6 |
|
PressureConditions.Measurement.MeasurementType
TODO(ccoley) get input on how to expand this enum, among others
| Name |
Number |
Description |
| UNSPECIFIED |
0 |
|
| CUSTOM |
1 |
|
| PRESSURE_TRANSDUCER |
2 |
|
PressureConditions.PressureControl.PressureControlType
| Name |
Number |
Description |
| UNSPECIFIED |
0 |
|
| CUSTOM |
1 |
|
| AMBIENT |
2 |
|
| BALLOON |
3 |
|
| SEALED |
4 |
Fully sealed vessel (e.g., microwave vial). |
| SEPTUM_WITH_NEEDLE |
5 |
Slight positive pressure maintained |
| RELEASEVALVE |
6 |
|
| BPR |
7 |
Back pressure regulator, as used in flow synthesis. |
ReactionAnalysis.AnalysisType
TODO(ccoley): Solicit more feedback from experimentalists
| Name |
Number |
Description |
| UNSPECIFIED |
0 |
|
| CUSTOM |
1 |
|
| LC |
2 |
Liquid chromatography. |
| GC |
3 |
Gas chromatography. |
| IR |
4 |
Infrared spectroscopy. |
| NMR |
5 |
NMR spectroscopy. |
| MP |
6 |
Melting point characterization. |
| UV |
7 |
Ultraviolet spectroscopy. |
| TLC |
8 |
Thin-layer chromatography. |
| MS |
9 |
Mass spectrometry. |
| HRMS |
10 |
High resolution mass spectrometry. |
| MSMS |
11 |
Two-dimensional mass spectrometry. |
| WEIGHT |
12 |
Weight of an isolated compound. |
| LCMS |
13 |
Combined LC/MS. |
| GCMS |
14 |
Combined GC/MS. |
| ELSD |
15 |
Evaporative light scattering detector. |
| CD |
16 |
Circular Dichroism. |
| SFC |
17 |
Supercritical fluid chromatography. |
ReactionIdentifier.IdentifierType
Possible identifier types are listed in an enum for extensibility
| Name |
Number |
Description |
| UNSPECIFIED |
0 |
|
| CUSTOM |
1 |
|
| REACTION_SMILES |
2 |
|
| ATOM_MAPPED_SMILES |
3 |
|
| RINCHI |
4 |
Reaction InChI. |
| NAME |
5 |
Named reaction or reaction category. |
| RDKIT_BINARY |
6 |
RDKit binary format (for fast loading). |
ReactionProduct.Texture.TextureType
| Name |
Number |
Description |
| UNSPECIFIED |
0 |
|
| CUSTOM |
1 |
|
| POWDER |
2 |
|
| CRYSTAL |
3 |
|
| OIL |
4 |
|
ReactionWorkup.WorkupType
| Name |
Number |
Description |
| UNSPECIFIED |
0 |
|
| CUSTOM |
1 |
|
| ADDITION |
2 |
Addition (quench, dilution, extraction solvent, etc.) Specify composition/amount in "components". |
| TEMPERATURE |
3 |
Change of temperature. Specify conditions in "temperature". |
| CONCENTRATION |
4 |
Concentration step, often using a rotovap. |
| EXTRACTION |
5 |
Liquid extractions are often preceded by Additions. If there are multiple distinct additions prior to an extraction, it is assumed that the kept phases are pooled. Specify which phase to keep in "keep_phase". |
| FILTRATION |
6 |
Filtration (can keep solid or filtrate). Specify which phase to keep in "keep phase". |
| WASH |
7 |
Washing a solid or liquid, keeping the original phase. Specify "components" of rinse. Rinses performed in multiple stages should be given multiple workup steps |
| DRY_IN_VACUUM |
8 |
Dried under vacuum. |
| DRY_WITH_MATERIAL |
9 |
Dried with chemical additive. Specify chemical additive in "components". |
| FLASH_CHROMATOGRAPHY |
10 |
Purification by flash chromatography. |
| OTHER_CHROMATOGRAPHY |
11 |
Purification by other prep chromatography. |
| SCAVENGING |
12 |
Scavenging step (e.g., pass through alumina pad) Specify any material additives in "components". |
| WAIT |
13 |
Waiting step. Specify "duration". |
| STIRRING |
14 |
Mixing step. Specify "stirring" |
| CRYSTALLIZATION |
15 |
|
| PH_ADJUST |
16 |
pH adjustments should specify "components" to define species used as well as "ph" for target ph |
| DISSOLUTION |
17 |
Redissolution considered to be a special form of addition. Specify "components" |
Selectivity.SelectivityType
| Name |
Number |
Description |
| UNSPECIFIED |
0 |
|
| CUSTOM |
1 |
|
| EE |
2 |
Enantiomeric excess as a percentage. |
| ER |
3 |
Enantiomeric ratio. (x:1) |
| DE |
4 |
Diasteromeric ratio (x:1) |
StirringConditions.StirringMethod.StirringMethodType
| Name |
Number |
Description |
| UNSPECIFIED |
0 |
|
| CUSTOM |
1 |
|
| NONE |
2 |
|
| STIR_BAR |
3 |
|
| OVERHEAD_MIXER |
4 |
|
| AGITATION |
5 |
|
StirringConditions.StirringRate.StirringRateType
| Name |
Number |
Description |
| UNSPECIFIED |
0 |
|
| HIGH |
1 |
|
| MEDIUM |
2 |
|
| LOW |
3 |
|
Temperature.TemperatureUnit
| Name |
Number |
Description |
| UNSPECIFIED |
0 |
|
| CELSIUS |
1 |
|
| FAHRENHEIT |
2 |
|
| KELVIN |
3 |
|
TemperatureConditions.Measurement.MeasurementType
| Name |
Number |
Description |
| UNSPECIFIED |
0 |
|
| CUSTOM |
1 |
|
| THERMOCOUPLE_INTERNAL |
2 |
Physically in reaction solution. |
| THERMOCOUPLE_EXTERNAL |
3 |
On outside of vessel or, e.g., in oil bath. |
| INFRARED |
4 |
Contactless infrared probe. |
TemperatureConditions.TemperatureControl.TemperatureControlType
| Name |
Number |
Description |
| UNSPECIFIED |
0 |
|
| CUSTOM |
1 |
|
| AMBIENT |
2 |
|
| OIL_BATH |
3 |
|
| WATER_BATH |
4 |
|
| SAND_BATH |
5 |
|
| ICE_BATH |
6 |
|
| DRY_ALUMINUM_PLATE |
7 |
|
| MICROWAVE |
8 |
|
| DRY_ICE_BATH |
9 |
|
| AIR_FAN |
10 |
|
| LIQUID_NITROGEN |
11 |
|
Time.TimeUnit
| Name |
Number |
Description |
| UNSPECIFIED |
0 |
|
| HOUR |
1 |
|
| MINUTE |
2 |
|
| SECOND |
3 |
|
Vessel.VesselMaterial.VesselMaterialType
| Name |
Number |
Description |
| UNSPECIFIED |
0 |
|
| CUSTOM |
1 |
|
| GLASS |
2 |
|
| POLYPROPYLENE |
3 |
|
| PLASTIC |
4 |
|
Vessel.VesselPreparation.VesselPreparationType
| Name |
Number |
Description |
| UNSPECIFIED |
0 |
|
| CUSTOM |
1 |
|
| NONE |
2 |
|
| OVEN_DRIED |
3 |
|
Vessel.VesselType.VesselTypeEnum
| Name |
Number |
Description |
| UNSPECIFIED |
0 |
|
| CUSTOM |
1 |
|
| ROUND_BOTTOM_FLASK |
2 |
|
| VIAL |
3 |
|
| WELL_PLATE |
4 |
|
| MICROWAVE_VIAL |
5 |
|
| TUBE |
6 |
|
| CONTINUOUS_STIRRED_TANK_REACTOR |
7 |
|
| PACKED_BED_REACTOR |
8 |
|
Voltage.VoltageUnit
| Name |
Number |
Description |
| UNSPECIFIED |
0 |
|
| VOLT |
1 |
|
| MILLIVOLT |
2 |
|
Volume.VolumeUnit
| Name |
Number |
Description |
| UNSPECIFIED |
0 |
|
| MILLILITER |
1 |
|
| MICROLITER |
2 |
|
| LITER |
3 |
|
Wavelength.WavelengthUnit
| Name |
Number |
Description |
| UNSPECIFIED |
0 |
|
| NANOMETER |
1 |
|
| WAVENUMBER |
2 |
cm^{-1} |