Fused tricyclic organic compounds are an important class of organic electronic materials. In designing molecules for organic electronics, knowing what chemical structure that be used to tune the molecular property is one of the keys that can help to improve the material performance. In this research, we applied machine learning and data analytic approaches in addressing this problem. The energy states (Lowest Unoccupied Molecular Orbital (HOMO), Highest Occupied Molecular Orbitals (LUMO), singlet (Es) and triplet (ET) energy) of more than 10 thousand fused tricyclics are calculated. Corresponding descriptors are also generated. We find that the Coulomb matrix is a poorer descriptor than high-level descriptors in a multilayer perceptron neural network. Correlations as high as 0.95 is obtained using a multilayer perceptron neural network with Mean Absolute Error as low as 0.08 eV. The descriptors that are important in tuning the energy levels are revealed using the Random Forest algorithm. Correlations of such descriptors are also plotted. We found that the higher the number of tertiary amines, the deeper are the HOMO and LUMO levels. The presence of NN in the aromatic rings can be used to tune the ES. However, there is no single dominant descriptor that can be correlated with the ET. A collection of descriptors is found to give a far better correlation with ET. This research demonstrated that machine learning and data analytics in guiding how certain chemical substructures correlate with the molecule energy states.
* Title and MeSH Headings from MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine.