“Pearson’s r, a correlation coefficient ranging from -1 to 1 that measures the correlation between a confidence score and whether or not the field (or record) is correctly labeled.”
“average precision, used in the Information Retrieval community… the precision at each point in the ranked list where a relevant document is found and then averages these values. Instead of ranking documents by their relevance score, here we rank fields (and records) by their confidence score, where a correctly labeled field is analogous to a relevant document”
“accuracy-coverage graph. Better confidence estimates push the curve to the upper-right” Precision-recall curve. See fig 1.
2018 - Confidence Modeling for Neural Semantic Parsing Measures “the relationship between confidence scores and F1 using Spearman’s ρ correlation coefficient which varies between −1 and 1 (0 implies there is no correlation).”