Frequently Asked Questions

From CompSemWiki
Jump to navigationJump to search

What sections are used from training, development, and testing on the Penn Treebank?

This information comes courtesy of Nianwen Bert Xue:

The standard Dev set is Section 1 and the standard test set is Section 23. Most people don't use 24 and 25. Mitch's explanation, which I think is a plausible one, is that Section 23 is more "mature" annotation since it was done after the annotators had been well-trained, vs Section 00 where the annotators had just started learning to annotation.

  • Training: sections 02-22
  • Testing: section 23
  • Development: section 01