OntoHierarchy
From CompSemWiki
Jump to navigationJump to searchDirectory Hierarchy
- Root directory
- Original copy: CORPORA = https://verbs.colorado.edu/svn/ontonotes/corpora/
- Local copy: CORPORA = /home/verbs/shared/cleardata/
- LANGUAGE = english | chinese | arabic
- Treebank directory
- $CORPORA/$LANGUAGE/annotations/parse/
- Propbank directory
- $CORPORA/$LANGUAGE/annotations/prop/
- Sense direcotry
- $CORPORA/$LANGUAGE/annotations/sense/
- Raw directory 1 (all tokens including traces)
- $CORPORA/$LANGUAGE/annotations/tokens/
- Raw directory 2 (all tokens excluding traces)
- $CORPORA/$LANGUAGE/annotations/words/
- Frameset directory
- $CORPORA/$LANGUAGE/metadata/frames/
English Treebank Hierarchy
- $CORPORA/english/parse/$GENRE/$SOURCE/$SECTION/
- GENRE = bc | bn | mz | nw | wb
- SOURCE = source of the corpus (e.g. wsj | sinorama | etc.)
- SECTION = ##
- $CORPORA/english/parse/
- bc/: broadcasting conversations
- cctv/
- ccn/
- ebc/
- msnbc/
- phoenix/
- p2.5_a2e/
- p2.5_c2e/
- bn/: broadcasting news
- abc/
- cnn/
- mnb/
- nbc/
- pri/
- voa/
- p2.5_a2e/
- p2.5_c2e/
- mz/: magazines
- sinorama/
- nw/: newswires
- xinhua/
- wsj/
- p2.5_a2e/
- p2.5_c2e/
- wb/: webtexts
- ng_a2e/
- ng_c2e/
- ng_eng/
- ng_p2.5_a2e/
- ng_p2.5_c2e/
- wl_c2e/
- wl_eng/
- wl_p2.5_a2e/
- wl_p2.5_c2e/
- bc/: broadcasting conversations