This is evaluation data sets for a hashtag segmentor. They are manually segmented during our research at TABI Lab in Bogazici University.
Dev-BOUN Development set that includes 500 manually segmented hashtags. These are selected from tweets about movies, tv shows, popular people, sports teams etc. Test-BOUN Test set that includes 500 manually segmented hashtags. These are selected from tweets about movies, tv shows, popular people, sports teams etc. Dev-Stanford Development set that includes 1000 manually segmented distinct hashtags as well as multiple possible segmentations for some of hashtags. These are selected from Stanford Sentiment Tweet Set.
Please look at http://tabilab.cmpe.boun.edu.tr/projects/hashtag_segmentation/ for more.