Answers to Some Procedural and Performance Score Questions

To ensure that all CSI folks have ready access to these questions from Chris Higginbotham and my responses, I am posting them directly on the blog.

Scott

Hi Chris,

Sure. You ask some great questions (in blue below; which are echoed by Nancy Ungvarsky). Thanks for your attention to procedural details. I’ll insert my responses below. Also noticed that you posted these questions on Facebook, so I will respond there and on the blog as well to the benefit of the group. My apologies for what I anticipate may be a lengthy response.

I have a couple of questions regarding identification and the scoring, too, if that’s ok.
First, if an animal is not visible in the photo, but it IS present, should we still ID it? For example, if viewing consecutive images shows that it was hidden behind the pile, but was definitely present, do we still score it as being there? I’ve had a couple of images like that…like a photo that looks like nothing is on the pile, but the next one shows a crow sticking it’s head up from behind. I called that one crow, because it was there, just hidden. Or should I just call that a “None”?

Take home: to count it, you must see it (at least partially). Our standard operating procedure is to count an animal only if it is visible in the image. Even if it is just slightly visible (e.g., the very tip of a tail as the animal is leaving the scene or only the the very crown of the head extending above the pile surface) in your focal image (#1 of the consecutive images), it counts. Animals anywhere in the image (on pile, background, foreground) count as long as they are at least partially visible in your focal image. Thus, you have to see it (at least partially) to count it. So even though the animal that you describe above is in all likelihood behind the pile, we would score it as “none,” if it were completely invisible. The fact that there was an invisible animal (recorded as “none”) in that focal image will come out in the wash when we consider the sequential data, stringing together the individual images, applying our rules defining independent encounters.

Second, does any discrepancy between the # of images viewed and the contribution mean that I’ve made that many mistakes?

Not necessarily! The derivation of the accuracy potential which is used to calculate the contribution value is somewhat involved, so I beg your pardon for the long-winded explanation that follows. The accuracy potential is not a perfect reflection of accuracy, by any means, but is the best on-the-fly surrogate that we have developed to date. The ideal reflection of accuracy would result from a group of trained ecologists independently viewing and then reaching a consensus on the categorization of each image. These would serve as a benchmark by which a contributor’s accuracy could be established. Given that the images that are being viewed in CSI this summer are being categorized for the first time by you folks we don’t have the luxury of an established benchmark. We will ultimately select a random sample and compare it to a lab-determined benchmark to verify accuracy, but on-the-fly we must take advantage of our previously established relationship between the agreement threshold and accuracy: Images that have reached the 80% agreement threshold are accurately identified (compared to the lab-determined benchmark) 94% of the time. So to calculate your accuracy potential, we need to consider two variables, CompletedImages and AgreedImageConcur.

CompletedImages = number of completed images [i.e. that have overall reached a total of five views] for a particular user. This metric is dynamic, with a likelihood of changing after the user has completed a particular session. For example, if User X represents the third viewer of an image, then two more viewers will need to categorize this image before it gets tallied as a completed image for its five viewers (including User X).

AgreedImageConcur = number of “agreed upon” completed images [i.e., ≥ 80% agreement (4/5 or 5/5 agreement)] for which User X is in agreement with the majority categorization.

Accuracy Potential = AgreedImageConcur / CompletedImages (The accuracy potential is thus a value ranging from 0.0 – 1.0)

Contribution Value = Cumulative number of images categorized * Accuracy Potential

In an ideal world,where there is little variation in the ability of observers and all observers are careful, the Accuracy Potential should be a great estimate of all users’ accuracy. However, in reality, it may somewhat underestimate the accuracy of particularly acute observers, since these people may detect hidden animals that a more typical observer may miss. As a result acute viewers will have a lower AgreedImageConcur value. I know this from personal experience. I have had a fair amount of practice at this categorization process and, using consecutive images, although I am by no means perfect, I am pretty good at detecting and identifying obscured animals. Practice pays! Consequently my assessment often differs from the majority view, and thus my accuracy potential is below average. We’re not talking a huge discrepancy, something in the neighborhood between 0.05 and 0.10. Thus, in a CSI competition, I would need to compensate by viewing some additional images.

All this said, like any averaged value, the meaningfulness of the Accuracy Potential is directly proportional the magnitude of its base. As an analogy, I would not place much credence in the .400 batting average of a major league baseball player in the first week of the season. However, if after a full season, in late September this player is hitting .400, I would be in awe. Likewise, don’t be overly concerned early on about big fluctuations in your accuracy potential (as reflected by your contribution value).

This concerns me, because I think I’ve been very careful and thorough in my IDs, so if seeing 40 images viewed and a contribution of 36 means I’ve made 4 errors, I’d like to know where I’m missing things. If this is the case, is there any way we can see where we make mistakes to help us improve? Knowing our errors would make us more aware of them so they aren’t repeated.

Although it would certainly be a useful learning tool, at this point there is no way to flag mistakes, once you have passed the quiz and entered the actual research mode. One goal this summer is to revamp the quiz mode, basing it on near-video images (for which the consecutive image feature will be more akin to what you experience in research mode). As it stands now, once you pass the quiz you are barred from quiz mode forever and can’t take advantage of its teaching tools to help you learn from mistakes. As part of the revamp, I would like to create a new practice mode which could be used after successful completion of the quiz. Both quiz and practice mode would have lab-established benchmarks as references, as quiz mode does currently.

Hope this helps.

Thanks so much for your involvement and the care that you’re showing as you’re engaged in the analysis.

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *