<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>visible artifacts &#187; Summaries</title>
	<atom:link href="http://visual.placodermi.org/category/summaries/feed/" rel="self" type="application/rss+xml" />
	<link>http://visual.placodermi.org</link>
	<description></description>
	<lastBuildDate>Fri, 06 Jan 2012 17:19:18 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Summary: Awareness and coordination</title>
		<link>http://visual.placodermi.org/2009/02/16/summary-awareness-and-coordination/</link>
		<comments>http://visual.placodermi.org/2009/02/16/summary-awareness-and-coordination/#comments</comments>
		<pubDate>Tue, 17 Feb 2009 07:00:42 +0000</pubDate>
		<dc:creator>Chris Malek</dc:creator>
				<category><![CDATA[Summaries]]></category>

		<guid isPermaLink="false">http://visual.placodermi.org/?p=665</guid>
		<description><![CDATA[Summary of the article "Awareness and coordination in shared workspaces" ]]></description>
			<content:encoded><![CDATA[<p><img class="alignnone size-full wp-image-666" title="Awareness" src="http://visual.placodermi.org/wp-content/uploads/2009/02/awareness.jpg" alt="Awareness" width="840" height="630" />Summary of:</p>
<blockquote><p>Dourish, P. and Bellotti, V. (1992). Awareness and coordination in shared workspaces. In CSCW &#8216;92: <em>Proceedings of the 1992 ACM conference on Computer-supported cooperative work,</em> pages 107-114, New York, NY, USA. ACM Press.</p></blockquote>
<p>In this article, as the title says, the authors talk about awareness and coordination in shared workspaces, specifically in the context of collaborative writing.  By collaborative writing, we mean in this case that a team of people work together to write a single document.   By awareness, we&#8217;re talking about one team member&#8217;s awareness of what the other team members are doing, and what they have done.   This awareness has two aspects: the awareness of the <em>character</em> of another&#8217;s actions, and the awareness of the <em>content</em> of another&#8217;s actions.</p>
<p>Character refers to the kind of actions a person might perform in the context of the collaborative work: are they an editor, a reviewer, an author; what sections of the document are they responsible for; what part are they working on?   Character in this way is like the role that a person plays in the collaboration, although we should be careful when using the word &#8220;role&#8221;, since character and role are fluid, and can change as the collaboration progresses.   A person can use their awareness of the character of another&#8217;s actions in order to structure work and avoid duplication of effort.   The content refers to the work that a person has actually done: comments and annotations made; text written.  A person can use awareness of the content of another&#8217;s work to enable &#8220;fine-grained shared working and synergistic group behavior&#8221; (p. 112).</p>
<p>The purpose of awareness mechanisms within software is to help users see the character and content of their colleagues&#8217; work, presumably without having to look over their colleagues&#8217; shoulders or ask them what they are doing or have done.</p>
<h3>Prior models for providing awareness</h3>
<p>Previous attempts at building awareness mechanisms into collaborative editing software concerned<em> informational mechanisms </em>via which collaborators explicitly inform each other of their actions (e.g. RCS, CVS, subversion, etc. commit logs) and <em>role restrictive mechanisms </em>which in some way associate a set of possible actions and activities with roles built into the system, and then assign people to roles.</p>
<p>Dourish and Belloti say that informational mechanisms are inherently problematic (cf. subversion commit logs): the posting user gets no benefit from the update; the posting user determines what information to convey;  the posting user determines (in some cases) how to convey the information; and the posting user has to do extra work now in addition to the work of actually producing the work product (p. 109) .  About roles: don&#8217;t enforce roles, because roles are fluid (in collaborative editing anyway), and hard to define.   People may be reviewers in one minute and authors in the next.   Secondly, different people may have different ideas about what a particular role should do.   Some may take &#8220;reviewer&#8221; to mean purely annotation, others may want to edit the document (p. 108, 113).</p>
<h3><strong>The shared feedback model</strong></h3>
<p>Instead, Dourish and Belloti propose a <em>shared feedback</em> model, which makes &#8220;information about individual activities apparent to other participants by presenting feedback on operations within the shared, rather than the private, workspace&#8221; (p. 109).  We want the following things (reactions to things in the above paragraph):</p>
<ul>
<li>to collect and distribute awareness information passively, with low overhead for sender and reciever (no added work load)</li>
<li>to make information available as and when needed as a context for individual activities, allowing participants to extract the awareness information most relevant to them</li>
<li> to avoid  strict roles</li>
<li>to present the awareness information within the shared workspace alongside the shared object, so that users can see the information and object concurrently, and find the information most relevant to the object</li>
</ul>
<p>Some benefits of the shared feedback approach are that</p>
<ul>
<li>one can peripherally monitor other&#8217;s activities and comment on them, so that one is constantly both communicating one&#8217;s activities and allowing the opportunity for others to comment on them or observe them for themselves.</li>
<li>users can tailor their contribution to convey collaboration and coordination information and solicit responses</li>
</ul>
<p>Shared feedback is all about allowing users to see other&#8217;s work and actions as they occur, allowing them to interpret, contribute and coordinate more efficiently.</p>
<h3>Shared feedback in semi-synchronous systems</h3>
<p>What most of the article is talking about is everyone working simultaneously within the same document space, like sharing one single Word window, as opposed to using subversion to check out a working copy of the document, making your changes and then committing the changes.  This is synchronous editing  rather than asynchronous editing.  However,  &#8220;we can certainly imagine asyncrhonous awareness information presented in the same workspace as the work object&#8221; (p. 113).  For example: &#8220;change bars,&#8221; which highlight the area changed by the last editor, which may also include who did the changing, when, and the nature of the changes.   We can present past information of activity within the shared workspace at different levels of granularity.</p>
<h3>The experiment with ShrEdit</h3>
<p>Dourish and Belloti studied a group of designers using an application called ShrEdit, a synchronous multiuser text editor, to solve design problems.   ShrEdit allowed people to have both shared and private windows.  Shared windows shows a view onto the shared document, and each user has an edit cursor within their shared window.  Shared windows show text as it is being input by any of the participants in the shared document editing session.   Private windows are like typical text editors and show a document that only the user can see.  These were used for notes or for creating text which would later be pasted.  One user could see which other users were editing the shared document, and find the current location of their edit cursors.  ShrEdit provided some of the shared feedback model that Dourish and Belloti promote, but had some notable lacks: such as that one could not see one&#8217;s colleagues edit cursors.</p>
<p>The authors studied groups of three designers (all with previous experience of working together), each placed in a separate location and linked via audio and/or video to the others.    After a a training period, each group was given a specific design problem to solve collaboratively in 90 minutes.</p>
<p>&#8220;The shared workspace provided a focus for the designers&#8217; work and discussions&#8221; (p. 110).   Talk was used  heavily to maintain awareness of activities, as well as to discuss design ideas.   Participants moved between nearly independent work and tightly focused group consideration of specific elements or passages, relying on awareness to know when to do what.  The activities of each group member varied continuously and opportunisitically as things changed.</p>
<p>The problems with ShrEdit concerned how much of other&#8217;s activities were viewable.   Participants preferred to ask where others were, or construct special indexing schemes to refer to, rather than use the provisions built into the software.   Having others be aware of what one was doing was very important, and was accomplished through speaking aloud.</p>
<h3>Comments</h3>
<p>First, I am struck by how subtle the shared awareness in ShrEdit is; I could not really see it the first time through the paper and ended confused.  I realize that I expect an explicit, orthogonal mechanism which provides awareness information.  But it was in fact the very core of ShrEdit that provides the awareness: that everyone has a window on the same live document and that everyone can see each other&#8217;s edits as they occur.  That&#8217;s the awareness mechanism that ShrEdit provided, primarily.  It also provided some explicit mechanisms, but they ended up not actually being used because they were too much work.</p>
<p>Secondly, Dourish and Belloti build voice/video links right into the experiement, and so much awareness information is communicated in speech, <em>with the shared workspace providing shared context </em>so that speech can be very efficient and contextualized.   And people universally used the voice/video link to do much of the coordination and awareness maintenance.   So they&#8217;re not assuming that the software replaces human interaction completely, which is an extremely interesting choice to me, because I&#8217;m assuming (in the world of wiki) that I have no direct human interaction but that all awareness information must be communicated via the system.</p>
<p>Third, I&#8217;m also struck at how related Cockburn&#8217;s osmotic communication, erg-seconds and shared workspaces (Cockburn, 2002) are to the provision of awareness information via software.  They&#8217;re really trying to solve the same thing.  Well, similar things.</p>
<p>There were two incidental take homes for me: try not to add to people&#8217;s workloads in order to supply awareness information (p. 109, 112);  people tend to want to respect authorship (p. 111).</p>
]]></content:encoded>
			<wfw:commentRss>http://visual.placodermi.org/2009/02/16/summary-awareness-and-coordination/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Summary: Successful software engineering research</title>
		<link>http://visual.placodermi.org/2009/02/06/summary-successful-software-engineering-research/</link>
		<comments>http://visual.placodermi.org/2009/02/06/summary-successful-software-engineering-research/#comments</comments>
		<pubDate>Sat, 07 Feb 2009 05:32:08 +0000</pubDate>
		<dc:creator>Chris Malek</dc:creator>
				<category><![CDATA[Summaries]]></category>

		<guid isPermaLink="false">http://visual.placodermi.org/?p=649</guid>
		<description><![CDATA[Summary of the article "Successful software engineering research"]]></description>
			<content:encoded><![CDATA[<p><img class="alignnone size-full wp-image-650" title="Successful" src="http://visual.placodermi.org/wp-content/uploads/2009/02/successful.jpg" alt="Successful" width="840" height="630" />Parnas, D. L. (1998). Successful software engineering research. <em>ACM SIGSOFT Software Engineering Notes</em>, 23(3):64-68.</p>
<p>Parnas&#8217; essay is a plea to computer science/software engineering researchers to be more relevant to practicing developers by doing research the way he does it &#8212; essentially, by doing design science research.   He explicitly mentions the rigor/relevance dilemma (or failure in this case) as why he is so concerned: software engineering researchers write papers that are not useful to practitioners.   In this case, it is in the first sense that Straub and Ang (2008) describe: that the choice of topics or themes that researchers tackle are not helpful to practitioners.</p>
<p>He was influenced heavily in his outlook on research by two things: his engineering colleagues and a particular experience in an industry setting early in his career.  He describes engineering research as follows:</p>
<blockquote><p>&#8220;The majority of those papers begin by describing a problem that is frequently encountered in connection with product design or production.  They proceed to develop a model of the essential or fundamental parts of the problem, abstracting from facts that they consider irrelevant, and then proceed to analyse that model  Finally, they show how the results of their analysis can be applied to solve, or improve the solution of, the original problem.  Somewhere in the paper, there is a survey of alternative approaches, including those in the literature and those in use in other industrial environments&#8221; (p. 64)</p></blockquote>
<p>This is very close to the seven necessary parts of a design science study proscribed by Hevner et. al.   The only thing lacking from his descriptions are explicit rigor and scientific contribution, and (due to the nature of engineering as being about physical objects) deals with only mentions one of the four kinds of IT artifacts: models.</p>
<p>What is interesting about Parnas&#8217; paper in contrast to Hevner is Parnas&#8217; emphasis on close ties to the practitioner community, emphasizing personal interaction with actual developers to learn what real problems they encounter instead of considering software engineering from an academic distance.   In a critical experience earlier in his career, Parnas was fortunate enough to be embedded in a company among developers and was able to observe them at close hand and inspect their code and other work products.  This had profound impact on the way he chose problems to work on, and in the kinds of solutions he finds.  It was from this experience that he developed the now commonly used (but then nearly unthinkable) idea of information hiding in software design.</p>
<p>He describes some promising areas for research which yet have very little activity: inspection methods (allowing code inspectors to &#8220;proceed systematically, carefully considering all cases in a way that provides confidence that nothing has been overlooked&#8221; (p. 66); and documentation (studying ways to make documentation not be &#8220;unclear, incomplete, inconsistent or inaccurate&#8221; (p. 66).</p>
<p>He ends by giving advice to researchers: keep close to practitioners &#8212; read their code; look for causes instead of symptoms; look for problems that are longer term/bigger problems, not ones that developers are likely to solve themselves; ask why practitioners don&#8217;t use the ideas from software engineering research; be wary of fads and buzzwords.</p>
<h3>References</h3>
<ul>
<li>Hevner, A., March, S., Park, J., and Ram, S. (2004). Design science in information systems research. <em>MIS Quarterly</em>, 28(1):75-105.</li>
<li>Straub, D. W. and Ang, S. (2008). Readability and the relevance versus rigor debate. <em>MIS Quarterly</em>, 32(4):iii-xiii.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://visual.placodermi.org/2009/02/06/summary-successful-software-engineering-research/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Summary: No silver bullets</title>
		<link>http://visual.placodermi.org/2009/02/06/summary-no-silver-bullets/</link>
		<comments>http://visual.placodermi.org/2009/02/06/summary-no-silver-bullets/#comments</comments>
		<pubDate>Fri, 06 Feb 2009 20:25:20 +0000</pubDate>
		<dc:creator>Chris Malek</dc:creator>
				<category><![CDATA[Summaries]]></category>

		<guid isPermaLink="false">http://visual.placodermi.org/?p=641</guid>
		<description><![CDATA[Summary of "No Silver Bullet: Essence and Accidents of Software Engineering"]]></description>
			<content:encoded><![CDATA[<p><img class="alignnone size-full wp-image-647" title="2179136893_d91b9971f3_o1" src="http://visual.placodermi.org/wp-content/uploads/2009/02/2179136893_d91b9971f3_o1.jpg" alt="2179136893_d91b9971f3_o1" width="840" height="630" /></p>
<p>Brooks, Fredrick P., &#8220;No Silver Bullet: Essence and Accidents of Software Engineering,&#8221; <em>Computer</em>, Vol 20, No 4 (April 1987), pp. 10-19</p>
<p>This is the famous article by Brooks which Cockburn referred to in <em>Agile Software Development</em>.  In it, Brooks defines some concepts relating to software development in order to help people see what makes software development hard in an effort (possibly) to dissuade people from believing that there is yet to be a silver bullet which will make software development easy enough for anyone to do well.   He says there are two classes of qualities of software: essential qualities, and accidental qualities.  The essential  qualities are extremely difficult to address or improve, while the accidental qualities can be improved via tool support.</p>
<p>The essential qualities all have to do wth the design of software: understanding the problem and designing and testing the conceptual contstruct.   The accidental qualities have to do with implementing the construct in code and testing it.</p>
<p>The essential qualities are these:</p>
<ul>
<li><strong>Complexity:</strong> software design is complex: understanding domain concepts to the extent that one can define them and the relationships between them exactly is hard; this definition is essential if we want to translate those concepts into code.  Unlike physical objects, no two parts of a software construct are alike.  Scaling up of  a software entity necessarily means a scaling up in components and complexity.   The problem associated with complexity are: difficulty of communication of design among team members; difficulty in enumerating all possible states of the program (which brings unreliability); makes programs hard to use; makes programs hard to change without inducing unwanted side-effects.  The complexity makes overview hard (seeing and tying up loose ends), and the tremendous learning burden means turnover is costly.</li>
<li><strong>Invisible</strong>: can&#8217;t see it all at once as you would a physical piece of something.  It is also very difficult even to visualize since there are many ways to look at the same design: &#8220;the reality of software is not inherently embedded in space.&#8221;  This inhibits communication of structure and design.</li>
<li><strong>Changeable</strong>: we&#8217;re just pushing bits around in the end; changing ideas.  Therefore it is always possible to change a piece of software.  Software is inherently flexible where physical products are not. (McConnell , pp. 19-20).    Change comes in two ways: people use the software in ways not originally forseen in the design and want better support for that; and the physical machine/OS changes underneath us.</li>
<li><strong>Conformity</strong>: software must conform to external constraints (particular hardware, legacy systems and data, regulations) that make it difficult to reduce complexity.    Much of the complexity of interacting with human systems is arbitrary, and since software is percieved as changeable, it is most often seen as the thing which should conform, not the human system.</li>
</ul>
<p>The accidental qualities are all about implementation and testing, and these have been solved almost as well as they are likely to be solved, so don&#8217;t expect many more breakthroughs.   Here are things that Brooks says have helped to reduce the impact of the accidental qualities (keep in mind he&#8217;s writing this in the middle of the 1980s):</p>
<ul>
<li><strong>High level languages</strong> (including object orientation).   High level languages allow programmers to express implementation with concepts closer to that which we use in design.  We&#8217;re reducing or eliminating the translation step between data models, operations, sequances and communication and registers and bits.   This eliminates a lot of the translation complexity.   But Brooks says that &#8220;the most a high-level language can do is to funish all the constructs that the programmer imagines in the abstract program&#8221; and although our level of thinking about those constructs is improving, it does so at an ever decreasing rate.</li>
<li><strong>Time sharing and better OS support</strong>:  Time sharing is contrasted with batch processing, and it helps because we spend much less time in the code and test cycle.  By better OS support, I mean the difference betweeen DOS (which has no library and almost no system call support) and UNIX (which contains many libraries and a vast system call set, allowing you to not have to implement display code, disk drivers, filesystems code, etc.).</li>
</ul>
<p>He then talks about hopes for the future, and discounts all as being silver bullets: Ada and other high-level languages, object orientation, AI (both in the idea of software solving problems that were previously the domain of human intelligence; and in expert systems), expert systems, automatic programming (we specify the requirements and the software writes code for us); graphical programming (a la LabVIEW), proof based  design verification, better tools, and workstations.</p>
<p>He identifies three promising areas that may help promote obtaining (note: not necessarily building) better software:</p>
<ul>
<li><strong>Buy versus build. </strong> Remember that in the mid 1980s, the software market was still small.</li>
<li><strong>Incremental, iterative development</strong> with feedback from end users: because this helps to refine requirements, prove the design, and improves morale</li>
<li><strong>Identify, retain and coddle great designers:</strong> some people are just better at software design than others.  Try to identify them, retain them in your organization, and coddle them so that they get the skills and compensation that they need.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://visual.placodermi.org/2009/02/06/summary-no-silver-bullets/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Summary: What makes patterns interesting</title>
		<link>http://visual.placodermi.org/2009/02/05/summary-what-makes-patterns-interesting/</link>
		<comments>http://visual.placodermi.org/2009/02/05/summary-what-makes-patterns-interesting/#comments</comments>
		<pubDate>Fri, 06 Feb 2009 07:11:29 +0000</pubDate>
		<dc:creator>Chris Malek</dc:creator>
				<category><![CDATA[Summaries]]></category>

		<guid isPermaLink="false">http://visual.placodermi.org/?p=637</guid>
		<description><![CDATA[Summary of "What makes patterns interesting in knowledge discovery systems."]]></description>
			<content:encoded><![CDATA[<p><img class="alignnone size-full wp-image-638" title="Beliefs" src="http://visual.placodermi.org/wp-content/uploads/2009/02/beliefs.jpg" alt="Beliefs" width="840" height="630" />Silberschatz, A. and Tuzhilin, A. (1996). What makes patterns interesting in knowledge discovery systems. <em>IEEE Transactions on Knowledge and Data Engineering</em>, 8(6):970-974.</p>
<p>This paper is about devising a measure of interestingness of patterns.  Early on in knowledge discovery research, researchers realized that knowledge discovery systems can easily generate a huge number of patterns from a data set, most of which may be of no interest to the user of the system (p. 970).    And so researchers began to work on the problem of <em>interestingness</em>.   How do we know when a pattern found by the knowledge discovery system would be of interest to the user, so that we can only present such patterns to the user?    Researchers have taken two approaches towards developing interestingness measures:  objective measures, and subjective measures.</p>
<p>Objective measures are based only on the structure of the pattern and the underlying data used in the discovery process.  Such measures can be useful in some cases, but largely fail to capture the full complexities of the pattern discovery process, and can generate patterns that are objectively interesting, but which are still of little interest to the user.  Subjective measures of interestingness consider patterns from the point of view of the user, as well as the structure of the pattern and underlying data.     Prior work in subjective measures were limited in scope to a particular domain of patterns.</p>
<p>The measure that the authors propose, and the scientific contribution that they hope to make, is a domain-independent subjective measure of interestingness which can be used on patterns revealed by a knowledge discovery process.</p>
<p>They start by saying that there are two characteristics of a pattern that cause it to be interesting to a user: <em>unexpectedness</em> (it is &#8220;surprising&#8221; to the user) and <em>actionability</em> (the user can act on it to his advantage).   They go on to say that they believe that the majority of unexpected patterns are actionable, and  conversely that the majority of actionable patterns are unexpected.  Thus actionability and unexpectedness act as viable approximations of each other &#8212; we can look for patterns which exhibit one of those two characteristics, and assume that it also has the other.</p>
<p>It is hard to determine the actionability of a pattern for several reasons: we need to associate an action with each potential pattern (or pattern cluster, if we can partition the pattern space into subsets of similar patterns), which can be difficult or impossible (we may not know the full pattern space); and the mapping of actions to patterns may change over time, which means we have to do this difficult mapping over and over again.   Thus the authors chose to develop a measure of unexpectedness of a pattern.  Thus, this paper is about a domain-independent measure of subjective interestingness of a pattern which is a function of the unexpectedness of that pattern (p. 791).</p>
<p>The authors want to determine unexpectedness by examining the effects of new data on a person&#8217;s belief system: the more the new data changes a person&#8217;s belief system, the more unexpected it is.   A belief system is made up of one or more beliefs, which they define  as logical statements with an associated confidence level given prior evidence.   If b is a belief, then this confidence level or <em>degree of belief </em> based on previous evidence ? can be expressed as d(b??).  People have two kinds of beliefs:  <em>hard beliefs</em>, which are unchanged by new evidence (we suspect the evidence to be faulty instead); and <em>soft beliefs</em>, which can be changed by new evidence.</p>
<p>A pattern which contradicts hard beliefs is always unexpected and thus is always interesting (p. 975), so computing interestingness of such data is trivial.  Thus, much of the rest of the paper is dedicated to devising a measure of unexpectedness in relation to soft beliefs.</p>
<p>The authors provide four potential measures of degrees of soft beliefs and how they are affected by new evidence.  They eliminate three of them as impractical or too limited in applicability and propose a Bayesian approach (p. 973), which updates the degree of belief in a particular belief given new evidence by using Bayes rule.</p>
<p>They then write an expression which uses this result to express interestingness as how much degrees of belief within a belief system were changed as a result of new data.  They then show how, with this definition of interestingness, unexpected patterns do indeed cause more change in degree in beliefs than expected patterns.   Since we&#8217;re already saying unexpectedness as a proxy for interestingness, we can use this measure to estimate subjective interestingness: the bigger the measure, the more interesting the pattern.</p>
<p>They conclude by describing a general procedure for how one would use these concepts to look for interesting patterns in a database that gets updated with new data:</p>
<p style="padding-left: 30px;">&#8220;When new data arrives, degrees of all the beliefs are revised based on new data.  If some of the degrees change above predetermined threshold levels, this means that there are some interesting patterns in the data and that the discovery processes to extract interesting patterns should be launched&#8221; (p. 974).</p>
<p>They call this <em>belief-driven discovery</em>.  The authors then say: see our upcoming paper, in which we describe those discovery processes.</p>
<h3>Critique</h3>
<p>In terms of Hevner et. al. (2004):</p>
<ul>
<li>The <strong>artifacts</strong> that the paper describes are constructs: interestingness, unexpectedness, actionability.   Those constructs are intended to address a <strong>real business problem</strong> that knowledge discovery systems have: knowledge discovery systems can easily generate a huge number of patterns from a data set, most of which may be of no interest to the user of the system (p. 970), which makes the system not very useful.</li>
<li>The artifact could definitely provide <strong>utility</strong> if implemented correctly.</li>
<li>They do <strong>evaluate</strong> their concepts (mathematically) to show that they do solve the problem they set out to do.</li>
<li>Their measure does offer <strong>scientific contribution</strong>, because it is a domain-independent measure of interestingness, which had not yet been devised.</li>
<li>They certainly are <strong>rigorous</strong> in defining and representing their artifacts by deriving mathematical metrics for estimating them.</li>
<li>They do detail their <strong>search space</strong> for potential metrics for the effect of new data on belief systems (part of building the interestingness measure).</li>
<li>They obviously <strong>communicate</strong> their results to a technical audience.</li>
</ul>
<p>The only potential failures to stick to the seven characteristics of good design research as described by Hevner et. al. (2004) is in direct utility and in communicating to a managerial audience (they might have done this in another venue).</p>
<h3>Utility</h3>
<p>I see two issues around instantiating these metrics: defining beliefs, and computing prior and conditional probabilities.</p>
<p>Defining beliefs seems tricky.  Clearly, in order to use the metric they provide on p. 973, one has to express a user&#8217;s belief in a mathematically testable way.   One must also be able to quantify the degree to which one holds a belief (for soft beliefs).   It seems like both of these could be hard for most people.   I am guessing that how you might implement a belief is as a function or set of functions on a feature vector extracted from the dataset (p. 973).  Your belief is about what you believe the normal range of the output of that function should be given the data you&#8217;ve already seen.   Then you state how confident you are in that range of values.</p>
<p>In Bayes rule, computing appropriate prior probabilities and conditional probabilities (likelihoods, or posterior probabilities) can be hard (especially the likelihoods), and yet doing it correctly is critical for successful use of Bayes rule.  If you do it wrong, you still may end up finding uninteresting patterns.  For prior probabilites, you also clearly need some experience with the data before you can even have a belief ?and begin getting a good estimate of your prior probability P(?| ?).  For the conditional probabilities, they do seem to address this on p. 973.    In both cases, you need to know what aspects of the data to look at, which goes back to defining beliefs.</p>
]]></content:encoded>
			<wfw:commentRss>http://visual.placodermi.org/2009/02/05/summary-what-makes-patterns-interesting/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Summary: Mining email social networks</title>
		<link>http://visual.placodermi.org/2008/11/17/summary-mining-email-social-networks/</link>
		<comments>http://visual.placodermi.org/2008/11/17/summary-mining-email-social-networks/#comments</comments>
		<pubDate>Mon, 17 Nov 2008 16:25:44 +0000</pubDate>
		<dc:creator>Chris Malek</dc:creator>
				<category><![CDATA[Summaries]]></category>

		<guid isPermaLink="false">http://visual.placodermi.org/?p=359</guid>
		<description><![CDATA[This paper is a good example of what I ultimately want to do: analyze digital artifacts of an open source software project to uncover the social structure of the community. ]]></description>
			<content:encoded><![CDATA[<p><a href="http://visual.placodermi.org/wp-content/uploads/2008/11/mining.jpg"><img class="alignnone size-full wp-image-363" title="Email mining" src="http://visual.placodermi.org/wp-content/uploads/2008/11/mining.jpg" alt="" width="500" height="375" /></a>This paper is a good example of what I ultimately want to do: analyze email archives and revision control system logs (at least) of an open source software project to uncover the social structure of the community.  While this paper is more interested in what is happening than in making a tool to uncover what is happening, it&#8217;s still quite relevant.  It&#8217;s also an example of one of the many papers that relate to open source software communities that wouldn&#8217;t show up in Web of Science search (because it&#8217;s a conference paper, not a journal paper) but which would prove very useful to me.</p>
<p>Of special interest to me are the technical aspects of how they did their work: figuring out the sequence of replies and attempting to deal with what they call <em>email alias unmasking</em>.   In the email archives of a traditional firm, one can trace back each email address to a real person (one can look in the organizational directory to figure out who corresponds to what address) and each person typically only has one email address.  Not so in such public forums as are used in OSS projects:  in such spaces, a single person may use many aliases (foo@gmail.com, bar@yahoo.com, foo.bar@abccorp.com may all be the same person), and discovering that those aliases all belong to one person is important for discovering the true social structure.</p>
<p><strong>Summary of</strong></p>
<p style="padding-left: 30px;">Bird, C., Gourley, A., Devanbu, P., Gertz, M., and Swaminathan, A. (2006).  Mining email social networks.  In <em>MSR &#8216;06: Proceedings of the 2006 international workshop on   Mining software repositories</em>, pages 137-143, New York, NY, USA. ACM Press.</p>
<p class="line862">The authors looked at the Apache developers mailing list archive and CVS repository commit logs, considering messages covering period of four or five years. They did so with the goal of studying communication and collaboration technologies (C&amp;C) in software projects, particularly in open source software development. They are specifically interested in how activities in C&amp;C correspond to development activities in the source code: what are the social properties of the developer network; do active communicators also make a lot of source code changes; do developers and non-developers play different social roles; and do the most active developers have the highest status among developers. They examined an open source project because most/all communications are purposely publicly available.</p>
<p class="line862">The authors looked at each participant in the mailing list, and divided the group into developers (those who contributed code or documentation changes to the CVS repository) and non-developers (those who didn&#8217;t). For each participant, they looked at how many messages the person sent, how many of their messages were replied to, and three social networking measures: <em><a class="http" href="http://en.wikipedia.org/wiki/In-degree">in-degree</a></em> (the number of edges connecting to a node in a directed graph; in this case, the number of different people to whom a person has replied), <em><a class="http" href="http://en.wikipedia.org/wiki/Out-degree">out-degree</a></em> (the number of edges emerging from a node in a directed graph; in this case, the number of individuals who have replied to a person) and <em><a class="http" href="http://en.wikipedia.org/wiki/Betweenness#betweenness">betweenness</a></em> (the number of shortest paths that go through a node; high betweenness indicates that a person acts as a gatekeeper or broker, playing a role in many interactions). They also presented a directed sociogram of the Apache mailing list archive in which the arrows indicated who responded to whom more often (but didn&#8217;t do much with it).</p>
<p class="line862">They found that messages sent, messages replied to, <em>in-degree</em>, and <em>out-degree</em> follow a <a class="http" href="http://en.wikipedia.org/wiki/Pareto_distribution">Pareto distribution</a> (a power law probability distribution; a few people send a lot, but most people send a little), the latter showing a &#8220;long tailed degree distribution, characteristic of <a class="http" href="http://en.wikipedia.org/wiki/Small_world_network">small world networks</a>&#8221; (p. 141). There was a strong relationship between the number of messages sent by someone and the number of distinct people that respond to them (p. 141). They found a high correlation (<a class="http" href="http://en.wikipedia.org/wiki/Spearman_rank_correlation">Spearman rank correlation</a> of 0.80) between messages sent and number of source changes made, indicating that C&amp;C activity is correlated with development work (p. 141). There was a lower correlation between messages sent and document changes.</p>
<p class="line862">Developers do act as brokers or gatekeepers more than non-developers (p. 142), and generally have higher status (computed as what?), and developers who do more source code changes play more significant roles in the mailing list. Higher activity in source code changes is strongly correlated with higher activity in the mailing list; document changes are less so correlated. Generally, high <em>in-degree</em>, <em>out-degree</em>, and <em>betweeness</em> are correlated with status (how?) and source code change activity.</p>
<p class="line867">
<p class="line867">
<h3 id="head-3e76540de10d6c0c43f2d0967f035c2f9fface14"><strong>Data extraction</strong></h3>
<p>They used the Reply-To: address and Message Id: of each message to which a message is a reply (if any) to determine who replied to whom, and suggest that you could look through the contents for quoted text attributions. The sender of a reply is &#8220;one who found the initial message of interest&#8221; (p. 139).</p>
<p>One of the few groups to deal explicitly with e-mail alias unmasking: many people have more than one e-mail address, and ensuring that we count all the e-mail from those different addresses as belonging to that person is not trivial. They used a clustering algorithm plus manual inspection to develop a lookup table of e-mail addresses to names. The similarity measure they used for the clustering is based on the fields in the From: line.</p>
<p>They compared the normalized names to names and e-mails to e-mails using the <a class="http" href="http://en.wikipedia.org/wiki/Levenshtein_distance">Levenshtein distance</a>, compared names to e-mails, and took the max scoring of the three (p. 139). They did this for all pairs of &lt;name, e-mail&gt; tuples. They used a similar method for unmasking CVS aliases.</p>
<p class="line867">
<h3 id="head-910fb84d33e617c90fb334917f53c90371dba852">Social networking measures</h3>
<p>They comment on connectedness, but don&#8217;t use it except to say that the most highly connected people in the Apache network are, in fact, the most productive developers (p. 140), and that they are doing further research into that.</p>
<p>The &#8220;small world network&#8221; is a statement about mean shortest path and clustering of the network. Small world networks exhibit a power-law distribution of degrees of its nodes (few people are highly connected, and most people are not highly connected).  Scale-free networks follow an exponential distribution.</p>
<p>They used messages sent and out-degree to make the statement about number of messages sent vs. number of unique repliers. They&#8217;re doing further investigation into this.</p>
<p>They used betweenness with in-degree and out-degree to show that developers do act as brokers more than  non-developers (p. 142), and generally have higher status (computed as what?), and that developers who do more source code changes play more significant roles in the mailing list.</p>
]]></content:encoded>
			<wfw:commentRss>http://visual.placodermi.org/2008/11/17/summary-mining-email-social-networks/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Summary: The cognitive style of PowerPoint</title>
		<link>http://visual.placodermi.org/2008/10/11/the-cognitive-style-of-powerpoint/</link>
		<comments>http://visual.placodermi.org/2008/10/11/the-cognitive-style-of-powerpoint/#comments</comments>
		<pubDate>Sun, 12 Oct 2008 03:34:39 +0000</pubDate>
		<dc:creator>Chris Malek</dc:creator>
				<category><![CDATA[Summaries]]></category>

		<guid isPermaLink="false">http://visual.placodermi.org/?p=191</guid>
		<description><![CDATA[Discussion of Tufte, E. R. (2003). Cognitive style of PowerPoint.  Cheshire, CT: Graphic Press.]]></description>
			<content:encoded><![CDATA[<p><img class="alignnone size-full wp-image-192" title="Tufte maxims" src="http://visual.placodermi.org/wp-content/uploads/2008/10/tufte.jpg" alt="" width="500" height="375" />Tufte, E. R. (2003). <em>Cognitive style of PowerPoint</em>.  Cheshire, CT: Graphic Press.</p>
<p>Tufte&#8217;s article is really a critique of poor presentation skills, and a plea for better reasoning skills about information.  He asks that presenters and audiences do not simply accept PowerPoint&#8217;s default templates and modes as the proper way to conduct serious presentations of evidence: &#8220;The evidence indicates that PowerPoint, compared to other common presentation tools, reduces the analytical quality of serious presentations of evidence.&#8221;  He ends his article with a suggestion for how such presentations should be carried out: &#8220;serious presentations might well begin with a concise briefing paper or technical report (the 4-pager) that everyone reads (people can read 3 or 4 times faster than presenters can talk).  Following the reading period, the presenter might provide a guided analysis of the briefing paper and then encourage and perhaps lead a discussion of the material at hand&#8221; (p. 30).</p>
<p>Note that Tufte is specific about the kind of presentation he means: a serious presentation, &#8220;with explanations ot understand, evidence to evaluate, problems to solve, decisions to make, and in several examples, lives to save&#8221; (p. 27).  Kinds of presentations that Tufte is not talking about:</p>
<ul>
<li>the performance (David Byrne), Machine Project presentation</li>
<li>the interactive demonstration (computer education)</li>
<li>the marketing pitch (Steve Jobs at MacWorld)</li>
<li>the political speech (offshoot of marketing pitch)</li>
<li>other?</li>
</ul>
<h3><strong>Five steps to better serious presentations</strong></h3>
<p>He gives these maxims as what should guide good, serious presentations.</p>
<p>Assume that your audience is intelligent, and found your presentation in principles of respect for them.   Use the core ideas of good teaching: &#8220;explanation, reasoning, finding things out, questioning, content, evidence, credbile authoirty not patronizing authoritarianisim&#8221; (p. 7).<br />
Strive to be as clear as you can.  Be precise.  Use enough words (spoken and written), images, and information graphics in the right combination to clearly and precisely convey the information.   Give people all the data they need to make evaluations and decisions; don&#8217;t let the medium dictate how much you give.<br />
Be honest and accurate.  Don&#8217;t distort the data.  Be respectful of human strengths and weaknessess.  For example, it is hard for humans to compare areas of objects (so pie charts are bad), but we&#8217;re good at  being able to compare sets of data (e.g. images) with our eyespan).<br />
Allow your audience to explore, evaluate and discuss the information.<br />
Use appropriate media to support the level of detail you need to do this.</p>
<h3><strong>Properties of computer projected presentations</strong></h3>
<p>Intrinsic characteristics:</p>
<ol>
<li>LCD projectors have low resolution, hence low data density.  This determines the total amount of information we can convey on any one slide, and it&#8217;s not much.  This makes text and information graphics on slides be of limited usefulness.  High information density images, however, are worth it.</li>
<li>Projected things need to have large type so that people in the audience can read it.   There are only so many words we can expect to fit on a page and actually have people be able to read them.</li>
<li>Slides are presented sequentially and transitions are driven by presenter, not the audience.   The audience can&#8217;t go back to previous slides to review (without interrupting the presentation) or go forward to reconnoiter.  They can&#8217;t explore the information that&#8217;s being presented, and so they are put into the role of passive receptors rather than active learners.</li>
</ol>
<p>Accidental characteristics:</p>
<ul>
<li>bullet lists</li>
<li>hierarchies</li>
<li>bad graphics</li>
<li>transitition effects</li>
<li>templates</li>
<li>wizards</li>
<li>etc.</li>
</ul>
<p>All of the accidental charactersitics are avoidable, while the intrinsic characteristics are not (without much effort), and so it is those intrinsic characteristics that determine the cognitive style of PowerPoint.  That accidental characteristics caused serious problems as well in the Columbia slides are more a function of social consensus, lack of skill and lack of effort.</p>
<h3><strong>My critique</strong></h3>
<p>Much of what he complains about as the product or fault of PowerPoint I think is the lack of skill in the presenter in organizing information and presenting it.  He addresses this particular critique with a paraphrase of George Orwell: &#8220;PowerPoint becomes ugly and inaccurate because our thoughts are foolish, but the slovenliness of PowerPoint makes it easier for us to have foolish thoughts,&#8221; (p. 27) and &#8220;To make smarter presentations, try smarter tools.&#8221; (p. 28) implying that simply using PowerPoint can make you a bad presenter.</p>
<p>I think that skilled craftsmen can make quality work even with poor tools, while poor craftsmen can&#8217;t make quality work even with good tools.  Presenting at all is a skill, something that requires natural talent, training, a lot of practice and a lot of self-reflection.   You need to be able to organize information into a story that can be told sequentially (in time) and still make sense; you need to be able to tailor that information to your audience; you need to tailor your presentation to your goal &#8212; teach, inform, persuade, entertain;  you must be able to entertain your audience even if your goal is something else.   I think that if a skilled presenter is conscious of the intrinsic limitations of computer projected media, and can avoid or use to her advantage the accidental qualities, she can use PowerPoint (or Keynote, etc.) to advantage.</p>
<p>Secondly, I think he over-simplifies the kind of data one might need to convey in a serious presentation (video and images and sound, websites, technical demonstrations, visiting websites) and ignores how valuable a common focus is when doing collaboration.</p>
<h3><strong>Edward Tufte</strong></h3>
<p>Edward Tufte is a primal force in information visualization, and his books are referenced commonly in papers which propose or use visualization tools or techniques.  Google Scholar lists 2208 cites for The visual display of quantitative information, 1395 for Envisioning Information, 800 or so for Visual explanations.  Prof. Tufte (he is professor emeritus at Yale University) has made a mission of educating people on how to honestly, clearly, and effectively present information.</p>
]]></content:encoded>
			<wfw:commentRss>http://visual.placodermi.org/2008/10/11/the-cognitive-style-of-powerpoint/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Summary: The medium is the massage</title>
		<link>http://visual.placodermi.org/2008/09/21/the-medium-is-the-massage/</link>
		<comments>http://visual.placodermi.org/2008/09/21/the-medium-is-the-massage/#comments</comments>
		<pubDate>Sun, 21 Sep 2008 17:43:53 +0000</pubDate>
		<dc:creator>Chris Malek</dc:creator>
				<category><![CDATA[Summaries]]></category>

		<guid isPermaLink="false">http://visual.placodermi.org/?p=55</guid>
		<description><![CDATA[In The Medium is the Massage, McLuhan discusses different media and their effects on individuals, cognition and society.  The graphic design of this book is said to have been the basis for the design of Wired magazine.]]></description>
			<content:encoded><![CDATA[<p><a href="http://visual.placodermi.org/wp-content/uploads/2008/09/massage.jpg"><img class="alignnone size-full wp-image-57" title="massage" src="http://visual.placodermi.org/wp-content/uploads/2008/09/massage.jpg" alt="" width="500" height="375" /></a>Mcluhan, M. and Q. Fiore (1967). <em>The Medium is the Massage. </em>New York: Bantam Books.</p>
<p><em>The Medium is the Massage</em> was written in 1967 (two years before I was born) and in it McLuhan discusses voice, writing, and &#8220;electric&#8221; media (esp. television) and their effects on individuals, cognition and society.   The graphic design of this book is said to have been the basis for the design of <em>Wired</em> magazine [1].</p>
<p>He says &#8220;Societies have been shaped more by the nature of the media by which men communicate than by the content of the communication&#8221; (p. 9).   The book seems to be at least partly a reaction against the printed word, with its rationality and linearity, with its emphasis of the visual at the expense of the auditory and tactile, and with its encouragement to people to go off and be individuals and abandon the company of their fellows as opposed to electronic media&#8217;s global village.</p>
<p>I think McLuhan is strongly emphasizing the participatory role of information technology (primarily electronic communication) in some cases, and its mass media broadcast role in others.</p>
<ul>
<li>Electronic communication technology lessens distance (by reducing the time it takes to talk) between people and thus weakens social boundaries. McLuhan is believed to have coined the term &#8220;global village,&#8221; [1] meaning this is both in the sense of community (p. 10) and in the sense of &#8220;it takes a village to raise a child&#8221; (p. 13); electronic data storage and communication allows people to access information that their peers, family and locality don&#8217;t have.</li>
<li>Different media use our senses (sight, sound, touch, (smell, taste?)) in different ways, and this in turn affects the way we think and view the world, which in turn affects the environment we build for ourselves.  That&#8217;s the &#8220;massage&#8221; in the title.</li>
<li>The environment we live in is invisible to us, embedded in it as we are.  One purpose of art is to draw our attention to the world we live in.</li>
<li>Disruptive technologies can be incomprehensible to an established environment.  People try to use the new technologies to keep doing what they did with the old one (p. 74) (single loop) while not seeing that the new tech fundamentally changes what can be done (double loop).</li>
<li>New media emphasizes a holistic rather than reductionistic way of living: roles rather than goals.</li>
<li>The printing press helped bring about individualism in ways that the copied book (copied by monks) or the oral tradition (bards) could not. Printing technology created copyright, and electronic media may change how that is implemented. He says that the idea of copyright came with the advent of printing technology (p. 122), because the effort of copying texts was so great that there was no public readership, no public.   Texts were shared among a small group of scholars and there was no commercial aspect.   Printing technology created public readership, and suddenly you could sell your books.  Fighting piracy became important, and thus copyright.  He starts to consider the effect of cheap reproduction (Xeroxing) on copyright and how it becomes easier to get around it (p. 123)</li>
</ul>
<p><strong>References</strong></p>
<ol>
<li> &#8220;Marshall McLuhan&#8221;, wikipedia.org, <a title="wikipedia:Mashall McLuhan" href="http://en.wikipedia.org/wiki/Marshall_McLuhan" target="_self">http://en.wikipedia.org/wiki/Marshall_McLuhan</a> , retrieved Sep 21, 2008.</li>
</ol>
]]></content:encoded>
			<wfw:commentRss>http://visual.placodermi.org/2008/09/21/the-medium-is-the-massage/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

