⇓ More from ICTworks

Our 5 Guiding Principles for SurveyCTO. What Are Yours? – Your Weekend Long Reads

By Guest Writer on April 13, 2019

surveycto

As a social enterprise funded through user fees from hundreds of different organizations, we at Dobility have an unusual degree of freedom: for the most part, we don’t have donors or investors to answer to, and even individual users have limited power to drive our decisions. It’s a great place to be, but it means that we’re fully responsible for charting the path that best serves our overall user base, the sector, and our social mission.

Since the beginning, a set of core principles have guided the development and evolution of SurveyCTO, but they’ve never been publicly summarized – until now. None of these principles are entirely static, but we tend not to adjust them very often. I welcome feedback and even push-back in the comments. The principles are:

  1. Data security: private data should be kept as private as possible. Only people who legitimately need to see private data should be able to see it, and that generally doesn’t include our own server administrators or engineers (and it certainly doesn’t include hackers).
  2. Data quality: data collection should be transparent and carefully monitored. Improved transparency in data collection, coupled with timely and effective monitoring of data quality, is vitally necessary to combat the natural forces that push toward poor data quality.
  3. Data use: effective visualization and analysis requires other tools designed and built for those purposes. We don’t seek to compete with other products in the database, visualization, or analysis spaces. Rather, we prefer to integrate well with tools like Google Sheets, Airtable, Salesforce, Tableau, Power BI, Stata, SPSS, and R.
  4. Reliability and support: our users deserve a product that works reliably, and to get the help they need. Our users have more than enough challenges in their lives, so we have to work 24×7 to make sure that SurveyCTO creates as little stress as possible (and, generally, reduces stress instead).
  5. Empowerment: we should empower those who need to collect data to use our technology directly, with as few intermediaries as possible. The research, M&E, and field teams that use SurveyCTO know best how to design, deploy, and improve their data-collection instruments, so we should empower them with great technology they can themselves use; IT teams and outside consultants should serve as facilitators and supporters, not intermediaries or gatekeepers.

Below, I expand a little bit on what these principles mean in practice, and how they make us different from other vendors in the data-collection space. (Apologies for the length of this post, but I thought that concrete examples would be useful throughout.)

1. Data security: private data should be kept as private as possible.

Our approach to data security is simple but also radical: even though we host servers and data for our users, we ourselves don’t want to be able to see sensitive data (not our server administrators, not our engineers, not our support team, not our Amazon Web Services cloud providers, not hackers who might someday gain access to our servers or other systems, none of us).

After all, most of our users make confidentiality pledges as part of their data collection, and we want to respect those pledges. The first step in respecting them is assuring that a minimum possible number of people have access to view confidential data.

In practice, this means that we strongly encourage users to generate their own 2,048-bit public/private encryption keys, and to use those keys to secure their data. What’s more: we never want to see the private keys, because we never want to, ourselves, be able to decrypt the data. In terms of implementation:

  • In SurveyCTO v1.0, we added a key-generation function to our desktop application, SurveyCTO Sync, so that it could be used to generate keys offline, including on cold-room computers totally disconnected from all networks. Later, for ease-of-use, we added an option directly into our server console, but even that we designed to use Javascript to generate keys in the local browser, effectively offline, so that the keys themselves would never pass through our servers.
  • In SurveyCTO v1.17, we added options to publish data directly to .kml files in SurveyCTO Sync, so that it would be easy to visualize GPS data in Google Earth, even offline (and without having to send sensitive data to cloud providers like Google Maps).
  • In SurveyCTO v1.22, we added the ability to exempt individual non-PII fields from private-key encryption, to allow for easier publishing and sharing (e.g., to dashboards) – without having to lower the overall level of protection for more-sensitive data.
  • In SurveyCTO v2.20, we added our Data Explorer for easy, in-browser data monitoring and exploration. This included the ability to decrypt data safely in-browser and view GPS and other data online, but in a fundamentally safer way than in other cloud products. Decrypted data is held in-memory only, on the user’s computer, and nobody outside the local computer can see that data. Google Maps provides map tiles on-request, for example, so Google does see evidence of the general areas being viewed, but the actual GPS locations (the actual pins) are never sent to Google or anybody else.

A hallmark of our approach has been to avoid producing features that would tempt users away from protecting their data in the best possible way.

For example, years ago we could have very easily added the kind of cloud-based map view that basically every competitor platform has offered; but that would have required that (a) users either don’t use their own encryption keys to secure their GPS data or share the keys with us, and (b) GPS positions be shared on the cloud, with Google and potentially other providers.

In fact, there was tremendous pressure to provide these kinds of features, and we lost out on a fair amount of business because we didn’t offer them. The argument was always “well, you can at least offer these features for unencrypted forms, can’t you?”

And it was true: we could have. But then we would have been building in a powerful incentive for people to not use encryption and to share their data more widely than seems prudent.

It took us several years to build what we considered to be a safe solution for viewing GPS and other data online, in-browser, but in the end we were able to pull it off. And we think that our technical approach has struck the right balance between safety and convenience.

2. Data quality: data collection should be transparent and carefully monitored.

In the field, when people are collecting data, there are so many things that can go wrong: the data can be entirely made-up, the wrong person or facility might be interviewed or inspected, questions might be asked in the wrong way or not asked at all, answers might be recorded incorrectly, etc., etc.

The challenges are many, and because fieldwork is some unfortunate combination of boring and difficult nearly everywhere in the world, similar challenges arise in nearly every setting. And sadly, all natural forces seem to push in the direction of poor data quality. Without visibility into the data-collection process, it’s very difficult to assess the quality of field-collected data, let alone manage for quality.

So a big part of our job has been to constantly improve the visibility. This means not only collecting richer data and meta-data in the field, but also making the process of reviewing and learning from that data as easy as possible. And here, “easy” means fitting into the challenging realities of field project management.

So for example, a great many data-collection projects are difficult to get off the ground: they are running behind-schedule, there are still changes to instruments up through training and piloting, and generally there’s not time to think about back-end data systems or QC monitoring before data starts streaming in.

Very few teams seem to be able to pre-plan for QC processes, so our job has been to try to require less and less pre-planning. In terms of implementation:

  • In SurveyCTO v1.0, we added options for random audio auditing as well as text audits that saved detailed meta-data about the time spent on individual fields or questions. We also introduced support for auto-generated mail merge templates for Microsoft Word, along with features to auto-merge incoming data with those templates in SurveyCTO Sync. This rendered incoming data more readable for those reviewing the data, allowing for a human scrutiny/QC process akin to those traditionally used in paper-based data-collection workflows.
  • In SurveyCTO v1.17, we added options to publish data directly to .kml files in SurveyCTO Sync, so that incoming data could be easily reviewed by geographical location.
  • In SurveyCTO v1.30, we added the new concept of “speed limits” to flag cases where forms are being completed too quickly, and to automatically trigger audio audits based on speed-limit violations. We also added a suite of “automated quality checks” (otherwise known as “high-frequency checks” or “statistical checks”) to flag potential data-quality issues based on the full distribution of responses so far. So, for example, if one enumerator’s responses are different enough from others’ (statistically speaking), then that enumerator’s responses can be flagged for further review.
  • In SurveyCTO v2.20, we added the Data Explorer as the safe, in-browser way to review incoming data – both in aggregate and at the individual level. Importantly, this new interface allows for seamless movement between aggregate views and individual submissions, formats and labels individual submissions for effective review, and includes information and media from automated quality checks, audio audits, text audits, speed limits, and more, all in one place.
  • In SurveyCTO v2.40, we added a new review and correction workflow, so that our users can not only catch data-quality problems, but also correct them. It allows for systematic quality-control processes to be more easily put in place, and for key corrections to be made before data is released to dashboards or analysis.
  • In SurveyCTO v2.50, we added powerful new sensor meta-data options, like the ability to capture the percentage of interview time that seemed to involve conversation or how much noise, ambient light, or movement there was during the interview. By combining this data with the ability to automatically flag outliers for review, field QC processes can be even more effective.

We believe that the right kind of visibility into the data-collection process is needed for managers to effectively manage their own field teams, but we also believe that it’s vital to correct long-standing problems in the overall market for field data collection.

Most organizations outsource data collection to firms that specialize in building and managing field teams, and, for those organizations that do the outsourcing, there has been far too little visibility into the data-collection process itself. Without visibility into data quality, it has been impossible to contract on data quality or enforce any kind of minimum standard.

Contracted firms deliver “clean” datasets to their clients, and the clients have very little ability to distinguish when they are or are not receiving accurate, high-quality data. We have steadily sought to provide technology that offers greater visibility, allowing for better client-side monitoring and an expanding range of potential contractual terms.

We’re excited about the progress we’ve made to-date, and we’re even more excited about the potential for continued improvement. For example, we’re actively thinking about how we might use machine learning technologies to build upon traditional methods of quality control and provide powerful new monitoring options to our users.

Whether it’s fancy stuff like machine learning or just getting better at the human-centered design process for presenting the right kinds of information to the right people, we’re looking forward to continuing to innovate in this area of data quality. After all, the quality of the data we collect plays a key role in driving the quality of our decision-making. Low-quality data is not only of questionable value to us, it can even be harmful.

3. Data use: effective visualization and analysis requires other tools designed and built for those purposes.

So this is a tough one, because a lot of our users would like an all-in-one, end-to-end solution, and it’s never easy to say no. It’s also never comfortable to admit that you’re not the best at something.

But the fact is that our users use the data they collect with SurveyCTO in a dizzying number of ways, and while the mainstream tech world has left offline, illiterate populations effectively unserved, there are massively-well-funded efforts to provide data management, visualization, and analysis products that can work for our users.

It’s not at all obvious how we, as a small social enterprise operating in this space, can effectively compete with platforms and tools like Google Sheets, Airtable, Salesforce, Tableau, Power BI, Stata, SPSS, and R. Moreover, it’s not at all clear that we should even try.

Now, yes, mainstream products can sometimes be expensive, and so we’re sympathetic to those who want all the features of Product X but would rather pay a low, SurveyCTO-style price. But more and more mainstream tech companies are offering compelling nonprofit discounts, and many have CSR initiatives that offer grants for funding nonprofit adoption of best-in-class technology. Fundamentally, these programs seem a better way to get that kind of great technology into nonprofit hands, vs. a small organization like Dobility trying to duplicate all of that technology at a lower price point.

It’s also true, though, that smaller teams with lower technical capacity might have trouble integrating multiple solutions; for them, an all-in-one might be all that is really technically feasible. Here also, we’re sympathetic – but we still don’t think that reproducing best-in-class reporting or analysis functionality would be the right approach. At the very least, the more time and effort we put into that functionality, the more our prices will have to rise… which would then put SurveyCTO out-of-reach for many of the smaller teams we had originally hoped to serve.

What we’d like to do is get better and better about how we integrate with other solutions. We’d like deploying SurveyCTO+ProductX to be as clear and simple as possible, so that less and less technical skill is required. While we don’t think that SurveyCTO will ever be a true “all in one” that operates independently, we’d like setting up an overall system and workflow with SurveyCTO to be as close to “all-in-one easy” as possible. That will require more technical work for our R&D team, but then also better materials (checklists, videos, etc.) to support those who are setting up these systems.

And finally, we know that we can’t avoid all data visualization, analysis, or management. After all, we recognize that our data quality mission (above) fundamentally requires that people be able to monitor and correct the data effectively. This actually requires quite a lot in terms of real-time visualization and analysis.

So, on the surface, you’ll see us continue to build in ever-more-sophisticated visualization and analysis, and you might think that it should be easy to expand to include a wider range of visualization and analysis options.

But the thing is this: while there are aspects of visualization and analysis for monitoring and data quality that technically overlap with aspects of visualization and analysis for data use, the fact is that we can do a far better job delivering on our data quality mission if we stay focused on those needs as opposed to being dragged down the slippery slope toward the myriad ways that our users might use the data they collect.

At the end of the day, it’s all about where we fit in the world. Our focus is on secure, high-quality data collection. That already includes quite a lot – more than enough to keep us busy. And it is also an area that other companies tend to neglect. So we’d prefer to focus our energies there, and integrate with other solutions for data use and management. That way, at the end of the day, our users get the best overall solution for the best possible price.

4. Reliability and support: our users deserve a product that works reliably, and to get the help they need.

Free open-source software can often be less reliable than professionally-supported software, but even professional software seems to have gotten less and less reliable over time. New versions are pushed onto users constantly, and they too-frequently introduce new problems and headaches. In terms of reliability, standards seem to be falling. And meanwhile, the quality of technical support seems to be falling even faster.

At Dobility, we understand how incredibly stressful and complex field data collection can be, and we understand that those engaged in that work need to be able to rely on the technology they use – and be able to get effective help when they need it.

So much of the work we’ve done, from the earliest days of SurveyCTO, concerns reliability and support. When we began building on Open Data Kit’s (ODK’s) strengths, we started by fixing bugs and improving performance for long, complex surveys; we added redundancies to protect against data loss even in the most challenging settings; and we architected our server hosting environment in a way that we knew would prove costly but also expected to be fundamentally safe, reliable, and scalable.

Even today, with thousands of teams using SurveyCTO, every subscription has its own back-end database, its own software version, and its own server memory space. And every paying user has access to free, expert support 24×7.

In the early days, it was only me, a part-time developer, and a part-time QC person. To answer user support queries in a timely manner, I would wake up before dawn, I would pull my car over mid-drive to peck out a quick response, I would respond from the tops of mountains when on holiday. As the team slowly grew, that level of dedication to responsiveness has continued.

And in fact, I’ve had the pleasure of being slapped on the wrist by newer members of our support team when my own responses are not sufficiently quick or helpful; when I jump in on a support query and get something wrong, I get in trouble.

And because SurveyCTO has grown so much over the years and is relied upon in such a vast array of challenging settings, it really does require a team approach to continue offering timely, helpful support; most queries still, behind the scenes, involve multiple people collaborating, suggesting, correcting, or, later, critiquing.

Our QC and R&D teams are also pulled into support cases whenever it looks like there might be some kind of product problem. Often it can be difficult to distinguish between software problems and, for example, form-programming problems. So our developers are continuously pulled into cases. It’s a huge time-sink and it distracts attention from new feature development, but we view it as absolutely critical to maintaining a high level of reliability and support. It’s a price we’re willing to pay.

Also, when there is a software problem, we generally drop everything and move heaven and earth as needed to get it fixed. As the CEO, I remain actively involved in helping to prioritize and roll out essentially every fix; and I try to reach out and personally apologize to anybody who’s been inconvenienced by some problem that we allowed to make it through our QC processes.

It’s a major focus of mine, and it’s distracted attention from, for example, building up the sales and marketing side of our business. But if you want to provide a reliable, well-supported product, then there is a price you have to pay as an organization. To-date, we’ve been willing to pay that price, and I very much hope for that to remain true even as we continue to grow.

5. Empowerment: we should empower those who need to collect data to use our technology directly, with as few intermediaries as possible.

Back when digital data collection technologies required technical experts to “program” digital forms, there was a lot of friction and expense that stood between the researchers and M&E professionals who knew what they wanted and the technology that could meet their needs.

Those who needed digital forms would describe or document their needs, experts would wield the technology on their behalf, and ultimately that technology would be deployed in the field. If problems or potential improvements were discovered in the field, often it was too hard to fix those problems or implement those improvements: there was just too much distance, friction, and expense between those using the technology and those who served as the technical gatekeepers.

So one core goal of SurveyCTO was to make the technology more directly accessible to those who actually need it – those who will actually use it – so that they can be more empowered to design, manage, and revise digital forms as they see fit. We wanted to eliminate layers of intermediaries so that the researchers and M&E teams would be able to wield the technology themselves. In terms of implementation:

  • In SurveyCTO v1.0, we introduced a hosted version of ODK that (a) could be automatically launched as-needed by anybody filling out a simple sign-up form, and (b) had an entirely new user interface that simplified common tasks. Everything from learning about the platform to creating new forms to using cold room computers to creating encryption keys to creating field-validation expressions to using Microsoft Word’s mail-merge features for reviewing data became easier.
  • In SurveyCTO v1.30, we introduced “automated quality checks” so that those without a high level of statistical training could still benefit from statistical checks that are important for monitoring data quality.
  • In SurveyCTO v2.0, we introduced an entirely new user interface, in part because we’d added so much to the original product that the interface was becoming too complex; we needed a new design in order to keep adding new features and flexibility without making the product too hard to use. We also added a web interface for previewing forms, which was an important step in making it easier to develop and test new forms.
  • In SurveyCTO v2.10, we added the online, drag-and-drop form designer, in order to make SurveyCTO accessible to a broader range of new users. Even for those users who will ultimately prefer doing a lot of their work in Excel or Google Sheets, we wanted to provide an easier, more-structured way to get started in form design.
  • In SurveyCTO v2.20, we added the Data Explorer for being able to monitor and explore incoming data in a flexible and powerful way, without the need for expertise in outside visualization or analysis tools.
  • In SurveyCTO v2.30, we added a new “enterprise” feature-set to make managing multiple projects and teams easier.
  • In SurveyCTO v2.40, we added the new review and correction workflow, in order to further simplify the process of not only detecting data-quality issues – but also correcting them.
  • In SurveyCTO v2.51, we expanded the form designer to include a powerful new test view, which combined our original web preview functionality with powerful “inspection” tools that help users catch problems before they reach the field.

We’re proud of the work we’ve done in empowering users, in taking ODK’s core capabilities and extending them to be accessible by a broader and broader range of users. But, of course, our job here is never done: there is so much more we can do.

So, in the coming months and years, we’re excited about bringing in additional UI/UX design talent and further improving our product’s interface and accessibility. We’re going to keep making it more powerful and flexible all the time, and, ideally, we’ll keep making it easier to use at the same time.

Please feel free to comment below – even if it’s to push back!

By Christopher Robert and first publish as Guiding principles for SurveyCTO

Filed Under: Thought Leadership
More About: , , , , ,

Written by
This Guest Post is an ICTworks community knowledge-sharing effort. We actively solicit original content and search for and re-publish quality ICT-related posts we find online. Please suggest a post (even your own) to add to our collective insight.
Stay Current with ICTworksGet Regular Updates via Email

6 Comments to “Our 5 Guiding Principles for SurveyCTO. What Are Yours? – Your Weekend Long Reads”

  1. Would love to hear about others’ guiding principles. Please do comment!

  2. Neil Penman says:

    Excellent article Chris. It is impressive how you have applied these principles to build such a coherent high quality tool. And I remember receiving, many years ago, your prompt support when I asked a question about a SurveyCTO enhancement that had been applied to ODKCollect.

    The first principle Smap uses is “1. make it easy to collect and use data”. So we do offer that cloud based map. As you point out this can conflict with your principle number 1 on security.

    However a second principle is “2. never to lose customer data”. If the customer data has been encrypted with their key and that key is then lost then presumably the data would also be lost? Do you find this to be an issue?

    Another potential issue we see with end to end encryption is that it pushes the clear text view of data to the end points. Which may be the least secure part of the entire system.

    Those quality assurance features SurveyCTO has sound fantastic. One problem we face from time to time is where the customer expectation of what is on the system differs from what is actually there. A classic case was when a baseline survey had to be suspended for 6 months and no one looked at the data until almost a year after thousands of interviews had been completed. Advanced QA tooling doesn’t really help, if an enumerator did not complete all of their allotted interviews.

    To address this we are trying to encourage data to be used almost immediately, real time monitoring with real time decision support. We think it is a better way to use digital tools rather than just using them as a more efficient alternatives to paper.

    • Thanks, Neil! Focusing on ease-of-use and avoiding-data-loss makes a ton of sense, as does the push for more-immediate use of collected data. I’ve also witnessed the many ways in which those long lags between collection and use prove costly.

      The point you raise about potentially losing data because you lose the keys is a really good one. It’s frankly a terrifying risk, and I think it’s one that holds many users back from enabling form-level encryption. I only know of one case, however, where somebody did lose a key, and where it looked like a lot of data was lost – but, luckily, after a few days of extreme stress, the key was found, having simply been named incorrectly and saved into the wrong directory. One thing we’re looking into is integration with safe and secure key-management solutions, so that we can help users manage their private keys with less fear that they’ll lose them.

      In terms of the end points being the least secure, I definitely don’t disagree that end points can be major points of vulnerability. But, speaking from personal experience, I have never lost control of my credit card info, but Target has; and I have never lost control of my passwords, but LinkedIn and several others have. Cloud systems (and passwords in general) can be really vulnerable, despite the best efforts of those running the services. So I still think it’s prudent to doubly protect sensitive data from potential exposure between the end points.

      Thanks again for your reply. One great thing about a competitive ecosystem of technology providers is that we can all choose to focus in different ways, and then users can choose the provider that most meets their particular needs and preferences. Definitely our particular mix of principles have had up-sides – but then plenty of down-sides as well!

  3. Michael Downey says:

    In the past several years, companies that leverage free & open source software to drive their core business have develeped what’s now a standard guiding principle: giving back to the open source communities that make their success possible. These have taken many forms from code contributions to financial donations to supporting events and contributors to those platforms.

    Would be interesting to hear from the author about how SurveyCTO is contributing back to the OpenDataKit community to help maintain the public good that helps to enable their success.

  4. Great question!

    I personally have a view of open source as not only offering to reduce the up-front development costs of a platform like SurveyCTO (to the benefit of users, when the lower cost is passed on to them), but also offering a mechanism for constructive collaboration that can continue lowering the costs of ongoing innovation and maintenance (also to the benefit of users). So it is an ongoing disappointment to me that, today, we neither contribute to nor benefit from the ODK community in the ways we did during SurveyCTO’s early years.

    If you’re curious, two main factors seemed to undermine our ability to collaborate effectively with the broader community. The first was that we were simply pulled in different directions: concern for our growing user base pushed us to emphasize factors like data encryption, performance for long, complex forms, and consistency across different platforms (primarily Android and web), whereas the bulk of the community seemed drawn to other concerns; that made it really hard to agree on priorities or designs for new features. And then, when the ODK project transitioned out of the UW CS department a few years back, there was a fork in the road, and it seemed that the community was inclined to look at the ecosystem of user-fee-supported tools like SurveyCTO as a problem to solve rather than a strength to leverage; during a three-day workshop in Seattle, I failed to make the case for a thriving ecosystem that nurtured and built upon a shared foundation, and reluctantly concluded that we needed to basically go our separate ways.

    So I obviously haven’t made open-source contribution a key guiding principle here at Dobility. If I had, I honestly think the company might have gone under by now – rather than continuing to grow month-by-month, year-by-year. Our fanatic user-orientation has definitely helped us, as has our ability to move quickly and decisively. But it’s a disappointment to me as well, and I’m always looking for new ways to collaborate, particularly as we grow. I was just on a call today, in fact, regarding a potential open-source collaboration. Perhaps some day it will become a guiding principle..!

  5. Neil Penman says:

    Is ICT4D getting the benefits from open source that it should?

    This concept of a “thriving ecosystem that nurtured and built upon a shared foundation” put me in mind of the bazaar in Eric Raymond’s book “The cathedral and the bazaar”. In ICT4D we seem to be creating a few open source cathedrals built with top down control from large donors.

    Ok within each good open source project it can work as it should and you do get bottom up development to create high quality tools. But at the whole of industry level, where the benefits of innovation could be an order of magnitude greater, it does not seem to be working well.

    Maybe open source without the “thriving ecosystem” has become part of the problem. That is when you have a large donor saying: here is a tool, its open source so no lock in, everyone has to use it so we can get economies of scale!