Open Social Work

ASWB Exam Pass Rate was 55-59% in 2024

I spent part of this weekend following up on an inquiry from a colleague about the repeat and first-time pass rate in their state (Maryland). Once I read their report, I realized I could update my calculations on the number of repeat examinations, and more than that, I could calculate the pass rate for 1st-time, repeat, and all test-takers.

Fifty-one PDFs later, I created this spreadsheet of ASWB Exams 2024 Pass Rates by State. The results knocked me backward because I did not realize how deep the problem had gotten.

In 2024, the 51 boards of ASWB rejected…44.6% of all LBSW applications, 43.06% of all LMSW applications, and 40.54% of all LCSW applications because of examination failure. This is the rate of occupational closure by examination in the social work profession.

That is because the national pass rate across all states and test-takers is…55.42% for LBSW exams, 56.93% for LMSW exams, and 59.06% for LCSW exams. I calculated this by adding all of the values in their state reports, since apparently such public reporting escapes ASWB’s responsibility.

The LBSW examination pass rate is unacceptably low, just 64.2% for first-time examinees. I am unable to reconcile the difference between my calculations at ASWB’s reporting of 67.2% pass rate. 29.1% of all LBSW examinations are for repeat examinees, and their re-examination pass rate is 34%, the highest of any ASWB exam.

The LMSW examination pass rate is substantially higher, 73%, identical to ASWB’s reporting. 37.2% of all LMSW examinations are for repeat examinees, and only 29.28% pass the re-examination, the lowest of any ASWB exam.

The LCSW pass rate is the highest of all ASWB exams, 75.05%, nearly identical to ASWB’s reporting. 36.4% of all LCSW exams are for repeat examinees, and only 31.19% pass the re-examination.

One-in-three state board applications is for a repeat test-taker, educated and trained but excluded from licensed practice. There is significant state variation in the degree to which examinations have bottlenecked their workforce. Here are a few states of note, while I work on visualizing states a whole.

Check out Florida’s massive bottleneck. Its board apparently believes exams are so precise that they can reject 62.6% of applications for clinical licensure in 2024.

Here is Michigan. At least they could credibly claim that first-time examinees are getting through 70% of the time, I guess? Still, rejecting 55.1% of your 2024 applicants would require a level of confidence in the validity of the examinations that beggars belief.

Maryland, on the other hand, has a bigger bottleneck at the Bachelors and Masters level, than the Clinical level–with the same pattern holding at all three levels.

Again, just a ridiculous amount of confidence to reject half of all applications.

I need to do additional visualization on which states are most impacted by the exam bottleneck, but the national data paint a clear picture. ASWB exams are keeping states from fully licensing qualified, educated, and well-supervised applicants for licensure.

Here are the states rejecting the highest percentage of LCSW applications by examination. Nineteen states reject over 40% of applicants, and seven states reject over 50% of applicants by examination.

State	% of LCSW applicants excluded by examination
Mississippi	65.80%
Florida	62.60%
Maryland	61.40%
Delaware	59.50%
Michigan	55.10%
Louisiana	50.80%
Hawaii	50%
North Carolina	49.50%
Georgia	49.50%
Alabama	49.20%
Virginia	47.40%
California	47.10%
Arkansas	46.10%
Connecticut	44.50%
Oklahoma	43.50%
New Mexico	42.40%
Illinois	42.10%
Nebraska	42%
District of Columbia	41.50%

As with the LCSW examinations, seven states reject over 50% of applicants with Delaware, Louisiana, Maryland, and Mississippi appearing in both lists. Whereas 19 states reject over 40% of LCSW applicants, only 16 states reject over 40% of applicants due to examination failure.

State	% of LMSW applicants excluded by examination
Delaware	58.80%
Mississippi	58.20%
Alabama	57.90%
Louisiana	57.90%
Maryland	51.50%
Arkansas	50.70%
New York	50.20%
Georgia	49.80%
West Virginia	49.30%
New Mexico	48.70%
South Carolina	48.70%
Texas	46.90%
New Jersey	45.60%
Connecticut	41.90%
Kentucky	41.20%
Nevada	40.50%

LBSW examinations perform the poorest, with some states excluding all applicants in 2024 and nearly half (23) rejecting at least 40% of applicants by examination. Both Maryland and Delaware reject over 50% of applicants, but Louisiana and Mississippi do not license at the LBSW level. Instead, Maryland and Delaware are joined by seventeen other states who reject more than half of LBSW applicants by examination.

State	% of LBSW applicants excluded by examination
Hawaii	100%
North Carolina	100%
New Mexico	67.30%
South Carolina	66.70%
Delaware	66.60%
Texas	64.10%
Oklahoma	64%
Kentucky	61.70%
Alabama	60%
West Virginia	59.60%
Virginia	58.30%
Maryland	52.40%
Pennsylvania	52.40%
South Dakota	52.40%
Arkansas	50.70%
District of Columbia	50.00%
Oregon	50%
Nevada	48.60%
Iowa	46.70%
Massachusetts	46.10%
Arizona	44.40%
Michigan	42.70%
Alaska	40%

Frank Williams, Ph.D., PSI Services Director & Liar for Hire

Dr. Frank Williams lies for money. His specialization is lying about psychometrics. He trades on his credentials and reputation to lie for the contractors who pay him.

Dr. Williams testified against examination reform in Maryland on behalf of his contractor, the Association of Social Work Boards. What was galling about his testimony was he lied about the testing standards publishers like ASWB are supposed to adhere to.

Perhaps he thought no one would bother to read the standards he misstated? He is wrong!

The AERA, APA, and NCME (2014) Standards are cited by ASWB as their principal source of psychometrics in the examination guidebook. Additionally, the Standards themselves specify:

All professional test developers, sponsors, publishers, and users should make reasonable efforts to satisfy and follow the Standards and should encourage others to do so. All applicable standards should be met by all tests and in all test uses unless a sound professional reason is available to show why a standard is not relevant or technically feasible in a particular case (p. 1)

Dr. Williams did not encourage others to satisfy and follow the standards in his Maryland testimony (start recording at 4:05:07). Instead, he lied and said the standards I cited were actually only relevant for educational tests.

“What the gentleman said about the conditional standard educational [sic] measurement part…I deal with a lot of other clients too through the accreditation process…the standards he raised are more for educational testing.” [4:06:20]

Like much of Dr. Williams’ testimony, it is easy to refute by simply reading the applicable testing standards. They are, very obviously, for all psychometric tests including educational tests, psychological tests, and licensure examinations–with specific standards and interpretations for credentialing and licensure examinations. Dr. Williams is either profoundly ignorant of his own area of expertise, which seems unlikely, or a paid liar.

In my testimony, I highlighted one example of how ASWB violates testing standards by not measuring the conditional standard error of measurement at the cut score. I cited these standards:

Standard 2.14: Where cut scores are specified for selection or classification, the standard errors of measurement should be reported in the vicinity of each cut score.

Standard 2.15 When there is credible evidence for expecting that conditional standard errors of measurement or test information functions will differ substantially for various subgroups, investigation of the extent and impact of such differences should be undertaken and reported as soon as is feasible… If differences are found, they should be clearly indicated in the appropriate documentation. In addition, if substantial differences do exist, the test content and scoring models should be examined to see if there are legally acceptable alternatives that do not result in such differences.

Standard 2.16: When a test score or composite score is used to make classification decisions (e.g., pass/fail, achievement levels), the standard error of measurement at or near the cut scores has important implications for the trustworthiness of these decisions.

Standard 3.6: Where credible evidence indicates that test scores may differ in meaning for relevant subgroups in the intended examinee population, test developers and/or users are responsible for examining the evidence for validity of score interpretations for intended uses for individuals from those subgroups…Subgroup mean differences do not in and of themselves indicate lack of fairness, but such differences should trigger follow-up studies, where feasible, to identify the potential causes of such differences…When sample sizes are sufficient, studies of score precision and accuracy for relevant subgroups also should be conducted.

In addition to dismissing these testing standards as irrelevant because they applied mostly to educational tests, Dr. Williams talked about “good enough” reliability:

“When it comes to the reliability of most credential and licensure exams, what’s deemed good enough is just the overall reliability and reliability around the cut score…When I’m assisting clients go through their accreditation process, the accreditors, which a lot of times is the NCCA, what they’re looking for is a reliability, and from a statistical measure, we’re looking for something greater than .80 which is deemed as acceptable. However, when you look at ASWB’s analyses, there’s actually approaches 0.9. So, although they don’t have, they are not showing the conditional standard errors of measurement, they actually are showing that their exam is reliable enough–what we would normally deem as good enough for an accreditation exam”[4:06:40]

Putting aside the nursing and teacher licensing examinations that report the conditional standard error of measurement, Dr. Williams’ lie is not a matter of interpretation. The testing standards clearly state that one type of reliability is not substitutable for another.

Standard 2.6: A reliability or generalizability coefficient (or standard error) that addresses one kind of variability should not be interpreted as interchangeable with indices that address other kinds of variability, unless their definitions of measurement error can be considered equivalent…Error variances derived via item response theory are generally not equivalent to error variances estimated via other approaches.

Item response theory provides additional insight because it treats reliability/precision as conditional on the test-takers overall ability; whereas, Cronbach’s alpha assumes reliability is consistent across ability levels and demographic groups.

You know who agrees with me? ASWB’s previous psychometricians–who, by the way, happened to be social workers:

For licensing examinations like ASWB’s (where all candidates who pass are considered competent and candidates who fail are considered incompetent), the decision consistency in pass/fail decisions is considered a more appropriate form of reliability than the traditional classical concept of reliability, the Kuder-Richardson Formula 20…The ASWB examinations have shown high reliability estimates, in the nineties, both by the preferred advanced IRT model (decision consistency in pass/fail decisions) and the less relevant classical standards (KR-20, test reliability measure as shown by its internal consistency) (Marson et al., 2011, p. 89).

That’s right! We used more advanced reliability and precision tools in the 1990s than we do now. Perhaps it is because of the deficient consulting of PSI and Dr. Williams that ASWB decided to change its reliability approach in 2014 to the test deemed less relevant and appropriate by ASWB in the 1990s!

Unfortunately, there is no penalty for lying in testimony. Someone could contact the Association of Test Publishers and tell them their Certification and Licensure Division chair is out there lying about psychometric standards, but I get the impression that the psychometrics industry does a lot of lying for money.

So, if you want to know why licensure exams are so broken, it’s because of consultants like Dr. Williams who knowingly discard required psychometric tools to make licensure exams. Psychometricians went through painstaking effort to specify exactly what developers and boards need to measure (and why). It is social justice work that is extremely important to enable evidence-based practice.

Test-makers simply ignore tests unaligned with their financial interests. If ASWB actually performed the tests required, they might find a biased cut score. This result would immediately remove the legal defensibility of licensure decisions using the exam, and thus the ability of states to license social workers using the ASWB exam.

These problems persist because there is no inherent check in the system to ensure compliance with psychometric best practices. Institutional inertia keeps the licensure system moving.

It is a myth that all standardized measures are biased. The 2014 standards specifically sought to properly unite fairness, validity, and reliability using newer psychometric tools that ASWB previously applied themselves. These psychometricians and many social workers spend lifetimes working on tools to more accurately measure people for safe, culturally-responsive, and evidence-based practice.

That work was spit on yesterday.

ASWB Exam Policy Changes: A Timeline of Sneak-Revisions to the Exam Guidebook

Timeline of an ASWB Exam Guidebook cover-up.

Summary

Test-takers preparing for Association of Social Work Boards (ASWB) examinations in 2025 have been presented with conflicting information about how much time they are allotted for each section of the examination, the number of breaks provided, whether they can access all of half of their answers to review at the end of their exam, among other factors.

Contrary to best practices and testing standards, ASWB modified their Exam Guidebook without adequately notifying stakeholders–test-takers, boards, and the public. By using the Internet Archive’s Wayback Machine to view earlier versions of ASWB’s Exam Guidebook and website, one can uncover a clear timeline for exam policy changes. It demonstrates no examinee in April or May was adequately notified or provided appropriate documentation to prepare for the exam.

On or about April 14, 2025 ASWB updated the language in the Exam Guidebook to match the policy they announced in a April 10, 2025 blog post, and according to social media reports from test-takers, implemented on or around March 30, 2025.

No test-taker has been given enough time to prepare for the new exam format. Furthermore, ASWB has not provided social work boards with any psychometric evidence that the new two-section format of the examination is psychometrically equivalent to one in which you could return to any item at any time. Many boards appeared as blindsided by the policy changes as test-takers were.

Fundamentally, this is now how examination programs are supposed to operate. It is obviously unethical (and psychometrically invalid, unfair, and unreliable) to change the rules of an examination without adequate notification. Social workers spend thousands of dollars and months of their time to prepare for the examination.

At minimum, the ASWB examination program requires a two-month moratorium period during which psychometricians can ensure the exam formats are equivalent and test-takers can have adequate time to study using the new ruleset. Anything less continues with the cowboy calculus and regulatory recklessness that pervades social work licensure examinations.

ASWB produced two Exam Guidebooks in 2025

The PSI guidebook is valid until March 16, 2025. Then, ASWB exams took a two week break. The PearsonVUE guidebook is valid after March 30, 2025.

Visible in this March 11, 2025 screenshot from the Internet Archive:

https://web.archive.org/web/20250311234238/https://www.aswb.org/exam/getting-ready-for-the-exam/aswb-examination-guidebook/

In this February 12, 2025 screenshot from the Internet Archive, there is one set of policies:

https://web.archive.org/web/20250212232242/https://www.aswb.org/exam/getting-ready-for-the-exam/aswb-examination-guidebook/

Exam Policy Rule Set # 1
(aswb.org/…/2023/10/ASWB-Exam-Guidebook.pdf)

Valid through October 2024-March 16, 2025
“You may skip questions and go back to them later, flag questions for review, highlight text, and go back and change answers”
“You may take breaks of up to 10 minutes during the four-hour exam at your discretion. Testing time does not stop for breaks.”

Archived January 21, 2025 by web.archive.org.

https://web.archive.org/web/20250121142057/https://www.aswb.org/wp-content/uploads/2023/10/ASWB-Exam-Guidebook.pdf

This version of the handbook accurately described the exam as occurring in a single, uninterrupted period. Examinees could take informal breaks. However, the clock would continue to run, and the full length of the exam would remain accessible for review.

This file was created on November 21, 2024. Accessing the same file in 2025 from the ASWB website, it looks very different!

Sneak-Revision: ASWB revised & reuploaded (2023/10/ASWB-Exam-Guidebook.pdf) in April 2025

This is how the Exam Guidebook for this period currently appears on ASWB’s website.

Source: https://www.aswb.org/wp-content/uploads/2023/10/ASWB-Exam-Guidebook.pdf

Using the PDF metadata, it is clear that ASWB included language about exam halves and breaks after the policies went into effect.

I believe ASWB used the same file names when retroactively documenting these changes to obscure their potential impact to test-takers, boards, and other stakeholders.

The Internet Archive has not crawled the ASWB Exam Guidebook from January 21, 2025-May 16, 2025. I am unable to verify when the version of the Exam Guidebook created on April 14, 2025 was shared with the public prior to when I downloaded it on May 16, 2025.

“I could have sworn the ASWB Exam Guidebook used to say something else!”
– April & May test-takers

Correct.

On April 14, 2025, ASWB retroactively changed their deprecated test documentation (valid October 2024-March 16, 2025) to align with the policy changes announced on April 10, 2025.

Watch Dana Krobin, LMSW’s testimony at the April 30th Meeting of the Social Work Licensure Workgroup in Maryland.

https://youtu.be/bwDh6TPdmEs?si=sjQ-AopX6uk4rtC-&t=7046

Exam Policy Rule Set # 2
(aswb.org/…/2025/02/2025-ASWB-Examination-Guidebook-Pearson-VUE.pdf)

Presented to test-takers for Pearson VUE between February 28, 2025 and April 14, 2025.
“You may skip questions and go back to them later, flag questions for review, highlight text, and go back and change answers”
“There are two types of breaks that you may take during your exam, a scheduled 10-minute break and unscheduled breaks.”
“You will be given the entire exam time at the beginning of the test.”
“There are no individually timed sections, so manage your time accordingly.”

Here is how ASWB’s Exam Guidebook appears in the April 6, 2025 snapshot in the Internet Archive:

https://web.archive.org/web/20250406162401/https://www.aswb.org/wp-content/uploads/2025/02/2025-ASWB-Examination-Guidebook-Pearson-VUE.pdf

Sneak-Revision: ASWB revised & reuploaded (2025/02/2025-ASWB-Examination-Guidebook-Pearson-VUE.pdf) in April 2025

The Exam Guidebook PDF crawled by the Internet Archive on April 6, 2025 was created on February 26, 2025 and last edited on April 2, 2025.

The same file, downloaded from ASWB’s website on May 15, 2025, was created on April 14, 2025.

This is the same day that ASWB’s deprecated documentation was changed to align with the April 10, 2025 announcement of exam administration policy changes.

April 14, 2025 was a busy day for ASWB. They created and modified two new exam guidebooks, which are now available on their website. No revision date is visible on either document.

Exam Policy Rule Set # 3
(aswb.org/…/2025/02/2025-ASWB-Examination-Guidebook-Pearson-VUE.pdf)

Presented to test-takers for Pearson VUE between April 14, 2025 and today
“The exam is divided into two 85-question sections, each of which has a two-hour time limit.”
“After you have completed the first section, you will not be able to return to it.”

Click to access 2025-ASWB-Examination-Guidebook-Pearson-VUE.pdf

Why it matters

Examinees in any high-stakes exam are entitled to current and accurate information about the examination process. Furthermore, when substantive changes are made to that process, the exam developer is required to first establish measurement equivalency between the old and new processes. For April & May 2025 examinees, ASWB met neither of those obligations. These obligations are set by the AERA Standards for Educational and Psychological Testing. Test developers must abide by those standards as a foundational element of the legal defensibility of a high-stakes exam.

While the addition of a scheduled break is likely a welcomed change for many examinees, it is unclear how it may impact exam results. To the degree examinees are refreshed and able to refocus after a break, performance may improve. At the same time, for those examinees who struggle with time management or test anxiety, having a clock running down toward zero twice in their exam process instead of once may negatively impact performance.

In addition, the forced-break process is actually likely to shorten the total exam time taken by examinees. Any remaining time at the completion of the first section is not carried over to the second half of the test. Remaining time from either half that the examinee might otherwise have used to review (and, if necessary, change) responses to items in the other half of the exam cannot be used for that purpose.

Because ASWB apparently failed to meaningfully investigate the impacts of this structural change prior to implementing it, the effects on performance remain unknown. If boards had modified exam timing without ASWB’s approval, they would need to demonstrate their policies did not substantively impact test scores; yet, boards allow ASWB to make these changes without demonstrating measurement equivalence.

Ultimately, examinees cannot appropriately prepare themselves for an exam where the structure is unclear or is meaningfully different from what they were told it would be. Examinees who believed that they could return to all questions at the end of the exam, and followed ASWB’s explicit guidance to prepare themselves accordingly, may have had their results negatively impacted by novel time and break policy.

ASWB and its member boards have a duty to examinees to address the harms done by these shifting descriptions of the exam process, and to ensure a more professional process moving forward, in accordance with the AERA Standards.

Standard 4.4

If test developers prepare different versions of a test with some change to the test specifications, they should document the content and psychometric specifications of each version. The documentation should describe the impact of differences among versions on the validity of score interpretations for intended uses and on the precision and comparability of scores.

Comment: Test developers may have a number of reasons for creating different versions of a test, such as allowing different amounts of time for test administration by reducing or increasing the number of items on the original test, or allowing administration to different populations by translating test questions into different languages.

Test developers should document the extent to which the specifications differ from those of the original test, provide a rationale for the different versions, and describe the implications of such differences for interpreting the scores derived from the different versions.

Test developers and users should monitor and document any psychometric differences among versions of the test based on evidence collected during development and implementation. Evidence of differences may involve judgments when the number of examinees receiving a particular version is small (e.g., a braille version). Note that these requirements are in addition to the normal requirements for demonstrating the equivalency of scores from different forms of the same test.

Standard 6.5

Test takers should be provided appropriate instructions, practice, and other support necessary to reduce construct-irrelevant variance.

Comment: Instructions to test takers should clearly indicate how to make responses, except when doing so would obstruct measurement of the intended construct (e.g., when an individual’s spontaneous approach to the test-taking situation is being assessed). Instructions should also be given in the use of any equipment or software likely to be unfamiliar to test takers, unless accommodating to unfamiliar tools is part of what is being assessed. The functions or interfaces of computer-administered tests may be unfamiliar to some test takers, who may need to be shown how to log on, navigate, or access tools. Practice opportunities should be given when equipment is involved, unless use of the equipment is being assessed. Some test takers may need practice responding with particular means required by the test, such as filling in a multiple-choice “bubble” or interacting with a multimedia simulation…In addition, test takers should be clearly informed on how their rate of work may affect scores, and how certain responses, such as not responding, guessing, or responding incorrectly, will be treated in scoring, unless such directions would undermine the construct being assessed.

Standard 7.8

Test documentation should include detailed instructions on how a test is to be administered and scored.

Comment: Regardless of whether a test is to be administered in paper-and-pencil format, computer format, or orally, or whether the test is performance based, instructions for administration should be included in the test documentation. As appropriate, these instructions should include all factors related to test administration, including qualifications, competencies, and training of test administrators; equipment needed; protocols for test administrators; timing instructions; and procedures for implementation of test accommodations. When available, test documentation should also include estimates of the time required to administer the test to clinical, disabled, or other special populations for whom the test is intended to be used, based on data obtained from these groups during the norming of the test.

Two New #StopASWB papers

With gratitude to my collaborator on these manuscripts, Dr. Mary Nienow, these mansucripts are under peer review at Advances in Social Work‘s special issue on social work exam equity. They culminate a years-long open data project on social work licensing exams.

DeCarlo, M. P., & Nienow, M. (2024, October 15). Uniquely Biased: How ASWB Exams Violate Psychometric Best Practices. https://doi.org/10.31219/osf.io/2bxdt

After publication of the 2022 Pass Rate Analysis demonstrating minoritized social workers pass at less than half the rate of White social workers, the Association of Social Work Board (ASWB) Examination Guidebook (ASWB, 2023a), revised its psychometric reporting of exam fairness from “statistically free of race and gender bias” to “differences in exam performance for…different demographic groups…is influenced by many factors external to the exam,” upstream of the examination in the workforce pipeline (p. 9). Focusing only on factors external to the exam ignores the possibility that the internal properties of the exam may be invalid, unreliable, and unfair. Race, class, culture, and other structural factors do not impact ASWB exams the same over time, with ASWB’s 2022 Exam Pass Rate Analysis reporting 10-13% reductions in bachelors and masters examination pass rates after the introduction of the 2018 exam blueprint. Using extensive references to ASWB’s public statements, this article will demonstrate how ASWB ignored evidence of examination flaws and presents external factors as the only possible explanation for disparities in pass rates. Beginning with the policy paradox created by national organizational disagreement on the cause and next steps on exam score inequities, this article will review the extant empirical evidence demonstrating how bias is encoded to the language and theories underlying the examination as well as psychometric shortcomings in ASWB’s exam validation process converge to create a uniquely biased exam.

DeCarlo, M. P., & Nienow, M. (2024, October 15). In Pursuit of the Status Quo: ASWB’s Research, Grantmaking, and Regulatory Practices. https://doi.org/10.31219/osf.io/vbh7q

The Association of Social Work Boards (ASWB) Examination Program is used in all 50 states to regulate the practice of clinical social work and in the majority of states to regulate the practice of masters and bachelors-level social work. After releasing descriptive data demonstrating biases in exam pass rates by race, age, and dominant language, ASWB engaged in research and regulatory actions that violate social work ethics and psychometric best practices. This article will critique the research, grantmaking, and regulatory practices that support the ASWB Examination Program using extensive citations to psychometric standards, ASWB’s publications and exam documentation, and the researchers’ experiences engaging with ASWB to study the cause of exam score disparities. The analysis will reveal how, after their 2022 release of data demonstrating exam bias with respect to race, age, and language, ASWB funded researchers already affiliated with ASWB to support what it already tells test-takers in its exam guidebook–only structural factors bias exam scores, not psychometric flaws internal to the examination. Moreover, ASWB implemented solutions to exam bias without proper investigation and psychometric support. Because of ASWB’s position as the sole publisher and purchaser of licensing examinations, individual state boards are unable to make incremental changes to prevent biased ASWB examinations from closing the profession of social work to groups for whom the exam is invalid, unreliable, and unfair.

ASWB: Grandparenting In Social Work Licensure Harms the Public

To be fair to Dr. Kim, who produced the well-researched keynote, that is not how she frames the issue! I think that her interpretation of the scientific evidence is less plausible than the one I outline below. I will walk you through why Dr. Kim’s analysis proves ASWB’s main talking point–that exams protect the public–is based on fairy dust.

Dr. Kim was the recipient of ASWB’s previous round of grant funding for regulatory research. Her research produced the most comprehensive scientific survey of social work licensure to date, and the scholarship that ASWB produced was objective and fair. It must be emphasized, though, that the interpretation of those facts supports ASWB’s existing talking points and policy positions.

Dr. Kim’s literature review was extensive–spanning over 400 articles related to social work and occupational licensure (ASWB, 2022). It should be noted that from her early literature review, ASWB established a “bleak” reality:

“While some conceptual discussions on social work regulations were documented in the literature, there is very little documentation and research on specific regulatory practices in social work.” In other words, the reality was much bleaker than ASWB expected.
(ASWB, 2022 para. 2)

*alarm bells ringing*

Not great when the regulatory body describes the evidence for its regulations as…bleak. Maybe maintaining this open secret isn’t great for the profession, and we should instead fight to establish evidence-based regulations. Or maybe at least ensure that the current regulations don’t further stratify the profession by class, race, age, disability, and culture?

As part of their final products delivered to ASWB under the grant, Dr. Kim produced a comprehensive literature summary and recommendations for future research. I encourage you to watch her keynote speech (via Vimeo) or read the text of her keynote in its entirety. You should judge for yourself how well the evidence supports arguments of public protection. Don’t take as gospel my argument that regulations instead protect the social and economic capital of established practitioners and bosses.

So, how does ASWB interpret the “bleak” lack of research supporting its regulatory practices? If you guessed clinging onto any supporting citation like it’s the last handle on a life raft, you’d be right!

ASWB and Dr. Kim conclude that examinations protect the public based on a dissertation (Kinderknecht, 1995) which analyzed 252 social work ethics complaints in Kansas from 1980-1995. That’s it!

That’s…a lot of explanatory power attributed to one study. Let’s dive in into Kim’s summary!

The author examined ethical complaints filed against 252 social workers in the State of Kansas Behavioral Sciences Regulatory Board since the inception of 1980 with 15 years of data to identify social workers’ characteristics related to substantiated complaints. This study is critical because the sample included 55 grandfathered social workers who obtained licensure without taking a national standardized licensure exam. The analyses found that licensees who had taken the licensure exam were twice as likely to have unsubstantiated complaints than those who became licensed without the exam. The author concluded that whether or not licensed social workers were examined for licensure is related to the complaint outcomes. Kinderknecht’s finding may provide the only available evidence supporting the use of a standardized licensure exam as part of licensure requirements. It is the only empirical study beyond simply describing the type and prevalence of complaints and violations, examining factors related to the substantiation of ethical violations. (emphasis added)
Kim, 2023, p. 11-12

We know licensing exams protect the public based solely on this dissertation, according to ASWB and Dr. Kim. Indeed, ASWB paid a lot of money for a very good researcher to find any scientific evidence that supports licensing exams protecting the public–the core argument ASWB makes–and they found this one thing in a bleak and almost empty evidence-base. I had previously looked into the evidence supporting licensure and regulation of social work practice and didn’t find much. I am heartened to hear that my literature search skills are not as bad as I thought! Here, a grant-funded, accomplished, and tenured researcher cannot find but one dissertation.

As you can tell from the highlights I made, this study more accurately states that grandparenting social workers who are unqualified or unexamined into new regulations endangers clients. Yet, ASWB’s advocacy agenda explicitly supports grandparenting (a.k.a. grandfathering), exceptions to new regulations for existing providers. ASWB publicly supported grandparenting unexamined social workers into the licensure compact, but forces all new social workers (even if their states don’t require it!) to sit for an ASWB examination to receive an interstate license. ASWB’s Model Practice Act includes grandparenting. Yet, according to the one study that has ever examined the relationship between grandparenting and public protection that ASWB can find, this is a manifestly unethical decision.

Well, either that…or by mystical convenience… the explanatory power of this study applies to a totally separate population… only those entering the profession. Why are those entering the profession the only ones who lose in regulatory changes? Due to interest convergence, ASWB is effectively a lobbying group for those who have already achieved full licensure. It is in the narrow self-interest of protected practitioners for ASWB’s examinations to continue to practice without examination, and for the barriers to licensure to be as high an inequitable as possible to protect their income. While it is not their primary motivation, it is difficult to interpret innocuously arguments about how unexamined social workers will diminish professional standing when they don’t account for grandparented social workers who never took the exam. It is new entrants who are going to change the current reputation of the profession, which includes whatever reputational loss comes from not holding grandparenting practitioners to ever-increasing education, supervision, and examination requirements imposed on new practitioners.

Social work has never been shy about the self-interest at the heart of occupational licensure (and title protection). The purpose is to–in the absence of unions (sigh)–raise the income of practitioners. Indeed, Dr. Kim’s (2022) final study for ASWB demonstrated that licensed social workers earn a higher income premium from licensure than unlicensed practitioners. This is why you will see ASWB allying with the Clinical Social Work Association–a professional society of LCSWs–to protect the status quo in which the LBSW, LMSW, and LCSW exams make it ever more difficult for people to enter the profession of social work.

Yet, when that same study finds that there is a $110 earning premium for all social workers because of licensure, I find that hard to reconcile with these facts:

In 2012, the Bureau of Labor Statistics found the median annual salary for all social workers to be $47,370.
In 2015, the National Association of Social Workers’ workforce study found the median annual salary for a graduate-trained social worker to be $48,000.
In 2021, the Bureau of Labor Statistics found the median annual wage for social workers to be $50,390.

I am not seeing the earnings premium from licensure here.

Finally, I just want to close out on some other very important flaws in using Kinderknecht’s dissertation to support the nearly 500,000 social work examinations given in the past 11 years. First, it should be noted that Dr. Kinderknecht framed this as a “pilot study” and explicitly envisioned future multi-state studies on ethical complaints that could provide greater explanatory power. Second, the education that social workers received in 1980 is far different from what they receive in 2023. Third, the diversity of the sample is not outlined in Dr. Kinderknecht’s dissertation, making it difficult to generalize to all social workers. Most importantly, the dissertation uses univariate and bivariate analyses, which do not control for any confounding factors (third variables) that might explain the connection between grandfathered social workers and ethics complaints. For example, these connections could be due to differences in how social workers were educated, not examined. Although Kinderknecht measured potentially confounding variables, she did not conduct any of those analyses. This is not a criticism of the rigor in Dr. Kinderknecht’s study or her communication. She is forthcoming about the limitations of her study and what a more comprehensive study might reveal about the relationship. It’s a great dissertation, and I wish we had more like it!

One thing that everyone agrees on (me, Dr. Kinderknecht, Dr. Kim, ASWB): We need a stronger evidence base for social work regulations to make informed policy decisions. I think we disagree about what the ethical path forward is from here. Our regulatory policy is fundamentally broken in a way that devastates aspiring practitioner’s ability to provide culturally responsive and evidence-based practice.

ASWB’s Unethical Research and Regulatory Practices

This was originally penned as a letter to the editor of the International Journal of Social Work Values & Ethics, Dr. Stephen Marson upon his invitation to submit in the last volume.

Thank you for your offer to submit a letter to the editor about measurement bias and predictive bias in the exams produced by the Association of Social Work Boards (ASWB). As this is the social work values and ethics journal, I am going to confine my brief comments to the ethical problems in ASWB’s most recent Request for Proposals (RFP) entitled Regulatory Research Initiative to Advance Equity.

ASWB has publicly touted this RFP as the opportunity for social workers, regulators, and the concerned public to understand what issues are driving the vast disparities in exam scores which further privilege white, English-dominant, and younger social workers. Contrary to ASWB’s assertions, I will demonstrate ASWB’s abuse of the research and regulatory process to prevent researchers from investigating psychometric flaws in the examinations.

Instead of analyzing the exam itself, they will finance—using their $40 million in net assets, $30 million endowment of stocks and bonds, 29% profit margin, $1 million in executive compensation (ProPublica, n.d.), and $10 million examination defense fund (ASWB, 2019)—empirical research that supports what ASWB already tells test-takers in their Candidate Handbook.

ASWB works to ensure the fairness of each of its exam questions but acknowledges that there may be differences in exam performance outcomes for members of different demographic groups because exam performance is influenced by many factors external to the exams [emphasis added]. ASWB has committed to contributing to the conversation around diversity, equity, and inclusion by investing in a robust analysis of examination pass rate data.
(ASWB, 2023a, p. 12)

In the RFP, there is one funding area that supports researchers investigating biased examination scores. In a glaring lack of research ethics, ASWB suggests hypotheses that exculpate ASWB and match what they already tell test-takers. ASWB intends to use the research process to manufacture their foregone conclusions.

ASWB requires research on the variables associated with the licensing exam pass rate data to determine future steps and areas in need of continued research. Research proposals might address correlating external [emphasis added] variables that may influence the disparities in the licensing exam pass rate data. Such variables could include upstream [emphasis added] factors such as differences in education programs; considerations of intersectionality, including age, gender, race, health, socioeconomic status; and social determinants of health, including life experiences from early childhood to post-graduate.
(ASWB, 2023b, p. 4)

Although ASWB promises a “robust analysis of examination pass rate data,” only one of the twelve focus areas of the RFP will fund studies related to exam bias. Within that 1/12 of the RFP, ASWB explicitly suggests researchers’ hypotheses for them. The idea that early childhood experiences are more relevant than the psychometric functioning of the exam itself—i.e., the internal properties and multivariate functioning of the exam—is research and regulatory malpractice. Further, it is unethical for ASWB to attribute exam disparities to “many factors external to the exam” without evidence and seek to manifest that empirical reality through the grant process.

Crucially, only researchers who support ASWB’s hypotheses will have access to exam bias data. The most relevant data will only tell one story—factors external to the exam drive disparities—while leaving glaring internal issues within the exam unexplored and unfunded. This is a craven and unethical abuse of the research and grant funding process. While it is possible for a test maker like ASWB to overcome conflicts of interest and fund objective research into its own examinations, ASWB values their self-interest over the public interest.

To be clear, ASWB does not directly state they will reject proposals that examine psychometric functioning outright; however, their most recent misinformation-filled blog about psychometrics makes that point clearly. Their blog post incorrectly describes the purpose and procedure for differential test functioning analysis. It cites the psychometric standards which require independent tests of item and test functioning, but then states that:

Although it is theoretically possible that DIF analyses may fail to identify some problematic items and small amounts of bias may accumulate to produce DTF, it is very unlikely that practically important DTF will result, because there is often high power to detect small magnitudes of DIF, and DIF does not typically favor one examinee group consistently
(ASWB, 2023c, para. 4)

Do ASWB exams fall into the typical case, in which DIF does not favor one group over another? Or are ASWB exams systematically biased in favor of some groups? ASWB ignores these questions and only looks for bias item-by-item (DIF), thus ignoring patterns that emerge across multiple questions.

The descriptive data seem like strong evidence that would require multivariate analysis to completely understand. ASWB cites psychometricians in their blog post but refuses to apply the multivariate methodologies in those citations to assess the fairness of their examination. They simply state that exam-level bias is uncommon and commit to reporting descriptive data, never testing a single hypothesis about test functioning.

ASWB’s lies echo the misinformation in the public statements made by the editor in chief of the International Journal of Social Work Values & Ethics. He succinctly stated the central piece of ASWB gaslighting in his emails to the BPD-L Listserv. “I ask one and only one question: Each and every single item on all of ASWB’s tests demonstrates no sex or race bias, but the test as a whole does demonstrate race bias. I want an explanation of how that is possible. That’s it.” (emphasis in original).

To answer this question, one would merely have to look at the Standards for Educational and Psychological Testing that ASWB and its former psychometrics consultant, Dr. Marson, say are used to validate ASWB examinations.

Differential test functioning (DTF) refers to differences in the functioning of tests (or sets of items) for different specially defined groups. When DTF occurs, individuals from different groups who have the same standing on the characteristic assessed by the test do not have the same expected test score. The term predictive bias may be used when evidence is found that differences exist in the patterns of associations between test scores and other variables for different groups, bringing with it concerns about bias in the inferences drawn from the use of test scores…(p.51)

When credible evidence indicates potential bias in measurement (i.e., lack of consistent construct meaning across groups, DIF, DTF) or bias in predictive relations, these potential sources of bias should be independently investigated because the presence or absence of one form of such bias may have no relationship with other forms of bias. For example, a predictor test may show no significant levels of DIF, yet show group differences in regression lines in predicting a criterion.
(American Educational Research Association, American Psychological Association, National Council on Measurement, 2014, p. 52)

To translate a bit from methodology-speak, that last sentence states that a test like ASWB’s exams may show no item-level bias (DIF) but show differences across groups in the overall exam score (DTF) that are not related to the criterion being measured (entry-level social work competence). This is the exact situation we find ourselves in!

At the beginning of their blog post, ASWB (2023c) cites the Standards for Educational and Psychological Testing which states that these two sources of potential bias (DIF & DTF) should be investigated independently. Yet in the end, ASWB disagrees with the best practices of psychometricians and decides that DIF is good enough! This is what ASWB describes as meeting or exceeding psychometric standards.

If DIF vs. DTF feels too abstract, consider this score report from a test-taker who failed ASWB exams. Which content area displays the highest degree of differential functioning? We’ll never know because ASWB only assesses for bias item-by-item. Shouldn’t we know?

ASWB’s self-interested RFP would prevent researchers from investigating bias that emerges from specific topics, content areas, or subsets of the test. It would also prevent researchers from testing ASWB’s cut scores based on real-world performance of the examination. This fails to meet the ethical standard for exam validation, and the impacts of biased exams are manifestly clear in dire workforce shortages in social workers across the country.

While it is certainly possible for a regulator like ASWB to effectively manage conflicts of interest, it appears that ASWB is institutionally incapable of addressing these problems:

ASWB purchases examinations from itself. It does not allow for competitive bidding on examination development.
ASWB does not publish their exam validation methodology or results. Its member boards never ask for details on methods or results, shielding them from disclosure under public records laws and preventing states or psychometricians from developing substantially equivalent exams or Spanish language exams, which could be used in place of ASWB exams in the new social work licensure compact.
ASWB conditions researchers’ access to exam bias data on testing ASWB’s preexisting hypotheses.
ASWB gaslights test-takers by stating as truth their untested hypotheses that only “external factors” cause biased exam scores.
ASWB restricts researchers from talking publicly about their projects (ASWB, 2023a) and requires researchers to sign confidentiality agreements that give ASWB final authority over what is published (ASWB, 2020, p. 25).
ASWB removes scored exam items due to biased functioning without notifying test-takers and boards who denied licensure because their score was one less than the (biased) cut score (analysis in DeCarlo, 2023; original information in Owens, 2021).
ASWB has not published a procedure for its bias-detection methodology since 2010 (Marson et al., 2010), and those methods were outdated at the time they were published, using correlations instead of regressions to detect biased items (AERA, APA, & NCME, 2014, p. 51-52).

ASWB will say whatever it wants and do whatever is necessary to maintain the oppressive status quo. I know this because I have seen it happen before. The last RFP they funded on regulatory research—the worthwhile studies by Dr. Joy Kim—produced results like this:

None of these factors—variations in state regulations, the field of practice, the type of employers, and social workers demographic vulnerability—helped to explain away the African American-White disparity and the odds of licensing for bachelor social workers. [emphasis in original] The odds of African American social workers holding any license for 43% lower than the odds of white social workers. For a required license the odds of African Americans were 26% lower than that of Whites
(Kim, 2022, p. 382)

No changes were made to baccalaureate licensure because of this finding. Research by the keynote speaker at ASWB’s 2023 Education Meeting in New Orleans, LA and research grant awardee does not appear to convince ASWB that BSW licensure disparities are not the result of “external” or “pipeline” factors.

Despite funding results that say otherwise, ASWB still tells BSW examinees that external factors cause less than 40% of black social workers under 30 and less than 25% of black social workers over 50 pass the LBSW exam (ASWB, 2022). The evidence does not matter. Ethics do not matter. ASWB profits matter.

My view is that the entire project of regulatory research at ASWB is a callous marketing ploy. Even if a researcher were permitted by ASWB to conduct a proper measurement equivalence study and ASWB allowed their results to be published unedited, researchers’ conclusions would not inform how ASWB thinks or acts about the functioning of its exams. Only results that support ASWB’s profits are actionable.

ASWB exams are extremely lucrative, producing $17.6 million in revenue during 2021 (ProPublica, n.d.). Despite holding a nonprofit status, ASWB has sustained industry-shattering, recession-proof profit margins for the past decade. While 2021’s profit margin of 29% is quite high, since 2011 its average profit margin was over 17% and its net assets increased over 447%. They are chokepoint capitalists—abusing their nonprofit status, enriching themselves off social workers, and further stratifying the profession by class, age, race, disability, language, and culture.

With their most recent RFP, ASWB has rigged the regulatory research game. They will provide access to exam bias data to researchers pursuing hypotheses that exculpate ASWB and ignore obvious internal, psychometric issues in the examinations. I hope social workers who seek to productively collaborate with ASWB as researchers, question writers, or volunteers keep their frothing, unrepentant self-interest at the top of their mind.

ASWB hides data from State Social Work Boards that could license thousands of excluded social workers

The debate over social work examinations created by the Association of Social Work Boards (ASWB) has been hampered by the lack of openly available methodology and data checked by neutral third parties. As a result, test-takers, researchers, employers, and social work boards are left with little information on which to gauge the quality of social work licensing exams.

When ASWB describes their methodology, they merely list DIF (Differential Item Functioning) and leave it at that. DIF is a choice to examine for bias at the item-level, but it does not provide any real information on what approach to DIF ASWB uses. Do they use a 2PL or 3PL model? How large is their sample size? Do they purify items with DIF when estimating a test-taker’s overall ability? There are many right answers to each of these questions, and providing any information on the process used by ASWB would help social workers and boards judge the strengths and limitations of their approach.

ASWB would tell you to watch their psychometrics webinar!

I was a part of that webinar call, along with my introductory research methods students. They did not appreciate the perfunctory review of validity and reliability (complete with target diagrams!); nor did they appreciate the study from 2004 in which the developers of the ACT college entrance exam actually provided data to researchers to examine differential test functioning (i.e., whether the test is biased).

ASWB also explained statistical significance does not actually mean large effect sizes in the real world…which really just reinforced everything I taught them about p-values vs. effect sizes in every statistical test, not just the ones used to validate exams…so thanks for the review?

Students did appreciate that their instructor asked unanswered questions in the chat to social work regulators like, “why can’t we repeat this exam bias study you cited with ASWB test data?”, “what is your sample size for DIF analysis,” and “what is the slope of the test information curve at the cut score, and why don’t you measure for that?” I guess we never had time for those questions.

I’ll save you the fifty minutes of video, with the one screenshot that actually lists any information about the psychometric properties of the ASWB examinations and the procedures used to measure them. That’s it. The rest of the video is psychometrics 101 and a DTF study from 2004 that ASWB refuses to replicate with their own exam data.

Screengrab by https://twitter.com/alexandriaswkr

Hey is Cronbach’s alpha (that squiggly a = .85-.91) a good estimate of exam reliability? Nope! Let’s look at a recent OER textbook chapter on Item Response Theory by Jerry Bean:

IRT approaches the concept of scale reliability differently than the traditional classical test theory approach using coefficient alpha or omega. The CTT approach assumes that reliability is based on a single value that applies to all scale scores.

ASWB does not assess the test information curve or conditional standard errors. Instead, they rely on outdated methods and assumptions.

Just ask the state boards! They judge whether the exam is good enough for their state!

You’d think state boards would get more information, but that’s not true either. #StopASWB organizers have watched every public meeting of ASWB’s Nationwide Gaslighting Tour, submitted FOIAs, and written a lot of emails. All we can get is general information like the chart below, assuring state boards that the examinations are perfectly fine.

Maryland Board of Social Work Open Meeting 1/13/2023

Once I watched enough of these presentations, I noticed a problem.

According to this diagram, differential item functioning analysis is performed on unscored items that do not impact pass/fail. Yet, you will note in this video (and in other public statements I will note later) ASWB and Dr. Hardy-Chandler claim to continue monitoring for bias (Differential Item Functioning) even after an item becomes scored.

ASWB never makes information about deleted, scored items public knowledge–let alone, tell the social work boards who rejected qualified applicants based on exam items removed by ASWB for biased functioning.

As you can hear in her testimony, ASWB tests for biased items even after they are part of the scored item pool.

Maryland Board of Social Work Open Meeting 1/13/2023 (Minute 28:00)

For those who would like to skip to the important part:

“Items only make it to the scored portion of the exam if they show good statistics, meaning they function consistently across self-reported groups and that’s monitored on an ongoing basis. And then even when an item makes it to the scored portion, the 150 of the exam, monitoring of its performance continues [emphasis added].”
Dr. Stacey Hardy-Chandler, Maryland Board of Social Work Open Meeting 1/13/2023 (Minute 28:00)

This accords with similar statements that ASWB has made on monitoring the performance of scores. In volume 20 (2020) of Social Work Today, Lavina Harless of ASWB stated that

“Monitoring of item performance doesn’t end once an item moves out of pretest status. Scored items are continually monitored to ensure that performance doesn’t slip. If a scored item demonstrates a statistically significant drop in performance, it is taken out of use and returned to the examination committee for review. Should the committee decide to edit and keep the item, it returns to pretest status.”

In the September 2021 edition of the New Social Worker, Stacey Owens of ASWB stated that

“Scored items are continually monitored for DIF. On an annual basis, less than 5% of all items released show DIF. Items flagged for DIF are removed from the bank of potential exam questions” (para. 3)

Yet, in their webinar, ASWB insists that removed items never impact scores. How is this possible?

ASWB is lying to public. That is all I can say for sure. Prior to 2023, they publicly stated that scored items are removed; yet, in their most recent statements, ASWB denied removing scored items because of biased functioning.

It’s certainly not on their diagram on how the exam is validated and tested. Indeed, you could see where they chopped that entire process off from the diagram they shared with the board. Weird…

Think about the ethical implications of that hidden data and procedure. ASWB monitors live, scored exam items and removes them from the exam pool without informing test-takers who failed because of them (or the social work boards who denied their license). When they present to social work boards, they only allude to continuous monitoring without providing any information on items that have been removed due to biased functioning.

How often are scored items removed due to biased functioning?

The most recent statistics ASWB reported were in the September 2021 edition of the New Social Worker, in which Stacey Owens of ASWB stated

“Scored items are continually monitored for DIF. On an annual basis, less than 5% of all items released show DIF. Items flagged for DIF are removed from the bank of potential exam questions” (para. 3)

So, that is about 5% of the total exam pool that gets eliminated every year due to differential item functioning. Ideally, most of those items would be pretest items that never impact test-taker scores.

However, it is likely that some of those removed items were scored items. Thus, biased items are a part of those 150 questions and determined if an examinee failed by 1 or 2 points. A test-taker’s 175-item test, pulled at random from ASWB’s test bank with 5% of all items show DIF, should also have 5% of items showing DIF. How many items is that?

That means approximately 8.75 exam items out of every 175 given to social workers on every test they take right now as a condition for licensure will be flagged for biased functioning. ASWB says “<5% of items,” so let’s round down to 8 items are flagged for differential item functioning on every exam that social work boards require for licensure.

Let’s be generous and say that, on average, 6 of the 8 DIF-flagged items (75%) are in the pretesting section of the exam and had no impact on the test-taker’s score or licensure. Following this assumption would also mean that, on average, 6 of the 25 items approved provisionally by the examination committee for the average exam (nearly 25% of all pretest items on exams) displayed differential functioning. That’s not great, and we’ll put that in a little box as a little problem, and move on to the big one.

THAT MEANS 1 or 2 BIASED, SCORED ITEMS, ON AVERAGE, DETERMINE IF SOMEONE PASSES OR FAILS ASWB EXAMS!

HOW MANY TEST-TAKERS FAIL BY ONE OR TWO POINTS?! THOUSANDS!

Social work boards in every state and DC rely on ASWB’s assertion that their cut score is well-tuned. That decisions can be made based on one exam item alone–yet, according to ASWB’s public statements, it is likely that at least one or two items of EVERY EXAM SCORE SOCIAL WORK BOARDS USE TO MAKE LICENSING DECISIONS are going to be removed later, in secret, by ASWB due to differential item functioning.

ASWB has no publicly written procedure to inform the board who rejected them, the test-taker who was rejected, or the community that will now lose a qualified social worker–most likely, an aspiring clinician from oppressed and historically underrepresented groups in professional social work.

ASWB’s cut-scores are guesses (with massive consequences)

The previous section described how, based on ASWB’s publicly reported data, boards can conclude that ASWB’s cut scores are poorly tuned. Moreover, they should make ethical judgements about an examination provider that removes items from circulation that impact test-taker’s lives in deeply profound ways without informing anyone–state boards, test-takers, or other stakeholders.

Even if ASWB did not hide data from these parties, its cut scores are–put charitably–best guesses about the actual functioning of its examinations. Let’s take a look at how the test information curve, a measure of differential test functioning that ASWB refuses to perform, could be used to determine how well cut scores differentiate between test-takers of low, medium, or high ability.

Shout out to Dr. Ainsworth for putting your lectures on YouTube!!!

If you are having trouble with the abstract concepts in the video, please watch for the examples related to math ability standardized tests as well as depression scales that follow right after.

Essentially, the “hump” of the test information curve tells us where a good cut score might be. A middling social worker–someone right on the line of competent vs. incompetent–that is what the test should be best at measuring.

ASWB does not measure the test information curve or ever investigate the psychometric functioning of their exam as a whole. Put simply, ASWB evaluates, item-by-item how well an item distinguishes between social workers of different abilities. It does not look for bias at the test-level. On the slide shown previously, it says they are “looking into” the use of differential test functioning approaches. They do not use them.

So, even if exam committee members were producing near-perfect exams with <1% of questions flagged for DIF, their cut scores–the things that determine whether social workers get licensed or not–are not determined by the actual psychometric properties of the exam. Instead, they are determined by the best guesses of subject matter experts. Actual real-world data from license-seeking test-takers is never used to establish or evaluate the exam’s cut-scores.

ASWB lies about DIF & DTF

Recently, the Association of Social Work Boards posted an article of misinformation about differential item functioning (whether an item on a test is biased) as compared to differential test functioning (whether the entire test is biased). I’m going to go over what is inaccurate about their blog post and why those misrepresentations matter.

First, the blog post is authored by a marketer, not a methodologist. As is common practice for ASWB, marketers and managers are using psychometric terminology to mislead social workers and regulators. A methodologist might have cited recent sources to talk about currently accepted psychometric practices. Of course, that would require understanding that it is unethical for Bobbie Hartman, the Marketing and Content Strategy Manager, to be speaking authoritatively about psychometrics.

This is a consistent theme for ASWB–their officers testify and make public statements about psychometrics but do not actually perform or receive enough training in psychometrics to make competent statements. Because ASWB outsources psychometrics to contractors, social workers asking about psychometrics end up with unsatisfying answers like “we’ll check with our psychometricians” and “our psychometricians assure us that DIF is a robust process” that are meaningless without data and procedure to objectively evaluate.

DTF is not about item elimination

The first section of the blog post badly misrepresents the purpose of DTF analysis, again due to the author’s lack of competence to write about the topic and their organizational self-interest in misrepresenting psychometrics. The purpose of DTF is to identify multivariate properties of the entire examination, whereas DIF provides multivariate properties of individual items. Bobbie could have read the standards she cites to find this definition:

Differential test functioning (DTF) refers to differences in the functioning of tests (or sets of items) for different specially defined groups. When DTF occurs, individuals from different groups who have the same standing on the characteristic assessed by the test do not have the same expected test score.
Standards for Educational and Psychological Testing (page 51)

One could perform the nearly 30-year-old approach cited by the author (Raju, 1995) to engage in item elimination. Indeed, Chalmers et al (2016) revised and updated the DTF approach used by Raju. Let’s see what they say about DIF vs. DTF:

It is also possible, however, to obtain nontrivial DTF in applications where little to no DIF effects have been detected. Meaningful DTF can occur in testing situations where DIF analyses suggest that no individual item appears to demonstrate a large amount of DIF. Specifically, substantial DTF can occur when the freely estimated parameters systematically favor one group over another. The aggregate of these small and individually insignificant item differences can become quite substantial at the test level, and in turn bias the overall test in favor of one population over another. Therefore, studying DTF in isolation and in conjunction with DIF analyses can be a meaningful and informative endeavor for test evaluators.
Chalmers, Counsell, & Flora, 2016 (page 118)

Perhaps if ASWB had bothered to update themselves on Item Response Theory scholarship from this century, they would know that DTF is important because it evaluates a separate question than item-level functioning…and one that ASWB (because of their incompetence and self-interest) refuses to recognize…that differential functioning (e.g., bias) can happen at the item level, content area level, or at the whole-exam level because, as ASWB says “different types of biases should be evaluated independently of one another because they are not necessarily related.” Right on, ASWB. Now do the work!

Looking for differential functioning only at the item-level assumes that each item is independent of the last one. It does not investigate patterns on the content area or subset level. Differing test-taker perspectives on child welfare, social work theory, policing, supervision, and other hot topics on the exam may have different meaning across groups. Looking only item-by-item would miss patterns that emerge among relationships between questions. Using only DIF data would lead ASWB to mistakenly conclude they have an unbiased examination when in reality, there may be substantial DTF for clinicians who are older, not white, and English language learners.

ASWB prevents researchers from investigating DTF

To be fair, there are citations from the 2000s in the blog post. Indeed…the cited studies develop and test procedures for investigating DTF using Item Response Theory and Confirmatory Factor Analysis. These are approaches to exam bias that ASWB refuses to perform! Yet, they cite the studies that established the standards for effect sizes that determine whether a test is biased.

Cruelly and hilariously, they do so because “DIF does not typically favor one examinee group consistently.” Does an ASWB’s examination fall into the typical case, or does it fail the test? Of course, it is impossible to know. ASWB cites, but does not perform, the DTF tests from Nye (CFA methods) and Stark (IRT methods). The methods section is there for a reason, ASWB! I’m pretty sure ASWB has the money to pay someone competent to perform the analysis.

Although they recently released a Request for Proposals to investigate ASWB’s exam data, ASWB suggests researchers investigating exam bias “address correlating external [emphasis added] variables that may influence the disparities in the licensing exam pass rate data. Such variables could include upstream [emphasis added] factors such as differences in education programs; considerations of intersectionality, including age, gender, race, health, socioeconomic status; and social determinants of health, including life experiences from early childhood to post-graduate.”

Missing from these exculpatory hypotheses is the actual psychometric functioning of the examination. Indeed, the areas of focus only welcome studies that investigate “pipeline” and “upstream” factors, not problems with the psychometric properties of the examination. Weird, because the ASWB’s examination guidebook already states that external factors are the reason for different test scores across groups–not a broken examination.

ASWB works to ensure the fairness of each of its exam questions but acknowledges that there may be differences in exam performance outcomes for members of different demographic groups because exam performance is influenced by many factors external [emphasis added] to the exams. ASWB has committed to contributing to the conversation around diversity, equity, and inclusion by investing in a robust analysis of examination pass rate data.
ASWB Examination Guidebook (01/2023) pg. 12

It certainly sounds like ASWB would like to fund studies that use its data to investigate the statement it already uses to explain exam bias data to test-takers. Would researchers investigating DTF be able to kludge their proposal in under bullet #1 below? I dunno. I guess they could try…

Variables associated with the results reported in the 2022 ASWB Examination Pass Rate Analysis
The impact of licensure on the social work profession
Supervision’s role in social work licensure
Professional practice standards
Electronic practice
Regulatory enforcement

Without a measurement equivalence study that investigates the multivariate properties of the exam, the statements ASWB makes about the quality of the examination will continue to rest on the best guesses of the exam developer, rather than the test’s actual performance in the real world. I’m not optimistic that this analysis will be performed. The RFP is administered by…ASWB. And were any psychometrics researchers to get the data, ASWB retains final say over any publication created using their data (ASWB, “Methods of Operation” 7.14 Research Support pt. #7).

DTF analysis is required, no really read Page 52

ASWB publicly lies about the definition of bias. They maintain that the Testing Standards that everyone uses define bias as Differential Item Functioning. Here are a few examples of ASWB lying about that:

“Protocols for standardized testing require that bias be accounted for throughout the exam development process at the individual test question level. It is not the final pass rate data that is used to identify bias in exams.”
email from Jacqueline Braxton, MSW LCSW Licensed Examination Development Project Coordinator, to a test-taker.

“ASWB uses a testing industry statistical measurement called Differential Item Functioning (DIF). DIF indicates whether an exam question shows tendencies to advantage or disadvantage one group of test-takers over another (ASWB, 2020). DIF is identified by statistically analyzing responses to the exam questions—called items—during pretesting. Scored items are continually monitored for DIF. On an annual basis, less than 5% of all items released show DIF. Items flagged for DIF are removed from the bank of potential exam questions.”
Stacey Owens in the New Social Worker.

And here is how bias is actually defined in the testing standards. (Note how ASWB didn’t actually quote the standards in their blog post!)

The term predictive bias may be used when evidence is found that differences exist in the patterns of associations between test scores and other variables for different groups, bringing with it concerns about bias in the inferences drawn from the use of test scores.
Standards for Educational and Psychological Testing (page 51-52)

Clearly, the Standards define bias as differences in test scores, not individual exam items. Now that we understand that ASWB lies constantly about the conceptual definition of bias, we can understand why their bias detection methodology is similarly harebrained. When ASWB states in their blog post that DTF analysis is not required, they are not telling the whole truth. Let’s read together!

First, here is how the standards define “credible evidence [indicating] potential bias in measurement.” It is one of three factors (a)”inconsistent item meaning across groups,” (b) Differential Item Functioning, and (c) Differential Test Functioning. Next, the standards state that when credible evidence of measurement bias and predictive bias exist, these three sources (a-c) must be investigated independently. So far, there is no mandate for DTF analysis.

ASWB restates this…but leaves off the next sentence…let’s see why!

The presence or absence of one form of such bias may have no relationship with other forms of bias. For example, a predictor test may show no significant levels of DIF, yet show group differences in regression lines in predicting a criterion.
Standards for Educational and Psychological Testing (page 52)

“Regression lines predicting a criterion” refers to differential test functioning predicting the criterion: social work competence. The standards state that differential test functioning can exist without differential item functioning, and that is why they need to be investigated separately.

ASWB does not agree! According the marketer writing their blog post, the literature can be summarized thusly:

Although it is theoretically possible that DIF analyses may fail to identify some problematic items and small amounts of bias may accumulate to produce DTF, it is very unlikely that practically important DTF will result, because there is often high power to detect small magnitudes of DIF,
ASWB, Bobbie Hartman

As we read, this is omitting a large part of the truth. Contra ASWB, the standards foresee this exact circumstance–an entire test demonstrates bias while showing little bias at the item level. That context seems necessary for understanding why independently conducting DIF & DTF analyses would be recommended by the plain language by the standards.

Wait, I’m now remembering the first part of the blog post… ASWB said DIF and DTF are independent and need to be analyzed independently. Yet, the conclusion clearly states that because ASWB’s DIF approach is so good and DIF is rarely systematically biased, no DTF analysis needs to be performed. Did the first part of the blog post meet the second part?

Contrary to the incompetent writing at ASWB, the standards actually spell out what ASWB needs to do, now that it has finally bowed to decades of advocate pressure (while patting itself on the back as groundbreaking) and uncovered important evidence of problems with the entire exam.

Especially where credible evidence of potential bias exists, small sample methodologies should be considered. For example, potential bias for relevant subgroups may be examined through small-scale tryouts that use cognitive labs and/or interviews or focus groups to solicit evidence on the validity of interpretations made from the test scores
Standards for Educational and Psychological Testing (page 52)

ASWB is not performing any small-scale tryouts, using cognitive labs, or interviewing test-takers. While they are holding focus groups, those groups do not address the test-taking experience. Participants publicly report that they were directed not to talk about the exam bias report or issues of racism or ethnocentrism in the examination by the group facilitators. Instead, the focus groups address the broader social work journey through licensure. The focus groups are also facilitated by psychometric contractors employed by ASWB to consult on the examinations–a fact hidden from focus group participants until #StopASWB advocates complained.

DTF evaluates whether the test is biased…not which items are biased. ASWB does not want to perform a DTF analysis because it would test the hypothesis of whether the exam actually does what it says it does–assesses (fairly and impartially) the entry-level competence of social work practitioners. ASWB has the data necessary to perform this analysis–it is all contained in their descriptive report on examination bias. However, they will never commit to testing any hypotheses about the examination’s internal properties.

Right now, ASWB could also perform differential functioning analysis on the four content areas. Here is a screenshot of a failed examination report. Which content area displays the highest degree of differential functioning? We’ll never know.

Here is a study of Differential Test Functioning of the Praxis examination used to license teachers. It compares scores across components of the exam using data provided by the exam developers over five years. Yet, ASWB’s blog post makes it seem like no exam developer ever provides this information to the research community. Curious….

ASWB is unlikely to perform this analysis because it tests hypotheses that threaten its bottom line–that their exam tests a single construct, entry-level competence, fairly and accurately. Looking beyond item-by-item analysis opens the possibility of entire content areas in the examination needing to be removed entirely due to invalidity and bias (rather than individual exam items). Instead, ASWB seems content to throw up its hands and merely report the damage publicly while making vague gestures towards the need for greater social equity in society as a curative to its exam’s shoddy psychometrics.

ASWB: Chokehold capitalists

To perform a DTF analysis and test hypotheses on examination bias data using multivariate methods would create the real possibility that the examination is shown to be hopelessly biased and removed from use. There is no alternative to the ASWB exam, so removing the examination would grind licensure to a halt in every state.

ASWB calls looking only at the item-level for bias a “conservative approach”…which is apt, though not for the reasons they imply. It conserves the oppressive status quo by not exposing the multivariate properties of the entire examination (or its four separately-scored content areas) to differential functioning analysis.

ASWB has a financial stake in the status quo. Here are some facts:

ASWB had an 18% profit margin in their last tax year. Aren’t they a nonprofit?
They have an endowment of $25 million in stocks and bonds. And a $10 million exam defense fund they will use to fight any changes to the exam.
Examination fees and publishing make up almost all of their revenue.
ASWB contracts with itself to produce the examinations and has no incentive to tell us (the captive consumer of its tests) the truth.

Prepping a Zero Textbook Cost Course

Hi, everyone! Produced this for my fellow faculty, and I thought others might benefit from it. -Matt

Why prioritize textbook costs?

Textbooks are a social equity issue. Every student should have first-day access to all of the materials needed to succeed in the course. When my colleagues and I at Virginia schools of social work asked, “How do social work students in Virginia deal with textbook costs?” we found:

63.9% take on more work ; 50.3% delay purchasing required books
51.0% take out more loans ; 37.7% engage in piracy
25.8% do not visit family over breaks; 22.6% skip a meal

Here is a representative quote from our qualitative responses:
“In this past week, I couldn’t fill my car up with gas or buy groceries because I spent $500+ in textbooks, and I still have one more I can’t buy yet because I don’t get [paid] till Friday…The social work books are wonderful and VERY useful but I can’t afford food right now because of them.” (DeCarlo & Vandergrift, 2019).

This is a pretty common finding across all disciplines in higher education, just ask…Cengage.

Zero textbook cost courses

There are many different ways to eliminate out-of-pocket costs for the required materials for your courses. One designer might favor a different book chapter or journal article per week, combining open and commercially published materials, depending on what fits best for their learning objectives. Another designer might use a combination of library licensed materials like documentaries, graphic novels, journal articles…plus a few chapters from different textbooks. There is no right or wrong way to do it, and you are the expert on how best to design your course. If the library does not have a license or copy of something you need, please notify your chair. The library is not able to get a license for recent textbooks, though, since publishers do not sell them to libraries–only to students at a ridiculous markup.

Open educational resources

OER are freely and publicly available teaching, learning, and research resources that reside in the public domain or have been released under an intellectual property license that permits their free use and re-purposing by others. For example, instructors may download the material, tailor it to one’s course, save a copy locally to share with one’s students and share it with public stakeholders.

OER can include textbooks, course materials and full courses, modules, streaming videos, tests, software, and any other tools, materials, or techniques used to support access to knowledge. For example, redesigning a statistics course to use Jamovi or JASP instead of SPSS or Stata is adopting OER, since JASP and Jamovi use open copyright licenses and are free to use and remix while students will never afford access to SPSS or Stata after graduation.

It is important to note that OER are not better than other types of resources. When comparing their use versus traditional resources, their impact on student learning is the same but it achieves those outcomes at no out-of-pocket cost to students. See these two most recent meta-analyses of empirical studies of OER: Clinton & Khan, 2019; Hilton, 2020.

Only adopt materials you think will work for your course. It’s your eye for the quality of any educational resource that is most important, not the copyright license. You do not have to adopt an open textbook or OER in your course, even if one is available. For example, I know of three human development open textbooks I would never adopt in SWK 510…and one good one I plan to adopt. I’m happy to help you look!

When people talk about OER, they are mostly talking about open textbooks. Open textbooks are in use on the majority of campuses in the United States, though only about 15-20% of courses. Here is where to find them:

https://opensocialwork.org/textbooks/ (Open Social Work)
https://open.umn.edu/opentextbooks (Open Textbook Library)
https://www.oercommons.org/ (OER Commons)
https://merlot.org/merlot/ (MERLOT)
https://oer.deepwebaccess.com/oer/desktop/en/search.html (Mason OER Metafinder) Advanced

Open textbooks are unlikely to be one-for-one replacements for commercial textbooks at the graduate level; you may find an open textbook to be helpful as a supplemental resource, for a specific module, or for a single lesson. However, I have found a few open textbooks that have completely replaced (with some editing and customization) a commercial textbook I previously used.

I’ve copied this graphic below on why I like and prefer to adopt OER in my syllabi. Mostly, it provides me greater control over what my students read the first time, the ability to localize and customize content, and tailor everything to learning goals and class activities.

I cannot adopt OER in many courses I teach because there are no good OER for me to use.

Library-licensed resources

While openly licensed content is not available for many graduate social work topics, there are many sources of content that are free to students available through the library. Because we are a virtual program, please prioritize materials that can be digitally accessed by students via the library’s website or a PDF uploaded by the professor to the Canvas course.

If you previously assigned a commercial textbook or book when teaching this course, check whether the library has purchased an ebook license for it. If it is not in the library, notify your chair and we can explore licensing options with the library, but it is not a guarantee that the library can purchase it. Ebook licenses are a weird gray area in which what is a textbook (not license-able) and book (license-able) varies based on the financial interest of the publisher.

That said, there are many textbooks or academic books that are part of the library’s ebook collection. For example, we have the full-text PDF of Trauma Stewardship from 2009…a bit old, but also a seminal textbook I’d use at least part of. Check what we have! If you adopt a library-licensed ebook, please include the permalink in your syllabus and Canvas course.

Journal articles are expected to be included as part of course syllabi. If we do not have access to a social work journal, please notify your chair to touch base with the library. They have asked us for journals that are lacking in our collections.

A chapter or two from commercial books (fair use)

If you need a PDF of a chapter or two of a commercial book, but you cannot access it in the library, please email your chair. If there is no replacement for Chapter 6 in a famous textbook…we can assign Chapter 6 in the famous textbook without students needing to buy it.

Free internet resources

Please use whatever materials you think are best, even if they are “nontraditional.” That includes textbooks, books, articles, blogs, videos, or any materials available on the internet.

Readings as asynchronous learning activities

Social annotation of text (in Google Drive)

Share a link to an article or book PDF in your Google Drive with students via Canvas. Then, all students can mark up the same document in lieu of a discussion board

2-minute tech tutorial: https://www.youtube.com/watch?v=c5-ySTpfrqM

You can also turn any web text document into a shareable Google Document that students can collaboratively annotate

8-minute tech tutorial: https://youtu.be/3nJxgW26Gr8?t=48 (#2 and #3 in the series cover implementation of collaborative annotation in detail)

Social annotation of video in (VideoANT)

You can also annotate YouTube videos using VideoANT.

4-minute tech tutorial: https://www.youtube.com/watch?v=1SOE2aQky2I

I have asked students to use hashtags like #stats, link out to the open internet, reflect on personal experiences, and a variety of other prompts based on the lesson that week.

15-minute pedagogy tutorial: https://www.youtube.com/watch?v=U1-whmLNajQ

Annotations have made excellent fodder for early class discussion, resolving questions, eliciting how students define and apply key terms, etc. I still have some students who only open the document long enough to comment, so it certainly doesn’t guarantee genuine engagement.

Student Data Privacy in Field Education

Hey everyone! Starting a new thing here where I’m blogging instead of sending long emails that only one person reads. Hopefully more people find this useful.

One of the goals of Payment for Placements organizers is removing the cost of field education software. While many schools use some combination of spreadsheets, forms, poorly-formatted Microsoft Word documents and yes, paper…many schools rely on third-party software tools like Tevera. That’s the one that my school uses, and it’s the one that Payments for Placements at University of Georgia highlighted in their campaign. So, that’s the one I’ll analyze here. I imagine that most field education tracking software companies will be similar to Tevera, though a more rigorous study is required.

The reason I ask is that the issue of student data privacy comes up a lot in open education when textbook platforms try to get faculty to mandate 3rd party homework//learning platforms that are not approved by university IT… and so the companies who run 3rd party platforms can and do monetize student data. See Billy Meinke’s article which inspired mine.

This is an end-around of university IT departments’ controls on student data privacy, and it is one way that dominant education companies like textbook publishers and other vendors are pivoting to a digital-first business model. I wasn’t sure whether there were similar issues in these platforms, but I wanted to take a look. While field educators have well-established protocols they follow when choosing software vendors, sometimes those vendors can find ways to shortcut necessary review.

Tevera’s privacy policy seems anodyne, and it describes very clearly the limitations on data sharing. I’m curious about whether storing user data on AWS is FERPA-compliant, but I’m sure lots of ed tech companies do that. None of their privacy documents explicitly mention FERPA. Below is an excerpt from Tevera’s Terms of Service. Would you be comfortable if the software you used to create schoolwork required you to…

You hereby grant to Tevera and its affiliates, contractors, and suppliers a nonexclusive, perpetual, irrevocable, world-wide, royalty-free, assignable and sublicensable (through multiple tiers), license to reproduce, copy, use, host, store, sublicense, reproduce, create derivative works from, modify, publish, edit, translate, distribute, perform and display, including digitally or electronically, your submitted User Content and your name, voice and likeness (to the extent they are part of the User Content), (i) in connection with the Services, as specified under Third Party Licenses and/or for the interoperation of any third party products, (ii) if required by applicable law, where necessary to enforce these Terms of Use and/or to protect any of Tevera’s or other parties’ legal rights, (iii) in an aggregated form which does not include your identifying information, and (iv) as permitted by Tevera’s Privacy Notice.
Section 8 of Tevera’s Terms of Service

Billy’s post helpfully points to the United States Department of Education report Protecting Student Privacy While Using Online Educational Services: Model Terms of Service

	GOOD! This is a Best Practice	WARNING! Provisions That Cannot or Should Not Be Included in TOS
10 Rights and License in and to Data Maintaining ownership of data to which the provider may have access allows schools/districts to retain control over the use and maintenance of FERPAprotected student information. The “GOOD!” provision will also protect against a provider selling information.	“Parties agree that all rights, including all intellectual property rights, shall remain the exclusive property of the [School/District], and Provider has a limited, nonexclusive license solely for the purpose of performing its obligations as outlined in the Agreement. This Agreement does not give Provider any rights, implied or otherwise, to Data, content, or intellectual property, except as expressly stated in the Agreement. This includes the right to sell or trade Data.”	“Providing Data or user content grants Provider an irrevocable right to license, distribute, transmit, or publicly display Data or user content.”

Section 10 https://studentprivacy.ed.gov/sites/default/files/resource_document/file/TOS_Guidance_Mar2016.pdf

From my reading, it appears Tevera’s language is nearly identical to the all-caps warning from the DoE. Moving down to the next section in Tevera’s Terms of Service, it also appears that students themselves are responsible for determining whether Tevera meets their needs.

Without limiting the foregoing, you understand the risks associated with the access to and use of the Services and any User Content and other data, content and materials made available through the Services, and acknowledge that you are using the Services and such other data, content, and materials at your own risk and that you are personally responsible for verifying their suitability for your needs through your own investigation.
Section 9 of Tevera’s Terms of Service

While this clause by Tevera is entirely understandable from a legal perspective, universities know school employees are actually selecting software that students must purchase. This is one reason students are not permitted to choose their field software. The other is that software companies do not design their platforms to be interoperable because it is incompatible with their business model for students to move their field education data to a better provider if one comes along.

Students are not free to decline Tevera’s terms of service and continue in their social work program. Yet, they must agree that they are entirely in charge of consenting to the terms of service and are therefore providing voluntary consent. Seems fishy…ethically…for the university to manufacture that truth.

Essentially, students are forced to purchase software that requires them to give the software company unlimited rights to use, analyze, republish, and basically have carte blanche with the educational records students enter into the platform–in perpetuity. And students are forced to say this deal was entirely their choosing, and that they entered into the agreement voluntarily.

Cool. Cool.