USC Norris Comprehensive Cancer Center

An NCI-designated Comprehensive Cancer Center

Data Science Core

The Data Science Core (DSC) at the USC Norris Cancer Center is a unified research support facility integrating biostatistics and informatics to advance high-impact cancer research. We collaborate with investigators across the full research lifecycle — from study conception and design through data acquisition, integration, analysis, and dissemination. 

By combining rigorous statistical methodology with scalable data engineering and informatics infrastructure, DSC ensures that research is methodologically sound, analytically robust, and reproducible at scale. Our team works as a long-term partner to investigators, supporting projects ranging from early-stage experimental studies to large, multi-site clinical and population-based research. 

Integrated Capabilities 

  • End-to-End Research Support From grant development and study design to data integration, analysis, and publication 
  • Unified Biostatistics & Informatics Expertise Seamless collaboration between statistical scientists and data engineers 
  • Reproducible & Scalable Workflows Modern, version-controlled pipelines supporting transparency and efficiency 
  • Multi-Modal Data Integration Harmonization of clinical, survey, laboratory, and omics data sources 
  • Regulatory & Compliance Alignment Secure, governed environments supporting data integrity and sharing requirements 

Biostatistics

 DSC Biostatistics Support 

 DSC provides comprehensive biostatistical support for cancer‑related research, spanning all stages of the scientific process. We partner with investigators on study design for clinical trials, observational studies, and laboratory experiments; offer rigorous data management and advanced analytical methods tailored to modern biomedical data; and actively support grant preparation through study design consultation, power and sample size justification, and development of statistical sections. We also assist with manuscript and abstract preparation, including clear reporting of methods and results and the creation of publication‑ready tables and figures. Our goal is to serve as a collaborative, long‑term partner to Cancer Center investigators, enhancing the quality, impact, and translational relevance of their research. 

    •  Developing comprehensive clinical trial protocols, including detailed plans for pilot studies and Phase I, II, III, trials. 
    • Creating a Statistical Analysis Plan (SAP) that specifies design schema, primary and secondary endpoints, statistical methods, and sample size and power calculations. 
    • Defining safety monitoring procedures, including adverse event monitoring, stopping rules, and interim analysis guidelines. 
    • Advising on eligibility criteria, randomization schemes, stratification factors, and endpoint definitions to ensure methodological rigor. 
    • Implementing and overseeing data quality assurance procedures to ensure accurate, consistent, and reliable data collection. 
    • Specifying the format, structure, and key content of interim reports and the final clinical study report to support publication and regulatory submission. 
    •  Formulating clear research questions and testable hypotheses for laboratory studies. 
    • Designing experiments and determining appropriate sample sizes for cell line and/or animal models with defined endpoints (e.g., tumor size, cell viability, gene expression). 
    • Recommending suitable statistical methods for analyzing experimental data. 
    • Developing and implementing quality control procedures to ensure measurements are collected consistently and according to standardized protocols. 
    • Performing data analysis, interpreting results, and summarizing key conclusions and biological implications. 
    • Formulating clear research questions and testable hypotheses for observational studies. 
    • Selecting an appropriate study design (e.g., cross-sectional, case-control, cohort, longitudinal, ecological). 
    • Providing guidance on sampling methods (e.g., random, stratified, cluster) to ensure representativeness of the target population. 
    • Calculating required sample size and statistical power and developing an analysis plan that addresses potential confounding using appropriate statistical models. 
    • Conducting data analysis, interpreting results, and summarizing key conclusions and implications for clinical or public health practice. 
    • Providing expert support to Cancer Center investigators during grant preparation to ensure proposed research is well designed and statistically rigorous. 
    • Advising on study design and hypothesis formulation tailored to the scientific aims of the proposal. 
    • Performing sample size and power calculations to justify feasibility and ensure adequate statistical precision. 
    • Developing detailed statistical analysis plans aligned with the study objectives and data structure. 
    • Drafting or refining the statistical considerations sections to meet funding agency expectations and review criteria. 
    • Collaborating with investigators to enhance the overall methodological quality and impact of the proposed research. 
    • Providing comprehensive statistical analysis support for data from a wide range of study designs (clinical trials, observational studies, laboratory experiments, and registries). 
    • Applying both conventional statistical methods (e.g., regression modeling, survival analysis, longitudinal and multilevel models) and advanced techniques tailored to complex data structures. 
    • Leveraging modern approaches such as machine learning and predictive modeling to identify patterns, build risk scores, and improve outcome prediction. 
    • Supporting omics and high‑dimensional data analysis (e.g., genomics, transcriptomics, proteomics), including feature selection, dimension reduction, and integrative multi‑omics methods. 
    • Developing customized analysis pipelines and reproducible workflows, including code, documentation, and version‑controlled scripts for transparency and reusability. 
    • Interpreting and translating analytical results into clear scientific conclusions, preparing tables and figures, and assisting with manuscript, abstract, and presentation development. 
    • Assisting with the drafting and refinement of Methods and Results sections to ensure statistical accuracy and clarity. 
    • Preparing high‑quality tables, figures, and supplementary materials that clearly present key findings. 
    • Reviewing full manuscripts and abstracts for coherence, logical flow, and consistency between aims, methods, results, and conclusions. 
    • Ensuring appropriate reporting of statistical methods and results in line with journal and reporting‑guideline requirements (e.g., CONSORT, STROBE). 
    • Providing detailed editorial feedback on wording, organization, and interpretation to strengthen scientific messaging and improve the likelihood of acceptance. 

Informatics

DSC Informatics Services

 The Data Science Core provides end-to-end research data engineering and informatics support — from study design and governance through secure integration and delivery of analysis-ready datasets. Our services span the full research data lifecycle to ensure rigor, reproducibility, compliance, and scalability. 

    •  Plan and implement scalable research data infrastructure with robust data governance and lifecycle management to ensure data integrity, security, and regulatory compliance. 
    • Establish data standards and governance practices, including variable naming conventions, controlled vocabularies, and metadata documentation to ensure consistency across systems and study waves. 
    • Provide proactive review of survey instruments and database designs to prevent inconsistencies, preserve longitudinal comparability, and protect analytic validity. 
    • Provide secure computing environments, including relational databases and cloud-based integration and processing environments with appropriate access controls and data protection safeguards. 
    • Design and deploy secure, automated data pipelines to streamline ingestion, transformation, and delivery of research data in R and related data engineering frameworks, with versioning and reproducibility built in. 
    • Generate identified and de-identified analytic datasets for regulatory reporting, data sharing, and secondary analysis in compliance with regulatory and data sharing requirements. 
    • Harmonize and reconcile multi-source datasets across REDCap, OpenSpecimen, Medidata Rave EDC, and other research systems. 
    • Integrate specialized third-party data sources such as NutritionQuest dietary assessment instruments into unified research data workflows. 
    • Translate heterogeneous source data into interoperable, analysis-ready formats using structured transformation pipelines in R. 
    • Resolve instrument and database design issues proactively to protect analytic accuracy and longitudinal consistency. 
    • Comprehensive variable harmonization and standardization of naming conventions, coding schemes, and response categories across studies and survey waves. 
    • Rigorous data validation in R including identification of out-of-range values, errant entries, logical inconsistencies, and patterns of missingness. 
    • Expert data transformation and reshaping (wide/long, pivots), aggregation, regrouping, and analytic dataset construction using reproducible R workflows. 
    • Support data capture, wrangling, and structured data management workflows. 
    • Develop custom research software and web applications including recruitment tracking systems, biospecimen inventory management tools, and cohort identification solutions. 
    • Implement automated scheduling and participant management integrations (e.g., Calendly) to streamline study operations. 
    • Provide custom programming for data unification, website development, and scalable cluster computing solutions. 
    • Implement and develop reporting solutions within clinical trials management systems to support study monitoring and operations. 

DSC Policies and Procedures

Scope: The mission of the Data Science Core (DSC) is to provide consultation and support for research project design, informatics, data wrangling, and data analysis for cancer-focused peer-reviewed grant applications, cancer-focused clinical trials, and cancer-related research projects. See full list of services above.  Individuals requesting services for non-cancer related research should contact the SC CTSI using this link:  https://sc-ctsi.org/bbr-consult.          

The DSC website serves as the entry point for DSC activities and all NCCC members are required to use the system to submit service requests.

        1. Once a service request is registered in the system, the DSC director gets a “project request notification” and schedules a meeting with the PI to briefly discuss the project
        2. An initial meeting of up to one hour to determine the feasibility and scope of a project will be scheduled. The anticipated outputs of the initial consultation period will be a feasibility assessment, project scope, and project plan including cost estimate and timeline. Once the project plan is developed and agreed upon, all further activities will generate Core charges
        3. After the initial meeting, the DSC Director will triage the project to DSC members with matching expertise
        4. Project turnaround timeframes are set at the start of a project and are met with high adherence. The DSC Director will communicate directly and regularly with the DSC user regarding project progress and expected completion. If additional tasks are needed, these will be specified, documented and new estimated hours and charges will be agreed upon.
        5. Investigators seeking help should contact the DSC:
          1. NIH P series, U series, and SPORE grants (and similar sized foundation grants): 4 months prior to deadline
          2. R series and K series grants (and similar sized foundation grants): 2 months prior to deadline
          3. Data analysis for manuscript or abstract submission: 1 month prior to deadline
    • Letters of Support
      1. DSC will provide a copy of their standard letter of support detailing the biostatistical and informatics services available at USC Norris at no charge to the Core User
      2. If a more detailed, specific letter is needed, the work will be done as part of the Grant Feasibility/Development Phase
    • Grant Feasibility/Development Phase
      1. DSC will provide the Core User with an estimated amount of time needed for each grant application following the above timelines outlined in section 5.
      2. USC Non-Norris Comprehensive Cancer Center full members will be billed through the USC FBS System at the hourly rate of $ 150 for grant application support.
      3. For Norris Comprehensive Cancer Center (NCCC) full members only, DSC will provide up to 1 hour of support at no charge to each NCCC full member Core User per grant application/grant development.
        1. The NCCC full member Core User agrees to include effort for the DSC Core Director and/or a DSC master level biostatistician in the grant budget as stated in the MOU, or they will need to pay an hourly rate of $125 for any support time beyond the initial 1 hour per grant application. This will be billed through the USC FBS System.
      4.  All Core Users will sign an MOU indicating their agreement to include funding for the DSC in the grant budget if the grant is funded at any time in the future.
    • All DSC work on manuscripts and abstracts will be billed at the appropriate hourly rate, no free DSC service is provided for manuscript/abstract writing and preparation. It is expected that DSC members providing statistical support be included as co-authors on publications. Informatics professionals who make significant contributions should also be considered for inclusion as co-authors. Acknowledgment of the Norris Comprehensive Cancer Center CCSG grant (NIH-NCI grant # P30CA014089) in scientific manuscripts is required when Data Science Core services have been provided.

DSC Leadership

Ming Li, PhD

Director

mli69131@usc.edu

David Birtwell

Co-Director

David.Birtwell@med.usc.edu